Skip to content

kirtis111/store-sales-forecasting-e2e

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Store Sales Forecasting - End-to-End Machine Learning Project

End-to-end retail sales forecasting system built with a production mindset — from data validation and feature engineering to model explainability, batch inference, API serving, and CI.

This project demonstrates how historical sales, calendar effects, and macro-economic indicators can be used to forecast weekly store sales and support inventory, staffing, and promotion planning decisions.


Problem Statement

Retail teams require accurate weekly sales forecasts at the store level to:

  • avoid stock-outs during high-demand periods,
  • reduce over-stocking during low-demand weeks,
  • plan staffing and promotions effectively.

Sales patterns are influenced not only by historical trends, but also by holidays and macroeconomic drivers such as fuel price, CPI, and unemployment.


Dataset

  • Historical Walmart weekly sales data
  • Granularity: Store × Week
  • Features include:
    • Historical sales
    • Holiday indicator
    • Temperature
    • Fuel price
    • CPI
    • Unemployment

(Raw data is excluded from the repository for cleanliness and reproducibility.)


Modeling Approach

Data & Features

  • Strict data validation (schema checks, date parsing, duplicates, constraints)
  • Leakage-safe time-series features :
    • Lag features (1, 2, 4, 8 weeks)
    • Rolling averages (4, 8 weeks)
    • Calendar features (year, month, week of year)

Models

  • Baseline: Seasonal naive (last-week sales)
  • Final model: LightGBM regression

Evaluation Strategy

  • Time-based train / validation / test split
  • Metrics aligned with retail decision-making:
    • WAPE (Weighted Absolute Percentage Error)
    • RMSE
  • Error slicing by:
    • Store
    • Holiday vs non-holiday weeks

Results

Metric Value
Test WAPE (overall) 3.53%
Non-holiday WAPE 3.42%
Holiday WAPE 4.81%

Key Insight:

Holiday weeks exhibit higher forecast error due to demand spikes, highlighting the need for special handling in retail planning.


Model Explainability (SHAP)

SHAP was used to interpret model behavior at both global and local levels.

Key Findings

  • Recent sales momentum (lag & rolling features) is the strongest driver of forecasts.
  • Holiday flag increases variance and uncertainty.
  • Macroeconomic indicators act as secondary stabilizing signals.

Artifacts generated:

  • Global feature importance (bar plot) shap_global_importance

  • SHAP beeswarm plot shap_beeswarm

  • Local explanation for an individual store-week prediction

  • shap_local_example

All explainability outputs are saved under docs/shap/.


Inference & Serving

Batch Inference

  • Generates CSV forecasts for the most recent weeks
  • Output: batch_predictions.csv
python -m src.inference.predict_batch

batch_predictions

API (FastAPI)

A lightweight REST API for real-time predictions.

uvicorn src.inference.api:app --reload

Health check:

GET /health

Prediction:

POST /predict

Store sales prediction_fastapi_2

Store sales prediction_fastapi_1


Business Demo (Streamlit)

A Streamlit dashboard enables non-technical stakeholders to:

  • Select a store
  • Compare actual vs predicted sales
  • View recent forecasts and error slices
streamlit run app/streamlit_app.py

Streamlit_demo_1 Streamlit_demo_2


Engineering & Quality

  • Clean, modular project structure
  • Reproducible pipelines
  • GitHub Actions CI
  • Ruff linting
  • Pytest smoke tests
  • Defensive coding to avoid silent failures

Github CI actions


Project Structure

store-sales-forecasting-e2e/ ├── src/ │ ├── data/ # ingestion & validation │ ├── features/ # feature engineering │ ├── models/ # training, evaluation, SHAP │ └── inference/ # batch + API inference ├── app/ # Streamlit dashboard ├── docs/ # evaluation & SHAP artifacts ├── tests/ # CI-safe tests ├── .github/workflows # GitHub Actions CI

This project demonstrates:

  • End-to-end ML lifecycle ownership
  • Retail-focused metric selection and error analysis
  • Production-style inference (batch + API)
  • Explainable ML for stakeholder trust
  • Engineering discipline through CI and clean structure

Feel free to reach out if you’d like to discuss modeling decisions, evaluation strategy, or production trade-offs.

About

End-to-end retail sales forecasting using LightGBM with time-series features, SHAP explainability, FastAPI inference, Streamlit demo, and CI for production-ready ML workflows.

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages