Store Sales Forecasting - End-to-End Machine Learning Project

End-to-end retail sales forecasting system built with a production mindset — from data validation and feature engineering to model explainability, batch inference, API serving, and CI.

This project demonstrates how historical sales, calendar effects, and macro-economic indicators can be used to forecast weekly store sales and support inventory, staffing, and promotion planning decisions.

Problem Statement

Retail teams require accurate weekly sales forecasts at the store level to:

avoid stock-outs during high-demand periods,
reduce over-stocking during low-demand weeks,
plan staffing and promotions effectively.

Sales patterns are influenced not only by historical trends, but also by holidays and macroeconomic drivers such as fuel price, CPI, and unemployment.

Dataset

Historical Walmart weekly sales data
Granularity: Store × Week
Features include:
- Historical sales
- Holiday indicator
- Temperature
- Fuel price
- CPI
- Unemployment

(Raw data is excluded from the repository for cleanliness and reproducibility.)

Modeling Approach

Data & Features

Strict data validation (schema checks, date parsing, duplicates, constraints)
Leakage-safe time-series features :
- Lag features (1, 2, 4, 8 weeks)
- Rolling averages (4, 8 weeks)
- Calendar features (year, month, week of year)

Models

Baseline: Seasonal naive (last-week sales)
Final model: LightGBM regression

Evaluation Strategy

Time-based train / validation / test split
Metrics aligned with retail decision-making:
- WAPE (Weighted Absolute Percentage Error)
- RMSE
Error slicing by:
- Store
- Holiday vs non-holiday weeks

Results

Metric	Value
Test WAPE (overall)	3.53%
Non-holiday WAPE	3.42%
Holiday WAPE	4.81%

Key Insight:

Holiday weeks exhibit higher forecast error due to demand spikes, highlighting the need for special handling in retail planning.

Model Explainability (SHAP)

SHAP was used to interpret model behavior at both global and local levels.

Key Findings

Recent sales momentum (lag & rolling features) is the strongest driver of forecasts.
Holiday flag increases variance and uncertainty.
Macroeconomic indicators act as secondary stabilizing signals.

Artifacts generated:

Global feature importance (bar plot)
SHAP beeswarm plot
Local explanation for an individual store-week prediction

All explainability outputs are saved under docs/shap/.

Inference & Serving

Batch Inference

Generates CSV forecasts for the most recent weeks
Output: batch_predictions.csv


python -m src.inference.predict_batch

API (FastAPI)

A lightweight REST API for real-time predictions.


uvicorn src.inference.api:app --reload

Health check:


GET /health

Prediction:


POST /predict

Business Demo (Streamlit)

A Streamlit dashboard enables non-technical stakeholders to:

Select a store
Compare actual vs predicted sales
View recent forecasts and error slices


streamlit run app/streamlit_app.py

Engineering & Quality

Clean, modular project structure
Reproducible pipelines
GitHub Actions CI
Ruff linting
Pytest smoke tests
Defensive coding to avoid silent failures

Project Structure


store-sales-forecasting-e2e/
├── src/
│   ├── data/         # ingestion & validation
│   ├── features/     # feature engineering
│   ├── models/       # training, evaluation, SHAP
│   └── inference/    # batch + API inference
├── app/              # Streamlit dashboard
├── docs/             # evaluation & SHAP artifacts
├── tests/            # CI-safe tests
├── .github/workflows # GitHub Actions CI

This project demonstrates:

End-to-end ML lifecycle ownership
Retail-focused metric selection and error analysis
Production-style inference (batch + API)
Explainable ML for stakeholder trust
Engineering discipline through CI and clean structure

Feel free to reach out if you’d like to discuss modeling decisions, evaluation strategy, or production trade-offs.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
app		app
docs		docs
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Store Sales Forecasting - End-to-End Machine Learning Project

Problem Statement

Dataset

Modeling Approach

Data & Features

Models

Evaluation Strategy

Results

Model Explainability (SHAP)

Key Findings

Inference & Serving

Batch Inference

API (FastAPI)

Business Demo (Streamlit)

Engineering & Quality

Project Structure

This project demonstrates:

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Store Sales Forecasting - End-to-End Machine Learning Project

Problem Statement

Dataset

Modeling Approach

Data & Features

Models

Evaluation Strategy

Results

Model Explainability (SHAP)

Key Findings

Inference & Serving

Batch Inference

API (FastAPI)

Business Demo (Streamlit)

Engineering & Quality

Project Structure

This project demonstrates:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages