lstm-time-series-forecasting

LSTM time series forecasting for pension contributions using PyTorch with FastAPI and Streamlit

0
0
0
Python
public

LSTM Time Series Forecasting - Monthly Contributions

Type: Sequence Modeling | Algorithm: LSTM (PyTorch) | Domain: Financial Forecasting


🎯 Business Problem

Cash flow unpredictability creates significant challenges for:

  • Resource Planning: Difficulty allocating staff and infrastructure
  • Investment Decisions: Uncertainty in capital deployment timing
  • Budget Management: Inaccurate forecasting leads to shortfalls or excess liquidity

Solution: Deploy an LSTM-based forecasting model to predict monthly contribution amounts 6 months in advance, enabling proactive decision-making.


πŸ“Š Dataset

Synthetic Time Series Data

  • Period: January 2018 - December 2022 (60 months)
  • Target: monthly_contributions_clp - Monthly contribution amounts in CLP
  • Features:
    • new_customers - New customer acquisitions
    • market_index - Market performance indicator
    • unemployment_rate - Chile unemployment rate (%)
    • gdp_growth - GDP growth rate (%)
  • Characteristics:
    • Trend component (upward growth)
    • Yearly seasonality (summer dips, year-end spikes)
    • Correlated with economic indicators
    • Realistic Chilean economic patterns

Data Splits

  • Train: 48 months (2018-01 to 2021-12)
  • Test: 6 months (2022-01 to 2022-06)
  • Validation: 6 months (2022-07 to 2022-12)

πŸ—οΈ Architecture

Model Specifications

Type: Stacked LSTM
Layers: 2
Hidden Units: 128
Dropout: 0.2
Sequence Length: 12 months (lookback window)
Forecast Horizon: 6 months (direct multi-step)
Output: 6 values (one per forecast month)

Feature Engineering

  • Lag Features: 1, 3, 6, 12-month lags
  • Rolling Statistics: 3 and 6-month mean/std
  • Calendar Features: Month (sin/cos encoded), Quarter
  • Scaling: MinMaxScaler per feature (fit on train only)

πŸš€ Quick Start

Installation

# Clone repository
cd 07-lstm-time-series-forecasting

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Generate Synthetic Data

python data/generate_synthetic_data.py

Train Model

python scripts/train.py

Run API

# Start FastAPI server
bash scripts/run_api.sh

# API will be available at http://localhost:8000
# API docs at http://localhost:8000/docs

Launch Dashboard

# Start Streamlit dashboard
bash scripts/run_dashboard.sh

# Dashboard will be available at http://localhost:8501

πŸ“‘ API Endpoints

POST /predict

Generate 6-month forecast with confidence intervals.

Request:

{
  "historical_data": {
    "date": ["2022-01-01", "2022-02-01", ...],
    "monthly_contributions_clp": [150000000, 155000000, ...],
    "new_customers": [520, 535, ...],
    "market_index": [1050, 1060, ...],
    "unemployment_rate": [0.072, 0.071, ...],
    "gdp_growth": [0.031, 0.032, ...]
  }
}

Response:

{
  "forecast": [158000000, 162000000, ...],
  "confidence_intervals": {
    "lower": [152000000, 156000000, ...],
    "upper": [164000000, 168000000, ...]
  },
  "forecast_dates": ["2023-01-01", "2023-02-01", ...],
  "model_info": {
    "model_type": "LSTM",
    "sequence_length": 12,
    "forecast_horizon": 6,
    "confidence_level": 0.95
  }
}

POST /predict/scenario

Scenario analysis by adjusting economic variables.

Request:

{
  "adjustments": {
    "unemployment_rate_delta": 0.01,  # +1%
    "gdp_growth_delta": -0.005,       # -0.5%
    "market_index_multiplier": 0.95   # -5%
  }
}

πŸ“ˆ Dashboard Features

  1. Forecast Explorer

    • Historical data visualization
    • 6-month forecast with confidence bands
    • Interactive Plotly charts
  2. Scenario Analysis

    • Adjust economic indicators via sliders
    • Real-time forecast updates
    • Compare multiple scenarios
  3. Model Performance

    • Training/validation loss curves
    • Error metrics (MAE, RMSE, MAPE)
    • Residual analysis
  4. Data Explorer

    • Time series decomposition
    • Trend, seasonality, residual components
    • Feature correlations

πŸ§ͺ Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=src --cov-report=html

# Run specific test file
pytest tests/test_model.py

🐳 Docker Deployment

Docker Compose

# Build and start all services
cd docker
docker-compose up --build

# Services:
# - API: http://localhost:8000
# - Dashboard: http://localhost:8501

Individual Containers

# API only
docker build -f docker/Dockerfile.api -t lstm-forecast-api .
docker run -p 8000:8000 lstm-forecast-api

# Dashboard only
docker build -f docker/Dockerfile.dashboard -t lstm-forecast-dashboard .
docker run -p 8501:8501 lstm-forecast-dashboard

πŸ“ Project Structure

07-lstm-time-series-forecasting/
β”œβ”€β”€ config.yaml                    # Configuration file
β”œβ”€β”€ requirements.txt               # Dependencies
β”œβ”€β”€ README.md                      # This file
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ generate_synthetic_data.py # Data generation script
β”‚   β”œβ”€β”€ raw/                       # Raw data
β”‚   β”œβ”€β”€ processed/                 # Train/test/val splits
β”‚   └── external/                  # External data sources
β”œβ”€β”€ notebooks/                     # Jupyter notebooks
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ models/                    # LSTM model & training
β”‚   β”œβ”€β”€ features/                  # Feature engineering
β”‚   β”œβ”€β”€ api/                       # FastAPI endpoints
β”‚   β”œβ”€β”€ visualization/             # Plotting utilities
β”‚   β”œβ”€β”€ utils/                     # Helper functions
β”‚   └── dashboard/                 # Streamlit app
β”œβ”€β”€ docker/                        # Docker files
β”œβ”€β”€ scripts/                       # Execution scripts
β”œβ”€β”€ tests/                         # Unit tests
β”œβ”€β”€ docs/                          # Documentation
└── models/                        # Saved model checkpoints

πŸŽ“ Key Concepts

Why LSTM for Time Series?

  • Memory: Captures long-term dependencies in sequential data
  • Flexibility: Handles multiple input features simultaneously
  • Non-linear: Learns complex patterns (trend + seasonality)

Direct Multi-Step Forecasting

Instead of recursive prediction (where errors compound), our model outputs all 6 months directly, improving accuracy.

Confidence Intervals via Monte Carlo Dropout

By enabling dropout during inference and running multiple predictions, we estimate uncertainty quantiles without additional models.


πŸ“š References


πŸ“ License

MIT License - feel free to use this project for learning and development.


πŸ‘€ Author

Created as part of a 10-project ML/DL portfolio showcasing end-to-end machine learning systems.

Technologies: Python, PyTorch, FastAPI, Streamlit, Docker
Domain: Financial Time Series Forecasting
Date: 2024

v0.3.3[beta]