customer-segmentation-pension

Customer segmentation for pension fund using K-Means and GMM clustering with FastAPI serving

0
0
0
Python
public

02 – Customer Segmentation (Pension)

Unsupervised clustering of pension customers with synthetic data, auto-tuned clusters (K-Means + GMM), segment profiling, a lightweight recommendation layer, and a Streamlit explorer. MLflow logs experiments and artifacts locally.

Setup

  • Create a virtual environment inside the project root and install dependencies:
    python3 -m venv .venv && source .venv/bin/activate
    pip install -r requirements.txt
    
  • Always activate the venv before running the commands below.
  • Before running training or UI, point MLflow to the local directory so the UI can find recorded runs:
    export MLFLOW_TRACKING_URI="$(pwd)/mlruns"
    

Quickstart

  1. Train pipeline + recommenders (logs to MLflow experiment customer-segmentation-pension)
    PYTHONPATH=. python3 scripts/train.py --config configs/config.yaml
    
    Training generates:
    • Models: models/02-segmentation/cluster_pipeline.joblib, models/02-segmentation/recommenders.joblib
    • Artifacts: artifacts/metrics.json, artifacts/segment_profiles.json, artifacts/pca_projection.csv
    • Data: data/02-segmentation/raw/customers.csv (plus processed CSV)
    • MLflow traces under mlruns/
  2. Launch MLflow UI
    PYTHONPATH=. MLFLOW_TRACKING_URI="$(pwd)/mlruns" mlflow ui --backend-store-uri "$(pwd)/mlruns" --port 5001
    
    Open http://localhost:5001 and select customer-segmentation-pension.
    (Troubleshoot with the fuller order/notes in info.md if you see “No traces recorded”.)
  3. Run Streamlit explorer
    PYTHONPATH=. streamlit run src/app.py -- --config configs/config.yaml
    
    Default URL: http://localhost:8502; stop with Ctrl+C.
  4. Run FastAPI service (for external tools like Power BI)
    ./scripts/run_api.sh
    
    • Default URL: http://localhost:8000
    • Endpoints: /health, /segment, /recommend
    • Quick example:
      curl -X POST http://localhost:8000/recommend \
        -H "Content-Type: application/json" \
        -d '{"age":48,"tenure_years":10,"balance_total":1e6,"conservative_ratio":0.4,"variable_ratio":0.3,"contribution_consistency":0.8,"app_logins_monthly":12,"feature_usage_score":70,"support_contacts":2,"avg_transaction_amount":250000,"transaction_frequency":6,"last_transaction_days":12,"risk_tolerance":"Moderate","financial_goal":"Growth"}'
      

What it builds

  • Synthetic dataset (15k rows, pension schema) with train/val/test splits.
  • Auto-tuned clustering across K=3–8 for K-Means and GMM using silhouette/Davies-Bouldin/CH.
  • Segment profiles (size, centroids, top differentiating features) stored as JSON.
  • Recommendation layer: per-offer propensity models (LogReg) using cluster-aware features.
  • Streamlit explorer: dataset overview, segment explorer (PCA scatter), customer lookup, and top-N offers.
  • MLflow tracking: params/metrics/artifacts for each training run.

Repo layout

  • configs/ – configuration (paths, tuning ranges, offers, MLflow).
  • data/02-segmentation/ – raw + processed CSVs.
  • models/02-segmentation/ – persisted cluster pipeline + recommender models.
  • artifacts/ – metrics, profiles, PCA projection for the app.
  • src/ – code: data generation, features, clustering, profiling, recommenders, app, api.
  • scripts/ – entrypoints to train, serve app, and run API.
  • tests/ – unit/smoke tests for pipeline pieces.

Key commands

  • Train: python3 scripts/train.py --config configs/config.yaml
  • App: streamlit run src/app.py -- --config configs/config.yaml
  • API: ./scripts/run_api.sh
  • Tests: pytest -q

Notes

  • Designed to swap in anonymized real data later if schema remains compatible.
  • MLflow uses a local mlruns/ backend by default; set MLFLOW_TRACKING_URI to use a remote server.
v0.3.3[beta]