Customer segmentation for pension fund using K-Means and GMM clustering with FastAPI serving
Unsupervised clustering of pension customers with synthetic data, auto-tuned clusters (K-Means + GMM), segment profiling, a lightweight recommendation layer, and a Streamlit explorer. MLflow logs experiments and artifacts locally.
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
export MLFLOW_TRACKING_URI="$(pwd)/mlruns"
customer-segmentation-pension)PYTHONPATH=. python3 scripts/train.py --config configs/config.yaml
Training generates:
models/02-segmentation/cluster_pipeline.joblib, models/02-segmentation/recommenders.joblibartifacts/metrics.json, artifacts/segment_profiles.json, artifacts/pca_projection.csvdata/02-segmentation/raw/customers.csv (plus processed CSV)mlruns/PYTHONPATH=. MLFLOW_TRACKING_URI="$(pwd)/mlruns" mlflow ui --backend-store-uri "$(pwd)/mlruns" --port 5001
Open http://localhost:5001 and select customer-segmentation-pension.info.md if you see “No traces recorded”.)PYTHONPATH=. streamlit run src/app.py -- --config configs/config.yaml
Default URL: http://localhost:8502; stop with Ctrl+C../scripts/run_api.sh
/health, /segment, /recommendcurl -X POST http://localhost:8000/recommend \
-H "Content-Type: application/json" \
-d '{"age":48,"tenure_years":10,"balance_total":1e6,"conservative_ratio":0.4,"variable_ratio":0.3,"contribution_consistency":0.8,"app_logins_monthly":12,"feature_usage_score":70,"support_contacts":2,"avg_transaction_amount":250000,"transaction_frequency":6,"last_transaction_days":12,"risk_tolerance":"Moderate","financial_goal":"Growth"}'
configs/ – configuration (paths, tuning ranges, offers, MLflow).data/02-segmentation/ – raw + processed CSVs.models/02-segmentation/ – persisted cluster pipeline + recommender models.artifacts/ – metrics, profiles, PCA projection for the app.src/ – code: data generation, features, clustering, profiling, recommenders, app, api.scripts/ – entrypoints to train, serve app, and run API.tests/ – unit/smoke tests for pipeline pieces.python3 scripts/train.py --config configs/config.yamlstreamlit run src/app.py -- --config configs/config.yaml./scripts/run_api.shpytest -qmlruns/ backend by default; set MLFLOW_TRACKING_URI to use a remote server.