Propensity modeling for cross-selling insurance products using CatBoost with FastAPI and Streamlit
Binary classification model to predict insurance product purchase propensity using CatBoost with comprehensive MLflow experiment tracking.
Business Objective: Develop a predictive model to identify customers most likely to purchase additional insurance products, enabling targeted cross-selling campaigns and improved conversion rates.
Technical Approach:
04-propensity-modeling-cross-selling/
โโโ data/
โ โโโ generate_synthetic_data.py # Synthetic data generation
โ โโโ processed/ # Processed training data
โ โโโ external/ # External data sources
โโโ docker/
โ โโโ Dockerfile.api # FastAPI container
โ โโโ Dockerfile.dashboard # Streamlit container
โ โโโ Dockerfile.mlflow # MLflow UI container
โ โโโ docker-compose.yml # Multi-service orchestration
โโโ mlruns/ # MLflow experiment tracking
โโโ models/ # Trained model artifacts
โโโ scripts/
โ โโโ run_api.sh # Start API server
โ โโโ run_dashboard.sh # Start dashboard
โ โโโ run_mlflow_ui.sh # Start MLflow UI
โโโ src/
โ โโโ api/
โ โ โโโ app.py # FastAPI endpoints
โ โโโ dashboard/
โ โ โโโ app.py # Streamlit UI
โ โโโ features/
โ โ โโโ __init__.py
โ โ โโโ build_features.py # Feature engineering
โ โโโ models/
โ โ โโโ __init__.py
โ โ โโโ train.py # Training pipeline with MLflow
โ โโโ utils/
โ โโโ __init__.py
โ โโโ mlflow_tracking.py # MLflow utilities
โโโ tests/ # Unit and integration tests
โโโ config.yaml # Project configuration
โโโ requirements.txt # Python dependencies
โโโ README.md # This file
cd 04-propensity-modeling-cross-selling
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
python data/generate_synthetic_data.py
python src/models/train.py
Option A: Using shell scripts
# Terminal 1: MLflow UI
./scripts/run_mlflow_ui.sh
# Terminal 2: FastAPI
./scripts/run_api.sh
# Terminal 3: Dashboard
./scripts/run_dashboard.sh
Option B: Using Docker Compose
docker-compose up -d
Experiment Tracking
Model Registry
Artifacts
# All training runs are automatically tracked
python src/models/train.py
# Start MLflow UI
./scripts/run_mlflow_ui.sh
# Visit: http://localhost:5000
# Models are automatically registered to "Staging" stage
# Navigate to MLflow UI โ Models โ propensity_catboost_model
# Promote to "Production" when ready
import mlflow.catboost
# Load from Model Registry
model = mlflow.catboost.load_model("models:/propensity_catboost_model/Production")
# Or load from specific run
model = mlflow.catboost.load_model("runs:/<run_id>/model")
mlflow:
experiment_name: "propensity_modeling_cross_selling"
tracking_uri: "./mlruns"
model_name: "propensity_catboost_model"
Demographics:
age: Customer age (18-100)Product Holdings:
has_life_insurance: Binary flaghas_apv: Binary flag (APV = Rentas Vitalicias)total_products: Number of productsCustomer Relationship:
customer_lifetime_years: Years as customerEngagement Metrics:
interactions_last_6m: Support interactionsweb_visits_monthly: Monthly website visitsemail_open_rate: Email engagement rateFinancial Metrics:
annual_income_clp: Annual income in Chilean Pesostotal_assets_clp: Total assetsavg_monthly_balance_clp: Average account balanceLife Stage:
life_stage: Single, Married_NoKids, Married_Kids, Divorced, Retiredhas_dependents: Binary flagBehavioral:
transaction_frequency_monthly: Monthly transaction countlast_purchase_days: Days since last purchasecustomer_service_calls: Support call countmobile_app_usage_score: App engagement (0-100)Credit Profile:
credit_score: Credit score (300-850)employment_years: Years employedhome_owner: Binary flageducation_level: High_School, Bachelor, Master, PhDFeature engineering creates additional features:
engagement_score: Weighted engagement metricfinancial_stability_score: Combined financial health indicatorwealth_score: Aggregated wealth metriclife_stage_ordinal: Encoded life stagerecency_score: Inverse of days since last purchasecurl -X POST "http://localhost:8000/predict" \
-H "Content-Type: application/json" \
-d '{
"age": 45,
"has_life_insurance": 0,
"has_apv": 1,
"total_products": 3,
"customer_lifetime_years": 8,
"interactions_last_6m": 7,
"web_visits_monthly": 5,
"email_open_rate": 0.65,
"annual_income_clp": 3500000,
"total_assets_clp": 25000000,
"life_stage": "Married_Kids",
"has_dependents": 1,
"avg_monthly_balance_clp": 2500000,
"transaction_frequency_monthly": 8,
"last_purchase_days": 45,
"customer_service_calls": 2,
"mobile_app_usage_score": 75.0,
"credit_score": 720,
"employment_years": 12,
"home_owner": 1,
"education_level": "Bachelor"
}'
Response:
{
"customer_id": null,
"prediction": 1,
"probability": 0.7834,
"confidence": "High"
}
import requests
customers = [
{ ... }, # Customer 1 features
{ ... }, # Customer 2 features
]
response = requests.post(
"http://localhost:8000/batch_predict",
json={"customers": customers}
)
results = response.json()
print(f"High propensity customers: {results['high_propensity_count']}")
Run tests:
pytest tests/
Run with coverage:
pytest --cov=src tests/
# Build all services
docker-compose build
# Start all services
docker-compose up -d
# View logs
docker-compose logs -f
# Stop services
docker-compose down
Target Metrics:
Current Performance (on test set):
Edit config.yaml to customize:
# Data configuration
data:
n_samples: 50000
test_size: 0.2
validation_size: 0.15
random_state: 42
# Model hyperparameters
model:
iterations: 1000
depth: 6
learning_rate: 0.1
l2_leaf_reg: 3.0
border_count: 128
# Training configuration
training:
n_trials: 50 # Optuna trials (0 to disable)
timeout: 3600 # Optimization timeout (seconds)
# MLflow configuration
mlflow:
experiment_name: "propensity_modeling_cross_selling"
tracking_uri: "./mlruns"
model_name: "propensity_catboost_model"
# Generate new data
python data/generate_synthetic_data.py --n_samples 50000
# Train with hyperparameter optimization
python src/models/train.py
# Model automatically logged to MLflow
# Review in MLflow UI and promote to Production
# List experiments
mlflow experiments list
# Delete experiment
mlflow experiments delete -n <experiment_name>
# Create experiment
mlflow experiments create -n <experiment_name>
# List runs
mlflow runs list --experiment-id <experiment_id>
# Delete run
mlflow runs delete --run-id <run_id>
# Restore run
mlflow runs restore --run-id <run_id>
from mlflow.tracking import MlflowClient
client = MlflowClient()
# List registered models
for model in client.list_registered_models():
print(model.name)
# Get model versions
versions = client.get_latest_versions("propensity_catboost_model")
# Transition model stage
client.transition_model_version_stage(
name="propensity_catboost_model",
version=1,
stage="Production"
)
git checkout -b feature/amazing-feature)git commit -m 'Add amazing feature')git push origin feature/amazing-feature)This project is licensed under the MIT License.
Note: This project uses synthetic data for demonstration purposes. Replace with actual customer data for production use while ensuring compliance with data privacy regulations.