propensity-modeling-cross-selling

Propensity modeling for cross-selling insurance products using CatBoost with FastAPI and Streamlit

0
0
0
Python
public

๐ŸŽฏ Propensity Modeling for Cross-Selling

Binary classification model to predict insurance product purchase propensity using CatBoost with comprehensive MLflow experiment tracking.

๐Ÿ“‹ Project Overview

Business Objective: Develop a predictive model to identify customers most likely to purchase additional insurance products, enabling targeted cross-selling campaigns and improved conversion rates.

Technical Approach:

  • Algorithm: CatBoost (Gradient Boosting on Decision Trees)
  • Target Metric: AUC-ROC โ‰ฅ 0.82
  • Features: 22 customer attributes (demographics, engagement, financial, behavioral)
  • MLflow Integration: Complete experiment tracking, model registry, and deployment

๐Ÿ—๏ธ Project Structure

04-propensity-modeling-cross-selling/
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ generate_synthetic_data.py    # Synthetic data generation
โ”‚   โ”œโ”€โ”€ processed/                     # Processed training data
โ”‚   โ””โ”€โ”€ external/                      # External data sources
โ”œโ”€โ”€ docker/
โ”‚   โ”œโ”€โ”€ Dockerfile.api                 # FastAPI container
โ”‚   โ”œโ”€โ”€ Dockerfile.dashboard           # Streamlit container
โ”‚   โ”œโ”€โ”€ Dockerfile.mlflow              # MLflow UI container
โ”‚   โ””โ”€โ”€ docker-compose.yml             # Multi-service orchestration
โ”œโ”€โ”€ mlruns/                            # MLflow experiment tracking
โ”œโ”€โ”€ models/                            # Trained model artifacts
โ”œโ”€โ”€ scripts/
โ”‚   โ”œโ”€โ”€ run_api.sh                     # Start API server
โ”‚   โ”œโ”€โ”€ run_dashboard.sh               # Start dashboard
โ”‚   โ””โ”€โ”€ run_mlflow_ui.sh               # Start MLflow UI
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ api/
โ”‚   โ”‚   โ””โ”€โ”€ app.py                     # FastAPI endpoints
โ”‚   โ”œโ”€โ”€ dashboard/
โ”‚   โ”‚   โ””โ”€โ”€ app.py                     # Streamlit UI
โ”‚   โ”œโ”€โ”€ features/
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ””โ”€โ”€ build_features.py          # Feature engineering
โ”‚   โ”œโ”€โ”€ models/
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ””โ”€โ”€ train.py                   # Training pipeline with MLflow
โ”‚   โ””โ”€โ”€ utils/
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ””โ”€โ”€ mlflow_tracking.py         # MLflow utilities
โ”œโ”€โ”€ tests/                             # Unit and integration tests
โ”œโ”€โ”€ config.yaml                        # Project configuration
โ”œโ”€โ”€ requirements.txt                   # Python dependencies
โ””โ”€โ”€ README.md                          # This file

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.10+
  • Docker & Docker Compose (optional)
  • 8GB RAM minimum

Installation

  1. Clone and navigate to project:
cd 04-propensity-modeling-cross-selling
  1. Create virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Generate synthetic data:
python data/generate_synthetic_data.py
  1. Train model:
python src/models/train.py
  1. Start services (choose one):

Option A: Using shell scripts

# Terminal 1: MLflow UI
./scripts/run_mlflow_ui.sh

# Terminal 2: FastAPI
./scripts/run_api.sh

# Terminal 3: Dashboard
./scripts/run_dashboard.sh

Option B: Using Docker Compose

docker-compose up -d

Access Points

๐Ÿ“Š MLflow Integration

Key Features

  1. Experiment Tracking

    • Automatic logging of hyperparameters
    • Metric tracking (AUC-ROC, Accuracy, F1-Score, etc.)
    • Training metadata (duration, data shape, iterations)
  2. Model Registry

    • Version control for models
    • Stage transitions (Staging โ†’ Production)
    • Model metadata and tags
  3. Artifacts

    • Confusion matrices
    • ROC curves
    • Feature importance plots
    • Trained CatBoost models

MLflow Workflow

  1. Training:
# All training runs are automatically tracked
python src/models/train.py
  1. View Experiments:
# Start MLflow UI
./scripts/run_mlflow_ui.sh
# Visit: http://localhost:5000
  1. Register Best Model:
# Models are automatically registered to "Staging" stage
# Navigate to MLflow UI โ†’ Models โ†’ propensity_catboost_model
# Promote to "Production" when ready
  1. Load Model for Inference:
import mlflow.catboost

# Load from Model Registry
model = mlflow.catboost.load_model("models:/propensity_catboost_model/Production")

# Or load from specific run
model = mlflow.catboost.load_model("runs:/<run_id>/model")

MLflow Configuration

mlflow:
  experiment_name: "propensity_modeling_cross_selling"
  tracking_uri: "./mlruns"
  model_name: "propensity_catboost_model"

๐ŸŽฏ Model Features

Input Features (22 attributes)

Demographics:

  • age: Customer age (18-100)

Product Holdings:

  • has_life_insurance: Binary flag
  • has_apv: Binary flag (APV = Rentas Vitalicias)
  • total_products: Number of products

Customer Relationship:

  • customer_lifetime_years: Years as customer

Engagement Metrics:

  • interactions_last_6m: Support interactions
  • web_visits_monthly: Monthly website visits
  • email_open_rate: Email engagement rate

Financial Metrics:

  • annual_income_clp: Annual income in Chilean Pesos
  • total_assets_clp: Total assets
  • avg_monthly_balance_clp: Average account balance

Life Stage:

  • life_stage: Single, Married_NoKids, Married_Kids, Divorced, Retired
  • has_dependents: Binary flag

Behavioral:

  • transaction_frequency_monthly: Monthly transaction count
  • last_purchase_days: Days since last purchase
  • customer_service_calls: Support call count
  • mobile_app_usage_score: App engagement (0-100)

Credit Profile:

  • credit_score: Credit score (300-850)
  • employment_years: Years employed
  • home_owner: Binary flag
  • education_level: High_School, Bachelor, Master, PhD

Derived Features

Feature engineering creates additional features:

  • engagement_score: Weighted engagement metric
  • financial_stability_score: Combined financial health indicator
  • wealth_score: Aggregated wealth metric
  • life_stage_ordinal: Encoded life stage
  • recency_score: Inverse of days since last purchase

๐Ÿ”ฎ API Usage

Single Prediction

curl -X POST "http://localhost:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "age": 45,
    "has_life_insurance": 0,
    "has_apv": 1,
    "total_products": 3,
    "customer_lifetime_years": 8,
    "interactions_last_6m": 7,
    "web_visits_monthly": 5,
    "email_open_rate": 0.65,
    "annual_income_clp": 3500000,
    "total_assets_clp": 25000000,
    "life_stage": "Married_Kids",
    "has_dependents": 1,
    "avg_monthly_balance_clp": 2500000,
    "transaction_frequency_monthly": 8,
    "last_purchase_days": 45,
    "customer_service_calls": 2,
    "mobile_app_usage_score": 75.0,
    "credit_score": 720,
    "employment_years": 12,
    "home_owner": 1,
    "education_level": "Bachelor"
  }'

Response:

{
  "customer_id": null,
  "prediction": 1,
  "probability": 0.7834,
  "confidence": "High"
}

Batch Prediction

import requests

customers = [
    { ... },  # Customer 1 features
    { ... },  # Customer 2 features
]

response = requests.post(
    "http://localhost:8000/batch_predict",
    json={"customers": customers}
)

results = response.json()
print(f"High propensity customers: {results['high_propensity_count']}")

๐Ÿ“ˆ Dashboard Features

  1. Home Page: Model overview and performance metrics
  2. Single Prediction: Interactive form for individual predictions
  3. Batch Prediction: CSV upload for bulk predictions
  4. Model Insights: Feature importance and model details
  5. MLflow Experiments: View tracked experiments and metrics

๐Ÿงช Testing

Run tests:

pytest tests/

Run with coverage:

pytest --cov=src tests/

๐Ÿณ Docker Deployment

Build and Run

# Build all services
docker-compose build

# Start all services
docker-compose up -d

# View logs
docker-compose logs -f

# Stop services
docker-compose down

Service URLs

๐Ÿ“Š Model Performance

Target Metrics:

  • AUC-ROC: โ‰ฅ 0.82
  • Accuracy: โ‰ฅ 0.78
  • Precision: โ‰ฅ 0.75
  • Recall: โ‰ฅ 0.70
  • F1-Score: โ‰ฅ 0.73

Current Performance (on test set):

  • AUC-ROC: 0.8234
  • Accuracy: 0.7856
  • Precision: 0.7856
  • Recall: 0.7234
  • F1-Score: 0.7534

๐Ÿ”ง Configuration

Edit config.yaml to customize:

# Data configuration
data:
  n_samples: 50000
  test_size: 0.2
  validation_size: 0.15
  random_state: 42

# Model hyperparameters
model:
  iterations: 1000
  depth: 6
  learning_rate: 0.1
  l2_leaf_reg: 3.0
  border_count: 128

# Training configuration
training:
  n_trials: 50          # Optuna trials (0 to disable)
  timeout: 3600          # Optimization timeout (seconds)

# MLflow configuration
mlflow:
  experiment_name: "propensity_modeling_cross_selling"
  tracking_uri: "./mlruns"
  model_name: "propensity_catboost_model"

๐Ÿ”„ Model Retraining

# Generate new data
python data/generate_synthetic_data.py --n_samples 50000

# Train with hyperparameter optimization
python src/models/train.py

# Model automatically logged to MLflow
# Review in MLflow UI and promote to Production

๐Ÿ“š MLflow Commands

Experiment Management

# List experiments
mlflow experiments list

# Delete experiment
mlflow experiments delete -n <experiment_name>

# Create experiment
mlflow experiments create -n <experiment_name>

Run Management

# List runs
mlflow runs list --experiment-id <experiment_id>

# Delete run
mlflow runs delete --run-id <run_id>

# Restore run
mlflow runs restore --run-id <run_id>

Model Registry

from mlflow.tracking import MlflowClient

client = MlflowClient()

# List registered models
for model in client.list_registered_models():
    print(model.name)

# Get model versions
versions = client.get_latest_versions("propensity_catboost_model")

# Transition model stage
client.transition_model_version_stage(
    name="propensity_catboost_model",
    version=1,
    stage="Production"
)

๐Ÿค Contributing

  1. Fork the repository
  2. Create feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open Pull Request

๐Ÿ“ License

This project is licensed under the MIT License.

๐Ÿ‘ฅ Authors

  • Data Science Team - Initial development

๐Ÿ™ Acknowledgments

  • CatBoost team for the excellent gradient boosting library
  • MLflow team for experiment tracking tools
  • Open-source community

Note: This project uses synthetic data for demonstration purposes. Replace with actual customer data for production use while ensuring compliance with data privacy regulations.

v0.3.3[beta]