credit-risk-interpretability

Credit risk interpretability platform using RandomForest + SHAP with MLflow tracking and FastAPI serving

Python

public

Credit Risk Interpretability Platform

A self-contained Python project that generates a synthetic population of Chilean credit applicants, trains and tracks a credit-risk model with MLflow, and exposes FastAPI endpoints that return predictions plus SHAP-powered explanations for denied customers. The README below walks through each layer so you can study the workflow end-to-end.

Project layout
Workflow overview
Dataset & feature table
Training & MLflow tracking
Explainability service (FastAPI + SHAP)
Notebooks and supporting material
Testing & validation
Operational notes & recommendations

Project layout

Directory	Purpose
`src/data`	Synthetic Chilean dataset generator, summary helpers, CLI entry point (`generate-data`).
`src/model`	Model training pipeline (preprocessing, RandomForest, MLflow logging) and CLI (`train-model`).
`src/api`	FastAPI service that loads the MLflow artifact, caches a SHAP explainer, and serves `/predict` + `/explain`.
`tests`	Unit/integration tests plus fixtures derived from a reproducible 500-row sample.
`notebooks`	Narrative analysis (EDA, model performance, interpretability walk-through).
`docs`	Architecture diagrams (Mermaid flows) and future monitoring scaffolds.
`outputs/`	Runtime artifacts (confusion matrix, metrics, pipeline joblib) produced during training.
`mlruns/`	Default MLflow tracking store for experiments and model registry stages.

Workflow overview

flowchart TD
  subgraph Dataset
    A[generate-data CLI] --> B[Chilean customer table]
  end
  B --> C[Train (+preprocess + RandomForest)]
  C --> D[MLflow run + artifacts]
  D --> E[FastAPI service picks latest model]
  E --> F[SHAP explainer + /predict + /explain]
  F --> G[Documentation/notebooks capture interpretability stories]

Synthetic data generation: generate-data produces 250k+ rows with demographics, housing, income, credit history, and macroeconomic features tailored to Chilean regions. The dataset embeds a logistic score + approval flag.
Training + MLflow: train-model loads the Parquet dataset, preprocesses numerics/categoricals, fits a RandomForestClassifier, computes metrics (ROC-AUC, Brier score, confusion matrix), logs everything to MLflow, and stores artifacts under outputs/.
Explainability service: run-api launches FastAPI; the service loads the MLflow pipeline, samples background data for SHAP, and exposes inference/explanation endpoints.
Documentation & notebooks: Notebooks highlight EDA, model performance, and SHAP-driven explanations for denied customers, tying the full workflow together.

Dataset & feature table

Feature group	Columns	Notes
Demographics	`age`, `gender`, `region`, `education_level`, `marital_status`, `housing_status`, `dependents`	Mix of categorical and ordinal values; Chilean regions prioritize Metropolitana/Valparaíso.
Financial profile	`income`, `employment_years`, `credit_history_months`, `num_loans`, `avg_monthly_payment`	Income is clipped [200k, 4M] CLP; `avg_monthly_payment` scales w/ income.
Risk signals	`delinquency_rate`, `credit_inquiries`, `approval_score`, `approved`	Score combines income, education, regional boost, employment tenure, delinquency; `approved` is derived via logistic probability.
Macros	`regional_cpi`, `regional_unemployment`	Drawn from normal distributions around Chilean averages to mimic macro cycles.

The generator also exposes dataset_summary for quick sanity checks and save_dataset for CSV/Parquet exports.

Training & MLflow tracking

Command	Purpose
`poetry run generate-data --rows 250000 --prefix data/chilean_credit_data`	Create the synthetic dataset (writes Parquet, optional CSV).
`poetry run train-model --data data/chilean_credit_data.parquet`	Fit the RandomForest, log metrics/params to MLflow, save confusion matrix + metrics JSON + pipeline joblib.
`mlflow ui`	Inspect tracked runs, compare experiments, and promote runs into the registry if desired.

MLflow artifacts

Model artifact: stored as a scikit-learn pipeline under mlruns/<experiment>/run/artifacts/model. The FastAPI service sets MODEL_URI to this path.
Metrics/artifacts: outputs/confusion_matrix.png, outputs/metrics.json, and outputs/pipeline.joblib are logged with every run.
Parameters: Sample counts, train/test split size, and classifier hyperparameters are logged for reproducibility.

Interpretability integration: During training the approval score and feature counts are retained to align with downstream SHAP explanations.

Explainability service (FastAPI + SHAP)

Endpoints

Path	Method	Description
`/predict`	POST	Returns `{ approved: bool, probability: float }`. When `include_explanation=true`, includes SHAP contributions fetched lazily.
`/explain`	POST	Forces SHAP explanation and returns base value + per-feature contributions. Useful for offline denial review.

Request schema

{
  "customer": {
    "age": 45,
    "gender": "f",
    "region": "Metropolitana",
    "education_level": "media",
    "marital_status": "casado",
    "housing_status": "arriendo",
    "dependents": 2,
    "income": 800000,
    "employment_years": 6,
    "credit_history_months": 84,
    "num_loans": 2,
    "avg_monthly_payment": 150000,
    "delinquency_rate": 0.05,
    "credit_inquiries": 1,
    "regional_cpi": 0.034,
    "regional_unemployment": 0.086
  },
  "include_explanation": true
}

SHAP workflow

On startup, the service loads the preprocessor + classifier from MLflow and samples 1k rows from the background data path (BACKGROUND_DATA_PATH) to build a shap.TreeExplainer.
/predict uses the pipeline’s predict_proba to return a probability (threshold 0.5) and optionally returns the cached explanation for transparency dashboards.
/explain recomputes SHAP values for the request, exports the base value, and lists feature contributions so analysts can see why a customer was denied.

Lifespan diagram

flowchart TD
  subgraph API startup
    A[Load MLflow pipeline] --> B[Sample background data]
    B --> C[Initialize SHAP TreeExplainer]
  end
  subgraph Request
    D[/predict] --> E[Transform input]
    E --> F[Predict probability]
    F --> G{explanation requested?}
    G -->|yes| H[Return SHAP contributions]
  end

Notebooks and supporting material

notebooks/01-eda.ipynb: Validate regional income spreads, approval rates, and highlight the distribution of approved vs. denied cases.
notebooks/02-model-performance.ipynb: Visualize the confusion matrix, ROC curve, Brier score, and calibration bucket analysis to evaluate classifier reliability.
notebooks/03-interpretability.ipynb: Simulate a denied customer, call /explain via the FastAPI test client, and record the SHAP force plot values.

Each notebook includes markdown commentary describing what to look for (e.g., “Why is the Maule customer denied?”) so learners can follow the reasoning for interpretability.

Testing & validation

Suite	Command	Notes
Unit (dataset)	`python -m pytest tests/test_data_generator.py`	Validates distributions (income bounds, region spread, binary approval).
Integration (API)	`python -m pytest tests/test_api.py`	Boots FastAPI, points to a fixture model, and checks `/predict` + `/explain` responses.
Notebook execution	`poetry run python -m nbconvert --to notebook --execute notebooks/01-eda.ipynb`	Ensures narrative notebooks run with current dependencies.

Operational notes & recommendations

Fairness monitoring: Log approval ratios by region, gender, and education_level as MLflow metrics and compare them batch-to-batch. Trigger alerts if disparities exceed ~5%. Document findings in notebooks/01-eda.ipynb.
Drift detection: Regenerate a small synthetic sample when macro features (CPI/unemployment) shift; compare histograms against the training distribution to flag drift before retraining.
MLflow registry workflow: After validating ROC-AUC and Brier score, promote the run to Staging/Production. Only register runs that also preserve interpretable SHAP rankings (e.g., income, debt ratio, delinquency should remain dominant).
API observability: FastAPI runs best behind your gateway/logging stack; log structured JSON (use python-json-logger) for every /predict request and record whether SHAP explanations were returned.
Next steps for study:
1. Add a Streamlit or Literate UI that shows denied customers, their SHAP breakdowns, and simulation sliders for income or unemployment.
2. Schedule a GitHub Actions workflow that regenerates data + retrains + reruns tests on merge so the MLflow run IDs remain traceable.
3. Build a docs/interpretability-report.md that explains how SHAP features map to business policies (e.g., “High delinquency pushes the model to deny credit”).

Quick reference commands

# regenerate dataset
poetry run generate-data --rows 250000 --prefix data/chilean_credit_data --csv

# train the RandomForest and log to MLflow
poetry run train-model --data data/chilean_credit_data.parquet

# launch explainability FastAPI
export MODEL_URI="mlruns/0/<run_id>/artifacts/model"
export BACKGROUND_DATA_PATH="data/chilean_credit_data.parquet"
poetry run run-api

# run everything
python -m pytest
poetry run python -m nbconvert --to notebook --execute notebooks/01-eda.ipynb

For a study-oriented walkthrough, follow the notebooks top-to-bottom: start with EDA, move to model diagnostics, and finish by replaying the SHAP explanations that justify why a customer was denied credit. Each notebook references the FastAPI service and MLflow artifacts so you can trace every step from synthetic data to an interpretable denial.

Find me

v0.3.3[beta]