Skip to content

pulipakav1/churn_rate

Repository files navigation

Telco Customer Churn Prediction

Built a production-ready churn prediction system that identifies which customers will leave, why, and how confident the model is — validated statistically, explained with SHAP, and deployed via FastAPI + Docker.


Business Dashboard

Churn Dashboard


Key Business Findings

  • 26.5% overall churn rate — 1 in 4 customers leaves every year
  • Month-to-month contracts churn at 42.7% vs 2.8% for two-year contracts — a 15x difference
  • New customers (0–12 months) churn at 47.4% — the single highest-risk group
  • 2,186 customers currently flagged as High Risk
  • Higher monthly charges (>$60) correlate strongly with churn — 33.7% churn rate

Retention recommendation: Target month-to-month customers in their first 12 months with a contract upgrade offer. This single segment accounts for the majority of preventable churn.


Model Comparison

Three models benchmarked to validate XGBoost's improvement over simpler baselines:

Model ROC-AUC PR-AUC Recall (Churn) Accuracy
Dummy Classifier 0.500 0.265 0.0% 73.5%
Logistic Regression 0.841 0.633 55.3% 79.8%
XGBoost (ours) 0.865 0.712 77.3% 78.2%

XGBoost outperforms Logistic Regression by +7.9% PR-AUC and +22% Recall on the churn class. PR-AUC is the primary metric — ROC-AUC overstates performance on imbalanced datasets.


SQL Analysis

Churn drivers validated directly via SQL on the customer dataset.

By Contract Type

Contract Customers Churned Churn Rate
Month-to-month 3,875 1,655 42.7%
One year 1,473 166 11.3%
Two year 1,695 48 2.8%

By Tenure Band

Tenure Customers Churn Rate
0–12 months 2,186 47.4%
13–24 months 1,024 28.7%
25–48 months 1,594 20.4%
49–72 months 2,239 9.5%

By Monthly Charges

Charge Band Customers Churn Rate
High ($60–$90) 2,392 33.7%
Very High (>$90) 1,744 32.9%
Medium ($30–$60) 1,254 26.1%
Low (<$30) 1,653 9.8%

Full query files in /sql_analysis/ — executed via DuckDB on data/churn_predictions.csv.


Statistical Validation

Churn drivers validated statistically before modeling — not just assumed from EDA:

  • Chi-Square Test — Contract type vs churn: statistically significant association confirmed
  • Independent T-Test — Monthly charges differ significantly between churned and retained customers
  • Effect Size (Cohen's d) — Quantified magnitude of pricing impact beyond p-value significance

This ensures the model is built on validated signals, not noise.


Why This Approach

Decision Reason
PR-AUC over ROC-AUC Dataset is imbalanced (26% churn) — ROC-AUC overstates performance
F1-based threshold tuning Default 0.5 threshold is suboptimal for imbalanced classes
scale_pos_weight Handles class imbalance without synthetic oversampling
Calibration analysis Ensures predicted probabilities are reliable for business decisions
SHAP over feature importance Built-in XGBoost importance is biased toward high-cardinality features
Baseline comparison Quantifies XGBoost's real improvement over simpler alternatives

Model Explainability (SHAP)

Global Feature Importance

SHAP summary

SHAP summary plot ranking all features by mean absolute impact on churn predictions.

Individual Customer Explanation

SHAP waterfall

SHAP waterfall plot showing exactly how each feature pushes a single customer's churn probability up or down.

Top 3 churn drivers: tenure · contract type · monthly charges — consistent across both statistical tests and SHAP.


Model Evaluation

ROC Curve

ROC Curve

Precision–Recall Curve

PR Curve

Calibration Curve

Calibration

Calibration confirms predicted probabilities align with observed churn frequencies — making scores reliable for business threshold decisions.


Application

Architecture

Streamlit UI
    │
    ▼
FastAPI (/predict)
    │
    ▼
Preprocessing Pipeline
    │
    ▼
XGBoost → Probability Calibration
    │
    ▼
Churn Score + SHAP Explanation

Streamlit Dashboard

Streamlit UI

Upload a customer dataset or enter individual details to get real-time churn probability scores with SHAP explanations.

FastAPI Inference API

FastAPI Swagger

REST API exposing /predict for JSON-based churn inference — documented via auto-generated Swagger UI.


Stack

Layer Tools
Business Dashboard Power BI
Modeling XGBoost · scikit-learn
Explainability SHAP
Statistical Tests SciPy — Chi-Square · T-Test · Cohen's d
SQL Analysis DuckDB
API FastAPI · Uvicorn
Frontend Streamlit
Deployment Docker · Docker Compose
Language Python 3.9+

Project Structure

telco-churn/
├── src/
│   └── training/
│       ├── train.py            # Main training pipeline
│       ├── evaluate.py         # Model evaluation
│       └── baseline.py         # Baseline model comparison
├── api/                        # FastAPI inference API
├── app_streamlit/              # Streamlit frontend
├── configs/                    # Config-driven training params
├── models/                     # Saved model artifacts
├── reports/                    # Dashboard, plots, screenshots
├── data/                       # Raw data + predictions CSV
├── sql_analysis/               # SQL queries + DuckDB script
├── tests/
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
└── README.md

Run Locally

git clone https://github.com/pulipakav1/churn_rate.git
cd churn_rate

pip install -r requirements.txt

# Train model
python -m src.training.train

# Run baseline comparison
python src/training/baseline.py

# Export predictions for dashboard
python src/export_predictions.py

# Run SQL analysis
python sql_analysis/run_queries.py

# Start API
uvicorn api.main:app --reload --port 8000

# Start Streamlit UI
streamlit run app_streamlit/app.py

Docker

docker build -t telco-churn-api .
docker run -p 8000:8000 telco-churn-api

What Makes This Different

Most churn projects stop at a notebook with an accuracy score. This one:

  • Statistically validates churn drivers before modeling
  • Benchmarks against baselines — quantifies real improvement
  • Uses PR-AUC — the correct metric for imbalanced classification
  • Optimizes the decision threshold for business recall goals
  • Explains every prediction — SHAP at global and individual level
  • Deploys end-to-end — working API + UI in Docker
  • Delivers business insights — SQL analysis + Power BI dashboard for non-technical stakeholders

License

MIT — use and modify freely.


About

Lifted churn recall 55% → 77% with XGBoost + SHAP explainability. Dockerized FastAPI endpoint for real-time predictions.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors