Telco Customer Churn Prediction

Built a production-ready churn prediction system that identifies which customers will leave, why, and how confident the model is — validated statistically, explained with SHAP, and deployed via FastAPI + Docker.

Business Dashboard

Key Business Findings

26.5% overall churn rate — 1 in 4 customers leaves every year
Month-to-month contracts churn at 42.7% vs 2.8% for two-year contracts — a 15x difference
New customers (0–12 months) churn at 47.4% — the single highest-risk group
2,186 customers currently flagged as High Risk
Higher monthly charges (>$60) correlate strongly with churn — 33.7% churn rate

Retention recommendation: Target month-to-month customers in their first 12 months with a contract upgrade offer. This single segment accounts for the majority of preventable churn.

Model Comparison

Three models benchmarked to validate XGBoost's improvement over simpler baselines:

Model	ROC-AUC	PR-AUC	Recall (Churn)	Accuracy
Dummy Classifier	0.500	0.265	0.0%	73.5%
Logistic Regression	0.841	0.633	55.3%	79.8%
XGBoost (ours)	0.865	0.712	77.3%	78.2%

XGBoost outperforms Logistic Regression by +7.9% PR-AUC and +22% Recall on the churn class. PR-AUC is the primary metric — ROC-AUC overstates performance on imbalanced datasets.

SQL Analysis

Churn drivers validated directly via SQL on the customer dataset.

By Contract Type

Contract	Customers	Churned	Churn Rate
Month-to-month	3,875	1,655	42.7%
One year	1,473	166	11.3%
Two year	1,695	48	2.8%

By Tenure Band

Tenure	Customers	Churn Rate
0–12 months	2,186	47.4%
13–24 months	1,024	28.7%
25–48 months	1,594	20.4%
49–72 months	2,239	9.5%

By Monthly Charges

Charge Band	Customers	Churn Rate
High ($60–$90)	2,392	33.7%
Very High (>$90)	1,744	32.9%
Medium ($30–$60)	1,254	26.1%
Low (<$30)	1,653	9.8%

Full query files in /sql_analysis/ — executed via DuckDB on data/churn_predictions.csv.

Statistical Validation

Churn drivers validated statistically before modeling — not just assumed from EDA:

Chi-Square Test — Contract type vs churn: statistically significant association confirmed
Independent T-Test — Monthly charges differ significantly between churned and retained customers
Effect Size (Cohen's d) — Quantified magnitude of pricing impact beyond p-value significance

This ensures the model is built on validated signals, not noise.

Why This Approach

Decision	Reason
PR-AUC over ROC-AUC	Dataset is imbalanced (26% churn) — ROC-AUC overstates performance
F1-based threshold tuning	Default 0.5 threshold is suboptimal for imbalanced classes
`scale_pos_weight`	Handles class imbalance without synthetic oversampling
Calibration analysis	Ensures predicted probabilities are reliable for business decisions
SHAP over feature importance	Built-in XGBoost importance is biased toward high-cardinality features
Baseline comparison	Quantifies XGBoost's real improvement over simpler alternatives

Model Explainability (SHAP)

Global Feature Importance

SHAP summary plot ranking all features by mean absolute impact on churn predictions.

Individual Customer Explanation

SHAP waterfall plot showing exactly how each feature pushes a single customer's churn probability up or down.

Top 3 churn drivers: tenure · contract type · monthly charges — consistent across both statistical tests and SHAP.

Model Evaluation

ROC Curve

Precision–Recall Curve

Calibration Curve

Calibration confirms predicted probabilities align with observed churn frequencies — making scores reliable for business threshold decisions.

Application

Architecture

Streamlit UI
    │
    ▼
FastAPI (/predict)
    │
    ▼
Preprocessing Pipeline
    │
    ▼
XGBoost → Probability Calibration
    │
    ▼
Churn Score + SHAP Explanation

Streamlit Dashboard

Upload a customer dataset or enter individual details to get real-time churn probability scores with SHAP explanations.

FastAPI Inference API

REST API exposing /predict for JSON-based churn inference — documented via auto-generated Swagger UI.

Stack

Layer	Tools
Business Dashboard	Power BI
Modeling	XGBoost · scikit-learn
Explainability	SHAP
Statistical Tests	SciPy — Chi-Square · T-Test · Cohen's d
SQL Analysis	DuckDB
API	FastAPI · Uvicorn
Frontend	Streamlit
Deployment	Docker · Docker Compose
Language	Python 3.9+

Project Structure

telco-churn/
├── src/
│   └── training/
│       ├── train.py            # Main training pipeline
│       ├── evaluate.py         # Model evaluation
│       └── baseline.py         # Baseline model comparison
├── api/                        # FastAPI inference API
├── app_streamlit/              # Streamlit frontend
├── configs/                    # Config-driven training params
├── models/                     # Saved model artifacts
├── reports/                    # Dashboard, plots, screenshots
├── data/                       # Raw data + predictions CSV
├── sql_analysis/               # SQL queries + DuckDB script
├── tests/
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
└── README.md

Run Locally

git clone https://github.com/pulipakav1/churn_rate.git
cd churn_rate

pip install -r requirements.txt

# Train model
python -m src.training.train

# Run baseline comparison
python src/training/baseline.py

# Export predictions for dashboard
python src/export_predictions.py

# Run SQL analysis
python sql_analysis/run_queries.py

# Start API
uvicorn api.main:app --reload --port 8000

# Start Streamlit UI
streamlit run app_streamlit/app.py

Docker

docker build -t telco-churn-api .
docker run -p 8000:8000 telco-churn-api

What Makes This Different

Most churn projects stop at a notebook with an accuracy score. This one:

Statistically validates churn drivers before modeling
Benchmarks against baselines — quantifies real improvement
Uses PR-AUC — the correct metric for imbalanced classification
Optimizes the decision threshold for business recall goals
Explains every prediction — SHAP at global and individual level
Deploys end-to-end — working API + UI in Docker
Delivers business insights — SQL analysis + Power BI dashboard for non-technical stakeholders

License

MIT — use and modify freely.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Telco Customer Churn Prediction

Business Dashboard

Key Business Findings

Model Comparison

SQL Analysis

By Contract Type

By Tenure Band

By Monthly Charges

Statistical Validation

Why This Approach

Model Explainability (SHAP)

Global Feature Importance

Individual Customer Explanation

Model Evaluation

ROC Curve

Precision–Recall Curve

Calibration Curve

Application

Architecture

Streamlit Dashboard

FastAPI Inference API

Stack

Project Structure

Run Locally

Docker

What Makes This Different

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
api		api
app_streamlit		app_streamlit
configs		configs
data		data
models		models
reports		reports
sql_analysis		sql_analysis
src		src
tests		tests
.dockerignore		.dockerignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Telco Customer Churn Prediction

Business Dashboard

Key Business Findings

Model Comparison

SQL Analysis

By Contract Type

By Tenure Band

By Monthly Charges

Statistical Validation

Why This Approach

Model Explainability (SHAP)

Global Feature Importance

Individual Customer Explanation

Model Evaluation

ROC Curve

Precision–Recall Curve

Calibration Curve

Application

Architecture

Streamlit Dashboard

FastAPI Inference API

Stack

Project Structure

Run Locally

Docker

What Makes This Different

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages