Skip to content

rishi2001-bot/Customer_Churn

Repository files navigation

📉 Customer Retention Engine (Telco Churn Prediction)

Python Streamlit Scikit-Learn Status

📌 Project Overview

This project is an end-to-end Machine Learning application designed to predict whether a customer will leave (churn) a telecommunications company. By analyzing customer demographics, services, and billing patterns, the model identifies high-risk customers, enabling the business to take proactive retention measures.

The solution includes a full pipeline: Data Cleaning → EDA → Feature Engineering (SMOTE) → Model Tuning → Deployment (Streamlit Web App).

💼 Business Problem

Acquiring a new customer is 5-25x more expensive than retaining an existing one.

  • Goal: Predict Churn (Yes/No) with high sensitivity (Recall).
  • Value: By identifying at-risk customers early, the marketing team can target them with specific retention offers (e.g., discounts, contract upgrades), reducing revenue loss.

🛠️ Tech Stack

  • Language: Python
  • Libraries: Pandas, NumPy, Scikit-Learn, Matplotlib, Seaborn, Imbalanced-learn (SMOTE)
  • Deployment: Streamlit (Web Interface)
  • Model Serialization: Joblib

📊 Key Implementation Details

1. Data Cleaning & Handling

  • Handled missing values in TotalCharges by converting string errors to numeric and imputing with the median.
  • Encoded categorical variables using One-Hot Encoding (with drop_first=True to reduce multicollinearity).

2. Handling Class Imbalance

  • The dataset was heavily imbalanced (73% Non-Churn / 27% Churn).
  • Solution: Used SMOTE (Synthetic Minority Over-sampling Technique) on the training set only to generate synthetic examples of churners. This prevented the model from being biased toward the majority class.

3. Model Selection

Compared Logistic Regression (Baseline) vs. Random Forest Classifier.

  • Winner: Random Forest
  • Metric: Optimized for ROC-AUC and Recall (minimizing False Negatives is critical for retention).

4. Key Business Insights

The model identified the top drivers of churn:

  1. Contract Type: Customers on "Month-to-Month" contracts are highly likely to leave.
  2. Tenure: New customers (0-12 months) are the most volatile.
  3. Internet Service: Fiber Optic users showed higher churn rates (likely due to higher costs or competition).

🚀 How to Run the Project

Prerequisites

Ensure you have Python installed. Clone this repository and install the dependencies.

git clone [https://github.com/YOUR_USERNAME/churn-prediction.git](https://github.com/YOUR_USERNAME/churn-prediction.git)
cd churn-prediction
pip install -r requirements.txt

About

This project is an end-to-end Machine Learning application designed to predict whether a customer will leave (churn) a telecommunications company. By analyzing customer demographics, services, and billing patterns, the model identifies high-risk customers, enabling the business to take proactive retention measures.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors