This project is an end-to-end Machine Learning application designed to predict whether a customer will leave (churn) a telecommunications company. By analyzing customer demographics, services, and billing patterns, the model identifies high-risk customers, enabling the business to take proactive retention measures.
The solution includes a full pipeline: Data Cleaning → EDA → Feature Engineering (SMOTE) → Model Tuning → Deployment (Streamlit Web App).
Acquiring a new customer is 5-25x more expensive than retaining an existing one.
- Goal: Predict
Churn(Yes/No) with high sensitivity (Recall). - Value: By identifying at-risk customers early, the marketing team can target them with specific retention offers (e.g., discounts, contract upgrades), reducing revenue loss.
- Language: Python
- Libraries: Pandas, NumPy, Scikit-Learn, Matplotlib, Seaborn, Imbalanced-learn (SMOTE)
- Deployment: Streamlit (Web Interface)
- Model Serialization: Joblib
- Handled missing values in
TotalChargesby converting string errors to numeric and imputing with the median. - Encoded categorical variables using
One-Hot Encoding(withdrop_first=Trueto reduce multicollinearity).
- The dataset was heavily imbalanced (73% Non-Churn / 27% Churn).
- Solution: Used SMOTE (Synthetic Minority Over-sampling Technique) on the training set only to generate synthetic examples of churners. This prevented the model from being biased toward the majority class.
Compared Logistic Regression (Baseline) vs. Random Forest Classifier.
- Winner: Random Forest
- Metric: Optimized for ROC-AUC and Recall (minimizing False Negatives is critical for retention).
The model identified the top drivers of churn:
- Contract Type: Customers on "Month-to-Month" contracts are highly likely to leave.
- Tenure: New customers (0-12 months) are the most volatile.
- Internet Service: Fiber Optic users showed higher churn rates (likely due to higher costs or competition).
Ensure you have Python installed. Clone this repository and install the dependencies.
git clone [https://github.com/YOUR_USERNAME/churn-prediction.git](https://github.com/YOUR_USERNAME/churn-prediction.git)
cd churn-prediction
pip install -r requirements.txt