An end-to-end Machine Learning project that predicts whether a telecom customer is likely to churn using customer demographics, account information, and service usage patterns.
- Data Cleaning & Preprocessing
- Exploratory Data Analysis (EDA)
- Feature Engineering
- Machine Learning Pipeline using Scikit-Learn
- Logistic Regression
- Random Forest Classifier
- XGBoost Classifier
- Feature Importance Analysis
- Interactive Streamlit Web Application
- Model Serialization using Joblib
Dataset: Telco Customer Churn Dataset
The dataset contains customer information such as:
- Gender
- Senior Citizen Status
- Partner & Dependents
- Contract Type
- Internet Service
- Monthly Charges
- Total Charges
- Tenure
- Payment Method
- Churn Status
Target Variable:
Churn
0 -> Customer Stays
1 -> Customer Leaves
Key findings from EDA:
| Contract | Churn Rate |
|---|---|
| Month-to-month | 42.7% |
| One Year | 11.3% |
| Two Year | 2.8% |
Month-to-month customers were significantly more likely to churn.
Customers who churned generally had much lower tenure than retained customers.
Customers with higher monthly charges showed a higher tendency to churn.
- SeniorCitizen
- tenure
- MonthlyCharges
- TotalCharges
Processing:
- Missing Value Imputation
- Standard Scaling
- Gender
- Partner
- Dependents
- Internet Service
- Contract
- Payment Method
- Other service-related attributes
Processing:
- Missing Value Imputation
- One-Hot Encoding
Implemented using:
ColumnTransformer
Pipeline- Baseline Model
- Fast and interpretable
- Ensemble Learning
- Bagging-based approach
- Gradient Boosting Framework
- Handles complex feature interactions
| Model | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|
| Logistic Regression | 79.53% | 63.75% | 53.30% | 58.06% |
| Random Forest | 77.68% | 60.71% | 45.45% | 51.99% |
| XGBoost | 79.62% | 64.65% | 51.52% | 57.34% |
Logistic Regression achieved the highest F1 Score and Recall, making it the preferred model for deployment.
Top churn-driving features:
- Contract_Month-to-month
- InternetService_Fiber optic
- TechSupport_No
- Contract_Two year
- OnlineSecurity_No
- Tenure
- MonthlyCharges
These insights aligned closely with findings from the exploratory data analysis.
The project includes a Streamlit-based web interface where users can:
- Enter customer details
- Predict churn probability
- Identify customers at risk of leaving
Run locally:
streamlit run app.pycustomer-churn-prediction/
│
├── app.py
├── requirements.txt
├── README.md
│
├── data/
│ └── Dataset.csv
│
├── models/
│ └── churn_pipeline.pkl
│
├── notebooks/
│ └── churn_analysis.ipynb
│
├── src/
│ └── train.py
│
└── screenshots/
Clone the repository:
git clone https://github.com/your-username/customer-churn-prediction.gitMove into the project directory:
cd customer-churn-predictionInstall dependencies:
pip install -r requirements.txt- Python
- Pandas
- NumPy
- Scikit-Learn
- XGBoost
- Streamlit
- Joblib
- Matplotlib
- Hyperparameter Tuning
- ROC-AUC Analysis
- SHAP Explainability
- Cross Validation
- Docker Deployment
- Cloud Deployment (Render / Streamlit Cloud)
If you found this project useful, consider giving it a ⭐ on GitHub.