A professional data science project focused on classifying customer churn in the telecommunications industry. This project includes exploratory data analysis (EDA), feature engineering, and implementation of multiple machine learning models to predict customer churn using Python.
telco-churn-classification/
├── data/ # Dataset files (original and processed)
├── notebooks/ # Jupyter notebooks for EDA and modeling
├── scripts/ # Python scripts for modular workflows
├── results/ # Output visuals (plots, confusion matrices, etc.)
├── requirements.txt # List of Python dependencies
└── README.md # Project overview and instructionsThe goal of this project is to predict whether a customer will churn based on features such as tenure, contract type, services subscribed, and billing behavior. This helps businesses reduce churn by proactively identifying high-risk customers.
- Distribution analysis of numerical and categorical features
- Correlation heatmaps and feature importance
- Churn rate breakdown by customer attributes
Implemented and evaluated the following models:
- Logistic Regression
- Decision Tree
- Random Forest
- Support Vector Machines (SVM)
- k-Nearest Neighbors (k-NN)
Models are evaluated based on accuracy, precision, recall, F1-score, and ROC-AUC.
Example outputs include confusion matrices and ROC curves.
- Python 3.7+
- Recommended: Use a virtual environment
git clone https://github.com/Anindo21/telco-churn-classification.git
cd telco-churn-classification
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txtjupyter notebookNavigate to notebooks/ and open the desired .ipynb file.
This project uses the Telco Customer Churn dataset available on Kaggle. It includes customer account information, services used, and churn labels.
This project is licensed under the MIT License. See the LICENSE file for more information.
Feel free to ⭐ the repository and contribute!

