Customer Churn Prediction: A Comprehensive ML and Power BI Project

This repository contains a project focused on predicting customer churn for a telecom company. The project involves a detailed analysis of customer data, building machine learning models, and creating a Power BI dashboard to visualize the insights.

Overview

Customer churn is a significant problem for telecom companies, as retaining customers is often more cost-effective than acquiring new ones. This project aims to predict which customers are likely to churn (i.e., stop using the company's services) using various machine learning techniques and to visualize the key factors leading to churn using a Power BI dashboard.

Dataset

The dataset used for this project is a sample telecom customer dataset, containing information such as customer demographics, account information, and usage details.
It includes features like gender, SeniorCitizen, Partner, Dependents, tenure, PhoneService, MultipleLines, InternetService, OnlineSecurity, TechSupport, Contract, PaymentMethod, MonthlyCharges, TotalCharges, and the target variable Churn.

Exploratory Data Analysis

Data Overview: Loaded the dataset and explored its structure, data types, and basic statistics.
Data Cleaning:
- Identified and handled missing values, particularly in the TotalCharges column.
- Converted TotalCharges to a numeric data type.
- Removed records with missing values.
Feature Engineering:
- Grouped tenure into bins to create a new feature, tenure_group.
- Dropped irrelevant columns (customerID, tenure).
- Converted categorical variables to dummy variables.
Visualizations:
- Plotted distributions of key variables and their relationship with churn.
- Used heatmaps and bar plots to understand correlations and feature importance.

Data Preprocessing

Handling Class Imbalance:
- Applied SMOTEENN (Synthetic Minority Over-sampling Technique and Edited Nearest Neighbors) to address the class imbalance in the target variable (Churn).
Data Splitting:
- Split the data into training and test sets (80% training, 20% testing).

Model Building

Decision Tree Classifier:
- Built an initial model using a Decision Tree Classifier.
- Achieved an accuracy of ~78% on the imbalanced dataset, but with low recall for the minority class (churned customers).
Improving Model Performance:
- Applied SMOTEENN to resample the data and improve model performance.
- Retrained the Decision Tree model, achieving an accuracy of ~93% with improved recall (0.98) and precision (0.91) for the minority class.
Random Forest Classifier:
- Built a Random Forest Classifier on both the original and resampled datasets.
- The resampled model achieved an overall accuracy of ~94%, with high recall (0.96) and precision (0.94) for the minority class.
Dimensionality Reduction:
- Applied PCA (Principal Component Analysis) to reduce dimensionality; however, it did not improve model performance significantly.

Results and Evaluation

The final Random Forest model with SMOTEENN resampling showed the best performance with an accuracy of 94%, and high recall and precision for churn prediction.
The model is saved as model.sav using pickle for deployment.

Power BI Dashboard

A Power BI dashboard was created to visualize the key insights from the data and the model. The dashboard includes:

Customer Demographics: Breakdown by gender, tenure, senior citizen status, etc.
Churn Analysis: Visualizations of churn distribution across different features (e.g., contract type, payment method, online security).
Risk Segmentation: Classification of customers into risk groups (Non-risky, Low-risk, Risky, High-risk).
Churn Reasons: Key reasons for customer churn, such as lack of tech support or online security.
Interactive Features: An "Ask a Question" feature for dynamic exploration of the data.

How to Run

Clone the Repository:

git clone https://github.com/yourusername/telecom-churn-prediction.git
cd telecom-churn-prediction

Install the Required Packages: Make sure you have Python 3.x and install the required packages:
```
pip install -r requirements.txt
```
Run the Flask Application:
```
python app.py
```
The app will run on http://127.0.0.1:5000/, where you can input customer data and get churn predictions.

Conclusion

The project successfully predicted customer churn using a combination of data preprocessing, feature engineering, model building, and visualization. The Random Forest model with SMOTEENN resampling showed the best performance, and the Power BI dashboard provided clear insights into the factors influencing churn.

Files in the Repository

app.py: Flask application for deploying the churn prediction model.
model.sav: Serialized model file for the Random Forest Classifier.
Churn-Prediction-Analysis-Dashboard.pdf: Power BI dashboard showcasing the churn analysis.
requirements.txt: List of required Python packages.
README.md: Project documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Churn Analysis - EDA.ipynb		Churn Analysis - EDA.ipynb
Churn Analysis - Model Building.ipynb		Churn Analysis - Model Building.ipynb
Churn-Prediction-Analysis-Dashboard.pdf		Churn-Prediction-Analysis-Dashboard.pdf
Churn-prediction-Analysis-Dashboard.pbix		Churn-prediction-Analysis-Dashboard.pbix
README.md		README.md
WA_Fn-UseC_-Telco-Customer-Churn.csv		WA_Fn-UseC_-Telco-Customer-Churn.csv
app.py		app.py
first_telc.csv		first_telc.csv
home.html		home.html
model.sav		model.sav
tel_churn.csv		tel_churn.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Churn Prediction: A Comprehensive ML and Power BI Project

Table of Contents

Overview

Dataset

Exploratory Data Analysis

Data Preprocessing

Model Building

Results and Evaluation

Power BI Dashboard

How to Run

Conclusion

Files in the Repository

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Customer Churn Prediction: A Comprehensive ML and Power BI Project

Table of Contents

Overview

Dataset

Exploratory Data Analysis

Data Preprocessing

Model Building

Results and Evaluation

Power BI Dashboard

How to Run

Conclusion

Files in the Repository

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages