Feature_Engineering_Telco_Customer_Churn

Customer Churn Prediction

This project aims to predict customer churn in a telecom company using machine learning. Churn refers to the loss of customers over time, and predicting it allows companies to take proactive steps to retain valuable customers.

Project Overview

In this project, we explore two approaches to predicting customer churn:

Base Model: A model built using only the raw data with minimal preprocessing.
Feature Engineering Model: A model where we apply advanced feature engineering techniques to enhance the predictive power of the model. The primary goal is to compare the performance of these two models and highlight the importance of feature engineering in improving predictive accuracy.

1. Project Structure

  /customer-churn-prediction
├── data
│   ├── customer_data.csv      # Raw dataset containing customer information
│   └── churn_data_processed.csv  # Processed data after feature engineering
├── notebooks
│   ├── churn_model_base.ipynb    # Jupyter notebook for base model
│   └── churn_model_feature_eng.ipynb  # Jupyter notebook for model with feature engineering
├── scripts
│   ├── preprocessing.py         # Script for data preprocessing (feature engineering)
│   └── model_training.py        # Script to train the model
├── README.md                   # Project documentation
└── requirements.txt            # List of Python dependencies

2. Problem Definition

The task is to predict whether a customer will churn or not based on various attributes such as customer tenure, payment method, service usage, and demographic information.

Target variable:

Churn: Binary variable indicating whether a customer has churned (1) or not (0).

Input features include:

Demographic information: Age, gender, senior citizen status.
Account information: Contract type, tenure, payment method.
Service usage: Whether the customer has subscribed to services like phone, internet, tech support, etc.

3. Approach

A. Base Model (Raw Data)

In the base model, we used the raw dataset with minimal preprocessing steps. The key steps in this approach include:

-Handling missing values. -Encoding categorical variables using One-Hot Encoding.

Splitting the dataset into training and testing sets. The model used is CatBoostClassifier, a gradient boosting algorithm. After training the model, we evaluated its performance using the following metrics:
Accuracy: 0.79
Recall: 0.65
Precision: 0.50
F1 Score: 0.56
AUC: 0.74 The performance was decent, but there was room for improvement in handling feature relationships and data transformation.

B. Feature Engineering Model

In the feature engineering model, we enhanced the dataset by creating new features, handling outliers, and adding domain-specific knowledge. The steps taken in this approach include:

Handling missing data: We imputed missing values with the median or appropriate methods.
Outlier Detection and Capping: We detected and capped outliers using the Interquartile Range (IQR) method.
Feature Extraction: We created several new features:
- NEW_TENURE_YEAR: Categorized customer tenure into bins (e.g., 0-1 year, 1-2 years).
- NEW_Engaged: A flag for customers who have a one- or two-year contract.
- NEW_noProt: A flag indicating customers without protection services.
- NEW_TotalServices: Count of services a customer has subscribed to.
- NEW_AVG_Charges: Average monthly charges per customer.

After performing these transformations, the model was retrained using the same CatBoostClassifier algorithm.

The performance of this model improved:

Accuracy: 0.7847
Recall: 0.6331
Precision: 0.493
F1 Score: 0.5544
AUC: 0.7292

Although the improvement in accuracy was modest, the model now has more refined features that could allow it to capture more nuanced patterns. The recall and precision scores are more balanced.

4. Importance of Feature Engineering

Why Feature Engineering Matters:

Feature engineering plays a crucial role in the performance of machine learning models. It helps uncover hidden relationships in the data and can significantly improve model predictions. In this project, the following feature engineering steps made a noticeable difference:

Categorization of Tenure: The transformation of the continuous tenure variable into categorical bins helps the model understand customer behavior based on the length of their relationship with the company.
Service Usage Features: Features like NEW_TotalServices and NEW_FLAG_ANY_STREAMING provide insights into customer engagement and can help predict churn better.
Outlier Capping: Removing or adjusting extreme values in numeric features ensures that the model is not misled by rare, extreme data points.
Domain Knowledge Features: Adding flags for specific behaviors, such as contract type and protection services, helps the model capture the importance of customer retention strategies.

Comparison of Models:

While the base model gives a decent starting point, the feature engineering model provides more meaningful features that help capture underlying patterns in the data. This improves the model's performance, especially in terms of recall, which is crucial for predicting churn.

5. Model Evaluation

The following evaluation metrics were used to assess the models:

Metric	Base Model	Feature Engineering Model
Accuracy	0.7847	0.79
Recall	0.6331	0.65
Precision	0.493	0.50
F1 Score	0.5544	0.56
AUC	0.7292	0.74

6. Conclusion

Feature engineering has a clear impact on the predictive power of the model, as seen in the improved performance metrics, even if the improvements are modest.
While both models provide valuable insights, the feature engineering model helps to uncover more complex relationships in the data, offering a more refined approach to churn prediction.
Future Work: We could explore hyperparameter tuning, model selection, or incorporating more advanced techniques like ensemble methods for further improvements.

Kaggle

For details about the project and access to the dataset, visit: Feature_Engineering_Telco_Customer_Churn

Contact

If you have any questions or suggestions, feel free to reach out. Your feedback on the project is highly appreciated.

Email: baydemirhatice@hotmail.com

Linkedln: https://www.linkedin.com/in/haticebaydemir/

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
feature-engineering-telco-customer-churn.ipynb		feature-engineering-telco-customer-churn.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Feature_Engineering_Telco_Customer_Churn

Customer Churn Prediction

Project Overview

1. Project Structure

2. Problem Definition

3. Approach

A. Base Model (Raw Data)

B. Feature Engineering Model

4. Importance of Feature Engineering

Why Feature Engineering Matters:

Comparison of Models:

5. Model Evaluation

6. Conclusion

Kaggle

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Feature_Engineering_Telco_Customer_Churn

Customer Churn Prediction

Project Overview

1. Project Structure

2. Problem Definition

3. Approach

A. Base Model (Raw Data)

B. Feature Engineering Model

4. Importance of Feature Engineering

Why Feature Engineering Matters:

Comparison of Models:

5. Model Evaluation

6. Conclusion

Kaggle

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages