Skip to content

vivek8031/Customer-Churn-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

15 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“ˆ Customer Churn Prediction with LightGBM, CatBoost, TabTransformer, and AutoGluon-Tabular in Amazon SageMaker ๐Ÿš€

Open In Studio Lab ย ย ย ย ย ย  Preview in nbviewer ย ย ย ย ย ย  Open in Google Colab

This repository contains a Jupyter notebook ๐Ÿ““ that demonstrates how to use machine learning (ML) ๐Ÿง  for the automated identification of unhappy customers ๐Ÿ˜ž, also known as customer churn prediction. The notebook employs Amazon SageMaker's implementation of LightGBM ๐Ÿ’ก, CatBoost ๐Ÿฑ, TabTransformer ๐Ÿ”„, and AutoGluon-Tabular ๐Ÿ“Š algorithms to train and host a customer churn prediction model using Amazon SageMaker's Automatic Model Tuning (AMT) ๐ŸŽ›๏ธ.

๐Ÿ“Overview

Losing customers is costly for any business. ๐Ÿ’ธ Identifying unhappy customers early gives businesses a chance to offer incentives to stay. However, ML models rarely give perfect predictions. This notebook showcases how to incorporate the relative costs of prediction mistakes when determining the financial outcome of using ML.

The ML models used in this notebook are trained in two scenarios:

  1. Training a tabular model on the customer churn dataset with AMT. ๐Ÿ“Š
  2. Using the trained tabular model to perform inference, i.e., classifying new samples. ๐Ÿงฎ

In the end, the performance of the four models trained with AMT is compared on the same test data. ๐Ÿ“ˆ

๐Ÿ“˜Notebook Contents

The notebook contains the following sections:

  1. Set Up: This part involves setting up the notebook and importing necessary libraries. ๐Ÿ“š

  2. Data Preparation and Visualization: This section deals with preparing the dataset for the machine learning models and visualizing the data for a better understanding of it. ๐Ÿ“‰ The data visualization includes histograms for each numeric feature. The dataset used is a publicly available set containing 5,000 records with 21 attributes describing the profile of a customer of an unknown US mobile operator. The data is processed and visualized to understand its distribution and its relation with the target variable 'Churn?'. This section also includes converting the target attribute to binary and moving it to the first column of the dataset to meet the requirements of SageMaker's built-in tabular algorithms.

  3. Train A LightGBM Model with AMT: This part involves training a LightGBM model with AMT and also includes sub-sections on retrieving training artifacts, setting training parameters, starting the training process, deploying and running inference on the trained model, and evaluating the prediction results. ๐ŸŽฏ

  4. Train A CatBoost model with AMT: Similar to the previous section, this part focuses on the CatBoost model. ๐Ÿฑโ€๐Ÿ’ป

  5. Train A TabTransformer model with AMT: This section focuses on training a TabTransformer model with AMT. ๐ŸŒ€

  6. Train An AutoGluon-Tabular model: This section is about training an AutoGluon-Tabular model. ๐ŸŽ๏ธ

  7. Compare Prediction Results of Four Trained Models on the Same Test Data: In the final section, the performance of the four trained models - LightGBM, CatBoost, TabTransformer, and AutoGluon-Tabular - is compared using the same test data. The comparison includes a plot of the confusion matrix for each model. ๐Ÿ“Š

๐Ÿค–Models Used

This notebook uses the following machine learning models:

๐ŸŒณLightGBM

LightGBM is a gradient boosting framework that uses tree-based learning algorithms. It is designed to be distributed and efficient with the following advantages: faster training speed and higher efficiency, lower memory usage, better accuracy, parallel and GPU learning supported, capable of handling large-scale data.

๐Ÿˆโ€โฌ›CatBoost

CatBoost is an open-source gradient boosting on decision trees library with categorical features support. The CatBoost library can be used to solve both classification and regression tasks. The library provides all the functionality necessary to solve classification, regression and ranking tasks, supports GPU acceleration and handling categorical features.

๐Ÿ“ŠTabTransformer

TabTransformer is a model introduced in the paper "TabTransformer: Tabular Data Modeling Using Contextual Embeddings". It treats each feature as a token and applies a transformer model to capture the interactions between features. It's especially powerful for tabular data with high cardinality categorical features.

๐Ÿš€AutoGluon-Tabular

AutoGluon-Tabular is an automated machine learning model designed for tabular data. It automatically prepares the data, selects the necessary features, and chooses the best model and hyperparameters to solve the problem. AutoGluon-Tabular can save a lot of time and effort with great performance.

๐Ÿ–ผ๏ธImages

image

LightGBM with AMT CatBoost with AMT TabTransformer with AMT AutoGluon-Tabular
Accuracy 0.895556 0.953333 0.960000 0.980000
F1 0.894855 0.953229 0.960526 0.980044
AUC 0.965412 0.991407 0.992040 0.997926

Each column represents a different model, and the rows represent different metrics (Accuracy, F1, and AUC) for each model. ๐Ÿ“

๐ŸงนCleaning Up

To avoid incurring unnecessary charges, remember to delete the endpoint corresponding to the trained model. Instructions on how to do this are provided at the end of the notebook. ๐Ÿงพ

๐Ÿ”งUsage

You can use this notebook to evaluate the performance of LightGBM, CatBoost, TabTransformer, and AutoGluon-Tabular on your own dataset. The notebook was tested in Amazon SageMaker Studio on ml.t3.medium instance with Python 3 (Data Science) kernel. ๐Ÿ

About

๐Ÿ“ˆ Harness the power of LightGBM, CatBoost, TabTransformer & AutoGluon-Tabular in Amazon SageMaker ๐Ÿš€ for customer churn prediction. ๐Ÿ’ก๐Ÿฑ๐Ÿ”„๐Ÿ“Š We employ Automatic Model Tuning (AMT) to train ML models ๐Ÿง , identifying unhappy customers ๐Ÿ˜ž early to prevent business loss. ๐Ÿ’ธ Comparison of models included! ๐Ÿ“

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors