This project aims to predict whether a credit card customer will churn (i.e., close their account). By identifying customers who are likely to churn, the bank can take proactive steps to retain them. This is a binary classification problem, and a neural network model is built using TensorFlow and Keras to make the predictions.
The dataset used for this project is the "Churn Modelling" dataset from Kaggle. It contains information about bank customers who have held a credit card.
You can find the dataset here: Credit Card Customer Churn Prediction Dataset
The dataset has 10,000 rows and the following 14 columns:
β’ RowNumber: Row number
β’ CustomerId: Unique identifier for each customer
β’ Surname: Customer's surname
β’CreditScore: Customer's credit score
β’ Geography: Customer's country (France, Spain, Germany)
β’ Gender: Customer's gender
β’ Age: Customer's age
β’ Tenure: Number of years the customer has been with the bank
β’ Balance: Customer's account balance
β’ NumOfProducts: Number of bank products the customer uses
β’ HasCrCard: Whether the customer has a credit card (1 = yes, 0 = no)
β’ IsActiveMember: Whether the customer is an active member (1 = yes, 0 = no)
β’ EstimatedSalary: Customer's estimated salary
β’ Exited: Whether the customer has churned (1 = yes, 0 = no) - This is the target variable.
The project follows these steps:
-
Data Loading and Exploration: The data is loaded using pandas, and initial exploration is done to understand the data's structure and features.
-
Data Preprocessing:
β’ Unnecessary columns (RowNumber, CustomerId, Surname) are dropped.
β’ Categorical features (Geography, Gender) are converted into numerical format using one-hot encoding.
-
Train-Test Split: The data is split into training (80%) and testing (20%) sets.
-
Feature Scaling: The numerical features are scaled using StandardScaler to ensure that all features contribute equally to the model's training.
-
Model Building: A sequential neural network is built using TensorFlow/Keras with the following architecture:
β’ An input layer with 11 neurons and a ReLU activation function.
β’ A hidden layer with 11 neurons and a ReLU activation function.
β’ An output layer with 1 neuron and a sigmoid activation function for binary classification.
- Model Compilation and Training: The model is compiled with the Adam optimizer and binary cross-entropy loss function. It is then trained for 100 epochs.
The model's performance is evaluated based on its accuracy. The final accuracy on the test set is approximately 85%.
Training and Validation Loss Training and Validation Accuracy
To run this project, you will need to have a Python environment with the following libraries installed:
β’ pandas
β’ numpy
β’ scikit-learn
β’ tensorflow
β’ matplotlib
You can then open the Jupyter Notebook file (credit-card-customer-churn-prediction.ipynb) and run the cells sequentially.