- 📈 Project Overview
- 🔍 Features
- 🛠️ Technologies Used
- 📁 Project Structure
- 🚀 Installation
- 💡 Usage
- 🗃️ Data Description
- 🤖 Model Training
- 🖥️ Application
- 📜 License
The Customer Churn Analysis project aims to predict whether a customer will leave a telecommunications company (churn) based on various features such as usage patterns, demographics, and service details. By accurately predicting churn, the company can proactively address customer concerns, improve retention strategies, and enhance overall customer satisfaction.
-
View Consolidated Dataset:
- Explore the entire dataset with easy-to-understand metrics and visualizations.
-
Geospatial Insights:
- See where customers are located and how churn patterns look on an interactive map.
-
New Customer Prediction:
- Enter details about new customers to predict if they might leave, with helpful visual feedback.
-
Comprehensive Data Processing:
- A strong process for loading, cleaning, and transforming data to ensure high-quality inputs for modeling.
-
Model Training & Evaluation:
- Use XGBoost, a powerful tool, to build an accurate model that predicts customer churn.
- Programming Languages: Python
- Web Framework: Streamlit
- Data Processing: Pandas, Joblib
- Machine Learning: Scikit-learn, XGBoost
- Visualization: Kepler.gl
-
Clone the Repository
git clone https://github.com/yourusername/CustomerChurnModelgit cd CustumerChurnModel -
Install Dependencies
pip install -r requirements.txt
If
requirements.txtis not present, install the necessary packages manually:pip install streamlit pandas numpy scikit-learn xgboost plotly keplergl streamlit-keplergl joblib
-
Prepare the Data
Ensure that the raw data files are placed in the
./data/raw/directory as follows:services.xlsxdemographics.xlsxlocation.xlsxstatus.xlsx
Note: Replace the placeholder data with your actual datasets.
-
Process the Data
Run the data processing script to merge, clean, and save the processed data.
python scripts/data_processing.py
-
Train the Model
Execute the model training script to build and save the churn prediction model.
python scripts/model_training.py
-
Run the Streamlit Application
Launch the web application to interact with the churn prediction system.
streamlit run app/streamlit_app.py
The app will be accessible at
http://localhost:8501.
- Go to the New Customer Prediction section.
- Input relevant customer details such as tenure, monthly charges, services subscribed, and demographics.
- Click on Predict Churn to receive a probability score and risk assessment.
- Visual indicators and key risk factors will help interpret the prediction.
- Access the Geospatial Insights section to visualize customer locations and churn patterns on an interactive map.
- Understand regional trends and identify hotspots of customer churn.
- Navigate to the View Dataset section.
- Explore key metrics like total customers, average tenure, and monthly charges.
- Utilize the tabs to delve into churn analysis, demographic insights, or view the raw data.
-
services.xlsx
- Columns: Customer ID, Tenure in Months, Phone Service, Internet Service, Streaming, Monthly Charge, Total Charges
-
demographics.xlsx
- Columns: Customer ID, Age, Gender
-
location.xlsx
- Columns: Customer ID, City, State, Zip Code, Latitude, Longitude
-
status.xlsx
- Columns: Customer ID, Churn Value, Churn Category, Churn Reason
- The raw datasets are merged on
Customer IDto form a consolidated dataset. - Non-essential columns are dropped, and data types are appropriately set.
- Missing values are handled, and features are scaled for modeling.
- The final processed data is saved as
merged.parquetin the./data/processed/directory.
- XGBoost Classifier: Used for its performance and ability to handle complex datasets.
-
Load Data:
- We start by loading the cleaned data from a file called
merged.parquet.
- We start by loading the cleaned data from a file called
-
Prepare Data:
- Convert categories (like gender or service type) into numbers so the model can understand them.
- Scale numerical values (like charges) to ensure they are on a similar range.
-
Split Data:
- Divide the data into two parts: one for training the model and one for testing how well it works.
-
Tune Model Settings:
- Adjust settings (how deep the model can go) to find the best version of the model that predicts churn accurately.
-
Evaluate Model:
- Check how well the model performs using various metrics (like accuracy) to see if it’s making good predictions.
-
Save Model:
- Save the best version of the model and its settings so we can use it later without retraining.
-
Located at
scripts/model.py -
Execute using:
python scripts/model_training.py
-
File:
app/streamlit_app.py -
Launch Command:
streamlit run app/streamlit_app.py
-
Dataset:
- Displays key metrics and interactive visualizations.
- Tabs for churn analysis, demographics, and raw data exploration.
-
Geospatial Insights:
- Interactive map showcasing customer locations and churn density.
-
New Customer Prediction:
- Input form for new customer details.
- Predicts churn probability with visual indicators and risk factors.
