This repository contains the solution developed for a Data Science competition judged by BCG X. The objective was to isolate key drivers of churn and build a predictive model to identify customers at risk of churn.
churn_bcgx/
├── data/ # Raw and processed datasets
├── inference/ # Scripts for model inference
├── main/ # Main execution scripts
├── modelling/ # Model training, evaluation, and prediction code
├── supplementary/ # Additional resources and documentation
├── .gitignore # Git ignore file
└── LICENSE # Project license (Apache 2.0)
- data/: Contains raw and processed datasets used for training and evaluation.
- inference/: Scripts and utilities for running model inference on new data.
- main/: Main scripts to execute the pipeline or key project steps.
- modelling/: Code for model training, validation, and prediction.
- supplementary/: Additional resources, documentation, or supporting materials.
-
Clone the repository:
git clone https://github.com/marcolomele/churn_bcgx.git cd churn_bcgx -
Set up a virtual environment and install dependencies using
requirements.txt. -
Explore the
main/directory for entry-point scripts to reproduce results or run the pipeline.
- The
data/directory contains all datasets used in the project.
- All model development, training, and evaluation scripts are located in the
modelling/directory.
- The
inference/directory contains notebooks on inferring churn drivers via data analysis.
- Additional work using Small Language Model to generate emotional involvement of customers is in the
supplementary/directory.
- Identified multiple churn drivers that align with business intuition. The final model achieved strong performance in predicting customer churn.
- See Churn Modelling Presentation.pdf for an overview of our methods and the results.
- See Churn Modelling Report.pdf for an in-depth explanation of our methods and the results.
This project is licensed under the Apache-2.0 License. See the LICENSE file for details.