Skip to content

meysamesh/santander-ctp-ml-project

Repository files navigation


Santander Customer Transaction Prediction

Overview

This project aims to predict whether a customer will make a specific transaction using a binary classification model. The work includes data understanding, exploratory analysis, model development, and performance evaluation.

Summary

You can find the project summary in 02_Non-Technical-Summary.md

PRD

All the product requirements are in 03_PRD.md

Requirements

Setup Instructions

1. Create and Activate a Virtual Environment

Before running the notebooks, fork the repository and set up a fresh virtual environment.

2. Install Dependencies

Use the commands below for your operating system.
If installation errors occur (especially on Apple Silicon), removing strict version pins in requirements.txt may help.


Here's a combined README with all necessary information organized clearly:


Environment Setup

Please make sure you have forked the repo and set up a new virtual environment.

Note:

  • If there are errors during environment setup, try removing the versions from the failing packages in the requirements file.
  • In some cases it is necessary to install the graphviz compiler for the transformers library.
  • Make sure to install hdf5 if you haven't done it before.

Prerequisites: Install Graphviz and HDF5

Check if graphviz is already installed by running:

dot -V

If you haven't installed it yet, follow the instructions below for your operating system.

macOS

Update Homebrew and install graphviz and hdf5:

brew update
brew install graphviz
brew install hdf5

Restart your terminal and verify the installation:

dot -V

Windows

Update chocolatey and install graphviz:

choco upgrade chocolatey
choco install graphviz

Press Y for the standard installation.

For hdf5, visit this website to install hdf5.

Restart your terminal and verify the installation:

dot -V

Linux

Ubuntu / Debian

sudo apt-get update
sudo apt-get install -y build-essential cmake libomp-dev graphviz libhdf5-dev

Fedora / RHEL / CentOS

sudo dnf install -y gcc-c++ cmake libomp graphviz hdf5-devel

Python Environment Setup

macOS

For macOS with Intel chips:

pyenv local 3.11.3
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

For macOS with Silicon chips (M1/M2/M3):

pyenv local 3.11.3
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements_silicon.txt

If LightGBM fails to install:

brew install cmake libomp
pip install lightgbm
pip install -r requirements.txt

Windows

PowerShell:

pyenv local 3.11.3
python -m venv .venv
.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
pip install -r requirements.txt

Git Bash:

pyenv local 3.11.3
python -m venv .venv
source .venv/Scripts/activate
python -m pip install --upgrade pip
pip install -r requirements.txt

If LightGBM fails due to CMake or compiler errors:

pip install cmake
pip install lightgbm

Linux

pyenv local 3.11.3
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

If LightGBM fails to install:

pip install lightgbm

Models Evaluated

  • Logistic Regression — baseline linear model.
  • LightGBM — gradient-boosted tree model optimized for speed and efficiency, designed to handle large datasets and high-dimensional features with fast training and strong predictive performance.
  • XGBoost — tree-based boosting model used to benchmark performance and validate robustness across model families.

About

customers' specific transaction prediction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors