Loan Default Prediction

Problem Statement

This project focuses on predicting non-defaulters from a banking dataset using supervised machine learning. The objective is to help financial institutions in identifying low-risk customers, improving lending decisions, and mitigating default risk.

Dataset

The dataset used in this project is titled HACKATHON_TRAINING_DATA.CSV. It contains customer-level information including:

Credit limits and loan details
Monthly outstanding balances and debits
Risk indicators such as CRIFF scores and repayment grades
Account behavior trends over 12 months
KYC status, digital banking indicators, and more

The key target variable is TARGET, where:

0 indicates a non-defaulter
1 indicates a defaulter

A few key columns include: ACCT_AGE, LIMIT, OUTS, LOAN_TENURE, INSTALAMT, KYC_SCR, CRIFF_33, INCOME_BAND1, CREDIT_HISTORY_LENGTH1, PRODUCT_TYPE, ALL_LON_LIMIT, LATEST_NPA_TENURE, NO_YRS_RG3, and others. Monthly transactional fields are also present, such as ONEMNTHSDR, TWOMNTHOUTSTANGBAL, THREEMNTHAVGMTD, etc.

A machine learning system that predicts loan defaults using XGBoost and LightGBM, with SHAP explainability, SMOTE-Tomek resampling, and a prototype UI for credit officers.

📁 Project Structure

📦 Loan-Default-Prediction
 ┣ 📂 backend/            # FastAPI app serving the model
 ┣ 📂 frontend/           # UI for credit officers
 ┣ 📂 assets/             # plots and images
 ┣ 📓 01_Data_Cleaning.ipynb
 ┣ 📓 02_Model_Pipeline.ipynb
 ┣ 📄 requirements.txt
 ┗ 📄 README.md

📊 Results

Model	Accuracy	F1-Score	Precision	Recall
XGBoost	~91%	~0.87	~0.85	~0.89
LightGBM	~93%	~0.89	~0.88	~0.91

🔍 Methodology

1. Data Cleaning & Feature Engineering

Converted binary flags (Y/N) to numeric (1/0)
Parsed text durations like 2 yrs 3 mon into total months
Engineered features: overspend_ratio, max_consec_overspend, outbal_slope, slope_MTD

2. Handling Class Imbalance

Used SMOTE-Tomek (oversampling + undersampling) to balance defaulters and non-defaulters

3. Feature Selection

Trained a preliminary XGBoost model
Used SHAP values to select the top 30 most impactful features

4. Model Training

XGBoost with grid search hyperparameter tuning
LightGBM with optimized decision threshold based on precision-recall curve

🚀 Getting Started

Option 1 — Run Locally

1. Clone the repo

git clone https://github.com/Shradd7/Loan-Fraud-Detection.git
cd Loan-Fraud-Detection

2. Install dependencies

pip install -r requirements.txt

3. Run the notebooks in order

01_Data_Cleaning.ipynb
02_Model_Pipeline.ipynb

Running 02_Model_Pipeline.ipynb will save model.pkl into the backend/ folder.

4. Start the backend API

cd backend
uvicorn main:app --reload

5. Open the frontend

Open frontend/index.html directly in your browser.

6. Test the API

Visit http://localhost:8000/docs in your browser to see the interactive API documentation auto-generated by FastAPI.

Option 2 — Run with Docker

1. Make sure Docker Desktop is installed

Download from https://www.docker.com/products/docker-desktop

2. Clone the repo

git clone https://github.com/Shradd7/Loan-Fraud-Detection.git
cd Loan-Fraud-Detection

3. Start everything with one command

docker-compose up

4. Access the services

Backend API → http://localhost:8000
API Docs → http://localhost:8000/docs
Frontend → http://localhost:3000

🛠️ Tools & Libraries

Category	Tools
Modeling	XGBoost, LightGBM, Scikit-learn
Explainability	SHAP
Resampling	imbalanced-learn (SMOTE-Tomek)
Backend	FastAPI, Uvicorn
Frontend	HTML, CSS, JavaScript
Data	Pandas, NumPy, SciPy

🔮 Future Work

Add real-time model monitoring and drift detection
Integrate live credit bureau APIs
A/B test decision thresholds across different loan products
Containerize with Docker for production deployment

👤 Author

Shradd7 — GitHub

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Loan Default Prediction

Problem Statement

Dataset

📁 Project Structure

📊 Results

🔍 Methodology

1. Data Cleaning & Feature Engineering

2. Handling Class Imbalance

3. Feature Selection

4. Model Training

🚀 Getting Started

Option 1 — Run Locally

Option 2 — Run with Docker

🛠️ Tools & Libraries

🔮 Future Work

👤 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
01_Data_Cleaning.ipynb		01_Data_Cleaning.ipynb
02_Model_Pipeline.ipynb		02_Model_Pipeline.ipynb
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Loan Default Prediction

Problem Statement

Dataset

📁 Project Structure

📊 Results

🔍 Methodology

1. Data Cleaning & Feature Engineering

2. Handling Class Imbalance

3. Feature Selection

4. Model Training

🚀 Getting Started

Option 1 — Run Locally

Option 2 — Run with Docker

🛠️ Tools & Libraries

🔮 Future Work

👤 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages