Skip to content

Shradd7/Loan-Fraud-Detection

Repository files navigation

Loan Default Prediction

Problem Statement

This project focuses on predicting non-defaulters from a banking dataset using supervised machine learning. The objective is to help financial institutions in identifying low-risk customers, improving lending decisions, and mitigating default risk.

Dataset

The dataset used in this project is titled HACKATHON_TRAINING_DATA.CSV. It contains customer-level information including:

  • Credit limits and loan details
  • Monthly outstanding balances and debits
  • Risk indicators such as CRIFF scores and repayment grades
  • Account behavior trends over 12 months
  • KYC status, digital banking indicators, and more

The key target variable is TARGET, where:

  • 0 indicates a non-defaulter
  • 1 indicates a defaulter

A few key columns include: ACCT_AGE, LIMIT, OUTS, LOAN_TENURE, INSTALAMT, KYC_SCR, CRIFF_33, INCOME_BAND1, CREDIT_HISTORY_LENGTH1, PRODUCT_TYPE, ALL_LON_LIMIT, LATEST_NPA_TENURE, NO_YRS_RG3, and others. Monthly transactional fields are also present, such as ONEMNTHSDR, TWOMNTHOUTSTANGBAL, THREEMNTHAVGMTD, etc.

Python XGBoost LightGBM SHAP

A machine learning system that predicts loan defaults using XGBoost and LightGBM, with SHAP explainability, SMOTE-Tomek resampling, and a prototype UI for credit officers.


📁 Project Structure

📦 Loan-Default-Prediction
 ┣ 📂 backend/            # FastAPI app serving the model
 ┣ 📂 frontend/           # UI for credit officers
 ┣ 📂 assets/             # plots and images
 ┣ 📓 01_Data_Cleaning.ipynb
 ┣ 📓 02_Model_Pipeline.ipynb
 ┣ 📄 requirements.txt
 ┗ 📄 README.md

📊 Results

Model Accuracy F1-Score Precision Recall
XGBoost ~91% ~0.87 ~0.85 ~0.89
LightGBM ~93% ~0.89 ~0.88 ~0.91

🔍 Methodology

1. Data Cleaning & Feature Engineering

  • Converted binary flags (Y/N) to numeric (1/0)
  • Parsed text durations like 2 yrs 3 mon into total months
  • Engineered features: overspend_ratio, max_consec_overspend, outbal_slope, slope_MTD

2. Handling Class Imbalance

  • Used SMOTE-Tomek (oversampling + undersampling) to balance defaulters and non-defaulters

3. Feature Selection

  • Trained a preliminary XGBoost model
  • Used SHAP values to select the top 30 most impactful features

4. Model Training

  • XGBoost with grid search hyperparameter tuning
  • LightGBM with optimized decision threshold based on precision-recall curve

🚀 Getting Started

Option 1 — Run Locally

1. Clone the repo

git clone https://github.com/Shradd7/Loan-Fraud-Detection.git
cd Loan-Fraud-Detection

2. Install dependencies

pip install -r requirements.txt

3. Run the notebooks in order

01_Data_Cleaning.ipynb
02_Model_Pipeline.ipynb

Running 02_Model_Pipeline.ipynb will save model.pkl into the backend/ folder.

4. Start the backend API

cd backend
uvicorn main:app --reload

5. Open the frontend

Open frontend/index.html directly in your browser.

6. Test the API

Visit http://localhost:8000/docs in your browser to see the interactive API documentation auto-generated by FastAPI.


Option 2 — Run with Docker

1. Make sure Docker Desktop is installed

Download from https://www.docker.com/products/docker-desktop

2. Clone the repo

git clone https://github.com/Shradd7/Loan-Fraud-Detection.git
cd Loan-Fraud-Detection

3. Start everything with one command

docker-compose up

4. Access the services

🛠️ Tools & Libraries

Category Tools
Modeling XGBoost, LightGBM, Scikit-learn
Explainability SHAP
Resampling imbalanced-learn (SMOTE-Tomek)
Backend FastAPI, Uvicorn
Frontend HTML, CSS, JavaScript
Data Pandas, NumPy, SciPy

🔮 Future Work

  • Add real-time model monitoring and drift detection
  • Integrate live credit bureau APIs
  • A/B test decision thresholds across different loan products
  • Containerize with Docker for production deployment

👤 Author

Shradd7GitHub

About

ML system to predict loan defaults using XGBoost & LightGBM with SHAP explainability, SMOTE-Tomek resampling, and a prototype UI for credit officers.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors