This project focuses on predicting non-defaulters from a banking dataset using supervised machine learning. The objective is to help financial institutions in identifying low-risk customers, improving lending decisions, and mitigating default risk.
The dataset used in this project is titled HACKATHON_TRAINING_DATA.CSV. It contains customer-level information including:
- Credit limits and loan details
- Monthly outstanding balances and debits
- Risk indicators such as CRIFF scores and repayment grades
- Account behavior trends over 12 months
- KYC status, digital banking indicators, and more
The key target variable is TARGET, where:
0indicates a non-defaulter1indicates a defaulter
A few key columns include:
ACCT_AGE, LIMIT, OUTS, LOAN_TENURE, INSTALAMT, KYC_SCR, CRIFF_33, INCOME_BAND1, CREDIT_HISTORY_LENGTH1, PRODUCT_TYPE, ALL_LON_LIMIT, LATEST_NPA_TENURE, NO_YRS_RG3, and others. Monthly transactional fields are also present, such as ONEMNTHSDR, TWOMNTHOUTSTANGBAL, THREEMNTHAVGMTD, etc.
A machine learning system that predicts loan defaults using XGBoost and LightGBM, with SHAP explainability, SMOTE-Tomek resampling, and a prototype UI for credit officers.
📦 Loan-Default-Prediction
┣ 📂 backend/ # FastAPI app serving the model
┣ 📂 frontend/ # UI for credit officers
┣ 📂 assets/ # plots and images
┣ 📓 01_Data_Cleaning.ipynb
┣ 📓 02_Model_Pipeline.ipynb
┣ 📄 requirements.txt
┗ 📄 README.md
| Model | Accuracy | F1-Score | Precision | Recall |
|---|---|---|---|---|
| XGBoost | ~91% | ~0.87 | ~0.85 | ~0.89 |
| LightGBM | ~93% | ~0.89 | ~0.88 | ~0.91 |
- Converted binary flags (Y/N) to numeric (1/0)
- Parsed text durations like
2 yrs 3 moninto total months - Engineered features:
overspend_ratio,max_consec_overspend,outbal_slope,slope_MTD
- Used SMOTE-Tomek (oversampling + undersampling) to balance defaulters and non-defaulters
- Trained a preliminary XGBoost model
- Used SHAP values to select the top 30 most impactful features
- XGBoost with grid search hyperparameter tuning
- LightGBM with optimized decision threshold based on precision-recall curve
1. Clone the repo
git clone https://github.com/Shradd7/Loan-Fraud-Detection.git
cd Loan-Fraud-Detection2. Install dependencies
pip install -r requirements.txt3. Run the notebooks in order
01_Data_Cleaning.ipynb
02_Model_Pipeline.ipynb
Running 02_Model_Pipeline.ipynb will save model.pkl into the backend/ folder.
4. Start the backend API
cd backend
uvicorn main:app --reload5. Open the frontend
Open frontend/index.html directly in your browser.
6. Test the API
Visit http://localhost:8000/docs in your browser to see the
interactive API documentation auto-generated by FastAPI.
1. Make sure Docker Desktop is installed
Download from https://www.docker.com/products/docker-desktop
2. Clone the repo
git clone https://github.com/Shradd7/Loan-Fraud-Detection.git
cd Loan-Fraud-Detection3. Start everything with one command
docker-compose up4. Access the services
- Backend API → http://localhost:8000
- API Docs → http://localhost:8000/docs
- Frontend → http://localhost:3000
| Category | Tools |
|---|---|
| Modeling | XGBoost, LightGBM, Scikit-learn |
| Explainability | SHAP |
| Resampling | imbalanced-learn (SMOTE-Tomek) |
| Backend | FastAPI, Uvicorn |
| Frontend | HTML, CSS, JavaScript |
| Data | Pandas, NumPy, SciPy |
- Add real-time model monitoring and drift detection
- Integrate live credit bureau APIs
- A/B test decision thresholds across different loan products
- Containerize with Docker for production deployment
Shradd7 — GitHub