Skip to content

Eshitasri/LoanDefaultClassification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

🏦 Loan Default Prediction

📌 Overview

This project focuses on predicting whether a customer will default on a loan using machine learning classification models. The dataset is simple and structured, allowing focus on model evaluation, class imbalance, and metric interpretation.


📊 Dataset

  • Source: Kaggle

  • Rows: 10,000

  • Features:

    • Employed (0/1)
    • Bank Balance
    • Annual Salary
    • Defaulted? (Target)

⚙️ Preprocessing

  • Minimal preprocessing required due to clean dataset
  • Train-test split applied
  • Focus on handling class imbalance
  • No missing values

🤖 Models Used

  • Logistic Regression (Standard)
  • Logistic Regression (Class Balanced)
  • Random Forest
  • XGBoost

📈 Evaluation Metrics

  • Confusion Matrix
  • Precision
  • Recall
  • F1-Score

Special focus was given to recall, as detecting defaulters is more critical than minimizing false alarms.


📊 Results Summary

Model Precision Recall F1 Score
Logistic (Standard) 0.63 0.28 0.38
Logistic (Balanced) 0.19 0.87 0.31
Random Forest 0.22 0.75 0.34
XGBoost 0.23 0.81 0.36

🔍 Key Insights

  • The dataset is highly imbalanced, making accuracy misleading
  • Logistic Regression (standard) misses most defaulters
  • Balanced Logistic Regression improves recall significantly but increases false positives
  • Random Forest improves recall while maintaining reasonable precision
  • XGBoost provides the best overall balance between precision and recall

⚖️ Trade-off Analysis

  • Higher recall → better detection of defaulters
  • Higher precision → fewer false alarms

The choice of model depends on business priorities.


📈 Visualization

  • Confusion Matrix comparison
  • Precision-Recall trade-off

🚀 Conclusion

Tree-based models (Random Forest, XGBoost) outperform linear models by capturing non-linear relationships. XGBoost achieved the best balance between detecting defaulters and controlling false positives.


🔮 Future Work

  • Hyperparameter tuning
  • Threshold optimization
  • ROC / PR curve analysis
  • Deployment using Streamlit

📜 License

This project is licensed under the GNU General Public License.

About

Loan Default Prediction using Machine Learning with focus on class imbalance, precision-recall trade-offs, and model comparison (Logistic Regression, Random Forest, XGBoost).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors