Skip to content

End-to-end credit-risk classification project using tabular loan data, with a reproducible scikit-learn pipeline, Jupyter notebooks and clear evaluation of model performance.

Notifications You must be signed in to change notification settings

abailey81/Credit-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Credit Classification

Supervised learning project to predict whether a loan will default using tabular customer and loan features. The aim is to build a clear, reproducible baseline credit-risk model that can be extended to more advanced approaches.

1. Overview

This repository contains a small end-to-end workflow for loan default prediction:

  • Exploratory data analysis of the loan dataset
  • Data cleaning and feature engineering
  • Training and evaluation of supervised learning models
  • Interpretation of model performance and limitations

The work was originally developed as part of a university assignment, but the structure is organised to resemble a real data science project rather than a single notebook.

2. Data

The project assumes a single CSV file with one row per loan and a binary target variable indicating default vs non-default, together with customer and loan attributes.

The dataset is stored locally and not committed to the repository. A typical local layout is:

  • data/raw/dataset.csv – original dataset
  • data/processed/ – any cleaned or engineered versions

You can adjust the paths inside the notebook or scripts if your filenames differ.

3. Methodology

The modelling workflow follows a standard supervised learning pipeline for credit risk:

  1. Preprocessing and feature engineering

    • Handling missing values
    • Encoding categorical variables
    • Scaling or normalising numerical features where appropriate
  2. Model training

    • Baseline models such as logistic regression
    • Optionally, comparison with tree-based methods (for example random forests or gradient boosting)
  3. Evaluation

    • Train/validation split or cross-validation
    • Metrics including ROC-AUC, accuracy, precision, recall and confusion matrices
    • Qualitative discussion of where the model performs well or poorly

Most of the experimentation currently lives in notebooks/Loan prediction.ipynb. As the project evolves, more logic can be refactored into reusable modules under src/.

4. Repository structure

.
├── notebooks/          # Jupyter notebooks for exploration and modelling
│   └── Loan prediction.ipynb
├── src/                # Python modules (data prep, training, evaluation)
├── requirements.txt    # Python dependencies
├── .gitignore          # Ignore rules (data, caches, IDE files, etc.)
└── README.md           # Project documentation

The data/, report/ and docs/ folders are expected to exist locally but are not tracked by git, so they do not appear in the GitHub view.

5. Getting started

  1. Clone the repository

    git clone git@github.com:abailey81/Credit-Classification.git
    cd Credit-Classification
  2. (Optional) Create and activate a virtual environment

    python -m venv .venv
    source .venv/bin/activate    # macOS / Linux
    # .venv\Scripts\activate     # Windows
  3. Install dependencies

    pip install -r requirements.txt
  4. Add the dataset

    Place your CSV in data/raw/dataset.csv (or update the notebook path accordingly).

  5. Run the analysis

    jupyter notebook notebooks/Loan\ prediction.ipynb

    From there you can reproduce the analysis, adjust features, or try alternative models.

6. Reproducibility and next steps

  • Dependencies are listed in requirements.txt.
  • Data files are kept out of version control to avoid exposing sensitive information.
  • Random seeds can be fixed in the notebook to make results more stable between runs.

Planned improvements include refactoring more code into src/, adding configuration files for experiments, logging model outputs, and extending evaluation to include calibration, scorecards and monitoring.

About

End-to-end credit-risk classification project using tabular loan data, with a reproducible scikit-learn pipeline, Jupyter notebooks and clear evaluation of model performance.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published