Skip to content

aengusmartindonaire/blockchain-fraud-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Blockchain Fraud Detection

This is a machine learning pipeline for detecting fraudulent blockchain transactions. It classifies transactions into three risk tiers (low, moderate, and high) using behavioral and transactional features, with built-in handling for severe class imbalance via SMOTE oversampling.

Problem

Online fraud in blockchain ecosystems costs billions annually. This project applies supervised classification to 78,600 blockchain transactions, comparing seven ML algorithms including gradient boosting (XGBoost, LightGBM, CatBoost), neural networks (MLP), and a stacking ensemble to identify the most reliable fraud indicators. Key challenges include severe class imbalance (80.8% low-risk vs 8.3% high-risk) and the need for interpretable risk signals.

Data Leakage Discovery

Exploratory analyses and initial results showed 100% accuracy across KNN, Logistic Regression, and Random Forest. A systematic investigation revealed two features that leak the target label:

  • risk_score — A pre-computed risk metric with completely non-overlapping ranges per class (low_risk: 15–59, moderate_risk: 62–84, high_risk: 90–100). This single feature achieves 100% accuracy alone.
  • transaction_type — Contains values "scam" and "phishing" that map exclusively to high_risk, and "purchase" / "transfer" that map exclusively to low_risk.

Both features were removed from the pipeline. The investigation also confirmed no issues with the train/test split (stratified, zero index overlap), no meaningful duplicate rows, and the label-shuffle sanity check ruled out pipeline bugs.

Key Results (After Leakage Fix)

With leaking features removed, models produce realistic scores on the held-out test set (15% of data, stratified):

Model Test Accuracy F1 (macro) ROC-AUC (macro) CV Accuracy (5-fold)
CatBoost 85.3% 0.608 0.911 90.8% +/- 0.061
LightGBM 85.1% 0.601 0.910 92.0% +/- 0.064
XGBoost 82.6% 0.640 0.907 91.1% +/- 0.044
Stacking Ensemble 81.7% 0.646 0.905 N/A (ensemble)
Random Forest 80.3% 0.649 0.904 92.5% +/- 0.017
MLP (Neural Net) 72.8% 0.654 0.903 88.4% +/- 0.004
Logistic Regression 63.1% 0.574 0.886 82.3% +/- 0.001

Top Predictive Features

Permutation importance (accuracy-based, 30 repeats) identifies the features that matter most once the leaking columns are removed. The ranking is consistent across all seven models:

Feature Importance (CatBoost) Role
hour_of_day 0.079 Strongest signal - certain hours carry significantly higher fraud risk, suggesting time-based behavioral patterns
amount 0.055 Larger or unusually sized transactions are more indicative of moderate- and high-risk activity
age_group 0.035 Account maturity matters - newer accounts show different risk profiles than established or veteran ones
session_duration 0.008 Weak individual signal, though it contributes in ensemble context
purchase_pattern 0.004 Minimal standalone impact
login_frequency −0.001 Negligible - permuting this feature does not degrade accuracy
location_region −0.001 Negligible - geographic region alone is not predictive

Key takeaway: After removing the leaking features, no single remaining feature dominates the way risk_score did. The top three features (hour_of_day, amount, age_group) combine behavioral timing, transaction size, and account maturity — a reasonable fraud signal that aligns with domain knowledge. The relatively modest importance scores (all < 0.08) confirm that the models are learning a genuine multi-feature pattern rather than relying on a single shortcut.

Architecture

fraud_data.csv
    │
    ▼
┌──────────┐    ┌──────────────┐    ┌───────────-┐
│  Loader  │───▶│ Preprocessor │───▶│  Splitter  │
│ validate │    │ encode/scale │    │ stratified │
└──────────┘    └──────────────┘    │  + SMOTE   │
                                    └─────┬──────┘
                           ┌──────────────┼──────────────┐
                           ▼              ▼              ▼
                       train set      val set        test set
                           │              │              │
                           ▼              │              │
                    ┌─────────────┐       │              │
                    │ Model Train │◀──────┘              │
                    │ XGB / LGBM /│  hyperparam          │
                    │ CB / RF / … │  selection           │
                    └──────┬──────┘                      │
                           │                             │
                           ▼                             ▼
                    ┌─────────────┐            ┌──────────────┐
                    │  Serialize  │            │   Evaluate   │
                    │  (joblib)   │            │ metrics/plots│
                    └─────────────┘            └──────────────┘

Dataset

78,600 blockchain transactions with 14 features:

Feature Type Description
location_region Categorical Europe, Asia, N. America, S. America, Africa
purchase_pattern Categorical focused, random, high_value
age_group Categorical new, established, veteran
hour_of_day Numerical 0–23
amount Numerical Transaction amount
login_frequency Numerical Login count per session
session_duration Numerical Minutes per session
anomaly Target low_risk (80.8%), moderate_risk (10.9%), high_risk (8.3%)

risk_score and transaction_type are present in the raw data but dropped during preprocessing due to target leakage (see above).

Setup

Requirements: Python 3.9+

git clone https://github.com/aengusmartindonaire/blockchain-fraud-detection.git
cd blockchain-fraud-detection
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

Usage

Train all models

python cli.py train --model all

Train a specific model

python cli.py train --model xgboost

Evaluate a trained model

python cli.py evaluate --model xgboost

Predict on new data

python cli.py predict --model catboost --input new_transactions.csv
python cli.py predict --model catboost --input new_transactions.csv --output predictions.csv

Generate visualizations

python cli.py visualize --type all
python cli.py visualize --type distributions
python cli.py visualize --type evaluation --model xgboost

Run tests

pytest tests/ -v

Project Structure

blockchain-fraud-detection/
├── cli.py                        # CLI entry point (train/evaluate/predict/visualize)
├── config/
│   └── default.yaml              # All hyperparameters and settings
├── data/
│   └── raw/fraud_data.csv        # Source dataset (78,600 records)
├── src/
│   ├── data/
│   │   ├── loader.py             # CSV loading and schema validation
│   │   ├── preprocessing.py      # Encoding, scaling, outlier handling
│   │   └── splitter.py           # Stratified split + SMOTE
│   ├── models/
│   │   ├── base.py               # Abstract model interface
│   │   ├── xgboost_model.py      # XGBoost
│   │   ├── lightgbm_model.py     # LightGBM
│   │   ├── catboost_model.py     # CatBoost
│   │   ├── random_forest.py      # Random Forest
│   │   ├── mlp.py                # Multi-Layer Perceptron
│   │   ├── logistic.py           # Logistic Regression
│   │   ├── stacking.py           # Stacking Ensemble (RF+XGB+LGBM+MLP)
│   │   └── registry.py           # Model lookup by name
│   ├── evaluation/
│   │   ├── metrics.py            # Accuracy, precision, recall, F1, ROC-AUC, cross-val
│   │   └── importance.py         # Permutation feature importance
│   ├── visualization/
│   │   ├── distributions.py      # Class distribution charts
│   │   ├── tuning_curves.py      # Hyperparameter tuning plots
│   │   ├── evaluation_plots.py   # Confusion matrices, ROC curves, model comparison
│   │   └── feature_plots.py      # Feature importance bars, box plots
│   └── pipeline.py               # End-to-end orchestrator
├── tests/                        # 115 tests
├── notebooks/                    # Exploratory analysis
├── outputs/                      # Generated models, figures, reports
├── Makefile                      # Common commands
├── requirements.txt
└── pyproject.toml

Tech Stack

  • Python 3.9+
  • scikit-learn — classifiers, preprocessing, metrics, stacking ensemble
  • XGBoost — gradient boosting (XGBClassifier)
  • LightGBM — gradient boosting (LGBMClassifier)
  • CatBoost — gradient boosting with native categorical support
  • imbalanced-learn — SMOTE oversampling
  • pandas / numpy — data manipulation
  • matplotlib / seaborn — visualization
  • PyYAML — configuration
  • joblib — model serialization
  • Click — CLI framework
  • pytest — testing

License

MIT

About

ML methods for detecting fraudulent blockchain transactions. Compares seven classifiers (XGBoost, LightGBM, CatBoost, Random Forest, MLP, Logistic Regression, Stacking Ensemble) on 78,600 transactions with SMOTE oversampling, including a data leakage investigation and permutation feature importance analysis.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors