Blockchain Fraud Detection

This is a machine learning pipeline for detecting fraudulent blockchain transactions. It classifies transactions into three risk tiers (low, moderate, and high) using behavioral and transactional features, with built-in handling for severe class imbalance via SMOTE oversampling.

Problem

Online fraud in blockchain ecosystems costs billions annually. This project applies supervised classification to 78,600 blockchain transactions, comparing seven ML algorithms including gradient boosting (XGBoost, LightGBM, CatBoost), neural networks (MLP), and a stacking ensemble to identify the most reliable fraud indicators. Key challenges include severe class imbalance (80.8% low-risk vs 8.3% high-risk) and the need for interpretable risk signals.

Data Leakage Discovery

Exploratory analyses and initial results showed 100% accuracy across KNN, Logistic Regression, and Random Forest. A systematic investigation revealed two features that leak the target label:

risk_score — A pre-computed risk metric with completely non-overlapping ranges per class (low_risk: 15–59, moderate_risk: 62–84, high_risk: 90–100). This single feature achieves 100% accuracy alone.
transaction_type — Contains values "scam" and "phishing" that map exclusively to high_risk, and "purchase" / "transfer" that map exclusively to low_risk.

Both features were removed from the pipeline. The investigation also confirmed no issues with the train/test split (stratified, zero index overlap), no meaningful duplicate rows, and the label-shuffle sanity check ruled out pipeline bugs.

Key Results (After Leakage Fix)

With leaking features removed, models produce realistic scores on the held-out test set (15% of data, stratified):

Model	Test Accuracy	F1 (macro)	ROC-AUC (macro)	CV Accuracy (5-fold)
CatBoost	85.3%	0.608	0.911	90.8% +/- 0.061
LightGBM	85.1%	0.601	0.910	92.0% +/- 0.064
XGBoost	82.6%	0.640	0.907	91.1% +/- 0.044
Stacking Ensemble	81.7%	0.646	0.905	N/A (ensemble)
Random Forest	80.3%	0.649	0.904	92.5% +/- 0.017
MLP (Neural Net)	72.8%	0.654	0.903	88.4% +/- 0.004
Logistic Regression	63.1%	0.574	0.886	82.3% +/- 0.001

Top Predictive Features

Permutation importance (accuracy-based, 30 repeats) identifies the features that matter most once the leaking columns are removed. The ranking is consistent across all seven models:

Feature	Importance (CatBoost)	Role
`hour_of_day`	0.079	Strongest signal - certain hours carry significantly higher fraud risk, suggesting time-based behavioral patterns
`amount`	0.055	Larger or unusually sized transactions are more indicative of moderate- and high-risk activity
`age_group`	0.035	Account maturity matters - newer accounts show different risk profiles than established or veteran ones
`session_duration`	0.008	Weak individual signal, though it contributes in ensemble context
`purchase_pattern`	0.004	Minimal standalone impact
`login_frequency`	−0.001	Negligible - permuting this feature does not degrade accuracy
`location_region`	−0.001	Negligible - geographic region alone is not predictive

Key takeaway: After removing the leaking features, no single remaining feature dominates the way risk_score did. The top three features (hour_of_day, amount, age_group) combine behavioral timing, transaction size, and account maturity — a reasonable fraud signal that aligns with domain knowledge. The relatively modest importance scores (all < 0.08) confirm that the models are learning a genuine multi-feature pattern rather than relying on a single shortcut.

Architecture

fraud_data.csv
    │
    ▼
┌──────────┐    ┌──────────────┐    ┌───────────-┐
│  Loader  │───▶│ Preprocessor │───▶│  Splitter  │
│ validate │    │ encode/scale │    │ stratified │
└──────────┘    └──────────────┘    │  + SMOTE   │
                                    └─────┬──────┘
                           ┌──────────────┼──────────────┐
                           ▼              ▼              ▼
                       train set      val set        test set
                           │              │              │
                           ▼              │              │
                    ┌─────────────┐       │              │
                    │ Model Train │◀──────┘              │
                    │ XGB / LGBM /│  hyperparam          │
                    │ CB / RF / … │  selection           │
                    └──────┬──────┘                      │
                           │                             │
                           ▼                             ▼
                    ┌─────────────┐            ┌──────────────┐
                    │  Serialize  │            │   Evaluate   │
                    │  (joblib)   │            │ metrics/plots│
                    └─────────────┘            └──────────────┘

Dataset

78,600 blockchain transactions with 14 features:

Feature	Type	Description
`location_region`	Categorical	Europe, Asia, N. America, S. America, Africa
`purchase_pattern`	Categorical	focused, random, high_value
`age_group`	Categorical	new, established, veteran
`hour_of_day`	Numerical	0–23
`amount`	Numerical	Transaction amount
`login_frequency`	Numerical	Login count per session
`session_duration`	Numerical	Minutes per session
`anomaly`	Target	low_risk (80.8%), moderate_risk (10.9%), high_risk (8.3%)

risk_score and transaction_type are present in the raw data but dropped during preprocessing due to target leakage (see above).

Setup

Requirements: Python 3.9+

git clone https://github.com/aengusmartindonaire/blockchain-fraud-detection.git
cd blockchain-fraud-detection
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

Usage

Train all models

python cli.py train --model all

Train a specific model

python cli.py train --model xgboost

Evaluate a trained model

python cli.py evaluate --model xgboost

Predict on new data

python cli.py predict --model catboost --input new_transactions.csv
python cli.py predict --model catboost --input new_transactions.csv --output predictions.csv

Generate visualizations

python cli.py visualize --type all
python cli.py visualize --type distributions
python cli.py visualize --type evaluation --model xgboost

Run tests

pytest tests/ -v

Project Structure

blockchain-fraud-detection/
├── cli.py                        # CLI entry point (train/evaluate/predict/visualize)
├── config/
│   └── default.yaml              # All hyperparameters and settings
├── data/
│   └── raw/fraud_data.csv        # Source dataset (78,600 records)
├── src/
│   ├── data/
│   │   ├── loader.py             # CSV loading and schema validation
│   │   ├── preprocessing.py      # Encoding, scaling, outlier handling
│   │   └── splitter.py           # Stratified split + SMOTE
│   ├── models/
│   │   ├── base.py               # Abstract model interface
│   │   ├── xgboost_model.py      # XGBoost
│   │   ├── lightgbm_model.py     # LightGBM
│   │   ├── catboost_model.py     # CatBoost
│   │   ├── random_forest.py      # Random Forest
│   │   ├── mlp.py                # Multi-Layer Perceptron
│   │   ├── logistic.py           # Logistic Regression
│   │   ├── stacking.py           # Stacking Ensemble (RF+XGB+LGBM+MLP)
│   │   └── registry.py           # Model lookup by name
│   ├── evaluation/
│   │   ├── metrics.py            # Accuracy, precision, recall, F1, ROC-AUC, cross-val
│   │   └── importance.py         # Permutation feature importance
│   ├── visualization/
│   │   ├── distributions.py      # Class distribution charts
│   │   ├── tuning_curves.py      # Hyperparameter tuning plots
│   │   ├── evaluation_plots.py   # Confusion matrices, ROC curves, model comparison
│   │   └── feature_plots.py      # Feature importance bars, box plots
│   └── pipeline.py               # End-to-end orchestrator
├── tests/                        # 115 tests
├── notebooks/                    # Exploratory analysis
├── outputs/                      # Generated models, figures, reports
├── Makefile                      # Common commands
├── requirements.txt
└── pyproject.toml

Tech Stack

Python 3.9+
scikit-learn — classifiers, preprocessing, metrics, stacking ensemble
XGBoost — gradient boosting (XGBClassifier)
LightGBM — gradient boosting (LGBMClassifier)
CatBoost — gradient boosting with native categorical support
imbalanced-learn — SMOTE oversampling
pandas / numpy — data manipulation
matplotlib / seaborn — visualization
PyYAML — configuration
joblib — model serialization
Click — CLI framework
pytest — testing

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Blockchain Fraud Detection

Problem

Data Leakage Discovery

Key Results (After Leakage Fix)

Top Predictive Features

Architecture

Dataset

Setup

Usage

Train all models

Train a specific model

Evaluate a trained model

Predict on new data

Generate visualizations

Run tests

Project Structure

Tech Stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
config		config
data/raw		data/raw
notebooks		notebooks
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
cli.py		cli.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Blockchain Fraud Detection

Problem

Data Leakage Discovery

Key Results (After Leakage Fix)

Top Predictive Features

Architecture

Dataset

Setup

Usage

Train all models

Train a specific model

Evaluate a trained model

Predict on new data

Generate visualizations

Run tests

Project Structure

Tech Stack

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages