An implementation of Graph Neural Networks for Bitcoin transaction fraud detection featuring principled uncertainty quantification using Monte Carlo Dropout.
This project implements a GraphSAGE-based fraud detection system on the Elliptic Bitcoin Dataset. It prioritizes reliable uncertainty estimation for high-stakes financial applications, achieving well-calibrated predictions (ECE < 0.05) through epistemic and aleatoric uncertainty decomposition.
- Bayesian Uncertainty Estimation: Monte Carlo Dropout (T=30) for robust prediction intervals.
- Class Imbalance Mitigation: Inverse frequency weighting (7.63x) for the fraud class.
- Temporal Analysis: Detection of distribution drift through time-series uncertainty monitoring.
- Rigorous Evaluation: Comprehensive ablations for dropout rates, hidden dimensions, and feature engineering.
- Model Calibration: Calibration curves and risk-coverage analysis for selective prediction.
GraphSAGE Model:
├── 2 Graph Convolutional Layers (64 hidden dims)
├── Dropout (p=0.5) for uncertainty quantification
├── RobustScaler preprocessing
└── Node degree features (in/out-degree)
The model utilizes Negative Log-Likelihood Loss with class weights, achieving an F1-score of 0.42 and PR-AUC of 0.40 after threshold tuning.
| Metric | Value | Interpretation |
|---|---|---|
| F1 Score | 0.4209 | +8.9% improvement via post-hoc threshold optimization |
| PR-AUC | 0.3979 | Effective handling of 7.6:1 class imbalance |
| ECE | 0.0450 | Well-calibrated confidence estimates |
| Entropy-AUC | 0.1400 | Strong separation of correct/incorrect predictions by uncertainty |
- Dropout Rate: An optimal rate of 0.2 provides the best balance between uncertainty quantification and regularization.
- Hidden Dimensions: 64-dimensional layers provide optimal capacity without overfitting.
- Feature Engineering: Degree features enhance F1-score by 3% and significantly improve uncertainty separation.
# Clone repository
git clone https://github.com/ridash2005/GNN-Based-Fraud-Detection.git
cd GNN-Based-Fraud-Detection
# Install dependencies
pip install torch torch-geometric scikit-learn pandas numpy matplotlib seabornThe Elliptic Bitcoin Dataset consists of:
- 203,769 Bitcoin transactions (nodes)
- 166-dimensional node features
- Temporal graph structure (49 time steps)
- Binary labels: licit (0) vs illicit (1)
GNN-Based-Fraud-Detection/
├── graphge/
│ ├── src/
│ │ ├── load_data.py # Data loading utilities
│ │ ├── models.py # GraphSAGE implementation
│ │ └── uncertainty.py # MC Dropout functions
│ └── results/
│ ├── metrics.csv # Performance logs
│ └── figures/ # Calibration and ablation plots
├── GNN_Fraud_Detection_Pipeline.ipynb # Primary implementation pipeline
├── Uncertainty_Quantification_Study.ipynb # Detailed Bayesian UQ analysis
├── Extended_Experimental_Ablations.ipynb # Comprehensive ablation experiments
├── Detailed_Report.md # Technical experimental report
└── README.md
@software{graphge2025,
author = {Rickarya Das},
title = {GraphGE: Uncertainty-Aware Fraud Detection with GraphSAGE},
year = {2025},
url = {https://github.com/ridash2005/GNN-Based-Fraud-Detection}
}- Hamilton et al. (2017) - "Inductive Representation Learning on Large Graphs"
- Gal & Ghahramani (2016) - "Dropout as a Bayesian Approximation"
- Weber et al. (2019) - "Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional Networks"
MIT License - see LICENSE file for details.
Rickarya Das - GitHub