🕵️ Fraud Detection ML Project

End-to-End Machine Learning System for Financial Fraud Detection
Achieved 94% Detection Accuracy with Random Forest & XGBoost

📌 About the Project

A complete end-to-end Machine Learning pipeline for detecting fraudulent financial transactions. The system handles highly imbalanced datasets, trains and compares multiple ML models, and deploys a real-time prediction interface using Streamlit — all in Python.

This project simulates a real-world fraud detection system used by banks and fintech companies to protect customers from unauthorized transactions.

🎯 Business Problem

Financial fraud costs billions of dollars globally every year. Traditional rule-based systems miss sophisticated fraud patterns. This ML-powered system:

Automatically learns fraud patterns from historical data
Detects fraudulent transactions in real-time
Handles heavily imbalanced datasets (fraud is rare — <1% of transactions)
Provides interpretable results for business decision-making

✨ Features

✅ End-to-end ML Pipeline — Data → Preprocessing → Training → Evaluation → Deployment
✅ Imbalanced Dataset Handling — Techniques for rare fraud class detection
✅ Multiple ML Models — Logistic Regression, Random Forest, XGBoost compared
✅ Saved Pipeline — .pkl model ready for production deployment
✅ Streamlit App — Interactive real-time fraud prediction interface
✅ Comprehensive Evaluation — Accuracy, Precision, Recall, F1, AUC-ROC
✅ Visual Analysis — Confusion Matrix, ROC Curve, Feature Importance

📊 Model Performance

Model	Accuracy	AUC Score
Logistic Regression	Baseline	—
Random Forest	98.7%	—
XGBoost	—	0.99
Overall Detection	94%	—

🏆 Best Model: Random Forest with 98.7% accuracy & XGBoost with 0.99 AUC

🛠️ Tech Stack

Technology	Purpose
Python 3.9+	Core programming language
Pandas & NumPy	Data manipulation & numerical operations
Scikit-learn	ML models, preprocessing & evaluation
XGBoost	Gradient boosting model
Matplotlib & Seaborn	Data visualization
Streamlit	Interactive web app deployment
Pickle (.pkl)	Model serialization & saving
Jupyter Notebook	Analysis & experimentation

📁 Project Structure

fraud-detection-ml-project/
│
├── analyse_model.ipynb              # Main analysis & training notebook
├── fraud_detection.py               # Streamlit app for real-time prediction
├── fraud_detection_pipline.pkl      # Saved ML pipeline (production ready)
├── Streamlit output 1.pdf           # Streamlit app screenshot — normal transaction
├── Streamlit output 2.pdf           # Streamlit app screenshot — fraud detected
├── .gitignore                       # Git ignore file
├── LICENSE                          # MIT License
└── README.md                        # Project documentation

🔬 How It Works

Raw Transaction Data
        ↓
Data Preprocessing
  → Handle missing values
  → Feature engineering
  → Handle class imbalance
        ↓
Model Training & Comparison
  → Logistic Regression (baseline)
  → Random Forest
  → XGBoost
        ↓
Model Evaluation
  → Accuracy, Precision, Recall, F1
  → ROC-AUC Curve
  → Confusion Matrix
        ↓
Save Best Model → fraud_detection_pipline.pkl
        ↓
Streamlit App → Real-time Prediction

🚀 How to Run

Step 1 — Clone Repository

git clone https://github.com/rakesh4407/fraud-detection-ml-project.git
cd fraud-detection-ml-project

Step 2 — Install Dependencies

pip install pandas numpy scikit-learn xgboost matplotlib seaborn streamlit jupyter

Step 3 — Run Analysis Notebook

jupyter notebook analyse_model.ipynb

Step 4 — Launch Streamlit App

streamlit run fraud_detection.py

📸 App Screenshots

Normal Transaction	Fraud Detected
See `Streamlit output 1.pdf`	See `Streamlit output 2.pdf`

💡 Key Learnings

🔄 Handling class imbalance is critical in fraud detection
🌲 Random Forest outperforms Logistic Regression for complex fraud patterns
📈 AUC-ROC is a better metric than accuracy for imbalanced datasets
⚡ XGBoost achieves near-perfect AUC (0.99) for fraud classification
🚀 Streamlit enables rapid ML model deployment without web dev knowledge

👨‍💻 Author

Rakesh G

BCA (H) — Artificial Intelligence & Data Science
K.R. Mangalam University, New Delhi | CGPA: 9.22/10
Dean's Award Recipient | IBM Certified Data Scientist

🏷️ Topics

python machine-learning fraud-detection scikit-learn xgboost random-forest streamlit data-science classification imbalanced-dataset fintech jupyter-notebook

📜 License

This project is licensed under the MIT License — see the LICENSE file for details.

⭐ If you found this useful, please star this repository!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🕵️ Fraud Detection ML Project

📌 About the Project

🎯 Business Problem

✨ Features

📊 Model Performance

🛠️ Tech Stack

📁 Project Structure

🔬 How It Works

🚀 How to Run

Step 1 — Clone Repository

Step 2 — Install Dependencies

Step 3 — Run Analysis Notebook

Step 4 — Launch Streamlit App

📸 App Screenshots

💡 Key Learnings

👨‍💻 Author

🏷️ Topics

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Streamlit output 1.pdf		Streamlit output 1.pdf
Streamlit output 2.pdf		Streamlit output 2.pdf
analyse_model.ipynb		analyse_model.ipynb
fraud_detection.py		fraud_detection.py
fraud_detection_pipline.pkl		fraud_detection_pipline.pkl

Folders and files

Latest commit

History

Repository files navigation

🕵️ Fraud Detection ML Project

📌 About the Project

🎯 Business Problem

✨ Features

📊 Model Performance

🛠️ Tech Stack

📁 Project Structure

🔬 How It Works

🚀 How to Run

Step 1 — Clone Repository

Step 2 — Install Dependencies

Step 3 — Run Analysis Notebook

Step 4 — Launch Streamlit App

📸 App Screenshots

💡 Key Learnings

👨‍💻 Author

🏷️ Topics

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages