A machine learning–powered web application that detects fraudulent credit card transactions in bulk using a robust XGBoost model. Built with Flask, Scikit-Learn, and Pandas, this project provides a sophisticated and automated solution for financial risk analysis. This project demonstrates the power of self-healing systems and advanced machine learning in cybersecurity applications.
FraudGuard Batch Analyzer enables users to upload CSV files containing anonymized transaction data. It uses a trained XGBoost Classifier optimized via tailored research to predict if a transaction is Fraudulent or Valid. The system includes:
- Interactive Dashboard: Visualizes risk statistics and high-risk transactions.
- Batch Processing: Handles large datasets instantly.
- Self-Healing: Automatically regenerates model artifacts if they are missing.
(Screenshots coming soon)
- Source: Credit Card Fraud Detection Dataset 2023 – Kaggle
- Classes:
0 → Valid Transaction1 → Fraudulent Transaction
The dataset is artificially balanced. That is why, the number of 0 and 1 classes are equal.
- Features:
V1-V28: Anonymized features.Amount: Transaction amount.
| Step | Description |
|---|---|
| Imputation | Missing values handled using median strategy |
| Scaling | Standardized with StandardScaler |
| Dimensionality Reduction | PCA (n_components=24) |
| Classifier | XGBClassifier (n_estimators=600, max_depth=10, learning_rate=0.1) |
Final model artifacts are serialized with joblib as:
models/
├── pipe.pkl
├── feat_names.pkl
We rigorously tested multiple algorithms including Random Forest, SVC, and XGBoost to find the optimal architecture. Using RandomizedSearchCV, we identified that XGBoost with PCA feature extraction yielded the best balance of speed and accuracy.
Here is the Classification Report for the final model:
| precision | recall | f1-score | support | |
|---|---|---|---|---|
| 0 | 1.00 | 1.00 | 1.00 | 5000 |
| 1 | 1.00 | 1.00 | 1.00 | 5000 |
| accuracy | 1.00 | 1.00 | 1.00 | 1.00 |
| macro avg | 1.00 | 1.00 | 1.00 | 10000 |
| weighted avg | 1.00 | 1.00 | 1.00 | 10000 |
You can find the detailed research code in the research.py file included in the repo. However, for the best viewing experience, use the HTML copy of the notebook which is available in the research.html file.
The model achieves near-perfect performance on the provided dataset. This behavior was investigated using a label-shuffling diagnostic test, which reduced performance to random chance (~50%), confirming the absence of data leakage.
The dataset is already anonymized, balanced, and pre-processed (PCA-transformed), which significantly simplifies the classification task. As such, these results should be viewed as a demonstration of modeling correctness rather than real-world deployability.
FRAUD DETECTION/
├── Dataset/
│ └── creditcard_2023.csv # Primary dataset
├── models/
│ ├── feat_names.pkl # Serialized feature names
│ └── pipe.pkl # Serialized machine learning pipeline
├── processed/ # Directory for analyzed output files (Ignored)
├── static/ # Static assets for the web application
├── templates/
│ ├── index.html # Upload page
│ └── dashboard.html # Results dashboard
├── uploads/ # Temporary storage for user uploads (Ignored)
|
├── .gitignore # Files to exclude from version control
├── app.py # Main Flask application file
├── fit.py # Script for training and saving the model
├── LICENSE # Licensing information
├── research.py # Marimo notebook for model research
└── requirements.txt # Python package dependencies
git clone https://github.com/ByteBard58/Fruad-Detection
cd "Fruad Detection"pip install -r requirements.txtCreate a .env file in the root directory:
DATA_PATH="Dataset/creditcard_2023.csv"python app.pyTo explore the research process interactively:
marimo edit research.pyThis command will open the notebook in your default browser.
Coming Soon
Users upload a CSV file containing transaction data. The system:
- Validates the columns.
- Processes the file using the pre-trained pipeline.
- Generates a Risk Dashboard with key insights.
- Allows downloading of the processed file with risk probability scores appended.
Note: The sophisticated UI/UX design was implemented with assistance from modern AI coding tools to ensure a premium user experience.
- Languages: Python, HTML, CSS, JavaScript
- Libraries: Flask, Scikit-Learn, Pandas, NumPy, XGBoost, Joblib, Marimo
- Dataset Source: Kaggle Credit Card Fraud Detection
Sakib ( ByteBard58 )
Student | Aspiring Computer Engineer | AI & ML Enthusiast
📍 GitHub Profile: ByteBard58
I appreciate you taking the time to look over my work. I hope you found it interesting and enjoyable. If you could star it on GitHub, it would be really appreciated. 🌟
Do not hesitate to contact us if you have any queries, recommendations, or topics you would want to talk about. My [GitHub profile page] (http://www.github.com/ByteBard58) has my contact details.
Have a great day !