This repository contains the code for a credit card fraud detection system using machine learning techniques. The project is designed to help identify fraudulent transactions in real-time, minimizing financial losses and protecting consumers.
We aim to build a model to detect if a transaction is normal or fraudulent using machine learning techniques.
-
Importing Libraries
- Pandas & NumPy: For data manipulation.
- Matplotlib & Seaborn: For visualization.
- Scikit-learn: For model building (Logistic Regression, Decision Tree, Random Forest) and performance metrics.
- Imbalanced-learn (SMOTE): To handle imbalanced datasets.
- Joblib: For model serialization.
-
Loading and Exploring the Dataset
- Data Source: The dataset is available on Kaggle, containing details of transactions labeled as normal or fraudulent.
- Data Cleaning:
- Duplicate data is removed.
- Null values are confirmed as absent.
-
Data Visualization
- Histograms and box plots are used to understand the distribution of transaction amounts.
- Count plot displays the distribution between fraudulent and non-fraudulent transactions.
-
Handling Class Imbalance
- The dataset has a class imbalance, with far fewer fraudulent transactions.
- Class Imbalance Handling:
- Undersampling: Reducing the majority class size.
- SMOTE: Synthetic data generation for the minority class.
-
Feature Engineering
- The Amount column is standardized to match the range of other features.
- The Time column is dropped as it's irrelevant to the classification task.
-
Model Selection
- Three models are evaluated:
- Logistic Regression
- Decision Tree
- Random Forest
- Performance metrics include:
- Accuracy: Percentage of correct predictions.
- Precision: How many selected items are relevant.
- Recall: How many relevant items are selected.
- F1-Score: Harmonic mean of precision and recall.
- Confusion Matrix: Evaluates true positives, true negatives, false positives, and false negatives.
- Final Model - Random Forest
- Random Forest achieved the best performance with an accuracy of 99.99%.
- The model is saved as CerditCardRandomForestModel.pkl using joblib for later use.
- Deploying the Model
- The trained Random Forest model can be loaded and tested on new data to predict if a transaction is fraudulent.
- Random Forest is chosen as the final model due to its superior accuracy, recall, and precision.
- This solution is highly applicable in real-world settings like fraud detection where minimizing both false positives and negatives is critical.
-
Clone the repository:
git clone https://github.com/MuhammadRuby/Crdit_Card_Fraud_Detection.git -
Install dependencies:
python -m venv venv source venv/bin/activate # For Linux/macOS venv\Scripts\activate.bat # For Windows pip install -r requirements.txt -
download data set from this Link Kaggle Dataset Link
-
edit path of data in code and run
Muhammad Ruby