This project focuses on detecting fraudulent credit card transactions using machine learning models. Since fraud cases are significantly fewer than legitimate transactions, an under-sampling technique is used to balance the dataset and improve model performance. Understood! Since your notebook uses undersampling instead of SMOTE, here’s the correct README file for your GitHub repository:
- Source: (e.g., Kaggle’s "Credit Card Fraud Detection" dataset)
- Features: 28 principal components (PCA-transformed) +
TimeandAmount - Target:
Class(0 = Legitimate, 1 = Fraudulent) - Imbalance Issue: Fraud cases account for <1% of total transactions
- Undersampling: Reducing the number of majority class samples (legitimate transactions) to match the minority class (fraudulent transactions).
- Why Undersampling? Helps prevent the model from being biased toward non-fraudulent transactions while maintaining a reasonable dataset size for training.
- Python Libraries: Pandas, NumPy, Matplotlib, Seaborn
- Machine Learning: Scikit-learn (Logistic Regression, Decision Trees, Random Forest, etc.)
- Google Colab Features: GPU acceleration, Google Drive integration
- Open the Colab notebook (
Credit_Card_Fraud_Detection.ipynb). - Upload the dataset (if required).
- Run the cells sequentially to preprocess data, apply undersampling, train models, and evaluate performance.
To use locally:
pip install -r requirements.txtThen, open the notebook with:
jupyter notebook Credit_Card_Fraud_Detection.ipynb- Evaluation Metrics: Accuracy