This project uses machine learning to detect fraudulent financial transactions. We clean the data, engineer meaningful features, train multiple models, and identify the best one for real-time deployment.
Fraud is rare (~1%) but costly. Manual detection is slow and outdated. Can we automatically flag fraud before money is lost?
- Python: pandas, scikit-learn, matplotlib
- Models: KNN, Decision Trees, Random Forest, Gradient Boosting, Logistic Regression
- Extras: Isolation Forest (outliers), ANOVA F-test (feature selection), Hyperparameter Tuning
- Best Model: Random Forest / Gradient Boosting
- F1 Score: ~0.85
- KNN underperformed on recall
- Data cleaning is critical
- Imbalanced data skews metrics—use F1, precision, recall
- Ensemble methods outperform simpler models
- Deploy best model in real-time
- Add deep learning for complex fraud
- Enable continuous retraining
- Built with love by Nitika Aggarwal