Objective: Perform data cleaning and Exploratory Data Analysis (EDA) on a dataset of choice. Dataset: Credit Card Fraud Detection (Kaggle).
I chose this dataset to analyze financial anomalies, which aligns with my interest in Blockchain Security and Fintech.
- High Imbalance: The dataset is heavily skewed, with frauds accounting for only 0.17% of transactions.
- Transaction Patterns: Fraudulent transactions are generally small in amount (mostly < $100) to avoid triggering bank security alarms.
- Correlations: The heatmap reveals that specific anonymized features (V17, V14, V12) have a strong negative correlation with the 'Class', making them key indicators for fraud.
- Python (Pandas, NumPy)
- Seaborn & Matplotlib (For visualizations like Heatmaps and Histograms)
- Jupyter Notebook
Click on the .ipynb file above to view the code and the output graphs directly in GitHub.