Credit Card Fraud Detection — Advanced SQL Project

Author: Duncan Chicho (NC-Dan)
Tool: Microsoft SQL Server (T-SQL)
Dataset: Credit Card Fraud Detection — ULB Machine Learning Group (Kaggle)
Scale: 283,726 clean transactions | 473 confirmed fraud cases
Complexity: Advanced — Views, Stored Procedures, Indexes, Anomaly Detection

Business Problem

Credit card fraud costs the global financial industry billions annually. This project analyses 283,726 real European credit card transactions to:

Detect — identify fraud patterns by amount, time and behaviour
Profile — build a statistical profile of fraudulent transactions
Monitor — create reusable Views and Stored Procedures for ongoing fraud surveillance

Dataset Context

Source: Real anonymised transactions from European cardholders (2 days)
Features: V1-V28 are PCA-transformed to protect cardholder privacy
Challenge: Only 0.1667% of transactions are fraudulent — finding 473 cases among 283,253 legitimate transactions is the core analytical challenge
Industry relevance: This imbalance mirrors real production fraud data

Database Architecture

Data Quality Audit & Cleaning

Issue	Detail	Action Taken
NULLs in V2-V27	21 per column — import precision loss	Retained — imputation would corrupt anonymised PCA features
NULLs in V28	237 values	Retained — same reason
Duplicate rows	1,081 identified	Deleted — ROW_NUMBER() partition method
Fraud count corrected	492 → 473 after deduplication	19 duplicate fraud rows removed
Fraud rate corrected	0.173% → 0.1667%	Recalculated post-cleaning

Cleaning method: ROW_NUMBER() OVER (PARTITION BY Time, Amount, Class, V1, V2, V3 ORDER BY TransactionID) — kept lowest TransactionID per duplicate group.

Key Findings

Finding 1 — The Needle in the Haystack

473 fraud cases hidden among 283,253 legitimate transactions
Fraud rate: 0.1667% — 1 in every 600 transactions
A model predicting everything as legitimate would be 99.83% accurate — and completely useless
This is why statistical anomaly detection is essential

Finding 2 — Fraudsters Spend More, But Not Too Much

Fraud avg amount: $123.87 vs Legitimate avg: $88.41 — 40% higher
Fraud MaxAmount: $2,125.87 vs Legitimate MaxAmount: $25,691.16
Fraudsters deliberately avoid extremely large transactions — they stay below detection thresholds
This behaviour is known as structuring in financial crime analysis

Finding 3 — The Test Transaction Pattern

73.6% of all fraud (348 cases) involves amounts under $100
Micro transactions (<$10): second highest fraud rate at 0.2457%
Small transactions ($10-$100): lowest fraud rate at 0.0849%
Pattern: criminals use tiny amounts to test stolen cards before escalating

Finding 4 — Large Transactions Are Highest Risk

Large transactions (>$1,000): 0.2937% fraud rate — highest of all bands
Only 3,064 large transactions exist — but 9 are fraudulent
Combined with micro test transactions — fraud follows a clear two-stage pattern

Finding 5 — Fraud Peaks at 2am

Hour 2am: 1.4550% fraud rate — nearly 10x the dataset average
Hour 4am: 1.0459% — second highest
Top 3 fraud hours are all between midnight and 5am
Fraudsters operate when human monitoring is at its lowest

Finding 6 — Midnight Fraud Is Bolder

Hour 0 (midnight): only 6 fraud cases but avg amount $303.35 — highest night avg
Hour 2am: 48 cases but avg only $78.19 — high volume, lower amounts
Two distinct night fraud profiles: bold midnight operators vs high-volume 2am attackers

Finding 7 — Fraud Hides in Normal Transactions

93.9% of fraud (444 of 473 cases) occurs in Normal transactions (Z-Score ≤ 2)
Only 29 fraud cases appear in Anomalous (high Z-Score) transactions
Z-Score detection alone would miss 94% of fraud
Multi-factor detection combining amount, time and behavioural signals is required

Finding 8 — Mid-Range Fraud Carries the Most Value

$100-$1,000 fraud: 116 transactions | $38,497.60 total value — 66% of all fraud value
$0-$100 fraud: 348 transactions | $6,856.46 total value — high volume, low value
Total fraud value across all 473 cases: $58,591.39

Finding 9 — The Complete Fraud Profile

A typical fraud transaction is:

Amount between $0-$100 (73.6% probability)
Occurring between midnight and 5am (highest risk window)
Appearing statistically normal (93.9% of cases)
Part of a sequence — micro test followed by mid-range escalation

Finding 10 — Imbalanced Data Requires Specialist Techniques

Standard accuracy metrics are misleading on 0.1667% fraud rate data
Precision, Recall and F1-Score are required for proper fraud model evaluation
SQL anomaly detection (Z-Score) is a first-pass filter — not a complete solution
Recommended next step: Python scikit-learn with SMOTE oversampling for full model

Recommendations

1. Implement real-time night monitoring (midnight–5am) Fraud rate at 2am is 10x the daytime average. Automated alerts for transactions in this window — especially amounts between $50-$500 — would flag the highest risk period with minimal false positives.

2. Flag micro-transaction sequences A card making multiple transactions under $10 within a short window is exhibiting test transaction behaviour. Rule-based flagging of 3+ micro-transactions within 1 hour from the same origin would catch this pattern early.

3. Build a multi-factor fraud score Z-Score alone catches only 6% of fraud. A composite score combining:

Transaction amount relative to card history
Time of day risk weight
Micro-transaction sequence detection
V1, V2, V14 feature values (strongest PCA signals in literature) would dramatically improve detection rate.

4. Prioritise mid-range fraud recovery $100-$1,000 fraud represents 66% of total fraud value at lower transaction volumes. Chargeback and recovery efforts should prioritise this band for maximum financial impact.

Advanced SQL Skills Demonstrated

Views — 3 production-ready Views for fraud monitoring
Stored Procedures — 2 parameterised procedures for analyst self-service
Indexes — Clustered PK + 2 Nonclustered indexes for query performance
Anomaly Detection — Z-Score calculation using CTEs and CROSS JOIN
Data Cleaning — ROW_NUMBER() deduplication, NULL audit and documentation
Window Functions — PERCENTILE_CONT, ROW_NUMBER, SUM OVER
Multi-level CTEs — chained CTEs for statistical calculations
IDENTITY columns — auto-incrementing Primary Key
DROP PROCEDURE IF EXISTS — safe procedure management

Project Files

File	Description
01_setup_and_data_quality.sql	Import verification, NULL audit, deduplication, PK, Indexes
02_views_and_setup.sql	3 Views: vw_FraudTransactions, vw_LegitTransactions, vw_TransactionSummary
03_fraud_analysis.sql	Amount comparison, fraud by band, fraud by hour using Views
04_stored_procedures.sql	sp_FraudByAmountThreshold, sp_FraudByHourRange, Z-Score anomaly detection

Dataset Source

Credit Card Fraud Detection
ULB Machine Learning Group — real anonymised European cardholder transactions

| Other SQL Projects |

🔗 Olist E-Commerce SQL Analysis
🔗 IBM HR Attrition Analysis — SQL

| PowerBI Analyst Projects |
🔗Olist-ecommerce-powerbi-analysis

| Excel Projects |

Connect on LinkedIn 🔗 linkedin.com/in/duncanalyst

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
A1.png		A1.png
A1_setup_and_data_quality.sql		A1_setup_and_data_quality.sql
A2.png		A2.png
A2_views_and_setup.sql		A2_views_and_setup.sql
A3.png		A3.png
A3_fraud_analysis.sql		A3_fraud_analysis.sql
A4.png		A4.png
A4_stored_procedures.sql		A4_stored_procedures.sql
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit Card Fraud Detection — Advanced SQL Project

Business Problem

Dataset Context

Database Architecture

Data Quality Audit & Cleaning

Key Findings

Finding 1 — The Needle in the Haystack

Finding 2 — Fraudsters Spend More, But Not Too Much

Finding 3 — The Test Transaction Pattern

Finding 4 — Large Transactions Are Highest Risk

Finding 5 — Fraud Peaks at 2am

Finding 6 — Midnight Fraud Is Bolder

Finding 7 — Fraud Hides in Normal Transactions

Finding 8 — Mid-Range Fraud Carries the Most Value

Finding 9 — The Complete Fraud Profile

Finding 10 — Imbalanced Data Requires Specialist Techniques

Recommendations

Advanced SQL Skills Demonstrated

Project Files

Dataset Source

| Other SQL Projects |

| PowerBI Analyst Projects |

🔗Olist-ecommerce-powerbi-analysis

| Excel Projects |

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Credit Card Fraud Detection — Advanced SQL Project

Business Problem

Dataset Context

Database Architecture

Data Quality Audit & Cleaning

Key Findings

Finding 1 — The Needle in the Haystack

Finding 2 — Fraudsters Spend More, But Not Too Much

Finding 3 — The Test Transaction Pattern

Finding 4 — Large Transactions Are Highest Risk

Finding 5 — Fraud Peaks at 2am

Finding 6 — Midnight Fraud Is Bolder

Finding 7 — Fraud Hides in Normal Transactions

Finding 8 — Mid-Range Fraud Carries the Most Value

Finding 9 — The Complete Fraud Profile

Finding 10 — Imbalanced Data Requires Specialist Techniques

Recommendations

Advanced SQL Skills Demonstrated

Project Files

Dataset Source

| Other SQL Projects |

| PowerBI Analyst Projects |

🔗Olist-ecommerce-powerbi-analysis

| Excel Projects |

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages