A card payments and customer analytics project for the German retail banking market. Covers 250,000 customers, 367K cards, and 1.5M transactions from 2023-2025.
Built end-to-end across SQL, Excel, and Power BI covering the full analyst toolkit a German bank would use.
- Revenue grew 48% over 3 years (EUR 42.3M to EUR 62.6M) with steady +21% YoY growth.
- 23% of customers (Champions tier) drive 63% of revenue..
- Premium credit cards generate 58x more profit per card than Standard debit (EUR 394 vs EUR 6.71).
- 7,400 high-value customers at risk of churning represent EUR 7.7M in revenue.
- 46% of fraud incidents happen in the afternoon (12:00-17:00), driving fraud-team staffing recommendations.
Full writeup in INSIGHTS.md.
High-level KPIs, monthly revenue trend, and segment-level revenue concentration.
Customer distribution, fraud patterns by hour, and a churn-risk operational list.
Eight KPI cards, yearly summary, and the top 10 churn risks with conditional formatting.
Monthly revenue PivotTable with grouped year/month structure and a clustered column chart.
CardPulse_Analytics/
├── README.md # This file
├── INSIGHTS.md # Business analysis writeup
├── CardPulse BI Analysis.pbix # Power BI dashboards (2 pages, 11 measures)
├── Excel Analysis.xlsx # Excel workbook (5 sheets)
├── CardPulse_01_Data_Generation.ipynb # Python data generation notebook
├── Data/ # 8 reference CSVs (no full transactions,cards and customers files due to Github web limit)
├── Screenshots/ # Dashboard images
└── sql/ # 18 SQL files in 5 folders
├── 01_revenue_overview/
├── 02_customer_segmentation/
├── 03_merchant_analysis/
├── 04_fraud_analysis/
└── 05_lifecycle_retention/
| Layer | Tool | Purpose |
|---|---|---|
| Data generation | Python (Pandas, NumPy, Faker) | Synthetic dataset with German locale |
| Database | PostgreSQL on Supabase (Frankfurt EU) | Cloud SQL backend |
| SQL client | DBeaver | Query development and result export |
| Tabular analysis | Excel (PivotTables, XLOOKUP, Power Query) | Pre-aggregated analysis layer |
| Dashboarding | Power BI Desktop (DAX, data model) | Two interactive dashboards |
Nine relational tables modeled on real banking schemas:
| Table | Rows | Purpose |
|---|---|---|
| customers | 250,000 | KYC and segmentation data |
| accounts | 250,000 | One checking account per customer |
| cards | 367,610 | Card issuance (1-3 cards per customer) |
| transactions | 1,499,985 | Card transaction fact table |
| merchants | 48,960 | German merchant brands by location |
| merchant_categories | 25 | Real ISO 18245 MCC codes |
| branches | 174 | Bank branches across 15 German cities |
| card_products | 10 | Card tier catalog (Standard / Premium / Business) |
| fx_rates | 4,384 | Daily EUR exchange rates |
Date range: 2023-01-01 to 2025-12-31 (36 months) Total rows: ~2.42M MCC codes: Real-world Visa/Mastercard codes (e.g., 5411 = Grocery, 4511 = Airlines) Merchant brands: German market (Edeka, Aldi, Lidl, Lufthansa, Deutsche Telekom, etc.)
The full transactions.csv (163 MB), cards.csv (25.7 MB), and customers.csv (25.7 MB) are excluded from this repo due to GitHub web upload size limits. The 6 smaller reference CSVs are included in /Data/. To regenerate the full dataset, run CardPulse_01_Data_Generation.ipynb in Google Colab.
Five themes covering executive-level questions a Head of Cards would ask:
Theme 1 - Revenue & Volume Overview (3 queries)
- Yearly volume and growth
- Monthly trends with YoY/MoM growth (window functions)
- Revenue by card tier
Theme 2 - Customer Segmentation (4 queries)
- Segment revenue contribution (Pareto analysis)
- Top 10 cities by volume
- Demographics (age x gender x segment)
- Cards-per-customer distribution
Theme 3 - Merchant & Category Analysis (3 queries)
- Top 10 merchants by location AND by brand
- Spending by MCC category group
- Online vs in-store channel split
Theme 4 - Fraud Analysis (3 queries)
- Fraud rate by category
- Fraud by hour of day (staffing insight)
- Customer fraud exposure by city
Theme 5 - Customer Lifecycle & Retention (5 queries)
- RFM segmentation (Recency, Frequency, Monetary)
- Customer activity levels (Active / Casual / Dormant)
- Cohort analysis by signup year
- Card product profitability (fees + interchange - rewards)
- Churn risk operational list
All queries use joins, CTEs, window functions, and conditional aggregations.
| Sheet | Skills Used |
|---|---|
| KPI Dashboard | INDEX formulas, custom number formatting, conditional formatting |
| Monthly Trends | PivotTable + PivotChart with grouped year/month |
| Customer Segments | PivotTable with Slicer (interactive filtering) |
| Merchant Lookup | XLOOKUP with error handling |
| Power Query Demo | Filter and group-by transformation pipeline |
Page 1 - Executive Overview
- 4 KPI cards (Revenue, Transactions, Customers, Cards)
- Monthly Revenue Trend (line chart)
- Revenue by Customer Segment (bar chart)
- YoY Revenue Growth (column chart)
- Top 5 Merchants (table)
Page 2 - Cards & Customer Deep-Dive
- 3 KPI cards (Avg Transaction, Fraud Rate, Churn Risks)
- Customer Distribution by Segment (donut chart)
- Top 10 Churn Risks (table)
- Churn Risk by City (stacked bar chart)
- Fraud Incidents by Hour of Day (stacked column chart)
Eleven DAX measures power the visuals, including ratio calculations with DIVIDE() and handle null denominators safely.
- Clone or download this repo
- Open
CardPulse_01_Data_Generation.ipynbin Google Colab - Mount Google Drive and run all cells (~10 minutes for full dataset)
- Load CSVs into a PostgreSQL database (e.g., Supabase)
- Run SQL files in numeric order from
/sql/ - Open
Excel Analysis.xlsxto explore PivotTables - Open
CardPulse BI Analysis.pbixin Power BI Desktop
Pavan Raj Kotagiri LinkedIn



