ChinookAnalytics is an end-to-end analytical data platform built entirely in SQL and PostgreSQL.
It transforms raw transactional music-store data into validated, structured, and decision-ready analytical assets using a layered architecture, strict integrity enforcement, reconciliation-driven validation, and executive-level reporting.
This is structured analytical engineering — not exploratory SQL.
Author: Kevin Mota da Costa
Portfolio: https://costakevinn.github.io
LinkedIn: https://linkedin.com/in/costakevinnn
ChinookAnalytics was designed to demonstrate production-style analytical engineering principles:
- Layered data architecture (stg → core → marts)
- Referential and domain integrity enforcement
- Deterministic pipeline execution
- Revenue reconciliation across aggregation layers
- Executive-ready business metrics
The goal is to build analytics that are reliable, financially consistent, and reproducible.
flowchart TD
A[Raw Source Data<br>SQLite / CSV] --> B[STG Layer<br>Raw Ingestion]
B --> C[CORE Layer<br>Validated Relational Model]
C --> D[MARTS Layer<br>Analytical Models]
D --> E[Executive Report<br>Business Insights]
subgraph Data Governance
C
end
subgraph Business Analytics
D
E
end
Layered execution:
stg → core → profiling → marts → validation
Every aggregation reconciles exactly to source revenue totals.
- Raw ingestion from SQLite / CSV
- Minimal transformation
- Source traceability preserved
- Fully validated relational model
- Referential integrity enforcement
- Domain constraints applied
- Clean transactional structure
- Business-aligned analytical models
- Pre-aggregated executive metrics
- Revenue & customer analytics
- Revenue reconciliation
- Aggregation consistency checks
- Automated financial verification
All marts reconcile 1:1 with core totals.
The platform materializes analytics across:
- Revenue evolution (MoM / YoY growth)
- Geographic revenue concentration
- Customer Lifetime Value (LTV)
- Revenue dependency risk (Top-N concentration)
- Artist & genre revenue distribution
- High-value invoice decomposition
All metrics are pre-computed, validated, and reproducible.
- Total Revenue: 2328.60
- Invoices: 412
- Customers: 59
- Countries: 24
- Top-5 Countries: 58.78% of revenue
- Top-10 Artists: 30.98% of revenue
- Top-10 Customers: 19.38% of revenue
All aggregations reconcile exactly to the core layer.
ChinookAnalytics/
├── data/raw/ # Source SQLite + exported CSV
├── sql/
│ ├── stg/ # Raw ingestion layer
│ ├── core/ # Validated relational model
│ ├── marts/ # Analytical models
│ └── validation/ # Reconciliation scripts
├── scripts/ # Automated pipeline execution
├── docker-compose.yml # PostgreSQL environment
└── docs/ # Data discovery & executive report
Each SQL layer has a single responsibility, mirroring production analytics engineering standards.
PostgreSQL 16
- Relational modeling (3NF)
- Keys & constraints
- ETL / ELT pipelines
- Analytical data marts
- Revenue reconciliation frameworks
- Docker
- Containerized PostgreSQL
- Deterministic execution
- Revenue analytics
- Customer analytics
- Concentration risk metrics
- Executive reporting
Run the full pipeline locally:
docker compose up -d
./scripts/02_run_pipeline.shThe pipeline executes deterministically from ingestion to validated analytical outputs.
This project is part of my Machine Learning & Data Engineering portfolio: 👉 https://costakevinn.github.io
MIT License — see LICENSE for details.