A structured SQL analytics project built on a simulated CMS-style healthcare data warehouse — covering enrollment tracking, claims utilization, provider performance, and program outcomes across four interrelated administrative tables.
The schema and queries mirror real-world workflows used in health plan analytics, ACO reporting, and CMS administrative data environments, making this directly applicable to payer, provider, and health policy analytics roles.
enrollment ──────┐
├──► claims ──► providers
programs ────────┘
| Table | Description |
|---|---|
enrollment |
Member enrollment records — program assignment, start/end dates |
claims |
Medical claims — procedure codes, diagnosis codes, service dates, allowed amounts, provider/member IDs |
providers |
Provider reference — name, specialty, state |
programs |
Health program metadata |
- Row counts and null value profiling across enrollment and claims tables
- Duplicate member enrollment detection by program
- Missing value rates flagged for downstream imputation decisions
- Monthly enrollment trends with month-over-month percentage changes by program
- Member tenure distribution across five buckets (< 1 month → 12+ months)
- Identifies retention patterns and program growth/attrition signals
- Per Member Per Month (PMPM) cost and claims metrics by program — the standard KPI in managed care and health plan reporting
- Top 10 procedure codes ranked by frequency, total cost, and % of total claims
- Surfaces high-cost, high-volume service categories driving utilization
- High-volume provider identification using quartile rankings
- Z-score outlier detection within specialty groups — flags statistically anomalous providers for quality review or audit
- Specialty-adjusted benchmarking accounts for case mix differences across provider types
- Member re-enrollment rates following disenrollment events
- Days-to-reenrollment distribution — measures program stickiness and gap-in-coverage patterns
| Concept | Applied In |
|---|---|
| PMPM (Per Member Per Month) | Claims utilization section |
| CMS administrative data structure | Overall schema design |
| Procedure code (CPT) frequency analysis | Claims utilization section |
| Provider outlier / anomaly detection | Provider performance section |
| Member tenure & retention | Enrollment analysis section |
| Re-enrollment / gap-in-coverage | Program outcomes section |
| Data suppression & quality flags | Data quality section |
| Tool | Purpose |
|---|---|
pandas + SQL (via sqlite3 / pandasql) |
Query execution and data manipulation |
numpy |
Z-score calculations for provider outlier detection |
Jupyter Notebook |
Interactive SQL + Python environment |
# 1. Clone
git clone https://github.com/SaiTejaPortfolioDS/Healthcare-analytics-sql.git
cd Healthcare-analytics-sql
# 2. Install dependencies
pip install pandas numpy jupyter pandasql
# 3. Launch notebook
jupyter notebook Healthcare_analytics_sql.ipynbAll data is simulated within the notebook — no external data files required.
Mohan Venkata Pavan Sai Teja Kattiboyina MS Business Analytics & AI — University of Texas at Dallas Healthcare Operations Research Associate (Co-Author) — UTD Optimization & Scheduling Lab