Skip to content

SaiTejaPortfolioDS/Healthcare-analytics-sql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Healthcare Analytics — SQL & Python

Python SQL Jupyter Domain

Overview

A structured SQL analytics project built on a simulated CMS-style healthcare data warehouse — covering enrollment tracking, claims utilization, provider performance, and program outcomes across four interrelated administrative tables.

The schema and queries mirror real-world workflows used in health plan analytics, ACO reporting, and CMS administrative data environments, making this directly applicable to payer, provider, and health policy analytics roles.


Data Warehouse Schema

enrollment ──────┐
                 ├──► claims ──► providers
programs ────────┘
Table Description
enrollment Member enrollment records — program assignment, start/end dates
claims Medical claims — procedure codes, diagnosis codes, service dates, allowed amounts, provider/member IDs
providers Provider reference — name, specialty, state
programs Health program metadata

Analytical Sections

1. Data Quality Assessment

  • Row counts and null value profiling across enrollment and claims tables
  • Duplicate member enrollment detection by program
  • Missing value rates flagged for downstream imputation decisions

2. Enrollment Analysis

  • Monthly enrollment trends with month-over-month percentage changes by program
  • Member tenure distribution across five buckets (< 1 month → 12+ months)
  • Identifies retention patterns and program growth/attrition signals

3. Claims Utilization

  • Per Member Per Month (PMPM) cost and claims metrics by program — the standard KPI in managed care and health plan reporting
  • Top 10 procedure codes ranked by frequency, total cost, and % of total claims
  • Surfaces high-cost, high-volume service categories driving utilization

4. Provider Performance & Outlier Detection

  • High-volume provider identification using quartile rankings
  • Z-score outlier detection within specialty groups — flags statistically anomalous providers for quality review or audit
  • Specialty-adjusted benchmarking accounts for case mix differences across provider types

5. Program Outcomes

  • Member re-enrollment rates following disenrollment events
  • Days-to-reenrollment distribution — measures program stickiness and gap-in-coverage patterns

Healthcare Domain Concepts

Concept Applied In
PMPM (Per Member Per Month) Claims utilization section
CMS administrative data structure Overall schema design
Procedure code (CPT) frequency analysis Claims utilization section
Provider outlier / anomaly detection Provider performance section
Member tenure & retention Enrollment analysis section
Re-enrollment / gap-in-coverage Program outcomes section
Data suppression & quality flags Data quality section

Tech Stack

Tool Purpose
pandas + SQL (via sqlite3 / pandasql) Query execution and data manipulation
numpy Z-score calculations for provider outlier detection
Jupyter Notebook Interactive SQL + Python environment

Setup & Usage

# 1. Clone
git clone https://github.com/SaiTejaPortfolioDS/Healthcare-analytics-sql.git
cd Healthcare-analytics-sql

# 2. Install dependencies
pip install pandas numpy jupyter pandasql

# 3. Launch notebook
jupyter notebook Healthcare_analytics_sql.ipynb

All data is simulated within the notebook — no external data files required.


Author

Mohan Venkata Pavan Sai Teja Kattiboyina MS Business Analytics & AI — University of Texas at Dallas Healthcare Operations Research Associate (Co-Author) — UTD Optimization & Scheduling Lab

LinkedIn Portfolio

About

SQL analytics on a CMS-style healthcare data warehouse — PMPM metrics, provider outlier detection, enrollment trends, and program outcomes across 5 analytical sections

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors