Skip to content

scale600/dba-azure-project

Repository files navigation

Hospital Quality Data Pipeline — Azure

A personal project that pulls US hospital quality data from the CMS (Centers for Medicare & Medicaid Services) public API, stores it in Azure SQL Database, and exposes it through a REST API and a web dashboard.

The goal was to connect all the pieces end-to-end: automated ETL, cloud infrastructure as code, a queryable API layer, and BI visualization — using only managed Azure services.

Live: https://www.dba-azure.techcloudup.com

Note: The ETL pipeline and REST API code are fully implemented and runnable locally. Deploying the Azure Function App requires a Pay-As-You-Go subscription (Free Trial does not allocate vCPU quota for compute resources). The database, dashboard, and monitoring infrastructure are live on Azure.


Architecture

CMS Public API (Hospital Info + Quality Metrics)
        │  HTTP GET (no auth, quarterly updates)
        ▼
Azure Function App  (Python 3.11)
  ├── Timer Trigger  — CRON 0 0 0,12 * * *  (ETL, twice daily)
  └── HTTP Trigger   — REST API (5 endpoints)
        │
        ▼
Azure SQL Database  (Serverless, GP_S_Gen5_1)
  ├── dbo.Hospital              — hospital master (~5,400 records)
  ├── dbo.HospitalVisitMetrics  — quality metric snapshots
  └── dbo.ETL_Log               — ETL run audit log
        │
        ├──▶ Azure Static Web Apps  — web dashboard (Chart.js)
        └──▶ Power BI Service       — DirectQuery reports

Azure Key Vault        — connection strings (no secrets in code)
Application Insights   — API performance monitoring + ETL alerts

Tech Stack

Layer Technology
Language Python 3.11
Cloud Azure (Function App, SQL Database, Key Vault, Static Web Apps, Application Insights)
Infrastructure Bicep (IaC)
Data sources CMS Hospital General Info API, CMS Unplanned Hospital Visits API
Web dashboard Chart.js, HTML/CSS (build-time DB snapshot)
BI Power BI Desktop + Service (DirectQuery)
Dependencies azure-functions, azure-identity, azure-keyvault-secrets, pyodbc, requests

Project Structure

.
├── function_app.py          # Azure Functions v2 — HTTP Trigger (REST API)
├── requirements.txt
├── .env.example             # local env var template (no real values)
│
├── etl/
│   ├── etl_runner.py        # ETL entry point (Timer Trigger → cms_client → db_client)
│   ├── cms_client.py        # CMS API fetch + field mapping
│   └── db_client.py         # MERGE upsert + ETL_Log writes
│
├── api/
│   └── db.py                # SQL query helpers (REST API)
│
├── scripts/
│   └── build_dashboard.py   # embeds DB snapshot into site/index.html before deploy
│
├── sql/
│   ├── 01_schema.sql        # table DDL
│   ├── 02_indexes.sql       # indexes
│   └── 03_queries.sql       # ad-hoc analysis queries
│
├── infra/
│   ├── main.bicep           # core infrastructure (SQL, Key Vault, App Insights)
│   └── func.bicep           # Function App deployment
│
├── site/
│   ├── index.html           # live dashboard (Chart.js, build-time DB snapshot)
│   ├── docs.html            # project docs (renders dba-azure-project.md via marked.js)
│   └── staticwebapp.config.json
│
├── bi/
│   ├── dax_measures.dax     # Power BI DAX measures
│   └── powerbi_setup.md     # Power BI connection setup guide
│
├── monitoring/
│   └── dashboard.json       # Azure Monitor dashboard definition
│
└── docs/
    └── screenshots/         # dev verification screenshots

Local Setup

Prerequisites

1. Configure environment variables

cp .env.example .env
# fill in the values in .env
Variable Description
AZURE_KEY_VAULT_URL Key Vault endpoint (https://<name>.vault.azure.net/)
DB_CONNECTION_STRING Azure SQL connection string (local dev only; production reads from Key Vault)
AzureWebJobsStorage local Function App storage (UseDevelopmentStorage=true)

2. Install dependencies

pip install -r requirements.txt

3. Run ETL manually (local)

python -c "
import os, dotenv
dotenv.load_dotenv()
from etl.etl_runner import run
run()
"

4. Run the REST API locally

func start

Test at http://localhost:7071/api/hospitals?state=CA.

5. Build the dashboard (optional)

# embeds a DB snapshot into site/index.html
python scripts/build_dashboard.py

REST API

Base URL: https://<function-app>.azurewebsites.net/api

Method Path Description
GET /hospitals list hospitals with filter + pagination
GET /hospitals/{id} single hospital detail
GET /hospitals/{id}/metrics quality metric history for a hospital
GET /states/summary aggregated stats per state
GET /metrics/top top hospitals by quality score for a given measure

Query parameters

GET /hospitals

Parameter Type Example
state string CA
rating int 4 (4 stars and above)
emergency string Y
limit int 20 (default)
offset int 0

GET /hospitals/{id}/metrics

Parameter Type Description
measure_id string e.g. EDAC_30_AMI
limit int number of snapshots (default: 10)

Error format

{
  "error": "NOT_FOUND",
  "message": "Hospital with facility_id '999999' not found.",
  "status": 404
}

Database Schema

-- hospital master data
CREATE TABLE dbo.Hospital (
    FacilityID        NVARCHAR(10)  NOT NULL PRIMARY KEY,
    FacilityName      NVARCHAR(200) NOT NULL,
    State             NCHAR(2),
    HospitalType      NVARCHAR(100),
    EmergencyServices NCHAR(1),     -- 'Y' / 'N'
    OverallRating     TINYINT,      -- 1–5, NULL if not rated
    Latitude          DECIMAL(9,6),
    Longitude         DECIMAL(9,6),
    UpdatedAt         DATETIME2 DEFAULT GETDATE()
);

-- quality metric snapshots
CREATE TABLE dbo.HospitalVisitMetrics (
    MetricID           INT IDENTITY PRIMARY KEY,
    FacilityID         NVARCHAR(10) NOT NULL REFERENCES dbo.Hospital(FacilityID),
    MeasureID          NVARCHAR(20) NOT NULL,
    Score              DECIMAL(8,2),
    ComparedToNational NVARCHAR(10), -- 'Better' / 'Same' / 'Worse'
    PeriodStart        DATE,
    PeriodEnd          DATE,
    CollectedAt        DATETIME2 DEFAULT GETDATE()
);

-- ETL run audit log
CREATE TABLE dbo.ETL_Log (
    LogID            INT IDENTITY PRIMARY KEY,
    RunStart         DATETIME2 DEFAULT GETDATE(),
    RunEnd           DATETIME2,
    HospitalsLoaded  INT,
    MetricsLoaded    INT,
    Status           NVARCHAR(20), -- 'SUCCESS' / 'FAILED'
    ErrorMessage     NVARCHAR(MAX)
);

ETL uses MERGE for upserts, so repeated runs don't produce duplicate records.


Infrastructure (Bicep)

# deploy core infrastructure (SQL, Key Vault, App Insights)
az deployment group create \
  --resource-group rg-dba-project \
  --template-file infra/main.bicep

# deploy Function App
az deployment group create \
  --resource-group rg-dba-project \
  --template-file infra/func.bicep

The Function App's Managed Identity is granted Key Vault Secrets User on the vault. At runtime, DefaultAzureCredential fetches the connection string — no secrets in code or config files.


Cost Control

Current low-cost operating mode, applied on 2026-06-18:

Resource Setting Cost impact
Function App func-dba-xvel6ncdvwsre Stopped Prevents Timer/API traffic from waking the database
SQL Database HospitalDB Serverless GP_S_Gen5_1, auto-pause after 15 minutes Compute cost stops while paused
SQL Database HospitalDB Max size reduced from 32 GB to 5 GB Lowers provisioned data storage cost without deleting data
Dashboard site/index.html Shows API DISABLED notice; live API refresh and full exports disabled Keeps the public dashboard usable from embedded snapshot data without waking the API/SQL path

Useful commands:

# stop ETL/API triggers so they do not resume SQL compute
az functionapp stop \
  --resource-group rg-dba-project \
  --name func-dba-xvel6ncdvwsre

# keep SQL serverless auto-pause at the minimum delay
az sql db update \
  --resource-group rg-dba-project \
  --server sql-dba-xvel6ncdvwsre \
  --name HospitalDB \
  --auto-pause-delay 15

# keep provisioned SQL storage at the current low-cost size
az sql db update \
  --resource-group rg-dba-project \
  --server sql-dba-xvel6ncdvwsre \
  --name HospitalDB \
  --max-size 5GB

# verify current cost-control state
az functionapp show \
  --resource-group rg-dba-project \
  --name func-dba-xvel6ncdvwsre \
  --query "{name:name,state:state}"

az sql db show \
  --resource-group rg-dba-project \
  --server sql-dba-xvel6ncdvwsre \
  --name HospitalDB \
  --query "{status:status,pausedDate:pausedDate,autoPauseDelay:autoPauseDelay,maxSizeBytes:maxSizeBytes}"

Avoid opening Azure Portal Query Editor, SSMS, Power BI DirectQuery, local API tests, or dashboard build scripts while minimizing cost. Any login attempt can resume a paused serverless SQL database.

To re-enable live dashboard/API behavior, start the Function App, set API_DISABLED = false in scripts/build_dashboard.py, rebuild, then redeploy Azure Static Web Apps.

To refresh only the embedded snapshot, use the Manual refresh button in the dashboard header and run the GitHub Actions workflow. This is intentionally a privileged, manual action because it queries Azure SQL and resumes serverless compute.


Live URLs

URL Description
https://www.dba-azure.techcloudup.com live dashboard
https://www.dba-azure.techcloudup.com/docs.html project documentation

Security Notes

  • .env is in .gitignore — never committed
  • Connection strings live in Azure Key Vault only
  • Code reads secrets at runtime via DefaultAzureCredential
  • Azure SQL enforces TLS 1.2; TDE is enabled by default

About

Automated pipeline that ingests US hospital quality data from CMS public APIs into Azure SQL Database, exposes it via a custom REST API (Azure Functions), and visualizes it in a live Chart.js dashboard. IaC with Bicep, secrets in Key Vault.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors