- 📌 Project Overview
- 🎯 Objectives
- 📂 Project Structure
- 🛠️ Tools & Technologies
- 🏗️ Data Architecture
- ⭐ Star Schema Design
- ⚙️ Step-by-Step Implementation
- 📊 Data Analytics
- ✅ Key Outcomes
This project showcases a real-time healthcare data engineering pipeline built on Microsoft Azure.
It simulates hospital operations such as patient admissions, transfers, and discharges, processes the live stream using Databricks, and delivers insights through Power BI dashboards connected to Synapse SQL Pool.
The solution bridges data engineering and data analytics—from ingestion to insight—enabling hospitals to track patient movement and bed utilization efficiently.
- Stream real-time hospital data via Azure Event Hub.
- Build a multi-layer ETL pipeline (Bronze → Silver → Gold) in Azure Databricks.
- Design a Star Schema in Synapse SQL Pool for analytical queries.
- Visualize patient and department metrics in Power BI.
- Maintain version control using Git.
real-time-patient-flow-azure/
│
├── databricks-notebooks/ # Transformation notebooks
│ ├── 01_bronze_rawdata.py
│ ├── 02_silver_cleandata.py
│ └── 03_gold_transform.py
├── simulator/ # Data simulation scripts
│ └── patient_flow_generator.py
├── powerbi/ # Powerbi Report
│ └── Hospital_dashboard.pbix
├── sqlpool-queries/ # SQL scripts for Synapse
│ └── SQL_queries.sql
├── Architecure Diagram/ # Flowchart
│ └──Flowchart
└── README.md # Project documentation
- Azure Event Hub – Real-time data ingestion
- Azure Databricks – PySpark-based data transformation
- Azure Data Lake Gen2 – Layered data storage
- Azure Synapse SQL Pool – Data warehouse for analytics
- Power BI – Dashboard and visualization
- Python 3.9+ – Core programming
- Git – Version control
The pipeline follows a medallion architecure:
- Bronze Layer: Raw JSON data from Event Hub stored in ADLS.
- Silver Layer: Cleaned and structured data (validated types, null handling).
- Gold Layer: Created analytical models and star schema tables for BI.
The Gold layer data in Synapse follows a star schema for optimized analytics:
- Fact Table:
FactPatientFlow(patient visits, timestamps, wait times, discharge) - Dimension Tables:
DimDepartment– Department detailsDimPatient– Patient demographic infoDimTime– Date and time dimension
- Created Event Hub namespace and patient-flow hub.
- Configured consumer groups for Databricks streaming.
- Developed Python script
patient_flow_generator.pyto stream fake patient data (departments, wait time, discharge status) to Event Hub. - Producer Code
- Configured Azure Data Lake Storage (ADLS Gen2).
- Created three containers:
- bronze/ → Raw JSON from Event Hub
- silver/ → Clean and structured data
- gold/ → Transformed fact and dimension tables.
- Notebook 1: Reads Event Hub stream into Bronze.
- Notebook 2: Cleans and validates schema.
- Notebook 3 : Aggregates and prepares star schema tables.
- Created dedicated SQL Pool.
- Executed schema and fact/dimension creation queries from:
- Version control with Git:
Once the data pipeline was established and a Star Schema implemented in Synapse SQL Pool, the next step was to build an interactive dashboard in Power BI.
- Connected Azure Synapse SQL Pool to Power BI using a direct SQL connection.
- Imported FactPatientFlow and Dimension tables.
- Established relationships for Star Schema-based reporting.
The Healthcare Patient Flow Dashboard provides insights into:
- Bed Occupancy Rate by department and gender.
- Patient Flow Trends (admissions, wait times).
- Department-Level KPIs (length of stay, Total Patients).
- Interactive Filters & Slicers for gender.
- End-to-End Pipeline: From real-time ingestion → transformation → warehouse → analytics.
- Scalable Architecture: Easily adaptable for different hospital datasets.
- Business Insights: Hospital admins can monitor bed usage, patient flow, and department efficiency in real time.
- Technical Impact: Delivered actionable hospital insights using Power BI and demonstrated a complete Azure Data Engineering lifecycle in one project
Author: Sahil Bodke
LinkedIn: Sahil Bodke
Contact: work.bodke@gmail.com

