Skip to content

SahiLmb/Real-time-Hospital-flow-Analytics

Repository files navigation

End-to-End Healthcare Data Engineering Pipeline

Azure PySpark Azure Data Factory Azure Synapse Python Databricks PowerBI Git


📑 Table of Contents


📌 Project Overview

This project showcases a real-time healthcare data engineering pipeline built on Microsoft Azure.
It simulates hospital operations such as patient admissions, transfers, and discharges, processes the live stream using Databricks, and delivers insights through Power BI dashboards connected to Synapse SQL Pool.

The solution bridges data engineering and data analytics—from ingestion to insight—enabling hospitals to track patient movement and bed utilization efficiently.

Pipeline

Architecture


🎯 Objectives

  • Stream real-time hospital data via Azure Event Hub.
  • Build a multi-layer ETL pipeline (Bronze → Silver → Gold) in Azure Databricks.
  • Design a Star Schema in Synapse SQL Pool for analytical queries.
  • Visualize patient and department metrics in Power BI.
  • Maintain version control using Git.

📂 Project Structure

real-time-patient-flow-azure/
│
├── databricks-notebooks/  # Transformation notebooks
│   ├── 01_bronze_rawdata.py
│   ├── 02_silver_cleandata.py
│   └── 03_gold_transform.py
├── simulator/             # Data simulation scripts
│   └── patient_flow_generator.py
├── powerbi/               # Powerbi Report
│   └── Hospital_dashboard.pbix
├── sqlpool-queries/       # SQL scripts for Synapse
│   └── SQL_queries.sql
├── Architecure Diagram/   # Flowchart
│   └──Flowchart 
└── README.md              # Project documentation

🛠️ Tools & Technologies

  • Azure Event Hub – Real-time data ingestion
  • Azure Databricks – PySpark-based data transformation
  • Azure Data Lake Gen2 – Layered data storage
  • Azure Synapse SQL Pool – Data warehouse for analytics
  • Power BI – Dashboard and visualization
  • Python 3.9+ – Core programming
  • Git – Version control

🏗️ Data Architecture

The pipeline follows a medallion architecure:

  • Bronze Layer: Raw JSON data from Event Hub stored in ADLS.
  • Silver Layer: Cleaned and structured data (validated types, null handling).
  • Gold Layer: Created analytical models and star schema tables for BI.

⭐ Star Schema Design

The Gold layer data in Synapse follows a star schema for optimized analytics:

  • Fact Table: FactPatientFlow (patient visits, timestamps, wait times, discharge)
  • Dimension Tables:
    • DimDepartment – Department details
    • DimPatient – Patient demographic info
    • DimTime – Date and time dimension

⚙️ Step-by-Step Implementation

1. Event Hub Setup

  • Created Event Hub namespace and patient-flow hub.
  • Configured consumer groups for Databricks streaming.

2. Data Simulation

  • Developed Python script patient_flow_generator.py to stream fake patient data (departments, wait time, discharge status) to Event Hub.
  • Producer Code

3. Storage Setup

  • Configured Azure Data Lake Storage (ADLS Gen2).
  • Created three containers:
  • bronze/ → Raw JSON from Event Hub
  • silver/ → Clean and structured data
  • gold/ → Transformed fact and dimension tables.

4. Databricks Processing


5. Synapse SQL Pool

  • Created dedicated SQL Pool.
  • Executed schema and fact/dimension creation queries from:

6. Version Control

  • Version control with Git:

📊 Data Analytics

Once the data pipeline was established and a Star Schema implemented in Synapse SQL Pool, the next step was to build an interactive dashboard in Power BI.

🔗 Synapse → Power BI Connection

  • Connected Azure Synapse SQL Pool to Power BI using a direct SQL connection.
  • Imported FactPatientFlow and Dimension tables.
  • Established relationships for Star Schema-based reporting.

📈 Dashboard Features

The Healthcare Patient Flow Dashboard provides insights into:

  • Bed Occupancy Rate by department and gender.
  • Patient Flow Trends (admissions, wait times).
  • Department-Level KPIs (length of stay, Total Patients).
  • Interactive Filters & Slicers for gender.

update this


✅ Key Outcomes

  • End-to-End Pipeline: From real-time ingestion → transformation → warehouse → analytics.
  • Scalable Architecture: Easily adaptable for different hospital datasets.
  • Business Insights: Hospital admins can monitor bed usage, patient flow, and department efficiency in real time.
  • Technical Impact: Delivered actionable hospital insights using Power BI and demonstrated a complete Azure Data Engineering lifecycle in one project

Author: Sahil Bodke

LinkedIn: Sahil Bodke

Contact: work.bodke@gmail.com

About

A real time end to end pipeline for hospital patient flow analytics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages