This project simulates how a hospital might use data engineering and ETL pipelines to manage cost forecasts, explore coefficients through data science and handle department queries.
Objective: Forecast hospital resource demand (beds, ICU units, staff) and patient cost using historical data. Provide analytics dashboards and ML-driven insights for hospital administrators.
┌──────────────┐ ┌────────────────┐
│ Data Gen / │ ─────▶ │ Raw GCS Bucket │
│ Ingestion │ └────────────────┘
└────┬─────────┘
▼
┌─────────────────┐ ┌──────────────┐
│ Kafka / PubSub │ ─────▶ │ BigQuery │ ◀────┐
└─────────────────┘ └────┬──────────┘ │
▼ ▼
┌──────────────┐ ┌────────────┐
│ dbt │ │ Jupyter DS │
│ Transform │ │ Notebooks │
└────┬─────────┘ └────┬───────┘
▼ ▼
┌────────────────┐ ┌───────────────┐
│ Dash / Airflow │ ◀▶│ REST API (Flask)│
│ Admin Dashboard │ └───────────────┘
└────────────────┘
Data Simulation
This project simulates the following datasets:
-
Patient Admission Events (Daily)
| patient_id | admit_date | dept | severity | diagnosis | age | insurance_type | -
Procedures / Billing Records
| procedure_id | patient_id | procedure | cost | performed_by | date | -
Staffing Schedule
| staff_id | role | department | shift_date | hours | -
Bed / ICU Occupancy
| bed_id | dept | patient_id | start_time | end_time | is_ICU |
A Demo of this project will be hosted in the future.
- Access Apache Airflow
The Admin Portal is ran through Airflow, User: admin, Password: admin.
- Verify DAGs
Once logged in, you should see your DAGs under the DAGs tab.
- Trigger a DAG
From the Airflow UI, click on a DAG and manually trigger it to start the workflow.
-
Generates 1000 patients admitted to the hospital, running this DAG creates a .csv of this admission data every 24 hours.
Randomized categorical data is generated for the following patient fields: patient_id, admit_date, dept, severity, diagnosis, age, insurance_type
Daily files are uploaded to a GCS bucket.
-
Receives .csv files from Google Cloud
Files output by generate_admission_dag are meant to be held in GCS to simulate reports sent by an admissions department.
This DAG downloads these files stored in an online GCS bucket. Locally stored in /opt/airflow/data/admissions/


