This project is part of the Healthcare Analytics course, where we explore hospital inpatient data during the COVID-19 pandemic. The analysis includes preprocessing datasets, building predictive models to estimate hospital length of stay (HLOS), and conducting trials using both MLP and RNN models.
The project consists of three key datasets:
- Data-at-admission: Contains patient details and measurements taken at the time of admission.
- Days-breakdown: Provides time-series data with daily vital signs and blood work results.
- Hospital-length-of-stay: Tracks the length of stay in the hospital for each patient.
The goal is to predict the hospital length of stay (HLOS) for inpatients using the available features.
-
Data Preprocessing:
- Dropped irrelevant columns and handled missing values using median imputation and KNN imputation.
- Converted categorical variables (e.g., comorbidities, intubation status) into numerical form using encoding techniques like MultiLabelBinarizer and LabelEncoder.
- Standardized numerical features using
StandardScaler.
-
Exploratory Data Analysis (EDA):
- Performed summary statistics and visualized important features.
- Identified key trends and patterns in the dataset.
-
Modeling:
- Multilayer Perceptron (MLP): Built an MLP model using
MLPRegressorfor HLOS prediction. - Recurrent Neural Network (RNN): Built an RNN model using
SimpleRNNandLSTMlayers to capture the sequential nature of the data.
- Multilayer Perceptron (MLP): Built an MLP model using
-
Optimization:
- Ran multiple trials to find the best hyperparameters for both MLP and RNN models.
- Evaluated model performance using Mean Squared Error (MSE) and R-squared metrics.
To run the project, you need to install the following dependencies:
- Python 3.x
- Required libraries:
pip install pandas matplotlib seaborn sklearn tensorflow openpyxl
Canada_Hosp1_COVID_InpatientData.xlsx: The dataset containing three sheets: Data-at-admission, Days-breakdown, and Hospital-length-of-stay.healthcare_analytics_project.ipynb: The Jupyter Notebook containing the full implementation.README.md: This file, describing the project structure and how to run the analysis.
- Clone the repository:
git clone <repository-url>
- Navigate to the project directory:
cd healthcare-analytics-project - Open the Jupyter notebook:
jupyter notebook healthcare_analytics_project.ipynb
The project evaluates the following models:
- MLP: Achieved an average MSE of X over 5 trials.
- RNN (LSTM): Achieved an average MSE of Y over 5 trials.
Both models are compared based on their predictive accuracy and computational performance.
- Experiment with additional models like GRU or Transformer-based architectures.
- Refine hyperparameters for better performance.
- Explore more advanced feature engineering techniques.
This project is licensed under the MIT License.