MedRisk is a machine learningβbased clinical decision support system that predicts the risk of a patient being readmitted to the hospital within 30 days of discharge. The goal is to help healthcare providers identify high-risk patients early and take preventive care actions.
Hospital readmissions are costly and often preventable. Identifying high-risk patients before discharge allows hospitals to:
- Provide additional monitoring
- Schedule early follow-ups
- Improve quality of care
- Reduce healthcare costs
This project uses historical patient data to build a predictive model that estimates readmission risk.
To develop a machine learning model that:
- Analyzes patient demographics, medical history, and hospital stay details
- Predicts the likelihood of 30-day readmission
- Provides interpretable results to support clinical decision-making
This is a binary classification problem:
- 0 β No readmission
- 1 β Readmission within 30 days
-
Data preprocessing and feature engineering
-
Handling missing and categorical data
-
Imbalanced data handling using SMOTE
-
Model training using:
- Logistic Regression
- Random Forest
- XGBoost (final model)
-
Model evaluation using:
- Precision, Recall, F1-score
- ROC-AUC score
-
Model interpretability using feature importance / SHAP
The dataset contains historical hospital records with features such as:
- Patient age and gender
- Admission type and length of stay
- Number of lab tests and diagnoses
- Medication details
- Previous hospital visits
Note: This project uses publicly available healthcare-style datasets for educational and research purposes.
- Python β Core programming language
- Pandas & NumPy β Data manipulation
- Scikit-learn β ML models and preprocessing
- XGBoost β Advanced gradient boosting model
- Imbalanced-learn (SMOTE) β Handling class imbalance
- SHAP / Feature Importance β Model explainability
- Streamlit / Flask β Web app interface (optional)
Because readmission prediction is an imbalanced problem, accuracy alone is not reliable. The following metrics are used:
- Recall β To identify as many high-risk patients as possible
- Precision β To avoid unnecessary false alarms
- F1 Score β Balance between precision and recall
- ROC-AUC β Overall model discrimination ability
- Load and preprocess hospital dataset
- Handle missing values and encode categorical features
- Balance data using SMOTE
- Train multiple ML models
- Select best-performing model (XGBoost)
- Evaluate performance using medical-relevant metrics
- Interpret predictions using SHAP/feature importance
- Deploy as a simple web application for risk prediction
- Use real-time hospital EHR data
- Add lab result trends and vital signs
- Integrate with hospital management systems
- Perform external validation across multiple hospitals
This project is for educational and research purposes only. It is not intended to replace professional medical judgment. The system is designed as a decision-support tool to assist healthcare providers.
Developed as a machine learning healthcare analytics project to demonstrate real-world ML problem solving, model interpretability, and responsible AI usage.