Healthcare Outcome Prediction Using Logistic Regression

Project Overview

This project focuses on predicting a patient’s health outcome using medical diagnostic data. The goal is to explore how machine learning can assist in identifying patterns related to disease risk, while remaining transparent, interpretable, and aligned with academic best practices.

Rather than building a complex or black-box system, this project intentionally uses a simple and explainable classification model to understand how medical features relate to health outcomes.

Project Goals

Apply machine learning to a real healthcare-related dataset
Practice structured data cleaning and preprocessing
Use logistic regression for binary classification
Evaluate model performance using clear and interpretable metrics
Understand the limitations of predictive models in healthcare contexts

Dataset Description

The dataset contains medical and demographic information collected from patients. Each row represents one patient, and the target variable indicates whether the patient tested positive or negative for a specific medical condition.

Typical features include:

Glucose levels
Blood pressure
Body mass index (BMI)
Age
Other physiological measurements

The target variable is binary:

Positive diagnosis
Negative diagnosis

This dataset is well-suited for introductory healthcare classification tasks due to its structured format and clear outcome variable.

Data Preparation and Cleaning

Before modeling, the dataset was carefully prepared:

Invalid zero values in medical measurements were identified and treated as missing data
Missing values were replaced using column mean imputation
Features were verified for consistency and correctness
The dataset was split into training and testing sets

These steps ensure that the model is trained on realistic and meaningful medical data.

Model Used

Logistic Regression

Logistic Regression was selected because:

It is widely used in healthcare analytics
It produces interpretable probability-based outputs
It aligns with coursework concepts
It avoids unnecessary model complexity

The focus of this project is understanding how the model works, not maximizing performance at all costs.

Model Evaluation

The model was evaluated using:

Accuracy – to measure overall prediction correctness
Confusion Matrix – to examine true positives, false positives, true negatives, and false negatives

These metrics are especially important in healthcare, where different types of errors can have different real-world implications.

Key Insights

Certain medical features show strong relationships with patient outcomes
Logistic regression provides clear insight into classification behavior
The confusion matrix highlights where the model succeeds and fails
Even simple models can provide meaningful signals when data is cleaned properly

Limitations

This model is trained on historical data and does not replace medical diagnosis
The dataset lacks contextual factors such as lifestyle or genetics
Accuracy alone is insufficient for real healthcare deployment
Ethical and clinical validation would be required in real-world use

This project is strictly educational and exploratory.

Future Improvements

Possible extensions include:

Adding precision, recall, and F1-score analysis
Comparing logistic regression with other classifiers
Exploring feature importance more deeply
Using cross-validation for more robust evaluation

Author

Isaac Wanlemvo
Software Engineering & AI Student

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Logistic Regression Project.ipynb		Logistic Regression Project.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Healthcare Outcome Prediction Using Logistic Regression

Project Overview

Project Goals

Dataset Description

Data Preparation and Cleaning

Model Used

Logistic Regression

Model Evaluation

Key Insights

Limitations

Future Improvements

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Healthcare Outcome Prediction Using Logistic Regression

Project Overview

Project Goals

Dataset Description

Data Preparation and Cleaning

Model Used

Logistic Regression

Model Evaluation

Key Insights

Limitations

Future Improvements

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages