An end-to-end ML-powered web application that predicts risk for multiple diseases in real time.
- About the Project
- Supported Diseases
- Features
- Tech Stack
- How It Works
- Getting Started
- Usage
- Project Structure
- API Endpoints
- Model Performance
- Key Learnings
- Future Improvements
- Contributing
- License
- Contact
Health-Insight is a full-stack web application that integrates multiple machine learning models with a Flask backend to deliver real-time disease risk predictions directly in the browser.
Designed with production-style considerations in mind, this project addresses challenges like feature consistency between training and inference, model serialization, dynamic form generation, and robust error handling โ making it more than just a demo, but a template for real-world ML deployment.
โ ๏ธ Disclaimer: This application is intended for educational and research purposes only. It is not a substitute for professional medical advice, diagnosis, or treatment. Always consult a qualified healthcare provider.
| Disease | Model File | Dataset |
|---|---|---|
| ๐ฉธ Diabetes | diabetes.pkl |
diabetes.csv |
| โค๏ธ Heart Disease | heart.pkl |
heart.csv |
| ๐ซ Kidney Disease | kidney.pkl |
kidney.csv |
| ๐ซ Liver Disease | liver.pkl |
liver.csv |
| ๐ฌ Cancer | cancer.pkl |
cancer.csv |
- ๐ค Individual ML model per disease โ each condition has its own dedicated RandomForestClassifier
- ๐ Dynamic input forms โ automatically generated based on disease-specific feature sets
- โก Real-time predictions โ instant inference via Flask REST backend
- ๐พ Model persistence โ serialized and loaded using Pickle for fast startup
- โ Input validation โ client and server-side checks before inference
- ๐ก Categorical encoding โ handles mixed data types in production
- ๐งฑ Modular architecture โ clean separation of training, inference, and UI layers
- ๐ฑ Responsive UI โ works across desktop and mobile browsers
| Layer | Technology |
|---|---|
| Backend | Python 3.x, Flask |
| ML Framework | Scikit-learn (RandomForestClassifier) |
| Data Handling | Pandas, NumPy |
| Serialization | Pickle |
| Frontend | HTML5, CSS3, JavaScript |
| Templating | Jinja2 |
User selects a disease
โ
โผ
Dynamic form rendered with disease-specific input fields
โ
โผ
User submits health parameters
โ
โผ
Flask validates & encodes inputs
โ
โผ
Correct .pkl model loaded for the selected disease
โ
โผ
RandomForestClassifier runs inference
โ
โผ
Prediction result displayed on result.html
- Training Phase โ Each disease has a standalone training script (
training/<disease>.py) that preprocesses the dataset, trains aRandomForestClassifier, and saves the model as a.pklfile. - Inference Phase โ When a user submits a form, Flask loads the corresponding
.pklmodel, applies the same preprocessing pipeline, and returns the prediction. - Feature Consistency โ Feature names and encoding schemes are kept consistent between training and inference to prevent silent prediction errors.
Ensure the following are installed on your system:
- Clone the repository:
git clone https://github.com/your-username/Health-Insight.git
cd Health-Insight- Create and activate a virtual environment (recommended):
python -m venv venv
source venv/bin/activate # macOS/Linux
venv\Scripts\activate # Windows- Install all dependencies:
pip install -r requirements.txt- (Optional) Retrain the models:
python training/diabetes.py
python training/heart.py
python training/kidney.py
python training/liver.py
python training/cancer.pyPre-trained
.pklfiles are included in themodels/directory so retraining is optional.
- Start the Flask development server:
python app.py- Open your browser and visit:
http://127.0.0.1:5000/
- Select a disease, fill in the health parameters, and click Predict to receive your risk assessment instantly.
Health-Insight/
โ
โโโ app.py # Main Flask app โ routes & inference logic
โโโ requirements.txt # All Python dependencies
โ
โโโ models/ # Serialized trained ML models
โ โโโ diabetes.pkl
โ โโโ heart.pkl
โ โโโ kidney.pkl
โ โโโ liver.pkl
โ โโโ cancer.pkl
โ
โโโ training/ # Standalone training scripts per disease
โ โโโ diabetes.py
โ โโโ heart.py
โ โโโ kidney.py
โ โโโ liver.py
โ โโโ cancer.py
โ
โโโ datasets/ # Raw CSV datasets used for training
โ โโโ diabetes.csv
โ โโโ heart.csv
โ โโโ kidney.csv
โ โโโ liver.csv
โ โโโ cancer.csv
โ
โโโ templates/ # Jinja2 HTML templates
โ โโโ index.html # Landing page โ disease selector
โ โโโ form.html # Dynamic input form
โ โโโ result.html # Prediction result display
โ
โโโ static/ # Static assets
โ โโโ css/ # Stylesheets
โ โโโ js/ # JavaScript files
โ
โโโ README.md
| Method | Endpoint | Description |
|---|---|---|
| GET | / |
Home page โ disease selection |
| GET | /predict/<disease> |
Load input form for selected disease |
| POST | /predict/<disease> |
Submit form and return prediction result |
Example POST body for diabetes prediction:
{
"pregnancies": 2,
"glucose": 138,
"blood_pressure": 62,
"skin_thickness": 35,
"insulin": 0,
"bmi": 33.6,
"diabetes_pedigree": 0.627,
"age": 47
}Results from training on the provided datasets. Metrics may vary with different train/test splits.
| Disease | Algorithm | Accuracy |
|---|---|---|
| Diabetes | RandomForestClassifier | ~76โ80% |
| Heart Disease | RandomForestClassifier | ~82โ86% |
| Kidney Disease | RandomForestClassifier | ~96โ99% |
| Liver Disease | RandomForestClassifier | ~72โ76% |
| Cancer | RandomForestClassifier | ~94โ97% |
- Feature consistency โ Ensuring training feature names/order exactly match inference inputs to prevent silent errors
- Categorical encoding in production โ Handling label encoding and one-hot encoding at inference time without refitting
- Real-world ML deployment โ Debugging shape mismatches, missing values, and dtype inconsistencies
- Modular backend design โ Separating training logic from inference for clean, maintainable code
- Flask routing patterns โ Building dynamic, parameterized routes for multi-model applications
- REST API endpoints with JSON responses for mobile/external integration
- User authentication and prediction history dashboard
- Model monitoring, drift detection & automated retraining pipeline
- Dockerized deployment with
docker-compose - SHAP-based explainability โ show which features drove the prediction
- Confidence scores alongside binary predictions
- CI/CD pipeline with GitHub Actions
Contributions are welcome and appreciated!
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature - Commit your changes:
git commit -m "Add: your feature description" - Push to your branch:
git push origin feature/your-feature - Open a Pull Request
Please follow PEP 8 coding standards and include docstrings for any new functions.
Distributed under the MIT License. See LICENSE for more information.
Your Name โ ganesh1a0576@gmail.com
GitHub: Ganesh-a0576
Project Link: https://github.com/your-username/Health-Insight
โญ If this project helped you, please consider giving it a star โ it means a lot!
Made with โค๏ธ and Python