Skip to content

y-kanchan/Health.env

Β 
Β 

Repository files navigation

🦟 Health.Env - Dengue Risk Prediction System

Python Flask XGBoost License

An intelligent machine learning system for predicting dengue outbreak risk levels across Indian cities using real-time environmental and demographic data. Features a modern glassmorphic dashboard with interactive data visualizations powered by an ensemble of ML models, with XGBoost achieving 96.57% accuracy.


πŸ“Š Overview

Health.Env combines a Flask REST API with multiple trained ML classifiers and a browser-based dashboard to forecast weekly dengue outbreak risk for 30+ major Indian cities. The system processes weather, environmental, and epidemiological indicators through a standardized preprocessing pipeline, returning probabilistic risk classifications (Low, Moderate, High) with confidence scores and actionable recommendations.

πŸ† Model Comparison

We trained and evaluated 5 machine learning algorithms on 15,600 weekly dengue outbreak records (2015-2023, 30 Indian cities):

Rank Model Accuracy Precision Recall F1-Score Status
πŸ₯‡ XGBoost 96.57% 96.75% 96.57% 96.59% βœ… Production Model
πŸ₯ˆ Random Forest 96.57% 96.75% 96.57% 96.59% βœ… Tied Accuracy
πŸ₯‰ Decision Tree 96.44% 96.59% 96.44% 96.46% βœ… High Accuracy
4️⃣ K-Nearest Neighbors 95.32% 95.46% 95.32% 95.38% βœ… Instance-Based
5️⃣ Logistic Regression 89.68% 89.55% 89.68% 89.60% ⚠️ Baseline

Why XGBoost is Deployed:

  • Tied highest accuracy with Random Forest at 96.57%
  • Superior gradient boosting for sequential learning from errors
  • Excellent feature importance interpretation for healthcare decisions
  • Faster inference than Random Forest's ensemble of trees
  • Better generalization on unseen data due to L1/L2 regularization
  • Handles class imbalance effectively with scale_pos_weight
  • Industry-standard for healthcare ML applications

Model Selection Rationale: While XGBoost and Random Forest achieve identical accuracy (96.57%), we chose XGBoost as the primary production model because:

  1. Interpretability: Built-in feature importance (gain, cover, weight) for medical professionals
  2. Performance: Gradient boosting sequentially corrects prediction errors
  3. Scalability: Single boosted tree vs 100+ trees in Random Forest
  4. Robustness: Regularization (gamma, lambda, alpha) prevents overfitting on seasonal patterns
  5. Speed: Faster training (0.94s vs 0.84s) but more efficient inference

KNN Model: A K-Nearest Neighbors model is also trained and available for predictions. It uses distance-weighted voting to classify risk levels based on similarity to historical cases, making it useful for:

  • Explainable predictions ("similar to outbreak X in city Y")
  • Local pattern recognition
  • Cases where instance-based reasoning is preferred
  • Comparison with tree-based ensemble methods

The API serves predictions from all 5 models simultaneously (XGBoost, Random Forest, Decision Tree, Logistic Regression, KNN), allowing users to:

  • Compare results across different algorithms
  • Choose based on specific requirements (speed vs accuracy, interpretability vs performance)
  • Ensemble predictions for higher confidence
  • Analyze model agreement for risk assessment

✨ Key Features

πŸ€– Machine Learning

  • 5 Trained Models: XGBoost, Random Forest, Decision Tree, K-Nearest Neighbors, Logistic Regression
  • All 5 officially evaluated with comprehensive comparison metrics
  • XGBoost as primary production model (96.57% accuracy, tied with Random Forest)
  • KNN provides instance-based predictions with 95.32% accuracy
  • Trained on 15,600+ weekly historical records (2015-2023)
  • StandardScaler preprocessing for normalized predictions
  • Multi-class classification: Low, Moderate, High risk levels
  • Real-time predictions via REST API endpoints
  • All models available for prediction and comparison

πŸ“Š Dual Dashboard System

1. Main Dashboard (/)

Real-time dengue risk prediction interface:

  • Interactive input controls for 12 environmental/demographic parameters
  • Live prediction with confidence scores and risk visualization
  • Risk-specific actionable recommendations
  • 4 interactive Plotly charts displaying real training data:
    • Monthly Risk Trend (seasonal patterns)
    • Rainfall vs Temperature scatter (risk-colored)
    • Feature Importance from models
    • City Risk Distribution across India
  • Theme toggle (dark/light mode)
  • Offline fallback mode

2. Model Comparison Page (/compare)

Comprehensive ML model analysis dashboard:

  • πŸ† Best Model Card: Highlights XGBoost with all metrics (96.57% accuracy)
  • πŸ“Š Accuracy Bar Chart: Visual comparison of all 5 models
  • 🎯 Radar Chart: Multi-dimensional performance view
  • πŸ“ˆ Grouped Bar Chart: Side-by-side metric comparison
  • πŸ“‹ Performance Table: Detailed metrics with rankings
  • πŸ’‘ Insights Section: AI-generated analysis of model strengths

🎨 Modern UI/UX

  • Glassmorphic design with smooth animations
  • Fully responsive (desktop β†’ tablet β†’ mobile)
  • Dark/light theme with localStorage persistence
  • Interactive charts with zoom and export
  • Navigation between dashboard and comparison pages
  • Real data from 8 years of training history

πŸš€ Quick Start

Prerequisites

  • Python 3.8+ (Python 3.11 recommended)
  • pip package manager
  • Modern web browser

Installation

# Clone repository
git clone https://github.com/Chanu716/Health.env.git
cd Health.env

# Create virtual environment
python -m venv .venv
.\.venv\Scripts\Activate.ps1  # Windows
# source .venv/bin/activate    # macOS/Linux

# Install dependencies
pip install -r requirements.txt

Running Locally

# Start Flask API
python api.py

# Access dashboards
# Main: http://localhost:5000/
# Comparison: http://localhost:5000/compare

πŸ”Œ API Endpoints

Endpoint Method Description
/ GET Main prediction dashboard
/compare GET Model comparison page
/health GET API health check
/predict POST Get dengue risk predictions (all models)
/model-info GET Model metadata
/api/feature-importance GET Feature weights
/api/training-stats GET Training data statistics
/api/model-comparison GET All model metrics

Example: POST /predict

Request:

{
  "city": "Mumbai",
  "month": "July",
  "temperature": 32.5,
  "rainfall": 1200,
  "humidity": 85,
  "aqi": 150,
  "mosquito": 0.75,
  "population": 5000,
  "cases": 450
}

Response:

{
  "success": true,
  "predictions": {
    "xgboost_model": {
      "risk_level": "High",
      "confidence": 94.23,
      "probabilities": {"low": 1.42, "moderate": 4.35, "high": 94.23}
    },
    "random_forest_model": {
      "risk_level": "High",
      "confidence": 93.87,
      "probabilities": {"low": 1.58, "moderate": 4.55, "high": 93.87}
    },
    "decision_tree_model": {
      "risk_level": "High",
      "confidence": 91.20,
      "probabilities": {"low": 2.10, "moderate": 6.70, "high": 91.20}
    },
    "knn_dengue_model": {
      "risk_level": "High",
      "confidence": 88.50,
      "probabilities": {"low": 3.20, "moderate": 8.30, "high": 88.50}
    },
    "logistic_regression_model": {
      "risk_level": "Moderate",
      "confidence": 52.30,
      "probabilities": {"low": 15.40, "moderate": 52.30, "high": 32.30}
    }
  }
}

πŸ“ Project Structure

Health.env/
β”œβ”€β”€ api.py                          # Flask REST API
β”œβ”€β”€ templates/
β”‚   β”œβ”€β”€ index.html                  # Main dashboard
β”‚   └── compare.html                # Model comparison
β”œβ”€β”€ static/
β”‚   β”œβ”€β”€ script.js                   # Dashboard logic
β”‚   β”œβ”€β”€ compare.js                  # Comparison page logic
β”‚   └── styles.css                  # Styling
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ xgboost_model.pkl           # 96.57% (primary production)
β”‚   β”œβ”€β”€ random_forest_model.pkl     # 96.57% (tied accuracy)
β”‚   β”œβ”€β”€ decision_tree_model.pkl     # 96.44% (fastest)
β”‚   β”œβ”€β”€ knn_dengue_model.pkl        # KNN (instance-based)
β”‚   β”œβ”€β”€ logistic_regression_model.pkl # 89.68% (baseline)
β”‚   β”œβ”€β”€ scaler.pkl                  # StandardScaler (15 features)
β”‚   β”œβ”€β”€ feature_names.pkl           # Core 12 features
β”‚   └── feature_names_15.pkl        # Extended 15 features
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ dengue_data_cleaned.csv
β”‚   └── dengue_india_weekly_with_nulls.csv
β”œβ”€β”€ results/
β”‚   └── model_comparison_results.csv
└── requirements.txt

πŸ§ͺ Model Details

Features (12 Core Variables)

  1. Year & Week (temporal)
  2. Temperature (Β°C)
  3. Rainfall (mm)
  4. Humidity (%)
  5. Air Quality Index (AQI)
  6. Mosquito Density (0-1)
  7. Population Density
  8. Dengue Cases Reported
  9. Latitude & Longitude
  10. Month Number

Training Configuration

  • Dataset: 15,600 records (2015-2023)
  • Split: 80-20 stratified
  • Preprocessing: StandardScaler normalization
  • Validation: 5-fold cross-validation
  • Tuning: GridSearchCV for hyperparameters

Supported Cities (30)

Mumbai, Delhi, Bangalore, Hyderabad, Chennai, Kolkata, Pune, Ahmedabad, Jaipur, Lucknow, Kanpur, Nagpur, Indore, Thane, Bhopal, Visakhapatnam, Pimpri-Chinchwad, Patna, Vadodara, Ghaziabad, Ludhiana, Agra, Nashik, Faridabad, Meerut, Rajkot, Kalyan-Dombivli, Vasai-Virar, Varanasi, Srinagar


🀝 Contributing

Contributions welcome! Areas for improvement:

  • Add LSTM/GRU for time-series forecasting
  • Implement geographic heatmaps
  • Mobile app (React Native/Flutter)
  • Real-time data integration
  • Multi-language support

πŸ“„ License

MIT License - See LICENSE file


πŸ‘¨β€πŸ’» Author

Karri Chanikya Sri Hari Narayana Dattu


πŸ™ Acknowledgments

  • Scikit-learn & XGBoost teams
  • Flask & Plotly.js communities
  • Indian health departments for data
  • Open-source ML community

Built with ❀️ for public health | Preventing dengue outbreaks through AI

Version 2.0 | December 2025

About

Dengue Risk Prediction & Monitoring System

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 98.2%
  • Other 1.8%