An intelligent machine learning system for predicting dengue outbreak risk levels across Indian cities using real-time environmental and demographic data. Features a modern glassmorphic dashboard with interactive data visualizations powered by an ensemble of ML models, with XGBoost achieving 96.57% accuracy.
Health.Env combines a Flask REST API with multiple trained ML classifiers and a browser-based dashboard to forecast weekly dengue outbreak risk for 30+ major Indian cities. The system processes weather, environmental, and epidemiological indicators through a standardized preprocessing pipeline, returning probabilistic risk classifications (Low, Moderate, High) with confidence scores and actionable recommendations.
We trained and evaluated 5 machine learning algorithms on 15,600 weekly dengue outbreak records (2015-2023, 30 Indian cities):
| Rank | Model | Accuracy | Precision | Recall | F1-Score | Status |
|---|---|---|---|---|---|---|
| π₯ | XGBoost | 96.57% | 96.75% | 96.57% | 96.59% | β Production Model |
| π₯ | Random Forest | 96.57% | 96.75% | 96.57% | 96.59% | β Tied Accuracy |
| π₯ | Decision Tree | 96.44% | 96.59% | 96.44% | 96.46% | β High Accuracy |
| 4οΈβ£ | K-Nearest Neighbors | 95.32% | 95.46% | 95.32% | 95.38% | β Instance-Based |
| 5οΈβ£ | Logistic Regression | 89.68% | 89.55% | 89.68% | 89.60% |
Why XGBoost is Deployed:
- Tied highest accuracy with Random Forest at 96.57%
- Superior gradient boosting for sequential learning from errors
- Excellent feature importance interpretation for healthcare decisions
- Faster inference than Random Forest's ensemble of trees
- Better generalization on unseen data due to L1/L2 regularization
- Handles class imbalance effectively with scale_pos_weight
- Industry-standard for healthcare ML applications
Model Selection Rationale: While XGBoost and Random Forest achieve identical accuracy (96.57%), we chose XGBoost as the primary production model because:
- Interpretability: Built-in feature importance (gain, cover, weight) for medical professionals
- Performance: Gradient boosting sequentially corrects prediction errors
- Scalability: Single boosted tree vs 100+ trees in Random Forest
- Robustness: Regularization (gamma, lambda, alpha) prevents overfitting on seasonal patterns
- Speed: Faster training (0.94s vs 0.84s) but more efficient inference
KNN Model: A K-Nearest Neighbors model is also trained and available for predictions. It uses distance-weighted voting to classify risk levels based on similarity to historical cases, making it useful for:
- Explainable predictions ("similar to outbreak X in city Y")
- Local pattern recognition
- Cases where instance-based reasoning is preferred
- Comparison with tree-based ensemble methods
The API serves predictions from all 5 models simultaneously (XGBoost, Random Forest, Decision Tree, Logistic Regression, KNN), allowing users to:
- Compare results across different algorithms
- Choose based on specific requirements (speed vs accuracy, interpretability vs performance)
- Ensemble predictions for higher confidence
- Analyze model agreement for risk assessment
- 5 Trained Models: XGBoost, Random Forest, Decision Tree, K-Nearest Neighbors, Logistic Regression
- All 5 officially evaluated with comprehensive comparison metrics
- XGBoost as primary production model (96.57% accuracy, tied with Random Forest)
- KNN provides instance-based predictions with 95.32% accuracy
- Trained on 15,600+ weekly historical records (2015-2023)
- StandardScaler preprocessing for normalized predictions
- Multi-class classification: Low, Moderate, High risk levels
- Real-time predictions via REST API endpoints
- All models available for prediction and comparison
Real-time dengue risk prediction interface:
- Interactive input controls for 12 environmental/demographic parameters
- Live prediction with confidence scores and risk visualization
- Risk-specific actionable recommendations
- 4 interactive Plotly charts displaying real training data:
- Monthly Risk Trend (seasonal patterns)
- Rainfall vs Temperature scatter (risk-colored)
- Feature Importance from models
- City Risk Distribution across India
- Theme toggle (dark/light mode)
- Offline fallback mode
Comprehensive ML model analysis dashboard:
- π Best Model Card: Highlights XGBoost with all metrics (96.57% accuracy)
- π Accuracy Bar Chart: Visual comparison of all 5 models
- π― Radar Chart: Multi-dimensional performance view
- π Grouped Bar Chart: Side-by-side metric comparison
- π Performance Table: Detailed metrics with rankings
- π‘ Insights Section: AI-generated analysis of model strengths
- Glassmorphic design with smooth animations
- Fully responsive (desktop β tablet β mobile)
- Dark/light theme with localStorage persistence
- Interactive charts with zoom and export
- Navigation between dashboard and comparison pages
- Real data from 8 years of training history
- Python 3.8+ (Python 3.11 recommended)
- pip package manager
- Modern web browser
# Clone repository
git clone https://github.com/Chanu716/Health.env.git
cd Health.env
# Create virtual environment
python -m venv .venv
.\.venv\Scripts\Activate.ps1 # Windows
# source .venv/bin/activate # macOS/Linux
# Install dependencies
pip install -r requirements.txt# Start Flask API
python api.py
# Access dashboards
# Main: http://localhost:5000/
# Comparison: http://localhost:5000/compare| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Main prediction dashboard |
/compare |
GET | Model comparison page |
/health |
GET | API health check |
/predict |
POST | Get dengue risk predictions (all models) |
/model-info |
GET | Model metadata |
/api/feature-importance |
GET | Feature weights |
/api/training-stats |
GET | Training data statistics |
/api/model-comparison |
GET | All model metrics |
Request:
{
"city": "Mumbai",
"month": "July",
"temperature": 32.5,
"rainfall": 1200,
"humidity": 85,
"aqi": 150,
"mosquito": 0.75,
"population": 5000,
"cases": 450
}Response:
{
"success": true,
"predictions": {
"xgboost_model": {
"risk_level": "High",
"confidence": 94.23,
"probabilities": {"low": 1.42, "moderate": 4.35, "high": 94.23}
},
"random_forest_model": {
"risk_level": "High",
"confidence": 93.87,
"probabilities": {"low": 1.58, "moderate": 4.55, "high": 93.87}
},
"decision_tree_model": {
"risk_level": "High",
"confidence": 91.20,
"probabilities": {"low": 2.10, "moderate": 6.70, "high": 91.20}
},
"knn_dengue_model": {
"risk_level": "High",
"confidence": 88.50,
"probabilities": {"low": 3.20, "moderate": 8.30, "high": 88.50}
},
"logistic_regression_model": {
"risk_level": "Moderate",
"confidence": 52.30,
"probabilities": {"low": 15.40, "moderate": 52.30, "high": 32.30}
}
}
}Health.env/
βββ api.py # Flask REST API
βββ templates/
β βββ index.html # Main dashboard
β βββ compare.html # Model comparison
βββ static/
β βββ script.js # Dashboard logic
β βββ compare.js # Comparison page logic
β βββ styles.css # Styling
βββ models/
β βββ xgboost_model.pkl # 96.57% (primary production)
β βββ random_forest_model.pkl # 96.57% (tied accuracy)
β βββ decision_tree_model.pkl # 96.44% (fastest)
β βββ knn_dengue_model.pkl # KNN (instance-based)
β βββ logistic_regression_model.pkl # 89.68% (baseline)
β βββ scaler.pkl # StandardScaler (15 features)
β βββ feature_names.pkl # Core 12 features
β βββ feature_names_15.pkl # Extended 15 features
βββ data/
β βββ dengue_data_cleaned.csv
β βββ dengue_india_weekly_with_nulls.csv
βββ results/
β βββ model_comparison_results.csv
βββ requirements.txt
- Year & Week (temporal)
- Temperature (Β°C)
- Rainfall (mm)
- Humidity (%)
- Air Quality Index (AQI)
- Mosquito Density (0-1)
- Population Density
- Dengue Cases Reported
- Latitude & Longitude
- Month Number
- Dataset: 15,600 records (2015-2023)
- Split: 80-20 stratified
- Preprocessing: StandardScaler normalization
- Validation: 5-fold cross-validation
- Tuning: GridSearchCV for hyperparameters
Mumbai, Delhi, Bangalore, Hyderabad, Chennai, Kolkata, Pune, Ahmedabad, Jaipur, Lucknow, Kanpur, Nagpur, Indore, Thane, Bhopal, Visakhapatnam, Pimpri-Chinchwad, Patna, Vadodara, Ghaziabad, Ludhiana, Agra, Nashik, Faridabad, Meerut, Rajkot, Kalyan-Dombivli, Vasai-Virar, Varanasi, Srinagar
Contributions welcome! Areas for improvement:
- Add LSTM/GRU for time-series forecasting
- Implement geographic heatmaps
- Mobile app (React Native/Flutter)
- Real-time data integration
- Multi-language support
MIT License - See LICENSE file
Karri Chanikya Sri Hari Narayana Dattu
- GitHub: @Chanu716
- Project: Health.env
- Scikit-learn & XGBoost teams
- Flask & Plotly.js communities
- Indian health departments for data
- Open-source ML community
Built with β€οΈ for public health | Preventing dengue outbreaks through AI
Version 2.0 | December 2025