AgriCast-ML — End-to-End Climate Forecasting System for Corn Yield

Developed during the DAWN Internship (ESSIC, University of Maryland; USDA NIFA-funded)

Overview

Built a multi-decade climate–yield forecasting framework integrating 50+ years of USDA yield and meteorological data (1M+ records) to model Corn Belt production under heterogeneous climate regimes.

The project combined large-scale ETL, spatial joins, seasonal climate feature engineering, and cluster-specific ensemble modeling to improve predictive performance and extract interpretable climate drivers.

Technical Contributions

Designed an end-to-end ETL pipeline integrating USDA county-level yield data with NOAA meteorological records
Performed station–county spatial assignment using geographic joins
Detrended yield to isolate climate-driven variability
Clustered regional climate regimes (k-means, 3 clusters)
Engineered seasonal aggregates and anomaly-based extreme-heat features
Built residual-stacked ensemble models (RF → XGBoost → MLP)
Applied PCA dimensionality reduction and MICE imputation
Conducted hyperparameter optimization and seasonal boundary tuning
Executed experiments on HPC infrastructure (Zaratan cluster)

Modeling Architecture

Baseline → Random Forest → Residual Stacking

Random Forest captures nonlinear climate effects
XGBoost models residual structure
MLP refines remaining error

Cluster-specific configurations were applied to account for heterogeneous climate–yield relationships across regions.

Results

R² improved from 0.30 → 0.76 (+150%)
Stable performance across three climate clusters
Identified seasonal Tmax and extreme heat anomalies as dominant yield drivers
Outputs translated into interpretable climate-risk indicators supporting ongoing research publication

Tools

Python • Pandas • NumPy • Scikit-learn • XGBoost • GeoPandas • PCA • MICE • HPC (Zaratan)

Data Access Note

Raw USDA and meteorological inputs are not included due to internship data agreements.
This repository documents the modeling framework and pipeline architecture used during the DAWN Internship.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
visualizations		visualizations
.DS_Store		.DS_Store
.gitignore		.gitignore
DAWN Internship Final Presentation.pdf		DAWN Internship Final Presentation.pdf
README.md		README.md
TECHNICAL_DETAILS.md		TECHNICAL_DETAILS.md
dawn_config.py		dawn_config.py
dawn_data_creation.ipynb		dawn_data_creation.ipynb
dawn_features.py		dawn_features.py
dawn_mapping.ipynb		dawn_mapping.ipynb
dawn_models.py		dawn_models.py
dawn_models_clean.ipynb		dawn_models_clean.ipynb
dawn_tuning.py		dawn_tuning.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgriCast-ML — End-to-End Climate Forecasting System for Corn Yield

Overview

Technical Contributions

Modeling Architecture

Results

Tools

Data Access Note

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AgriCast-ML — End-to-End Climate Forecasting System for Corn Yield

Overview

Technical Contributions

Modeling Architecture

Results

Tools

Data Access Note

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages