DTSC 301 — Group 4
Amanda Hawbecker, Chikamso Ezeaku, Kobe Duncan
This project analyzes changes in energy usage patterns and power outages across Maryland's 24 counties from 2020 to 2024, examining how electricity demand and grid stability evolved across three distinct eras:
- COVID-19 work-from-home era (2020 – mid-2021)
- Post-pandemic recovery (mid-2021 – 2022)
- AI data center growth (2023 – 2024)
A Random Forest ML model (compared against XGBoost) predicts monthly outage intensity
by county, with SHAP-based feature attribution. Results are presented through an
interactive Streamlit dashboard and a printed symposium poster
(reports/md_power_grid_poster.pptx).
- Python 3.12+
- Git LFS (required for the Eagle-I raw data files)
Windows:
winget install GitHub.GitLFS
# or download from https://git-lfs.com/Mac:
brew install git-lfsLinux:
sudo apt install git-lfs # Debian/UbuntuThen initialise LFS for your git user (also one-time):
git lfs installgit clone https://github.com/Prophet5Programs/MD_Energy_Usage.git
cd MD_Energy_UsageGit LFS will automatically download the large Eagle-I raw CSVs (~92 MB) during the clone.
python -m venv .venv
# Windows
.venv\Scripts\activate
# Mac / Linux
source .venv/bin/activatepip install -r requirements.txtjupyter notebook notebooks/01_data_cleaning.ipynbRun all cells top-to-bottom. The notebook expects the data files that are already in the repo — no additional downloads needed.
If you want to regenerate any raw data from scratch, the fetch scripts are in src/.
Each script is standalone — run from the project root:
# Eagle-I outage data (2020–2024) — downloads ~6 GB from Figshare, saves ~92 MB locally
# Requires: curl.exe (ships with Windows 10/11) or curl on Mac/Linux
python src/fetch_eaglei_md.py
# NOAA storm events (2020–2024)
python src/fetch_noaa_storm_events.py
# ZIP-to-county crosswalk
python src/fetch_crosswalk.py
# EIA Form 861 annual electricity sales
# (requires EIA zip archives already in data/raw/ — see script header for download links)
python src/fetch_eia_annual.py
# Maryland data center directory (scraped from datacentermap.com HTML)
python src/fetch_datacentermap.py # outputs data/external/md_datacenters_raw.csv
# Curated data center CSV with MW estimates and county/year metadata
python src/build_datacenter_csv.py # outputs data/external/md_datacenters.csvMD_Energy_Usage/
├── data/
│ ├── raw/ # Per-year Eagle-I CSVs (Git LFS)
│ │ └── eaglei_outages_MD_YYYY.csv
│ ├── processed/ # Cleaned & aggregated datasets
│ │ ├── eaglei_outages_MD_monthly.csv # 24 counties × 60 months
│ │ ├── storm_events_MD_2020_2024.csv
│ │ ├── weather_clean.parquet / .csv
│ │ └── eia_clean.csv
│ └── external/ # Reference / lookup tables
│ ├── zip_county_crosswalk.csv
│ ├── census_acs_county_md.csv
│ ├── eia_annual_md_2020_2024.csv
│ ├── md_tree_canopy_county.csv
│ ├── md_datacenters_raw.csv # Scraped from datacentermap.com
│ ├── md_datacenters.csv # Curated facility list with MW estimates
│ └── md_datacenters_county_annual.csv # County-year DC load estimates
├── notebooks/
│ ├── 01_data_cleaning.ipynb ← start here
│ ├── 02_feature_engineering.ipynb
│ ├── 03_eda.ipynb
│ ├── 04_modeling.ipynb
│ └── 05_datacenter_analysis.ipynb ← AI/data center energy impact & counterfactual
├── src/
│ ├── fetch_eaglei_md.py
│ ├── fetch_noaa_storm_events.py
│ ├── fetch_crosswalk.py
│ ├── fetch_eia_annual.py
│ ├── fetch_datacentermap.py ← scrapes datacentermap.com MD listings
│ ├── build_datacenter_csv.py ← builds curated md_datacenters.csv
│ ├── data_cleaning.py ← cleaning pipeline (mirrors notebook 01)
│ ├── feature_engineering.py ← builds model_features.csv
│ ├── eda.py ← EDA charts
│ ├── modeling.py ← RF + XGBoost training & SHAP
│ ├── datacenter_analysis.py ← AI/data-center counterfactual
│ └── energy_estimator.py ← LBNL-based MW → MWh estimates
├── models/
│ ├── rf_best.pkl ← trained Random Forest
│ ├── xgb_best.pkl ← trained XGBoost
│ └── best_model_name.txt
├── dashboard/
│ └── app.py ← Streamlit dashboard
├── reports/
│ ├── md_power_grid_poster.pptx ← final symposium poster
│ ├── poster_md_energy(2).pdf ← printed PDF
│ ├── midterm_progress_presentation.pptx
│ ├── charts/ ← EDA + poster figures
│ │ └── poster/ ← print-resolution poster charts
│ ├── shap_summary.png, shap_bar.png, shap_waterfall_*.png
│ └── POSTER_PLAN.md
├── docs/superpowers/ ← ML pipeline specs & plans
├── requirements.txt
└── README.md
| Dataset | Source | Granularity | Status |
|---|---|---|---|
| Eagle-I Power Outages | Brelsford et al. (2023), Figshare 10.6084/m9.figshare.24237376 |
15-min county, aggregated monthly | ✅ Complete |
| NOAA Storm Events | NOAA Storm Events Database | County, per event | ✅ Complete |
| Maryland Weather | NOAA GHCN | Daily station → monthly | ✅ Complete |
| EIA Form 861 Sales | EIA.gov | Annual, MD utilities | ✅ Complete |
| Census ACS 5-year | Census Bureau API | Annual, county | ✅ Complete |
| ZIP-to-County Crosswalk | Census Bureau | — | ✅ Complete |
| Tree Canopy / Forest Cover | Chesapeake Conservancy / UMD (2022) | County, 2018 imagery | ✅ Complete |
| MD Data Center Directory | datacentermap.com (scraped Mar 2026) | Facility-level (41 sites) | ✅ Complete |
| MD Data Center Energy Estimates | LBNL 2023 benchmarks + press releases | County-year, 2020–2024 | ✅ Complete (estimates) |
- Which counties and times of year are most at risk for power outages?
- How do storm severity and temperature extremes correlate with outage frequency?
- What is the relationship between customers affected and population density?
- How did COVID-era behavioral changes affect energy usage and outage risk?
- Did AI data center growth (2023–2024) measurably shift county-level demand or outage patterns?
- Models: Random Forest (primary) + XGBoost (comparison), trained via
src/modeling.py/notebooks/04_modeling.ipynb - Target:
log(outage_events + 1)— monthly county-level outage intensity - Train/test split: 2020–2023 train, 2024 holdout
- Key features: temperature extremes (CDD/HDD), storm severity, era dummies, population density, tree canopy %, prior-month outage lag, rolling 3-month average, same-month-prior-year, cyclical month encoding
- Interpretability: SHAP summary, bar, dependence, and waterfall plots in
reports/ - Artifacts: serialized models in
models/(rf_best.pkl,xgb_best.pkl)
End-to-end reproduction (after setup):
jupyter notebook notebooks/02_feature_engineering.ipynb # builds data/processed/model_features.csv
jupyter notebook notebooks/03_eda.ipynb # generates EDA charts
jupyter notebook notebooks/04_modeling.ipynb # trains models + SHAP
jupyter notebook notebooks/05_datacenter_analysis.ipynb # data-center counterfactualstreamlit run dashboard/app.pyThe dashboard surfaces county-level outage trends, era comparisons, weather/storm drivers, the data-center growth story, and live model predictions with SHAP explanations.
The final poster lives at reports/md_power_grid_poster.pptx (PDF export:
reports/poster_md_energy(2).pdf). Print-resolution figures are in
reports/charts/poster/. The poster narrative is documented in
reports/POSTER_PLAN.md.
- Brelsford et al. (2023) — Eagle-I Recorded Electricity Outages 2014–2022 — Figshare
- EIA Electricity Data
- NOAA Storm Events Database
- Census Bureau ACS
- Chesapeake Conservancy — Technical Study on Changes in Forest Cover and Tree Canopy in Maryland (2022)
- datacentermap.com — Maryland Data Centers (scraped March 2026)
- Shehabi, A. et al. (2023) — 2022 United States Data Center Energy Usage Report — Lawrence Berkeley National Laboratory
- IEA (2024) — Data Centres and Data Transmission Networks — International Energy Agency