Electrical Load Forecasting & Grid Analytics Framework

Quick Start (< 60 Seconds)

Install Dependencies

[!NOTE] Requirements: Python 3.10 is the tested version. CI/CD pipeline runs on CPU-only. GPU is optional for LSTM.
```
pip install -r requirements.txt
```
Verify Installation (Sample Mode) Runs the full pipeline on a tiny, self-contained sample dataset. No external ICED data required.
```
python main.py --sample
```
Run Full Pipeline (Real Data) Requires raw ICED datasets in data/raw/.
```
python main.py all
```

Project Overview

This repository implements a production-grade forecasting and analytics pipeline for electrical power systems. Using real-world data from the Intelligent Climate & Energy Database (ICED) by NITI Aayog, the project focuses on accurate long-term demand forecasting, short-term peak stress analysis, and grid capacity utilization metrics.

The core objective is to compare statistical (SARIMA) and deep learning (LSTM) approaches for modeling complex load profiles while enforcing strict software engineering standards for reproducibility and scalability.

Why This Project Matters

Accurate load forecasting is the backbone of modern power grid operations. As grids integrate more renewable energy sources and face increasing demand variability, the cost of forecasting errors rises significantly.

Under-forecasting risks blackouts and grid instability during peak hours.
Over-forecasting leads to wasted generation capacity and higher operational costs.

This project provides a rigorous framework for benchmarking forecasting models and analyzing critical grid characteristics like the Load Duration Curve (LDC).

Datasets Used

All data is sourced from NITI Aayog's ICED portal.

Yearly Hourly National Demand
- Contains hourly power demand (MW) for a full year.
- Used for training long-term forecasting models (SARIMA, LSTM).
Peak-Day Hourly Demand
- High-resolution hourly profiles for specific "stress days" (historical peak load days).
- Used to validate the model's ability to handle extreme events.
Load Duration Curve (LDC)
- Represents the cumulative frequency of demand levels over a year.
- Used to calculate Base Load vs. Peak Load requirements.

Methodology

1. Data Engineering

The pipeline begins with a robust ETL (Extract, Transform, Load) layer.

Schema Validation: Enforces strict column type checks to reject malformed data immediately.
Continuity Checks: Identifies missing hourly timestamps and re-indexes the series.
Imputation: Uses time-based interpolation to fill gaps without introducing look-ahead bias.

2. SARIMA Modeling

We implement Seasonal AutoRegressive Integrated Moving Average (SARIMA) as the statistical baseline.

Seasonality: Captures daily (24h) and weekly (168h) cycles.
Optimization: Uses parallel execution (Joblib) to perform a grid search for optimal (p,d,q)x(P,D,Q,s) hyperparameters based on validation set MAPE.

3. LSTM Modeling

We employ Long Short-Term Memory (LSTM) networks to capture non-linear temporal dependencies.

Architecture: Stacked LSTM layers with Dropout for regularization.
Vectorization: Input sequence generation is fully vectorized using NumPy stride tricks, offering significant speedups over iterative methods.

4. Peak Day Analysis

A focused module that isolates the single highest-demand day of the year.

Trains a short-horizon model on data preceding the peak event.
Quantifies the "Peak Error %" to measure safety margins for grid planning.

5. Load Duration Curve Analytics

Analyzing the LDC allows us to segment the demand into:

Base Load: The minimum load present throughout the year (typically met by coal/nuclear).
Peak Load: The maximum load seen only for a few hours (met by gas peakers/hydro).

Model Performance Summary

Model	RMSE	MAPE (%)	Notes
LSTM	Low	~1.06%	Best performance, captures non-linearity well.
SARIMA	High	~2.33%	Good baseline but struggles with complex patterns.

Note: The LSTM model consistently outperformed SARIMA on the test set, demonstrating the value of deep learning for complex time-series data.

Key Visualizations

1. LSTM Forecast vs Actual

Visual proof of the LSTM model's ability to track demand (Purple: Actual, Dashed: Forecast).

⚠️ Note: This plot is generated using the lightweight sample dataset for CI and reproducibility verification. Error values shown here are not representative of real-world model performance.

2. Peak Day Stress Test

Forecasting the single highest demand day of the year.

⚠️ Note: This plot is generated using the lightweight sample dataset for CI and reproducibility verification. Error values shown here are not representative of real-world model performance.

3. Load Duration Curve (LDC)

Illustrating the grid's capacity utilization.

⚠️ Note: This plot is generated using the lightweight sample dataset for CI and reproducibility verification. Error values shown here are not representative of real-world model performance.

Additional Insights:

Peak-Day Forecast Error: ~2.0% (Indicates high reliability during stress events).
National Base Load: Approximately 55% of Peak Load.

Project Structure

.
├── data/
│   ├── Raw/            # Immutable source Excel files
│   └── Processed/      # Cleaned and validated CSVs
├── src/
│   ├── models/
│   │   ├── sarima.py   # Statistical forecasting pipeline
│   │   ├── lstm.py     # Deep learning forecasting pipeline
│   │   ├── peak_day.py # Peak event analysis
│   │   └── ldc.py      # LDC analytics
│   ├── data_loader.py  # ETL and validation logic
│   ├── metrics.py      # Standardized evaluation metrics
│   └── visualization.py# Plotting utilities
├── plots/              # Generated reports and figures
├── main.py             # CLI Entry point
└── requirements.txt    # Project dependencies

How to Run

1. Setup Environment

Ensure you have Python 3.8+ installed. It is recommended to use a virtual environment.

pip install -r requirements.txt

2. Run Full Pipeline

To execute the ETL process, train all models, and generate the comparison report:

python main.py all

3. Run Specific Modules

You can also run individual components of the pipeline:

python main.py lstm       # Train and evaluate LSTM
python main.py sarima     # Train and evaluate SARIMA
python main.py peak_day   # Run peak day analysis
python main.py ldc        # Generate Load Duration Curve

Reproducibility & Determinism

This project enforces determinism to ensure results can be replicated.

Random seeds are fixed for NumPy (np.random.seed) and TensorFlow (tf.random.set_seed) in src/config.py.
Data splitting uses strictly chronological cutoffs (no random shuffling of time-series).

Key Observations

The LSTM model is superior for hourly variance but requires more computational resources for training.
SARIMA provides interpretable components (trend/seasonality) but is slower to infer on long horizons due to its recursive nature.
The Load Duration Curve reveals that nearly 45% of the grid capacity is used for less than 100% of the year, highlighting the economic challenge of sizing grid infrastructure for peak demand.

Future Work

Multi-Region Modeling: Extending the pipeline to forecast demand for specific regional grids (North, South, East, West).
Renewables Integration: Incorporating solar/wind generation profiles as exogenous variables.
Probabilistic Forecasting: Moving beyond point forecasts to provide confidence intervals (p90, p95) for better risk management.

Disclaimer

This project is for academic and research purposes. The datasets are property of NITI Aayog / ICED. While the code strives for accuracy, these forecasts should not be used for critical real-time grid operations without further validation.

Connect

Connect on LinkedIn

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
data		data
plots		plots
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Electrical Load Forecasting & Grid Analytics Framework

Quick Start (< 60 Seconds)

Project Overview

Why This Project Matters

Datasets Used

Methodology

1. Data Engineering

2. SARIMA Modeling

3. LSTM Modeling

4. Peak Day Analysis

5. Load Duration Curve Analytics

Model Performance Summary

Key Visualizations

1. LSTM Forecast vs Actual

2. Peak Day Stress Test

3. Load Duration Curve (LDC)

Project Structure

How to Run

1. Setup Environment

2. Run Full Pipeline

3. Run Specific Modules

Reproducibility & Determinism

Key Observations

Future Work

Disclaimer

Connect

About

Uh oh!

Releases

Packages

Languages

License

kalesha681/Load-Forecasting

Folders and files

Latest commit

History

Repository files navigation

Electrical Load Forecasting & Grid Analytics Framework

Quick Start (< 60 Seconds)

Project Overview

Why This Project Matters

Datasets Used

Methodology

1. Data Engineering

2. SARIMA Modeling

3. LSTM Modeling

4. Peak Day Analysis

5. Load Duration Curve Analytics

Model Performance Summary

Key Visualizations

1. LSTM Forecast vs Actual

2. Peak Day Stress Test

3. Load Duration Curve (LDC)

Project Structure

How to Run

1. Setup Environment

2. Run Full Pipeline

3. Run Specific Modules

Reproducibility & Determinism

Key Observations

Future Work

Disclaimer

Connect

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages