This project aims to predict U.S. monthly energy prices using machine learning and time series forecasting techniques. The dataset is sourced from the U.S. Energy Information Administration (EIA). The project follows a structured approach, including data wrangling, exploratory data analysis (EDA), preprocessing, and modeling.
📂 springboard_dsc_capstone3
│-- 📂 data # Raw and processed datasets
│-- 📂 figures # Model forecast figures
│-- 📂 notebooks # Jupyter notebooks with analysis and model development
│-- 📂 scripts # Python scripts for converting data and defining functions
|-- Capstone Three Report.pdf # Project report with final findings
|-- glossary.pdf # Explanation from EIA of terms used in data
|-- model-metrics.pdf # Information on final models
│-- README.md # Project documentation
- Data is collected from the U.S. Energy Information Administration (EIA).
- Includes historical monthly energy prices and related economic indicators.
The project is divided into four main phases:
- Load and clean the dataset.
- Handle missing values and outliers.
- Visualize time series trends.
- Perform stationarity tests (ADF, KPSS).
- Examine correlations using heatmaps and scatter plots.
- Scale and normalize features.
- Create lag features for machine learning models.
- ARIMA/SARIMAX for traditional time series forecasting.
- VAR (Vector Autoregression) for multi-variable forecasting.
- Facebook Prophet for automated time series forecasting.
- Exponential Smoothing for trend-based forecasting.
- Cross-validation using one-step-ahead forecasting.
- Mean Absolute Error (MAE)
- Root Mean Squared Error (RMSE)
- Akaike Information Criterion (AIC)
- Best performing models were SARIMAX and Exponential Smoothing.
- Include additional economic indicators as exogenous variables.
- Experiment with deep learning models such as LSTMs.
- Improve feature engineering for better predictive power.
For any questions, feel free to reach out or open an issue in this repository.
Author: [Ben Takacs]