Sunspot Time Series Forecasting

This repository presents a comprehensive time series analysis and forecasting project based on historical monthly sunspot observations. Sunspot activity is a well-studied natural phenomenon characterized by long-term cyclic behavior, making it a classic benchmark problem for time series modeling and forecasting.

The main objective of this project is to analyze the temporal structure of sunspot cycles and evaluate the performance of different forecasting approaches under a consistent and leakage-aware experimental setup. The workflow follows a standard data science pipeline, progressing from data understanding and exploratory analysis to feature construction and predictive modeling, with an emphasis on interpretability, temporal consistency, and fair model comparison.

Dataset Overview

Sunspots are temporary phenomena on the Sun’s photosphere that appear as dark spots compared to the surrounding areas. They are regions of reduced surface temperature caused by intense magnetic field activity that inhibits convection. Sunspots typically appear in pairs with opposite magnetic polarity, and their frequency varies according to an approximately 11-year solar cycle.

The dataset used in this project consists of monthly mean total sunspot numbers, representing the average number of observed sunspots per month over a long historical period. Due to its length and well-documented cyclical structure, this dataset is widely used in time series analysis, forecasting, and signal processing research.

Frequency: Monthly
Time span: January 1749 – August 2017
Target variable: Monthly mean total sunspot number

The raw dataset is stored at:

data/raw/sunspots.csv

Data Source and Acknowledgements

The sunspot dataset originates from the Solar Influences Data Analysis Center (SIDC), which is part of the Royal Observatory of Belgium and specializes in solar physics research. The data is publicly available and has been distributed through platforms such as Quandl.

Primary source: Solar Influences Data Analysis Center (SIDC)
Institution: Royal Observatory of Belgium
Reference: https://en.wikipedia.org/wiki/Sunspot

The authors acknowledge SIDC and related data providers for making this long-term solar activity dataset openly accessible.

Initial Data Inspection

Before applying any transformations or modeling techniques, the dataset is inspected to ensure its suitability for time series analysis. This initial inspection focuses on verifying:

Temporal continuity and correct chronological ordering
Presence of missing, duplicated, or invalid values
Overall scale, variability, and range of the target variable

Basic summary statistics and head–tail inspections confirm that the dataset is clean, numeric, and well-structured. No imputation or interpolation is required at this stage, allowing the analysis to proceed directly to exploratory data analysis without altering the original signal.

Exploratory Data Analysis (EDA)

Exploratory Data Analysis is conducted to understand the temporal dynamics, statistical properties, and anomaly behavior of the sunspot time series. The analysis focuses on visual inspection, temporal dependencies, distributional characteristics, and the identification of unusual patterns that may influence forecasting performance.

All EDA figures referenced in this section are stored under the assets/ directory.

Interactive Time Series Analysis

This visualization provides a global overview of monthly sunspot activity over time. The plot shows pronounced cyclic behavior with repeating peaks and troughs that align with known solar cycles. Variations in peak magnitudes across cycles suggest non-stationary variance, motivating the use of both trend-aware and nonlinear forecasting models.

Heat Map

The heat map visualizes sunspot intensity across months and years simultaneously. Vertical bands of high intensity reveal the temporal extent of solar maxima, while darker regions indicate prolonged solar minima. The lack of strong horizontal patterns confirms that intra-year seasonality is weak compared to long-term cyclic behavior.

Histogram and Box Plot Analysis

The distribution of sunspot values is right-skewed, with a high density of low-activity observations and a long tail corresponding to strong solar cycles. Box plot analysis further highlights the presence of extreme values, which can disproportionately affect error metrics during evaluation.

Autocorrelation Plot (ACF Plot)

The autocorrelation plot shows strong positive correlation across many lags, confirming long-term temporal dependence. This persistence indicates that past observations contain substantial predictive information even at distant time steps.

Lag-1 Scatter Plot

The lag-1 scatter plot demonstrates a nonlinear relationship between consecutive observations. While a general positive dependency exists, the dispersion increases at higher values, suggesting that linear assumptions may be insufficient for capturing the underlying dynamics.

Seasonal Pattern

Seasonal pattern analysis confirms that sunspot activity does not follow a strong monthly seasonal cycle. Instead, variability is dominated by multi-year oscillations, aligning with the physical nature of solar activity cycles.

Exponential Moving Average

The exponential moving average smooths short-term fluctuations while preserving long-term trends. This visualization highlights the gradual rise and fall of solar cycles and provides insight into the underlying signal structure without high-frequency noise.

Raw Data and Trends

Trend analysis reveals extended periods of increasing and decreasing sunspot activity. These slow-moving dynamics suggest that models capable of learning long-range dependencies may perform better than short-memory approaches.

Autocorrelation (ACF) and Partial Autocorrelation (PACF) Analysis

ACF and PACF plots jointly indicate that the series exhibits both long-memory behavior and complex lag dependencies. Significant correlations at large lags motivate the use of multiple lag-based and window-based features in later stages.

Sunspot Distribution by Days of the Week

As expected for a monthly aggregated astronomical dataset, no meaningful weekday-based pattern is observed. This result serves as a sanity check and confirms correct temporal handling of the data.

Anomaly Detection

This section focuses on identifying unusual or extreme patterns in sunspot activity that deviate from the expected temporal structure. Rather than treating anomalies as noise to be removed, the analysis aims to understand whether these extreme observations correspond to meaningful physical phenomena, such as periods of intense solar activity.

Anomaly detection is approached from both statistical decomposition and forecasting-based residual analysis, allowing anomalies to be examined from complementary perspectives.

Decomposition

Classical decomposition separates the series into trend, seasonal, and residual components. The residual component retains structured variance, suggesting that meaningful information remains beyond simple additive trend and seasonality modeling.

GRU Forecasting–Based Anomaly Detection

In this approach, anomalies are detected using a forecasting-residual strategy: a GRU model produces one-step-ahead predictions, and the absolute forecast error is monitored over time.
To avoid using a fixed global threshold (which would be unfair under changing cycle amplitudes), a dynamic threshold is constructed using an EWMA-smoothed error baseline. An observation is flagged as anomalous when its one-step GRU error rises above this smooth, adaptive threshold, capturing unexpected local deviations while remaining robust to regular solar-cycle behavior.

Anomalies detected: 43 / 3133
Anomaly rate: 1.372%

Transformer Forecasting–Based Anomaly Detection

A Transformer-based forecaster is used to detect anomalies by analyzing where the model fails to predict abrupt local changes. Compared to GRU, the Transformer tends to be more sensitive to subtle and rapid irregularities, especially around cycle peaks and transition regions, where short-term dynamics can shift quickly.

Anomalies are flagged when the forecast error becomes unusually high relative to the typical error behavior, indicating that the observed pattern is difficult to explain with the learned temporal structure.

Anomalies detected: 84 / 3133
Anomaly rate: 2.681%

Feature Engineering

In Colab, the raw monthly sunspot series is converted into a supervised learning table using leakage-safe transformations (only past values are used via shift()).

Input: data/raw/sunspots.csv
Output: data/processed/sunspots_train_engineered.csv, data/processed/sunspots_test_engineered.csv

Features Created

Calendar: Year
Cyclical month encoding:
[ Month_sin = \sin\left(\frac{2\pi \cdot Month}{12}\right), \quad Month_cos = \cos\left(\frac{2\pi \cdot Month}{12}\right) ]
Lag features: Lag_1, Lag_2, Lag_3, Lag_6, Lag_12, Lag_132
(132 months ≈ one solar cycle)
Rolling stats (computed on shifted series): Rolling_Mean_12, Rolling_Std_12, Rolling_Mean_132
Trend signals: Diff_1, Expanding_Mean

After dropping NaN rows introduced by lag/rolling operations, the dataset is split chronologically with an 80/20 ratio to create train and test sets.

Data Splitting Plot

The dataset is split chronologically into training and test sets using an 80/20 ratio, ensuring that future observations are never used during training.
The visualization highlights the clear temporal separation between the two partitions, mirroring real-world forecasting conditions and preventing information leakage across the split boundary.

Modeling Setup

All models are trained and evaluated under a shared, consistent experimental setup to ensure fair comparison.

Input: Feature-engineered datasets created in the previous step
(sunspots_train_engineered.csv, sunspots_test_engineered.csv)
Target: SunspotNumber
Split: Fixed chronological 80/20 train–test split
Preprocessing: Applied only on training data (then reused on test data where required)
Metrics: RMSE (primary), supported by visual prediction vs. ground-truth plots
Reproducibility: Fixed random seeds where applicable

This setup guarantees that performance differences arise from model behavior, not from data handling or evaluation inconsistencies.

Modeling

This section presents the modeling approaches evaluated in the project. A diverse set of models is considered, ranging from simple linear baselines to advanced nonlinear and deep learning architectures. All models are trained on the same feature-engineered dataset and evaluated under the experimental constraints described above.

Model 1 – Linear Regression, Ridge & Lasso (Regularized Linear Baselines)

These models provide interpretable linear baselines on the engineered feature set (lag features + rolling statistics). Since many predictors are highly correlated, regularization is required to stabilize coefficient estimates.

Standardization (required)

All linear models are trained on standardized features. Scaling is fitted only on the training set and then applied to the test set.

Xᵗʳᵃⁱⁿˢ = (Xᵗʳᵃⁱⁿ − μ) / σ
Xᵗᵉˢᵗˢ = (Xᵗᵉˢᵗ − μ) / σ

This step is mandatory because Ridge and Lasso penalties are scale-sensitive.

Linear Regression (OLS)

Ordinary Least Squares minimizes the residual sum of squares:

β̂ = arg min ‖ y − Xβ ‖²

OLS is simple and highly interpretable, but sensitive to multicollinearity among engineered features.

Ridge Regression (L2 regularization)

Ridge adds an ℓ₂ penalty to shrink coefficients smoothly:

β̂ʳⁱᵈᵍᵉ = arg min ( ‖ y − Xβ ‖² + α ‖ β ‖² )

• Reduces variance under correlated predictors
• Improves numerical stability
• Hyperparameter selected via 5-fold CV

Best Ridge parameter: α = 10.0

Lasso Regression (L1 regularization)

Lasso applies an ℓ₁ penalty, encouraging sparsity:

β̂ˡᵃˢˢᵒ = arg min ( ‖ y − Xβ ‖² + α ‖ β ‖₁ )

• Can set some coefficients exactly to zero
• Acts as implicit feature selection

Best Lasso parameter: α = 0.1

RMSE (lower is better)

Linear Regression: 22.7456
Lasso: 22.7500
Ridge: 22.7579

All three models produce nearly identical predictions, indicating that the engineered features already capture strong linear structure; regularization mainly affects coefficient stability rather than forecast shape.

Model 2 – Support Vector Regression (SVR, RBF Kernel)

Support Vector Regression (SVR) is used as a nonlinear baseline to capture smooth but complex relationships between engineered lag/rolling features and the target. We use the RBF kernel, which maps inputs into a higher-dimensional space and fits a maximum-margin regression function.

Standardization

SVR is highly sensitive to feature scales, so the same StandardScaler fitted on the training split is reused:

Xᵗʳᵃⁱⁿˢ = scaler(Xᵗʳᵃⁱⁿ)
Xᵗᵉˢᵗˢ = scaler(Xᵗᵉˢᵗ)

RBF-SVR Objective (conceptual)

SVR solves a regularized ε-insensitive regression problem:

minimize ½‖w‖² + C · Σ(ξᵢ + ξᵢ*)
subject to |yᵢ − f(xᵢ)| ≤ ε (with slack variables ξ)

Key hyperparameters:

C: penalty for errors (higher C → tighter fit, higher variance risk)
γ (gamma): RBF kernel width (smaller γ → smoother function)
ε (epsilon): size of the “no-penalty” tube around the regression curve

Hyperparameter Search (Expanded Grid)

A 3-fold GridSearchCV is performed over an expanded parameter grid:

Kernel: rbf
C: {50 … 3000}
γ: {scale, 0.001 … 0.1}
ε: {0.1 … 10}

Search cost:

1080 candidates × 3 folds = 3240 fits
Training time: 1913.03s

Best parameters:

C = 1500
γ = 0.001
ε = 3
kernel = rbf

Results

SVR RMSE: 22.7760
R²: 0.8934

The optimized SVR produces smooth predictions that follow the long-term cycle reasonably well, but it may still under-react to sharp local changes around extreme peaks due to the ε-insensitive loss and the smoothing effect of the RBF kernel.

Saved model:

V2-models/model_svr_optimized.pkl

Model 3 – Random Forest Regressor

Random Forest Regressor is evaluated as a strong nonlinear, tree-based ensemble model. By averaging predictions from many decision trees trained on bootstrapped samples, the model can capture complex feature interactions while remaining relatively robust to noise.

Unlike linear and kernel-based models, Random Forest does not require feature scaling, making it well-suited for the heterogeneous set of lag, rolling, and trend features used in this project.

Time-Aware Hyperparameter Optimization

To respect temporal ordering, hyperparameter tuning is performed using TimeSeriesSplit (3 folds) instead of random cross-validation. A narrow but expressive grid is explored:

n_estimators: number of trees
max_depth: tree depth (controls bias–variance trade-off)
min_samples_leaf / min_samples_split: regularization via minimum node sizes
max_features: fraction of features considered at each split

Search details:

432 parameter combinations
3 time-series folds → 1296 fits
Training time: 4577.15s

Best configuration:

n_estimators: 400
max_depth: 25
max_features: 0.5
min_samples_leaf: 3
min_samples_split: 2

Results

RMSE: 22.8724
R²: 0.8925

The Random Forest model produces stable predictions that follow the general structure of the sunspot cycles. However, its piecewise-constant nature limits its ability to smoothly track rapid transitions around extreme peaks.

Feature Importance Analysis

Feature importance scores confirm that short-term temporal dependence dominates prediction performance:

Top contributing features:

Lag_1 (0.46)
Lag_2 (0.20)
Rolling_Mean_12 (0.15)
Lag_3 (0.08)
Lag_6 (0.02)

This ranking highlights the critical role of recent history and short-term smoothing, while longer-horizon features contribute more subtly.

The trained model is saved under:

V2-models/model_random_forest.pkl

Model 4 – LightGBM (Gradient Boosting Decision Trees)

LightGBM is evaluated as a gradient boosting–based tree ensemble, which builds trees sequentially to correct the errors of previous models. Unlike Random Forest, which averages independent trees, LightGBM explicitly focuses on hard-to-predict regions of the feature space.

As a tree-based method, LightGBM does not require feature scaling and naturally handles the heterogeneous lag, rolling, and trend features used in this project.

Time-Aware Hyperparameter Optimization with Early Stopping

Hyperparameter tuning is performed using TimeSeriesSplit (3 folds) to preserve temporal ordering. A compact but expressive grid is explored:

learning_rate: boosting step size
num_leaves: controls tree complexity
max_depth: depth constraint
colsample_bytree: feature subsampling ratio

Grid search summary:

36 parameter combinations
3 time-series folds → 108 fits
Total search time: 58.09s

During training, early stopping is applied based on validation RMSE, automatically selecting the optimal number of boosting rounds and preventing overfitting.

Best configuration:

learning_rate: 0.05
num_leaves: 15
max_depth: 10
colsample_bytree: 0.8
Best iteration: 123

Results

RMSE: 23.3378
R²: 0.8881

Although LightGBM successfully captures nonlinear feature interactions and long-range dependencies (notably the solar-cycle-scale lag), its performance remains slightly below Random Forest and SVR in this setup. This suggests that, for this dataset, boosting-based models may overemphasize local corrections without consistently improving long-horizon generalization.

Feature Importance Analysis

Feature importance scores reveal a balanced reliance on both short-term memory and long-cycle structure:

Top contributing features:

Lag_1
Lag_132 (≈ one full solar cycle)
Rolling_Mean_12
Expanding_Mean
Lag_3

This confirms that gradient-boosted trees leverage information from both recent observations and long-term solar-cycle dynamics.

The trained model is saved under:

V2-models/model_lightgbm.pkl

Model 5 – MLP Regressor (Baseline Neural Network)

A baseline feed-forward neural network is evaluated using MLPRegressor to test whether a simple nonlinear neural model can improve upon tree-based and kernel-based approaches on the engineered feature set.

Because MLPs are sensitive to feature scales, the same StandardScaler fitted on the training split is reused:

Xᵗʳᵃⁱⁿˢ = scaler(Xᵗʳᵃⁱⁿ)
Xᵗᵉˢᵗˢ = scaler(Xᵗᵉˢᵗ)

Time-Aware Hyperparameter Optimization + Early Stopping

Hyperparameters are tuned with TimeSeriesSplit (3 folds) to preserve temporal ordering:

24 parameter combinations
3 folds → 72 fits
Training time: 369.98s

To prevent overfitting, the MLP uses built-in early stopping with:

validation_fraction = 0.1
n_iter_no_change = 20
tol = 1e-4

Best configuration:

hidden_layer_sizes: (100, 100)
activation: relu
solver: adam
alpha: 0.01
learning_rate_init: 0.005

Training Curve

The training loss decreases smoothly across epochs, indicating stable optimization under the selected architecture and learning rate. Early stopping prevents unnecessary iterations once improvement stalls.

Results

RMSE: 23.6061
R²: 0.8855

While the MLP captures some nonlinear relationships, its performance is weaker than the best classical baselines in this setup. This suggests that a shallow fully-connected network may struggle to represent the long-horizon temporal structure of sunspot cycles when trained on tabular engineered features.

The trained model is saved under:

V2-models/model_mlp.pkl

Model 5.1 – DNN (Advanced MLP, PyTorch)

To overcome the limitations of the baseline MLPRegressor, an advanced feed-forward neural network is implemented in PyTorch. The goal is to increase representational capacity while keeping training stable and preventing overfitting on engineered tabular features.

Architecture (What we built)

The network is a deep MLP with progressive dimensional reduction:

Linear(d → 128) + BatchNorm + ReLU + Dropout(0.3)
Linear(128 → 64) + BatchNorm + ReLU + Dropout(0.2)
Linear(64 → 32) + ReLU
Linear(32 → 1)

This design improves stability (BatchNorm), reduces overfitting (Dropout), and allows richer nonlinear transformations than the shallow baseline MLP.

Training Setup (How we trained)

Scaler: same StandardScaler fitted on train split (reused here)
Optimizer: AdamW (lr=0.01, weight_decay=1e-4)
Loss: MSE
LR Scheduler: ReduceLROnPlateau (factor=0.5, patience=10)
Early stopping: patience = 40 (training stopped at epoch 82)

Note: The run was executed on CPU (device=cpu), but the code supports CUDA when available.

Results

RMSE: 22.4704
R²: 0.8962

Compared to the baseline scikit-learn MLP, the PyTorch DNN achieves a clear improvement, indicating that deeper architectures with proper regularization and scheduling can better model the nonlinear structure of sunspot cycles even in a tabular feature setting.

Saved model:

V2-models/model_pytorch_mlp.pth

Model 6 – K-Nearest Neighbors (KNN, PyTorch – GPU Accelerated)

GPU-Accelerated KNN Architecture

This model implements a custom K-Nearest Neighbors regressor in PyTorch, designed to run efficiently on GPU. Instead of learning parameters, the model performs prediction through distance-based retrieval and weighted aggregation.

The inference pipeline follows these steps:

Input Tensor (X_test)
Test samples are transferred to GPU in batch form.
Parallel Distance Computation
Pairwise distances between test samples and all training samples are computed using
Manhattan distance via torch.cdist(p=1), producing a full distance matrix
( N_{test} \times N_{train} ).
Top-K Neighbor Selection
The indices of the K nearest neighbors are selected using torch.topk, enabling fast nearest-neighbor retrieval on GPU.
Inverse Distance Weighting
Neighbor contributions are weighted as
( w_i = \frac{1}{d_i + \varepsilon} ),
followed by normalization to ensure numerical stability.
Weighted Aggregation
Final predictions are obtained via a weighted sum of neighbor target values.

This architecture allows brute-force KNN to scale efficiently by leveraging GPU parallelism.

Results and K Optimization

The number of neighbors (K) is selected empirically using a two-stage search strategy:

Coarse search: (K = 15 \rightarrow 59)
Fine search: refinement around the best coarse value

Optimal configuration:

Best K: 27
RMSE: 24.39
R²: 0.8778

The prediction plot shows that KNN can roughly follow the overall shape of sunspot cycles but tends to oversmooth peaks and troughs, limiting its ability to model long-horizon nonlinear dynamics.

Overall, despite GPU acceleration and weighted aggregation, the instance-based nature of KNN makes it less effective than tree-based and neural approaches for this dataset.

Model 7 – Stacked LSTM (PyTorch)

If we consider the tested model family in this project, the PyTorch Stacked LSTM achieved the best overall forecasting performance on the held-out test period.

Stacked LSTM Architecture

A 2-layer stacked LSTM is used to model temporal dynamics, followed by a small regression head:

Input shape: (batch, seq_len=1, features)
LSTM: 2 layers, hidden_dim = 128, internal dropout = 0.2
Head: Linear(128→64) → ReLU → Dropout(0.1) → Linear(64→1)

Note: In this implementation, seq_len = 1 because the LSTM consumes the engineered feature vector as a single-step sequence tensor.

Training Setup

Device: CUDA (GPU)
Loss: MSE
Optimizer: AdamW (lr=0.001, weight_decay=1e-5)
LR Scheduler: ReduceLROnPlateau (factor=0.5, patience=10)
Early stopping: patience = 20 (stopped at epoch 39)

Training Curves + Forecast Results (single figure)

The combined figure reports both the learning dynamics (train/validation loss) and the final prediction vs. ground truth on the test period. Training converges rapidly in the first epochs, then stabilizes; early stopping prevents overfitting once validation loss stops improving.

Results

RMSE: 22.3363
R²: 0.8975

The LSTM tracks both the long-term solar-cycle structure and short-term fluctuations more reliably than the tabular baselines, resulting in the strongest generalization among the evaluated models.

Saved model:

V2-models/model_pytorch_lstm.pth

Final Model Comparison

To provide a clear and fair evaluation, all trained models are compared using RMSE on the same held-out test set. Lower RMSE indicates better forecasting accuracy.

RMSE Summary (lower is better)

Model	RMSE
PyTorch LSTM	22.3363
PyTorch MLP (DNN)	22.4704
Linear Regression	22.7456
Lasso Regression	22.7500
Ridge Regression	22.7579
SVR (Optimized)	22.7760
Random Forest	22.8724
LightGBM	23.3378
MLP (sklearn)	23.6061
PyTorch KNN	24.3876

Key Observations

PyTorch Stacked LSTM achieves the best overall performance, confirming the advantage of sequence-aware neural architectures for modeling long-term solar cycles.
Advanced PyTorch MLP performs competitively, outperforming most classical models despite operating on tabular features.
Linear models (OLS, Ridge, Lasso) show surprisingly strong baselines due to effective feature engineering.
Tree-based ensembles (Random Forest, LightGBM) capture nonlinearities but slightly underperform sequence models.
KNN, even with GPU acceleration, struggles to generalize long-horizon temporal dynamics.

Conclusion

The results demonstrate that while strong feature engineering enables classical models to perform well, deep learning models with temporal inductive bias—especially LSTM—provide the most accurate and robust forecasts for sunspot time series data.

Project Scope and Reproducibility

All experiments in this repository are conducted using a fixed train/test split and consistent preprocessing pipelines to ensure fair comparison across models. Trained models, feature-engineered datasets, and visualization assets are preserved to support reproducibility and transparent evaluation.

The project is structured to allow easy extension with alternative feature sets, forecasting horizons, or sequence-based architectures.

Author Note

This repository is intended as a complete and self-contained time series forecasting study, suitable for academic review, portfolio demonstration, and further experimentation.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
V2-models		V2-models
assets		assets
data		data
notebooks		notebooks
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Sunspot Time Series Forecasting

Table of Contents

Dataset Overview

Data Source and Acknowledgements

Initial Data Inspection

Exploratory Data Analysis (EDA)

Interactive Time Series Analysis

Heat Map

Histogram and Box Plot Analysis

Autocorrelation Plot (ACF Plot)

Lag-1 Scatter Plot

Seasonal Pattern

Exponential Moving Average

Raw Data and Trends

Autocorrelation (ACF) and Partial Autocorrelation (PACF) Analysis

Sunspot Distribution by Days of the Week

Anomaly Detection

Decomposition

GRU Forecasting–Based Anomaly Detection

Transformer Forecasting–Based Anomaly Detection

Feature Engineering

Features Created

Data Splitting Plot

Modeling Setup

Modeling

Model 1 – Linear Regression, Ridge & Lasso (Regularized Linear Baselines)

Standardization (required)

Linear Regression (OLS)

Ridge Regression (L2 regularization)

Lasso Regression (L1 regularization)

RMSE (lower is better)

Model 2 – Support Vector Regression (SVR, RBF Kernel)

Standardization

RBF-SVR Objective (conceptual)

Hyperparameter Search (Expanded Grid)

Results

Model 3 – Random Forest Regressor

Time-Aware Hyperparameter Optimization

Results

Feature Importance Analysis

Model 4 – LightGBM (Gradient Boosting Decision Trees)

Time-Aware Hyperparameter Optimization with Early Stopping

Results

Feature Importance Analysis

Model 5 – MLP Regressor (Baseline Neural Network)

Time-Aware Hyperparameter Optimization + Early Stopping

Training Curve

Results

Model 5.1 – DNN (Advanced MLP, PyTorch)

Architecture (What we built)

Training Setup (How we trained)

Results

Model 6 – K-Nearest Neighbors (KNN, PyTorch – GPU Accelerated)

GPU-Accelerated KNN Architecture

Results and K Optimization

Model 7 – Stacked LSTM (PyTorch)

Stacked LSTM Architecture

Training Setup

Training Curves + Forecast Results (single figure)

Results

Final Model Comparison

RMSE Summary (lower is better)

Key Observations

Conclusion

Project Scope and Reproducibility

Author Note

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Packages