The foreign exchange (FX) market is one of the most liquid and data-rich environments in finance. Among currency pairs, EUR/CHF is particularly relevant due to its economic and geopolitical importance in Europe and Switzerland.
This project investigates whether short-term forecasting at horizon T+1 (one day ahead) is feasible using both classical statistical approaches and modern machine learning techniques.
The work follows a full end-to-end pipeline: from data collection and cleaning, to baseline models, statistical benchmarks, and advanced ML implementations, with a focus on financial interpretation**.
The project is designed around two central goals:
- Forecasting objective: evaluate whether the daily EUR/CHF exchange rate returns can be predicted one day ahead.
- Methodological comparison: benchmark classical statistical models (ARIMA, GARCH, Holt-Winters) against machine learning models (Ridge Regression, Random Forest, XGBoost, Stacking).
The ultimate aim is not only to test predictive power but also to draw lessons on the limits of predictability in FX markets and the conditions under which models provide business value (e.g., risk sizing, volatility awareness).
The project follows a structured 9-part pipeline to ensure clarity and reproducibility:
-
Data collection
- Source: Yahoo Finance (
EURCHF=X). - Period: 2015–2025 (≈ 2,800 business days).
- Source: Yahoo Finance (
-
Data preparation
- Compute log-returns for stationarity.
- Perform chronological train/test split (2015–2022 for training, 2023–2025 for testing).
- Create engineered features: lagged returns, rolling means, rolling volatilities.
-
Modeling
- Baselines: Naïve, Moving Average, Simple Exponential Smoothing (SES).
- Statistical models: ARIMA, SARIMA, GARCH, Holt-Winters.
- Machine Learning (ML) baselines: Ridge, Lasso.
- Advanced ML: Random Forest, Gradient Boosting, XGBoost, LightGBM, Stacking & Blending.
-
Evaluation
- Metrics: RMSE, MAE (MAPE discarded due to near-zero returns).
- Validation: expanding/rolling splits and walk-forward backtesting.
- Diagnostics: residual analysis, volatility regime breakdown, feature importance.
The repository is organized to ensure clarity, modularity, and reproducibility:
eur_chf_forecasting/
│
├── data/
│ ├── raw/ # original downloaded data (Yahoo Finance, untouched)
│ ├── processed/ # cleaned datasets, train/test splits, engineered features
│
├── src/ # Python scripts (Part1.py … Part8.py)
│
│
│
├── results/
│ ├── baseline/ # outputs from Naïve, MA, SES
│ ├── stats/ # ARIMA, SARIMA, Holt-Winters results
│ ├── ml_baseline/ # Linear Regression, SES vs ML comparisons
│ ├── ml_advanced/ # RF, XGB, LGBM, Stacking, Blending
│ ├── final/ # consolidated evaluation (Part 8)
│
├── reports/
│ ├── Report.pdf # final report (exported from Word/LaTeX)
│ ├── figs/ # main figures (leaderboard, forecasts, residuals)
│
├── environment.yml # Conda environment for reproducibility
├── requirements.txt # alternative (pip) dependencies
├── LICENSE # license file (e.g., MIT)
└── README.md # project documentation (this file)
Key principles
- Clear separation between raw and processed data → ensures traceability.
- Scripts (
src/) generate results reproducibly and save outputs inresults/. - Figures and final report are centralized in
reports/. - Environment files (
environment.yml,requirements.txt) guarantee that anyone can recreate the same Python environment.
The analysis compared baselines, classical statistical models, and machine learning models on the EUR/CHF daily log-returns (2015–2025).
-
Best overall models:
- SES (Simple Exponential Smoothing) and ARIMA(2,0,1) achieved the lowest RMSE and MAE.
- Both are simple, robust, and highly competitive benchmarks.
-
Machine Learning (ML):
- Linear Ridge performed on par with SES/ARIMA.
- Random Forest and XGBoost captured non-linearities but did not significantly improve over statistical models.
- Stacking (RF + GBDT with Ridge meta-model) matched SES/ARIMA but did not outperform them.
-
Conclusion:
In noisy FX markets, simplicity and robustness outperform complexity unless enriched with exogenous features (macro variables, volatility indices, calendar effects).
| Model | RMSE | MAE | Δ vs SES |
|---|---|---|---|
| SES | 0.003321 | 0.002506 | — |
| ARIMA (2,0,1) | 0.003322 | 0.002509 | ≈0% |
| Ridge | 0.003342 | 0.002515 | +0.5% |
| RandomForest | 0.003362 | 0.002528 | +1.1% |
| XGBoost | 0.003410 | 0.002556 | +2.5% |
| LightGBM | 0.003569 | 0.002676 | +7.3% |
| Stacking | 0.003324 | 0.002506 | ≈0% |
SES and ARIMA remain the bests, with Ridge and Stacking as close challengers.
Complex models add little value without exogenous variables.
The following figures are suggested to showcase the project results.
They are stored in /reports/figs/ and can be included in presentations or dashboards.
This figure shows the historical EUR/CHF daily exchange rate from 2015 to 2025.
It highlights periods of volatility and long-term stability, providing the context for forecasting challenges.
The histogram of log-returns reveals a sharp peak around zero and heavy tails.
This confirms that FX returns are non-normal and subject to extreme movements, making predictions difficult.
This bar chart compares the out-of-sample RMSE of different models.
It illustrates that advanced ML models (Random Forest, XGBoost, Stacking) outperform classical baselines like SES and ARIMA.
This overlay shows predictions from the top-3 models compared to the actual EUR/CHF returns.
While short-term noise remains high, the models capture directional shifts and reduce overall forecast error.
These histograms display the residuals of the best models after forecasting.
They are centered around zero, but still show fat tails, underlining the inherent unpredictability of FX returns.
This repository is designed to ensure full reproducibility of results.
Two options are available depending on whether you want an exact or lightweight setup.
git clone https://github.com/EMen11/EUR_CHF_Forecasting.git
cd EUR_CHF_ForecastingOption A — Exact reproduction (Conda) Recreates the full environment with all dependencies and versions.
conda env create -f environment.yml
conda activate forecastOption B — Lightweight setup (Pip) Installs only the core libraries needed to run the pipeline.
pip install -r requirements.txtExecute the scripts sequentially to reproduce all datasets, metrics, and figures:
python src/Part1.py
python src/Part2.py
python src/Part3.py
python src/Part4.py
python src/Part5.py
python src/Part6.py
python src/Part7.py
python src/Part8.py- Processed datasets →
/data/processed/ - Model predictions & metrics →
/results/(each chapter has its own folder +figs/) - Curated figures for presentation →
/reports/figs/ - Final consolidated report →
/reports/Report.pdf
Note on reproducibility
- Use
environment.ymlfor a guaranteed identical setup. - Use
requirements.txtfor a clean minimal install. - If you update dependencies, don’t forget to regenerate these files:
conda env export > environment.yml
pip freeze > requirements.txt



