This is the official implementation of "MMPD: Diverse Time Series Forecasting via Multi-Mode Patch Diffusion Loss", published at ICLR 2026.
In this work, we point out that the conventional MSE-centric training implicitly assumes the future follows an independent Gaussian distribution, which is oversimplified for real-world data, especially when similar historical patterns can lead to multiple diverse outcomes. Therefore, the loss function, rather than the model architecture, becomes the bottleneck of deep time series forecasting. To fill the gap, we propose MMPD loss, which uses the diffusion process to model complex distributions, and generates multiple diverse predictions with associated probabilities through a multi-mode inference algorithm.

MSE (Left) v.s. MMPD (Right) using the same backbone. MSE produces a single ambiguous prediction, while MMPD generates multiple sharp predictions with associated probabilities.
Conventional approach maps

Loss from a broader view. The projector is part of the loss, allowing for more flexible loss design.
MMPD loss is based on a patch-based diffusion process, and a patch consistent MLP is used as the denoiser. Different from conventional MLPs that independently denoise each patch condtioned on its corresponding token, patch consistent MLP also includes adjacent noisy patches as condition, ensuring consistency across patches.

Patch Consistent MLP. To predict the noise in patch j, adjacent noisy patches centered around j (colored in red) are also included as condition, ensuring consistency across denoised patches.
Observing the multi-mode pattern in diffusion samples, a multi-mode inference algorithm is devised to extract multiple predictions with probabilities. It fits an variational Gaussian Mixture Model (GMM) with evolving priors alongside the reverse process. At the end of the reverse process, multi-mode predictions are obtained simultaneously with diffusion samples.

The evolution of our multi-mode inference algorithm.
- Install dependencies with
pip install -r requirements.txt. - Download datasets from Google Drive and place them in the
datasetsfolder. Most datasets are widely used benchmarks in time series forecasting, and the new dynamic is the first 500K rows from: Dynamical System Dataset at Kaggle. - To get results of ETTh1, lookback=96, horizon=96, run
Results and checkpoints will be saved in
python main_mmpd.py --data ETTh1 --in_len 336 --out_len 96./out/results/and./out/checkpoints/respectively. - To reproduce results of all datasets and settings, run bash scripts in
./scripts/. For example, to reproduce results of ETTh1, runbash scripts/etth1.sh - When running on your own datasets, the inference may be time-consuming due to the diffusion process. You can set:
--testing Falseto shut down testing and only train the model and test it later.--testing True --prob_pred Falseto only get deterministic metrics MSE and MAE like conventional methods.- If you want to get multi-mode predictions, you can set
--testing True --prob_pred True, and adjust--sample_num,--num_sampling_stepsand--gmm_iterationsto balance the inference time and performance.
We encourage the community to apply MMPD Loss to their own backbones or develop new losses based on the backbone–loss decoupling framework. A step-by-step tutorial is provided in tutorials/ to show how to do this.
If you find our work useful, please consider citing:
@inproceedings{
zhang2026mmpd,
title={{MMPD}: Diverse Time Series Forecasting via Multi-Mode Patch Diffusion Loss},
author={Yunhao Zhang and Wenyao Hu and Jiale Zheng and Lujia Pan and Junchi Yan},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=NEUgHT8dvH}
}