MMPD: Diverse Time Series Forecasting via Multi-Mode Patch Diffusion Loss (ICLR 2026)

This is the official implementation of "MMPD: Diverse Time Series Forecasting via Multi-Mode Patch Diffusion Loss", published at ICLR 2026.

🔥 Overview

In this work, we point out that the conventional MSE-centric training implicitly assumes the future follows an independent Gaussian distribution, which is oversimplified for real-world data, especially when similar historical patterns can lead to multiple diverse outcomes. Therefore, the loss function, rather than the model architecture, becomes the bottleneck of deep time series forecasting. To fill the gap, we propose MMPD loss, which uses the diffusion process to model complex distributions, and generates multiple diverse predictions with associated probabilities through a multi-mode inference algorithm.

MSE (Left) v.s. MMPD (Right) using the same backbone. MSE produces a single ambiguous prediction, while MMPD generates multiple sharp predictions with associated probabilities.

⚙️ Methodology

1. Loss from a Borader View

Conventional approach maps $x$ into a point estimate $\hat{y}=f_\theta(x)$, and computes $L(\hat{y}, y)$. To model complex distributions, we decouple the network into a backbone $h_\psi$ and a projector $g_\phi$, and treat the projector as part of the loss. This yields a composite loss between backbone output and target $Loss^\phi(H, y)$, enabling flexible modeling of complex distributions.

Loss from a broader view. The projector is part of the loss, allowing for more flexible loss design.

2. Patch Consistent MLP

MMPD loss is based on a patch-based diffusion process, and a patch consistent MLP is used as the denoiser. Different from conventional MLPs that independently denoise each patch condtioned on its corresponding token, patch consistent MLP also includes adjacent noisy patches as condition, ensuring consistency across patches.

Patch Consistent MLP. To predict the noise in patch j, adjacent noisy patches centered around j (colored in red) are also included as condition, ensuring consistency across denoised patches.

3. Multi-Mode Inference Algorithm

Observing the multi-mode pattern in diffusion samples, a multi-mode inference algorithm is devised to extract multiple predictions with probabilities. It fits an variational Gaussian Mixture Model (GMM) with evolving priors alongside the reverse process. At the end of the reverse process, multi-mode predictions are obtained simultaneously with diffusion samples.

The evolution of our multi-mode inference algorithm.

📊 Experiments

Install dependencies with pip install -r requirements.txt.
Download datasets from Google Drive and place them in the datasets folder. Most datasets are widely used benchmarks in time series forecasting, and the new dynamic is the first 500K rows from: Dynamical System Dataset at Kaggle.
To get results of ETTh1, lookback=96, horizon=96, run
```
python main_mmpd.py --data ETTh1 --in_len 336 --out_len 96
```
Results and checkpoints will be saved in ./out/results/ and ./out/checkpoints/ respectively.
To reproduce results of all datasets and settings, run bash scripts in ./scripts/. For example, to reproduce results of ETTh1, run
```
bash scripts/etth1.sh
```
When running on your own datasets, the inference may be time-consuming due to the diffusion process. You can set:
- --testing False to shut down testing and only train the model and test it later.
- --testing True --prob_pred False to only get deterministic metrics MSE and MAE like conventional methods.
- If you want to get multi-mode predictions, you can set --testing True --prob_pred True, and adjust --sample_num,--num_sampling_steps and --gmm_iterations to balance the inference time and performance.

🧩 Custom Backbones or Losses

We encourage the community to apply MMPD Loss to their own backbones or develop new losses based on the backbone–loss decoupling framework. A step-by-step tutorial is provided in tutorials/ to show how to do this.

📜 Citation

If you find our work useful, please consider citing:

@inproceedings{
zhang2026mmpd,
title={{MMPD}: Diverse Time Series Forecasting via Multi-Mode Patch Diffusion Loss},
author={Yunhao Zhang and Wenyao Hu and Jiale Zheng and Lujia Pan and Junchi Yan},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=NEUgHT8dvH}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
data_provider		data_provider
exp		exp
metrics		metrics
models		models
scripts		scripts
tutorials		tutorials
utils		utils
README.md		README.md
main_mmpd.py		main_mmpd.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MMPD: Diverse Time Series Forecasting via Multi-Mode Patch Diffusion Loss (ICLR 2026)

🔥 Overview

⚙️ Methodology

1. Loss from a Borader View

2. Patch Consistent MLP

3. Multi-Mode Inference Algorithm

📊 Experiments

🧩 Custom Backbones or Losses

📜 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MMPD: Diverse Time Series Forecasting via Multi-Mode Patch Diffusion Loss (ICLR 2026)

🔥 Overview

⚙️ Methodology

1. Loss from a Borader View

2. Patch Consistent MLP

3. Multi-Mode Inference Algorithm

📊 Experiments

🧩 Custom Backbones or Losses

📜 Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages