Skip to content

Thinklab-SJTU/MMPD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MMPD: Diverse Time Series Forecasting via Multi-Mode Patch Diffusion Loss (ICLR 2026)

This is the official implementation of "MMPD: Diverse Time Series Forecasting via Multi-Mode Patch Diffusion Loss", published at ICLR 2026.

🔥 Overview

In this work, we point out that the conventional MSE-centric training implicitly assumes the future follows an independent Gaussian distribution, which is oversimplified for real-world data, especially when similar historical patterns can lead to multiple diverse outcomes. Therefore, the loss function, rather than the model architecture, becomes the bottleneck of deep time series forecasting. To fill the gap, we propose MMPD loss, which uses the diffusion process to model complex distributions, and generates multiple diverse predictions with associated probabilities through a multi-mode inference algorithm.


MSE (Left) v.s. MMPD (Right) using the same backbone. MSE produces a single ambiguous prediction, while MMPD generates multiple sharp predictions with associated probabilities.

⚙️ Methodology

1. Loss from a Borader View

Conventional approach maps $x$ into a point estimate $\hat{y}=f_\theta(x)$, and computes $L(\hat{y}, y)$. To model complex distributions, we decouple the network into a backbone $h_\psi$ and a projector $g_\phi$, and treat the projector as part of the loss. This yields a composite loss between backbone output and target $Loss^\phi(H, y)$, enabling flexible modeling of complex distributions.


Loss from a broader view. The projector is part of the loss, allowing for more flexible loss design.

2. Patch Consistent MLP

MMPD loss is based on a patch-based diffusion process, and a patch consistent MLP is used as the denoiser. Different from conventional MLPs that independently denoise each patch condtioned on its corresponding token, patch consistent MLP also includes adjacent noisy patches as condition, ensuring consistency across patches.


Patch Consistent MLP. To predict the noise in patch j, adjacent noisy patches centered around j (colored in red) are also included as condition, ensuring consistency across denoised patches.

3. Multi-Mode Inference Algorithm

Observing the multi-mode pattern in diffusion samples, a multi-mode inference algorithm is devised to extract multiple predictions with probabilities. It fits an variational Gaussian Mixture Model (GMM) with evolving priors alongside the reverse process. At the end of the reverse process, multi-mode predictions are obtained simultaneously with diffusion samples.


The evolution of our multi-mode inference algorithm.

📊 Experiments

  1. Install dependencies with pip install -r requirements.txt.
  2. Download datasets from Google Drive and place them in the datasets folder. Most datasets are widely used benchmarks in time series forecasting, and the new dynamic is the first 500K rows from: Dynamical System Dataset at Kaggle.
  3. To get results of ETTh1, lookback=96, horizon=96, run
    python main_mmpd.py --data ETTh1 --in_len 336 --out_len 96
    
    Results and checkpoints will be saved in ./out/results/ and ./out/checkpoints/ respectively.
  4. To reproduce results of all datasets and settings, run bash scripts in ./scripts/. For example, to reproduce results of ETTh1, run
    bash scripts/etth1.sh
    
  5. When running on your own datasets, the inference may be time-consuming due to the diffusion process. You can set:
    • --testing False to shut down testing and only train the model and test it later.
    • --testing True --prob_pred False to only get deterministic metrics MSE and MAE like conventional methods.
    • If you want to get multi-mode predictions, you can set --testing True --prob_pred True, and adjust --sample_num,--num_sampling_steps and --gmm_iterations to balance the inference time and performance.

🧩 Custom Backbones or Losses

We encourage the community to apply MMPD Loss to their own backbones or develop new losses based on the backbone–loss decoupling framework. A step-by-step tutorial is provided in tutorials/ to show how to do this.

📜 Citation

If you find our work useful, please consider citing:

@inproceedings{
zhang2026mmpd,
title={{MMPD}: Diverse Time Series Forecasting via Multi-Mode Patch Diffusion Loss},
author={Yunhao Zhang and Wenyao Hu and Jiale Zheng and Lujia Pan and Junchi Yan},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=NEUgHT8dvH}
}

About

Official implementation of MMPD: Diverse Time Series Forecasting via Multi-Mode Patch Diffusion Loss (ICLR 2026).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors