Tianrui Feng1, Zhi Li2, Shuo Yang2, Haocheng Xi2, Muyang Li3, Xiuyu Li1, Lvmin Zhang4, Keting Yang5, Kelly Peng6, Song Han7, Maneesh Agrawala4, Kurt Keutzer2, Akio Kodaira8, Chenfeng Xu†,1
1UT Austin, 2UC Berkeley, 3Nunchaku AI, 4Stanford University, 5Independent Researcher, 6First Intelligence, 7MIT, 8Shizhuku AI
† Project lead, corresponding to xuchenfeng@utexas.edu
StreamDiffusionV2 is an open-source interactive diffusion pipeline for real-time streaming applications. It scales across diverse GPU setups, supports flexible denoising steps, and delivers high FPS for creators and platforms. Further details are available on our project homepage.
- [2026-01-26] 🎉 StreamDiffusionV2 is accepted by MLSys 2026!
- [2025-11-10] 🚀 We have released our paper at arXiv. Check it for more details!
- [2025-10-18] Release our model checkpoint on huggingface.
- [2025-10-06] 🔥 Our StreamDiffusionV2 is publicly released! Check our project homepage for more details.
- OS: Linux with NVIDIA GPU
- CUDA-compatible GPU and drivers
conda create -n stream python=3.10.0
conda activate stream
# Require CUDA 12.4 or above, please check via `nvcc -V`
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt
python setup.py develop# 1.3B Model
huggingface-cli download --resume-download Wan-AI/Wan2.1-T2V-1.3B --local-dir wan_models/Wan2.1-T2V-1.3B
huggingface-cli download --resume-download jerryfeng/StreamDiffusionV2 --local-dir ./ckpts --include "wan_causal_dmd_v2v/*"
# 14B Model
huggingface-cli download --resume-download Wan-AI/Wan2.1-T2V-14B --local-dir wan_models/Wan2.1-T2V-14B
huggingface-cli download --resume-download jerryfeng/StreamDiffusionV2 --local-dir ./ckpts --include "wan_causal_dmd_v2v_14b/*"We use the 14B model from CausVid-Plus for offline inference demo.
python streamv2v/inference.py \
--config_path configs/wan_causal_dmd_v2v.yaml \
--checkpoint_folder ckpts/wan_causal_dmd_v2v \
--output_folder outputs/ \
--prompt_file_path examples/original.mp4 \
--video_path examples/original.mp4 \
--height 480 \
--width 832 \
--fps 16 \
--step 2Note: --step sets how many denoising steps are used during inference.
torchrun --nproc_per_node=2 --master_port=29501 streamv2v/inference_pipe.py \
--config_path configs/wan_causal_dmd_v2v.yaml \
--checkpoint_folder ckpts/wan_causal_dmd_v2v \
--output_folder outputs/ \
--prompt_file_path examples/original.mp4 \
--video_path examples/original.mp4 \
--height 480 \
--width 832 \
--fps 16 \
--step 2
# --schedule_block # optional: enable block schedulingNote: --step sets how many denoising steps are used during inference. Enabling --schedule_block can provide optimal throughput.
Adjust --nproc_per_node to your GPU count. For different resolutions or FPS, change --height, --width, and --fps accordingly.
A minimal web demo is available under demo/. For setup and startup, please refer to demo.
- Access in a browser after startup:
http://0.0.0.0:7860orhttp://localhost:7860
- Demo and inference pipeline.
- Dynamic scheduler for various workload.
- Training code.
- FP8 support.
- TensorRT support.
StreamDiffusionV2 is inspired by the prior works StreamDiffusion and StreamV2V. Our Causal DiT builds upon CausVid, and the rolling KV cache design is inspired by Self-Forcing.
We are grateful to the team members of StreamDiffusion for their support. We also thank First Intelligence and Daydream team for their great feedback.
We also especially thank DayDream team for the great collaboration and incorporating our StreamDiffusionV2 pipeline into their cool Demo UI.
If you find this repository useful in your research, please consider giving a star ⭐ or a citation.
@article{feng2025streamdiffusionv2,
title={StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation},
author={Feng, Tianrui and Li, Zhi and Yang, Shuo and Xi, Haocheng and Li, Muyang and Li, Xiuyu and Zhang, Lvmin and Yang, Keting and Peng, Kelly and Han, Song and others},
journal={arXiv preprint arXiv:2511.07399},
year={2025}
}

