| 🌍Webpage | 📄Full Paper | 🎯ICLR 2026 |
Astraea is an open-source framework designed to optimize the inference efficiency of Video Diffusion Models. Our framework achieves up to 2.4×inference speedup on a single GPU with great scalability (up to 13.2× speedup on 8 GPUs) while achieving up to over 10 dB video quality compared to the state-of-the-art methods (<0.5% loss on VBench compared to baselines)
There is a short demo video of our algorithm running on an A100:
wan.mp4
Low resolution due to GitHub size limits. For higher resolution videos, please visit our website.
This repository provides a comprehensive implementation of Astraea applied to state-of-the-art t2v models: Wan2.1 and Hunyuan-Video. For Wan2.1, we used its 1.3B version.
Note
This project is a proprietary work developed in collaboration with Huawei. Due to the proprietary nature of certain components:
- We provide the full codebase and scripts for training the Evolutionary Timestep Searcher. This allows users to find optimal global scheduling for various video models.
- The fine-grained Token Selection (token-level optimization) mentioned in the original paper is part of a proprietary internal module and is not included in this public release.
git clone [https://github.com/YourUsername/Astraea-code.git](https://github.com/YourUsername/Astraea-code.git)
cd Astraea-code
Download the required model weights and place them in the ./models directory:
| Model | Source | Download Command |
|---|---|---|
| Wan2.1-T2V-1.3B | HuggingFace | huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir ./models/Wan2.1-T2V-1.3B |
| Hunyuan-Video | HuggingFace | huggingface-cli download Tencent/HunyuanVideo --local-dir ./models/HunyuanVideo |
We recommend using separate Conda environments for different model backends to avoid dependency conflicts.
- For wan2.1:
conda create -n astraea-wan python=3.10 -y
conda activate astraea-wan
pip install torch==2.6.0 torchvision --index-url [https://download.pytorch.org/whl/cu126](https://download.pytorch.org/whl/cu126)
pip install flash-attn==2.7.4.post1 --no-build-isolation
pip install -r requirements_wan.txt
- For hunyuan:
conda create -n astraea-hunyuan python=3.10 -y
conda activate astraea-hunyuan
# Follow specific requirements for Hunyuan-Video
pip install -r requirements_hunyuan.txt
Note: PyTorch installation varies by system. Please ensure you install the appropriate version for your hardware.
Astraea uses Evolutionary Algorithm to search for computation-worthy timesteps under a given computation budget. This section details how to train an evolutionary searcher for a video generation model and how to use searched timesteps to guide inferencing.
Astraea uses an evolutionary search strategy. It evaluates timestep candidates by comparing generated samples against "Ground Truth" videos (generated with original full timesteps) using
To sample original videos as references for Wan2.1, use:
# for wan2.1
python generate.py --save_ref \
--task 't2v-1.3B' \
--size '832*480' \
--frame_num 65 \
--base_seed 1024 \
--prompt_file 'prompts.txt' \
--ckpt_dir '.models/Wan2.1-T2V-1.3B' \
--save_dir '.outputs/ref/wan2.1/65x480p'
# for hunyuan-video
python3 sample_ea_hunyuan.py \
--save_ref True \
--video-size 544 960 \
--video-length 129 \
--seed 1024 \
--infer-steps 50 \
--prompt "A cat walks on the grass, realistic style." \
--flow-reverse \
--use-cpu-offload \
--save-path ./outputs/refTo train the evolutionary searcher for optimal timestep sequence under a given computation budget, use:
# for wan2.1
bash search_wan_scheduler.shThe --timestep argument specifies the computation budget. The --ref_videos argument specifies comparison references path.
# search_wan_scheduler.sh
python search_ea.py \
--outdir 'outputs/65x480p'\ # Directory for search logs and results
--time_step 25 \ # The target inference budget (searching for the best 25 steps)
--max_epochs 30 \ # Total iterations of the Evolutionary Algorithm
--population_num 50 \ # Number of timestep candidates in each generation
--mutation_num 20 \ # Number of offspring generated by random mutation
--crossover_num 15 \ # Number of offspring generated by merging top candidates
--base_seed 1024 \ # Seed used by the generative model during sampling
--prompt_file 'prompts.txt' \ # Path to the text file containing training prompts
--frame_num 65 \ # Video length (33 frames, following the 4n+1 rule)
--ref_videos '.outputs/ref/wan2.1/65x480p' \ # Directory containing the 50-step ground truth videos
--task 't2v-1.3B' \ # Model variant (Wan2.1-T2V-1.3B)
--size '832*480' \ # Resolution (width*height)
--ckpt_dir '.models/Wan2.1-T2V-1.3B' \ # Local path to pretrained weights
--sample_shift 8 \ # Flow-matching shift factor (optimized for Wan2.1)
--sample_guide_scale 6 \ # CFG (Classifier-Free Guidance) scale
--sample_steps 50 # Steps used to generate the reference videos (GT)# for hunyuan-video
bash search_hunyuan_scheduler.shNote: Make sure to use the same seed and other parameters like sample_shift as training to obtain the best generation quality.
To obtain the best timestep candidate, use:
python extract_best_candidate.py --log_path <path_to_search_log> --output_path <path_to_output_candidate>We can store and load the best timestep candidate as .npy file.
To sample videos with the timestep candidates, use:
# for wan2.1
python generate.py \
--task 't2v-1.3B' \
--size '832*480' \
--frame_num 65 \
--ckpt_dir '.models/wan/Wan2.1-T2V-1.3B' \
--save_dir '.outputs/65x480p/videos' \
--prompt_file 'prompts.txt' \
--base_seed 1024 \
--sample_shift 8 \
--sample_guide_scale 6 \
--ea_timesteps <path_to_timestep_npy_file>
# for hunyuan-video
python sample_ea_hunyuan.py \
--video-size 544 960 \
--video-length 129 \
--seed 1024 \
--infer-steps 50 \
--prompt "A cat walks on the grass, realistic style." \
--flow-reverse \
--use-cpu-offload \
--save-path ./results \
--token_timesteps 0 1 2 3 4 5 6 7 8 11 14 17 19 25 31 36 42 47 49\ # you can also use --token_timesteps_npy <path_to_timestep_npy_file>
We would like to express our sincere gratitude to the open-source community.
Special thanks to the following projects and libraries that this work builds upon: Wan2.1 HunyuanVideo AutoDiffusion
If you find this work useful for your research, please cite:
@inProceedings{liu2026astraea,
title={Astraea: A Token-wise Acceleration Framework for Video Diffusion Transformers},
author={Haosong Liu and Yuge Cheng and Wenxuan Miao and Zihan Liu and Aiyue Chen and Jing Lin and Yiwu Yao and Chen Chen and Jingwen Leng and Minyi Guo and Yu Feng},
year={2026},
booktitle = {The Fourteenth International Conference on Learning Representations (ICLR 2026)},
}