Astraea: A Token-wise Acceleration Framework for Video Diffusion Transformers

| 🌍Webpage | 📄Full Paper | 🎯ICLR 2026 |

Astraea is an open-source framework designed to optimize the inference efficiency of Video Diffusion Models. Our framework achieves up to 2.4×inference speedup on a single GPU with great scalability (up to 13.2× speedup on 8 GPUs) while achieving up to over 10 dB video quality compared to the state-of-the-art methods (<0.5% loss on VBench compared to baselines)

There is a short demo video of our algorithm running on an A100:

wan.mp4

Low resolution due to GitHub size limits. For higher resolution videos, please visit our website.

This repository provides a comprehensive implementation of Astraea applied to state-of-the-art t2v models: Wan2.1 and Hunyuan-Video. For Wan2.1, we used its 1.3B version.

Note

This project is a proprietary work developed in collaboration with Huawei. Due to the proprietary nature of certain components:

We provide the full codebase and scripts for training the Evolutionary Timestep Searcher. This allows users to find optimal global scheduling for various video models.
The fine-grained Token Selection (token-level optimization) mentioned in the original paper is part of a proprietary internal module and is not included in this public release.

🛠️ How to run?

Clone the Repository

git clone [https://github.com/YourUsername/Astraea-code.git](https://github.com/YourUsername/Astraea-code.git)
cd Astraea-code

Model Weights

Download the required model weights and place them in the ./models directory:

Model	Source	Download Command
Wan2.1-T2V-1.3B	HuggingFace	`huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir ./models/Wan2.1-T2V-1.3B`
Hunyuan-Video	HuggingFace	`huggingface-cli download Tencent/HunyuanVideo --local-dir ./models/HunyuanVideo`

Install Dependencies

We recommend using separate Conda environments for different model backends to avoid dependency conflicts.

For wan2.1:

conda create -n astraea-wan python=3.10 -y
conda activate astraea-wan
pip install torch==2.6.0 torchvision --index-url [https://download.pytorch.org/whl/cu126](https://download.pytorch.org/whl/cu126)
pip install flash-attn==2.7.4.post1 --no-build-isolation
pip install -r requirements_wan.txt

For hunyuan:

conda create -n astraea-hunyuan python=3.10 -y
conda activate astraea-hunyuan
# Follow specific requirements for Hunyuan-Video
pip install -r requirements_hunyuan.txt

Note: PyTorch installation varies by system. Please ensure you install the appropriate version for your hardware.

🚀 Training and Evaluation

Astraea uses Evolutionary Algorithm to search for computation-worthy timesteps under a given computation budget. This section details how to train an evolutionary searcher for a video generation model and how to use searched timesteps to guide inferencing.

1. Prepare Training Data

Astraea uses an evolutionary search strategy. It evaluates timestep candidates by comparing generated samples against "Ground Truth" videos (generated with original full timesteps) using $MSE$ as the fitness metric. The prompts used for training are selected from Vbench Prompts.

To sample original videos as references for Wan2.1, use:

# for wan2.1
python generate.py  --save_ref \
--task 't2v-1.3B' \
--size '832*480' \
--frame_num 65 \
--base_seed 1024 \
--prompt_file 'prompts.txt' \
--ckpt_dir '.models/Wan2.1-T2V-1.3B' \
--save_dir '.outputs/ref/wan2.1/65x480p'

# for hunyuan-video
python3 sample_ea_hunyuan.py \
--save_ref True \
--video-size 544 960 \
--video-length 129 \
--seed 1024 \
--infer-steps 50 \
--prompt "A cat walks on the grass, realistic style." \
--flow-reverse \
--use-cpu-offload \
--save-path ./outputs/ref

2. Train Evolutionary Searcher

To train the evolutionary searcher for optimal timestep sequence under a given computation budget, use:

# for wan2.1
bash search_wan_scheduler.sh

The --timestep argument specifies the computation budget. The --ref_videos argument specifies comparison references path.

# search_wan_scheduler.sh
python search_ea.py \
      --outdir 'outputs/65x480p'\                          # Directory for search logs and results
      --time_step 25 \                                     # The target inference budget (searching for the best 25 steps)
      --max_epochs 30 \                                    # Total iterations of the Evolutionary Algorithm
      --population_num 50 \                                # Number of timestep candidates in each generation
      --mutation_num 20 \                                  # Number of offspring generated by random mutation
      --crossover_num 15 \                                 # Number of offspring generated by merging top candidates
      --base_seed 1024 \                                   # Seed used by the generative model during sampling
      --prompt_file 'prompts.txt' \                        # Path to the text file containing training prompts
      --frame_num 65 \                                     # Video length (33 frames, following the 4n+1 rule)
      --ref_videos '.outputs/ref/wan2.1/65x480p' \         # Directory containing the 50-step ground truth videos
      --task 't2v-1.3B' \                                  # Model variant (Wan2.1-T2V-1.3B)
      --size '832*480' \                                   # Resolution (width*height)
      --ckpt_dir '.models/Wan2.1-T2V-1.3B' \               # Local path to pretrained weights
      --sample_shift 8 \                                   # Flow-matching shift factor (optimized for Wan2.1)
      --sample_guide_scale 6 \                             # CFG (Classifier-Free Guidance) scale
      --sample_steps 50                                    # Steps used to generate the reference videos (GT)

# for hunyuan-video
bash search_hunyuan_scheduler.sh

Note: Make sure to use the same seed and other parameters like sample_shift as training to obtain the best generation quality.

3. Inference with Searched Timesteps

To obtain the best timestep candidate, use:

python extract_best_candidate.py --log_path <path_to_search_log> --output_path <path_to_output_candidate>

We can store and load the best timestep candidate as .npy file.

To sample videos with the timestep candidates, use:

# for wan2.1
python generate.py \
      --task 't2v-1.3B' \
      --size '832*480' \
      --frame_num 65 \
      --ckpt_dir '.models/wan/Wan2.1-T2V-1.3B' \
      --save_dir '.outputs/65x480p/videos' \
      --prompt_file 'prompts.txt' \
      --base_seed 1024 \
      --sample_shift 8 \
      --sample_guide_scale 6 \
      --ea_timesteps <path_to_timestep_npy_file>

# for hunyuan-video
python sample_ea_hunyuan.py \
      --video-size 544 960 \
      --video-length 129 \
      --seed 1024 \
      --infer-steps 50 \
      --prompt "A cat walks on the grass, realistic style." \
      --flow-reverse \
      --use-cpu-offload \
      --save-path ./results \
      --token_timesteps 0 1 2 3 4 5 6 7 8 11 14 17 19 25 31 36 42 47 49\ # you can also use --token_timesteps_npy <path_to_timestep_npy_file>

🧩 Acknowledgement

We would like to express our sincere gratitude to the open-source community.

Special thanks to the following projects and libraries that this work builds upon: Wan2.1 HunyuanVideo AutoDiffusion

📜 Citation

If you find this work useful for your research, please cite:

@inProceedings{liu2026astraea,
      title={Astraea: A Token-wise Acceleration Framework for Video Diffusion Transformers}, 
      author={Haosong Liu and Yuge Cheng and Wenxuan Miao and Zihan Liu and Aiyue Chen and Jing Lin and Yiwu Yao and Chen Chen and Jingwen Leng and Minyi Guo and Yu Feng},
      year={2026},
      booktitle = {The Fourteenth International Conference on Learning Representations (ICLR 2026)},
}

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
demo		demo
examples		examples
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Astraea: A Token-wise Acceleration Framework for Video Diffusion Transformers

🛠️ How to run?

Clone the Repository

Model Weights

Install Dependencies

🚀 Training and Evaluation

1. Prepare Training Data

2. Train Evolutionary Searcher

3. Inference with Searched Timesteps

🧩 Acknowledgement

📜 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Astraea: A Token-wise Acceleration Framework for Video Diffusion Transformers

🛠️ How to run?

Clone the Repository

Model Weights

Install Dependencies

🚀 Training and Evaluation

1. Prepare Training Data

2. Train Evolutionary Searcher

3. Inference with Searched Timesteps

🧩 Acknowledgement

📜 Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages