WAM-Diff: A Masked Diffusion VLA Framework with MoE and Online Reinforcement Learning for Autonomous Driving

Mingwang Xu^1* Jiahao Cui^1* Feipeng Cai^2* Hanlin Shang^1* Zhihao Zhu¹ Shan Luan¹

Yifang Xu¹ Neng Zhang² Yaoyi Li² Jia Cai² Siyu Zhu¹

¹Fudan University ²Yinwang Intelligent Technology Co., Ltd

📰 News

2025/12/06: 🎉🎉🎉 Paper submitted on Arxiv.

📅️ Roadmap

Status	Milestone	ETA
🚀	Release the inference source code	2025.12.21
🚀	Release the SFT and inf code	2025.12.21
🚀	Release pretrained models on Huggingface	TBD
🚀	Release NAVSIM evaluation code	TBD
🚀	Release the RL code	TBD

🔧️ Framework

🏆 Qualitative Results on NAVSIM

NAVSIM-v1 benchmark results

NAVSIM-v2 benchmark results

Quick Inference Demo

The WAM-Diff will be available on Hugging Face Hub soon. To quickly test the model, follow these simple steps:

Clone the repository

git clone https://github.com/fudan-generative-vision/WAM-Diff
cd WAM-Diff

Initialize the environment
If you prefer conda, run the environment setup script to install necessary dependencies:
```
bash init_env.sh
```
Or you can use uv to create the environment:
```
uv venv && uv sync
```
Prepare the Model Download the pretrained WAM-Diff model from Hugging Face (pending release) to the ./model/WAM-Diff directory:
```
https://huggingface.co/fudan-generative-ai/WAM-Diff
```
Download the pretrained Siglip2 model from Hugging Face to the ./model/siglip2-so400m-patch14-384 directory:
```
https://huggingface.co/google/siglip2-so400m-patch14-384
```
Run the demo script
Execute the demo script to test WAM-Diff on an example image:
```
bash inf.sh
```

Training

To fine-tune WAM-Diff, please follow these steps:

Set Up the Environment
Follow the same environment setup steps as in the Quick Inference Demo section.

Prepare the Data
Prepare your training dataset in JSON format like

[
    {
    "image": ["path/to/image1.png"],
    "conversations": [
        {
            "from": "human",
            "value": "Here is front views of a driving vehicle:\n<image>\nThe navigation information is: straight\nThe current position is (0.00,0.00)\nCurrent velocity is: (13.48,-0.29)  and current accelerate is: (0.19,0.05)\nPredict the optimal driving action for the next 4 seconds with 8 new waypoints."
        },
        {
            "from": "gpt",
            "value": "6.60,-0.01,13.12,-0.03,19.58,-0.04,25.95,-0.03,32.27,-0.03,38.56,-0.05,44.88,-0.06,51.16,-0.09"
        }
        ]
    },
    ...
]

Run the Training Script
Execute the training script with the following command:
```
cd train
bash ./scripts/llada_v_finetune.sh
```

📝 Citation

If you find our work useful for your research, please consider citing the paper:

@article{xu2025wam,
  title={WAM-Diff: A Masked Diffusion VLA Framework with MoE and Online Reinforcement Learning for Autonomous Driving},
  author={Xu, Mingwang and Cui, Jiahao and Cai, Feipeng and Shang, Hanlin and Zhu, Zhihao and Luan, Shan and Xu, Yifang and Zhang, Neng and Li, Yaoyi and Cai, Jia and others},
  journal={arXiv preprint arXiv:2512.11872},
  year={2025}
}

🤗 Acknowledgements

We gratefully acknowledge the contributors to the LLaDA-V, repositories, whose commitment to open source has provided us with their excellent codebases and pretrained models.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
assets		assets
model		model
train		train
.gitignore		.gitignore
.pylintrc		.pylintrc
.python-version		.python-version
README.md		README.md
envs.yml		envs.yml
inf.sh		inf.sh
init_env.sh		init_env.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WAM-Diff: A Masked Diffusion VLA Framework with MoE and Online Reinforcement Learning for Autonomous Driving

📰 News

📅️ Roadmap

🔧️ Framework

🏆 Qualitative Results on NAVSIM

NAVSIM-v1 benchmark results

NAVSIM-v2 benchmark results

Quick Inference Demo

Training

📝 Citation

🤗 Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

fudan-generative-vision/WAM-Diff

Folders and files

Latest commit

History

Repository files navigation

WAM-Diff: A Masked Diffusion VLA Framework with MoE and Online Reinforcement Learning for Autonomous Driving

📰 News

📅️ Roadmap

🔧️ Framework

🏆 Qualitative Results on NAVSIM

NAVSIM-v1 benchmark results

NAVSIM-v2 benchmark results

Quick Inference Demo

Training

📝 Citation

🤗 Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages