CARP: Visuomotor Policy Learning
via Coarse-to-Fine Autoregressive Prediction

Zhefei Gong¹, Pengxiang Ding^12†, Shangke Lyu¹, Siteng Huang¹², Mingyang Sun¹², Wei Zhao¹,
Zhaoxin Fan³, Donglin Wang^1✉

¹Westlake University, ²Zhejiang University,
³Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing
^†Project lead. ^✉Corresponding author.

🔥 News

2025.06.18 We have released the code for CARP on single-task settings, supporting both state-based and image-based tasks.
2025.03.05 We've released the code for CARP on the multi-task benchmark (image-based). Stay tuned for the upcoming release of the single-task code for both state and image-based tasks.
2024.12.09 CARP has been released on arXiv, along with a dedicated homepage where you can explore its performance and architecture directly.

🧾 TBD

Release paper on arXiv
Release code based on multi-task
Release code based on single-task which includes both state and image-based tasks

👀 Overview

TL;DR: introduce Coarse-to-Fine AutoRegressive Policy (CARP), a novel paradigm for visuomotor policy learning that redefines the autoregressive action generation process as a coarse-to-fine, next-scale approach.

The left panel shows the final predicted trajectories for each task, with CARP producing smoother and more consistent paths than Diffusion Policy (DP). The right panel visualizes intermediate trajectories during the refinement process for CARP (top-right) and DP (bottom-right). DP displays considerable redundancy, resulting in slower processing and unstable training, as illustrated by 6 selected steps among 100 denoising steps. In contrast, CARP achieves efficient trajectory refinement across all 4 scales, with each step contributing meaningful updates.

💻 Code

Here, we provide two categories of code mentioned in the paper: Multi-Task for the Image-Based MimicGen Benchmark, following the same settings as SDP. Single-Task for both Robomimic and Kitchen tasks, aligned with DP. Just a heads-up: we've neatly centralized parameter management in arg_util.py. Before diving into any changes, take a look at this file—it'll make life a lot easier!

🙏 Acknowledgment

We sincerely thank the creators of the excellent repositories, including Visual Autoregressive Model, Diffusion Policy, and Sparse Diffusion Policy, which have provided invaluable inspiration.

🏷️ License

This repository is released under the MIT license. See LICENSE for additional details.

📌 Citation

If our findings contribute to your research, we would appreciate it if you could consider citing our paper in your publications.

@misc{gong2024carpvisuomotorpolicylearning,
      title={CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction}, 
      author={Zhefei Gong and Pengxiang Ding and Shangke Lyu and Siteng Huang and Mingyang Sun and Wei Zhao and Zhaoxin Fan and Donglin Wang},
      year={2024},
      eprint={2412.06782},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2412.06782}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
asset		asset
multitask		multitask
singletask		singletask
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CARP: Visuomotor Policy Learning
via Coarse-to-Fine Autoregressive Prediction

Zhefei Gong¹, Pengxiang Ding^12†, Shangke Lyu¹, Siteng Huang¹², Mingyang Sun¹², Wei Zhao¹,
Zhaoxin Fan³, Donglin Wang^1✉

🔥 News

🧾 TBD

👀 Overview

💻 Code

🙏 Acknowledgment

🏷️ License

📌 Citation

About

Uh oh!

Releases

Packages

Languages

License

ZhefeiGong/carp

Folders and files

Latest commit

History

Repository files navigation

CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction

Zhefei Gong1, Pengxiang Ding12†, Shangke Lyu1, Siteng Huang12, Mingyang Sun12, Wei Zhao1, Zhaoxin Fan3, Donglin Wang1✉

🔥 News

🧾 TBD

👀 Overview

💻 Code

🙏 Acknowledgment

🏷️ License

📌 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

CARP: Visuomotor Policy Learning
via Coarse-to-Fine Autoregressive Prediction

Zhefei Gong¹, Pengxiang Ding^12†, Shangke Lyu¹, Siteng Huang¹², Mingyang Sun¹², Wei Zhao¹,
Zhaoxin Fan³, Donglin Wang^1✉

Packages