Skip to content

[ICCV2025] Official code repository of "CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction"

License

Notifications You must be signed in to change notification settings

ZhefeiGong/carp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CARP: Visuomotor Policy Learning
via Coarse-to-Fine Autoregressive Prediction

1Westlake University, 2Zhejiang University,
3Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing
Project lead. Corresponding author.

🔥 News

  • 2025.06.18 We have released the code for CARP on single-task settings, supporting both state-based and image-based tasks.
  • 2025.03.05 We've released the code for CARP on the multi-task benchmark (image-based). Stay tuned for the upcoming release of the single-task code for both state and image-based tasks.
  • 2024.12.09 CARP has been released on arXiv, along with a dedicated homepage where you can explore its performance and architecture directly.

🧾 TBD

  • Release paper on arXiv
  • Release code based on multi-task
  • Release code based on single-task which includes both state and image-based tasks

👀 Overview

TL;DR: introduce Coarse-to-Fine AutoRegressive Policy (CARP), a novel paradigm for visuomotor policy learning that redefines the autoregressive action generation process as a coarse-to-fine, next-scale approach.

The left panel shows the final predicted trajectories for each task, with CARP producing smoother and more consistent paths than Diffusion Policy (DP). The right panel visualizes intermediate trajectories during the refinement process for CARP (top-right) and DP (bottom-right). DP displays considerable redundancy, resulting in slower processing and unstable training, as illustrated by 6 selected steps among 100 denoising steps. In contrast, CARP achieves efficient trajectory refinement across all 4 scales, with each step contributing meaningful updates.

💻 Code

Here, we provide two categories of code mentioned in the paper: Multi-Task for the Image-Based MimicGen Benchmark, following the same settings as SDP. Single-Task for both Robomimic and Kitchen tasks, aligned with DP. Just a heads-up: we've neatly centralized parameter management in arg_util.py. Before diving into any changes, take a look at this file—it'll make life a lot easier!

🙏 Acknowledgment

We sincerely thank the creators of the excellent repositories, including Visual Autoregressive Model, Diffusion Policy, and Sparse Diffusion Policy, which have provided invaluable inspiration.

🏷️ License

This repository is released under the MIT license. See LICENSE for additional details.

📌 Citation

If our findings contribute to your research, we would appreciate it if you could consider citing our paper in your publications.

@misc{gong2024carpvisuomotorpolicylearning,
      title={CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction}, 
      author={Zhefei Gong and Pengxiang Ding and Shangke Lyu and Siteng Huang and Mingyang Sun and Wei Zhao and Zhaoxin Fan and Donglin Wang},
      year={2024},
      eprint={2412.06782},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2412.06782}, 
}

About

[ICCV2025] Official code repository of "CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published