* Equal Contribution Corresponding Author ✉
News | Abstract | Dataset | Model | Statement
- [2026/2/21] Our V²-SAM is accepted by CVPR 2026. Thanks to all contributors.
- [2025/5/20] Our paper of "V²-SAM: Marrying SAM2 with Multi-Prompt Experts for Cross-View Object Correspondence" is up on arXiv.
Cross-view object correspondence, exemplified by the representative task of ego-exo object correspondence, aims to establish consistent associations of the same object across different viewpoints (e.g., ego-centric and exo-centric). This task poses significant challenges due to drastic viewpoint and appearance variations, making existing segmentation models, such as SAM2, non-trivial to apply directly. To address this, we present V2-SAM, a unified cross-view object correspondence framework that adapts SAM2 from single-view segmentation to cross-view correspondence through two complementary prompt generators. Specifically, the Cross-View Anchor Prompt Generator (V2-Anchor), built upon DINOv3 features, establishes geometry-aware correspondences and, for the first time, unlocks coordinate-based prompting for SAM2 in cross-view scenarios, while the Cross-View Visual Prompt Generator (V2-Visual) enhances appearance-guided cues via a novel visual prompt matcher that aligns ego-exo representations from both feature and structural perspectives. To effectively exploit the strengths of both prompts, we further adopt a multi-expert design and introduce a Post-hoc Cyclic Consistency Selector (PCCS) that adaptively selects the most reliable expert based on cyclic consistency. Extensive experiments validate the effectiveness of V2-SAM, achieving new state-of-the-art performance on Ego-Exo4D (ego-exo object correspondence), DAVIS-2017 (video object tracking), and HANDAL-X (robotic-ready cross-view correspondence).
Our method based on Ego-Exo4D (ego-exo object correspondence), DAVIS-2017 (video object tracking), and HANDAL-X (robotic-ready cross-view correspondence).
You can use our process data in Huggingface:
Ego-Exo4D: https://huggingface.co/datasets/jaychempan/Ego-Exo4D-Relation-Train and https://huggingface.co/datasets/jaychempan/Ego-Exo4D-Relation-Test
DAVIS-2017: https://huggingface.co/datasets/jaychempan/DAVIS
HANDAL-X: https://huggingface.co/datasets/jaychempan/HANDAL
conda create -n v2sam python=3.10 -y
conda activate v2sam
cd ~/projects/V2-SAM
export LD_LIBRARY_PATH=/opt/modules/nvidia-cuda-12.1.0/lib64:$LD_LIBRARY_PATH
export PATH=/opt/modules/nvidia-cuda-12.1.0/bin:$PATH
# conda install pytorch==2.3.1 torchvision==0.18.1 pytorch-cuda=12.1 cuda -c pytorch -c "nvidia/label/cuda-12.1.0" -c "nvidia/label/cuda-12.1.1"
pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121
# pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.3/index.html
pip install -U openmim
mim install mmengine
mim install "mmcv>=2.1.0"
pip install -r requirements.txt
pip install prettytable
# use local mmengine for use the thrid party tools
cd mmengine
pip install -e .
Choose the base model weights to use.
huggingface-cli download jaychempan/sam3 --local-dir weights/sam2 --include dinov3_vitl16_pretrain_lvd1689m-8aa4cbdd.pth
huggingface-cli download jaychempan/dinov2 --local-dir weights/dinov2 --include dinov2_vitg14_reg4_pretrain.pth
huggingface-cli download jaychempan/dinov3 --local-dir weights/dinov3 --include sam2_hiera_large.pt
bash tools/dist.sh train projects/v2sam/configs/v2sam.py 4
if V²-Visual, rename the project's dir projects/v2sam_visual --> projects/v2sam
else V²-Fusion, rename the project's dir projects/v2sam_fusion --> projects/v2sam
Note:
V²-Anchorno need to train (use sam2 offical decoder checkpoint)
bash tools/test.sh test projects/v2sam/configs/v2sam.py 4 /path/to/checkpoint
bash tools/test_all.sh test projects/v2sam/configs/v2sam.py 4 /path/to/checkpoint/dir
This project references and uses the following open source models and datasets.
If you are interested in the following work or want to use our dataset, please cite the following paper.
@article{pan2025v,
title={V $\^{}$\{$2$\}$ $-SAM: Marrying SAM2 with Multi-Prompt Experts for Cross-View Object Correspondence},
author={Pan, Jiancheng and Wang, Runze and Qian, Tianwen and Mahdi, Mohammad and Fu, Yanwei and Xue, Xiangyang and Huang, Xiaomeng and Van Gool, Luc and Paudel, Danda Pani and Fu, Yuqian},
journal={arXiv preprint arXiv:2511.20886},
year={2025}
}


