Skip to content

iSEE-Laboratory/Refer-Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Refer-Agent: A Collaborative Multi-Agent System with Reasoning and Reflection for Referring Video Object Segmentation

Haichao Jiang   Tianming Liang   Wei-Shi Zheng   Jian-Fang Hu*  

Sun Yat-sen University  

CVPR 2026

🎯 Framework

🚀 Environment Setup

# Clone the repo
git clone https://github.com/iSEE-Laboratory/Refer-Agent.git
cd Refer-Agent

# [Optional] Create a clean Conda environment
conda create -n refer-agent python=3.10 -y
conda activate refer-agent

# PyTorch 
conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.4 -c pytorch -c nvidia

# Other dependencies
pip install -r requirements.txt

Install SAM2

Refer-Agent uses SAM2 for mask propagation. Please install SAM2 following the official instructions:

cd sam2
pip install -e .
cd ..

Download SAM2 checkpoints and put them in sam2/checkpoints/:

cd sam2/checkpoints
bash download_ckpts.sh
cd ../..

Download Ovis2.5-9B

Download Ovis2.5-9B weights:

huggingface-cli download --type model AIDC-AI/Ovis2.5-9B --local-dir ./AIDC-AI
pip install flash-attn==2.7.0.post2 --no-build-isolation

📦 Data Preparation

Please refer to DATA.md for data preparation.

The directory struture is organized as follows.

Refer-Agent/
├── data
│   ├── ref_youtube_vos
│   ├── mevis
│   ├── ReVOS
│   ├── ReasonVOS
│   └── GroundMoRe

🌟 Evaluation

Ref-YouTube-VOS

# Step1: Preprocess
PYTHONPATH=. python utils/preprocess_refer_youtube.py
PYTHONPATH=. python utils/Ovis_preprocess_query_refer_youtube.py -ng 8 --split valid

# Step2: Inference
PYTHONPATH=. python eval/Ovis_infer_refytb.py -ng 8 --split valid
PYTHONPATH=. python eval/post_processing.py --all_results_path path/to/all_results.json

Submit your result to the online evaluation server.

MeViS

# Step1: Preprocess
PYTHONPATH=. python utils/preprocess_mevis.py
PYTHONPATH=. python utils/Ovis_preprocess_query_mevis.py -ng 8 --split valid

# Step2: Inference
PYTHONPATH=. python eval/Ovis_infer_mevis.py -ng 8 --split valid
PYTHONPATH=. python eval/post_processing.py --all_results_path path/to/all_results.json

Submit your result to the online evaluation server.

ReVOS

# Step1: Preprocess
PYTHONPATH=. python utils/preprocess_revos.py
PYTHONPATH=. python utils/Ovis_preprocess_query_revos.py -ng 8 --split valid

# Step2: Inference
PYTHONPATH=. python eval/Ovis_infer_revos.py -ng 8 --split valid
PYTHONPATH=. python eval/post_processing.py --all_results_path path/to/all_results.json

# Step3: Evaluation
PYTHONPATH=. python eval/eval_revos.py --pred_path "path/to/output"

ReasonVOS

# Step1: Preprocess
PYTHONPATH=. python utils/reorder_reasonvos_meta.py
PYTHONPATH=. python utils/preprocess_reasonvos.py
PYTHONPATH=. python utils/Ovis_preprocess_query_reasonvos.py -ng 8 --split valid

# Step2: Inference
PYTHONPATH=. python eval/Ovis_infer_reasonvos.py -ng 8 --split valid
PYTHONPATH=. python eval/post_processing.py --all_results_path "path/to/all_results.json"

# Step3: Evaluation
PYTHONPATH=. python eval/eval_reasonvos.py --pred_path "path/to/output"

GroundMoRe

# Step1: Preprocess
PYTHONPATH=. python utils/preprocess_groundmore.py
PYTHONPATH=. python utils/Ovis_preprocess_query_groundmore.py -ng 8 --split valid

# Step2: Inference
PYTHONPATH=. python eval/Ovis_infer_groundmore.py -ng 8 --split valid
PYTHONPATH=. python eval/post_processing_groundmore.py --all_results_path path/to/all_results.json

# Step3: Evaluation
PYTHONPATH=. python GroundMoRe/evaluate_groundmore.py

🙏 Acknowledgements

Our code is built upon Ovis2.5-9B, AL-Ref-SAM2 and SAM2. We sincerely appreciate these efforts.

📝 Citation

If you find our work helpful for your research, please consider citing our paper:

@article{jiang2026refer,
  title={Refer-Agent: A Collaborative Multi-Agent System with Reasoning and Reflection for Referring Video Object Segmentation},
  author={Jiang, Haichao and Liang, Tianming and Zheng, Wei-Shi and Hu, Jian-Fang},
  journal={arXiv preprint arXiv:2602.03595},
  year={2026}
}

About

[CVPR 2026] Refer-Agent: A Collaborative Multi-Agent System with Reasoning and Reflection for Referring Video Object Segmentation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages