Refer-Agent: A Collaborative Multi-Agent System with Reasoning and Reflection for Referring Video Object Segmentation
Haichao Jiang† Tianming Liang† Wei-Shi Zheng Jian-Fang Hu*
Sun Yat-sen University
# Clone the repo
git clone https://github.com/iSEE-Laboratory/Refer-Agent.git
cd Refer-Agent
# [Optional] Create a clean Conda environment
conda create -n refer-agent python=3.10 -y
conda activate refer-agent
# PyTorch
conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.4 -c pytorch -c nvidia
# Other dependencies
pip install -r requirements.txtRefer-Agent uses SAM2 for mask propagation. Please install SAM2 following the official instructions:
cd sam2
pip install -e .
cd ..Download SAM2 checkpoints and put them in sam2/checkpoints/:
cd sam2/checkpoints
bash download_ckpts.sh
cd ../..Download Ovis2.5-9B weights:
huggingface-cli download --type model AIDC-AI/Ovis2.5-9B --local-dir ./AIDC-AI
pip install flash-attn==2.7.0.post2 --no-build-isolationPlease refer to DATA.md for data preparation.
The directory struture is organized as follows.
Refer-Agent/
├── data
│ ├── ref_youtube_vos
│ ├── mevis
│ ├── ReVOS
│ ├── ReasonVOS
│ └── GroundMoRe
# Step1: Preprocess
PYTHONPATH=. python utils/preprocess_refer_youtube.py
PYTHONPATH=. python utils/Ovis_preprocess_query_refer_youtube.py -ng 8 --split valid
# Step2: Inference
PYTHONPATH=. python eval/Ovis_infer_refytb.py -ng 8 --split valid
PYTHONPATH=. python eval/post_processing.py --all_results_path path/to/all_results.jsonSubmit your result to the online evaluation server.
# Step1: Preprocess
PYTHONPATH=. python utils/preprocess_mevis.py
PYTHONPATH=. python utils/Ovis_preprocess_query_mevis.py -ng 8 --split valid
# Step2: Inference
PYTHONPATH=. python eval/Ovis_infer_mevis.py -ng 8 --split valid
PYTHONPATH=. python eval/post_processing.py --all_results_path path/to/all_results.jsonSubmit your result to the online evaluation server.
# Step1: Preprocess
PYTHONPATH=. python utils/preprocess_revos.py
PYTHONPATH=. python utils/Ovis_preprocess_query_revos.py -ng 8 --split valid
# Step2: Inference
PYTHONPATH=. python eval/Ovis_infer_revos.py -ng 8 --split valid
PYTHONPATH=. python eval/post_processing.py --all_results_path path/to/all_results.json
# Step3: Evaluation
PYTHONPATH=. python eval/eval_revos.py --pred_path "path/to/output"# Step1: Preprocess
PYTHONPATH=. python utils/reorder_reasonvos_meta.py
PYTHONPATH=. python utils/preprocess_reasonvos.py
PYTHONPATH=. python utils/Ovis_preprocess_query_reasonvos.py -ng 8 --split valid
# Step2: Inference
PYTHONPATH=. python eval/Ovis_infer_reasonvos.py -ng 8 --split valid
PYTHONPATH=. python eval/post_processing.py --all_results_path "path/to/all_results.json"
# Step3: Evaluation
PYTHONPATH=. python eval/eval_reasonvos.py --pred_path "path/to/output"# Step1: Preprocess
PYTHONPATH=. python utils/preprocess_groundmore.py
PYTHONPATH=. python utils/Ovis_preprocess_query_groundmore.py -ng 8 --split valid
# Step2: Inference
PYTHONPATH=. python eval/Ovis_infer_groundmore.py -ng 8 --split valid
PYTHONPATH=. python eval/post_processing_groundmore.py --all_results_path path/to/all_results.json
# Step3: Evaluation
PYTHONPATH=. python GroundMoRe/evaluate_groundmore.pyOur code is built upon Ovis2.5-9B, AL-Ref-SAM2 and SAM2. We sincerely appreciate these efforts.
If you find our work helpful for your research, please consider citing our paper:
@article{jiang2026refer,
title={Refer-Agent: A Collaborative Multi-Agent System with Reasoning and Reflection for Referring Video Object Segmentation},
author={Jiang, Haichao and Liang, Tianming and Zheng, Wei-Shi and Hu, Jian-Fang},
journal={arXiv preprint arXiv:2602.03595},
year={2026}
}