This repo is the office implement of the Semi-Auto Multi-Level Annotation Tool used in M-cube-VOS. It allows for user to annotate the mask of target objects efficiently. In M-cube-VOS, The pipeline of data collection is as follow:
-
We release the Annotation Tool.
-
M-cube-VOS get accepted in CVPR 2025.
-
We release the dataset M-cube-VOS in baidu disk.
Our test environment is :
-
Ubuntu 20.04.6 LTS -
Python 3.8.19 -
torch 2.3.1+cu118,torchaudio 2.3.1+cu118,torchvision 0.18.1+cu118
tip: The machine running this tool is expected to need GeForce GTX and RTX.
Clone our repository:
git clone https://github.com/Lijiaxin0111/SemiAuto-Multi-Level-Annotation-Tool.git
Create Environment:
conda create -n SemiAuto_AnnotateTool python=3.8
conda activate SemiAuto_AnnotateTool
Install with pip:
cd SemiAuto-Multi-Level-Annotation-Tool
pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
(If you encounter the File "setup.py" not found error, upgrade your pip with pip install --upgrade pip)
(If you encounter "error: Microsoft Visual C++ 14.0 or greater is required.", get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/)
Tips: If you are running this on a remote server, you can use VNC or X11 forwarding.
python interactive_demo.py --video ./demo_data/make_glass.mp4 --workspace ./workspace/make_glass --num_objects 1 --gpu 0
@InProceedings{chen2024m3vos_2025_CVPR,
author = {Zixuan Chen and Jiaxin Li and Liming Tan and Yejie Guo and Junxuan Liang and Cewu Lu and Yong-Lu Li},
title = {M$^3$-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2025}
}
-
This Semi-Auto-Multi-Level-Annotation-Tool is based on Cutie GUI tool, IVS, MiVOS, and XMem.
-
The Cutie GUI tools uses RITM for interactive image segmentation. This repository also contains a redistribution of their code in
gui/ritm. That part of code follows RITM's license. -
For automatic video segmentation/integration with external detectors, see DEVA.
-
Cutie GUI tool used ProPainter in the video inpainting demo.
-
-
Thanks to Cutie, RTIM, XMem++, IVS, MiVOS, and XMem for making this possible.
This project is licensed under the MIT License . You are free to use, modify, and distribute the code, provided that the original copyright notice and license are included.

