DORO-STVG

1. Evaluation Framework

eval/main.py is the unified entry point. The current code supports:

Models: qwen2.5vl / qwen3vl / llava-st-qwen2 / videomolmo
Datasets: hcstvg, vidstg, doro-stvg

The default script is eval/run_eval.sh. You can edit it directly to change model paths, annotation paths, video paths, and output paths.

For llava-st-qwen2, make sure PYTHONPATH includes your local LLaVA-ST repository:

export PYTHONPATH="/mnt/sdc/xingjianwang/yibowang/LLaVA-ST:${PYTHONPATH:-}"

For videomolmo, set the external VideoMolmo runtime before evaluation:

export VIDEOMOLMO_REPO=/path/to/VideoMolmo
export VIDEOMOLMO_PYTHON=/path/to/videomolmo/bin/python
export VIDEOMOLMO_COMPACT_QUERY=1

Then run evaluation with --model_name videomolmo --model_path videomolmo.

Typical outputs:

results.json: per-sample predictions, parsed outputs, GT, and metrics
status.json: overall summary and averaged metrics

2. Data Engine

graph_generator/ is to generate structured data from raw videos. Based on the current code, the main pipeline includes:

Scene splitting
Object detection and tracking
Attribute generation
Action detection
Relation generation
Cross-shot reference edge generation (optional)
STVG query generation from scene graphs
Formatting query outputs into training-friendly JSONL

Relevant entry points:

graph_generator/main.py: main scene graph generation entry
graph_generator/modules/query_generator_cpsat.py: generate queries from scene graphs
graph_generator/utils/format_train.py: convert query outputs into training format
graph_generator/scripts/run_generator.sh: current command collection used in practice

3. Environment Setup

This repository does not currently use a single root-level setup script. The actual setup should follow the module-specific pyproject.toml files under envs/.

3.1 Requirements

Install uv:

curl -LsSf https://astral.sh/uv/install.sh | sh
source ~/.bashrc

3.2 Virtual Environment

cd /path/to/DORO-STVG/envs/eval
uv sync

If uv sync times out on files.pythonhosted.org in this environment, refresh the lock and sync against the configured mirror:

cd /path/to/DORO-STVG/envs/eval
uv lock --refresh
uv sync --refresh

cd /path/to/DORO-STVG/envs/graph_generator/main
uv sync

This environment is used for:

graph_generator/main.py
the main pipeline modules for attributes, relations, reference edges, and query generation

cd /path/to/DORO-STVG/envs/graph_generator/action_detector
uv sync

This separate environment is mainly used by the action detection module to avoid dependency conflicts with the main environment.

3.5 Video Reader Backend

The evaluation script currently defaults to decord:

export FORCE_QWENVL_VIDEO_READER=decord

You can switch to torchvision or torchcodec if needed.

3.6 Extra Configuration for `graph_generator`

graph_generator depends on both model checkpoints and API-related environment variables. The repository already contains graph_generator/.env, and the scripts load it automatically.

The most important variables are:

API_KEYS=your_key_1,your_key_2
MM_API_BASE_URL=https://your-compatible-endpoint

You also need to prepare:

YOLO weights
SAM2 / Grounded-SAM2 checkpoints
VideoMAE action detection checkpoints
DAM or other attribute-description models

For those details, refer to graph_generator/README.md.

4. Usage

4.1 Run Evaluation

cd /path/to/DORO-STVG/eval
bash run_eval.sh

For llava-st-qwen2, the evaluation environment also expects:

a local LLaVA-ST source checkout
local LLaVA-ST-Qwen2-7B model weights

The default runner reads these environment variables:

LLAVA_ST_SOURCE_DIR
MODEL_PATH
ANNOTATION_PATH
VIDEO_DIR
OUTPUT_DIR
CUDA_VISIBLE_DEVICES

A typical smoke-test command is:

cd /path/to/DORO-STVG
CUDA_VISIBLE_DEVICES=3 \
LLAVA_ST_SOURCE_DIR=/path/to/LLaVA-ST \
MODEL_PATH=/path/to/LLaVA-ST-Qwen2-7B \
ANNOTATION_PATH=/path/to/query_train_for_eval_smoke1.jsonl \
VIDEO_DIR=/path/to/video_test1_smoke \
OUTPUT_DIR=eval/res_llava_st_smoke \
bash eval/run_eval.sh

If you prefer not to use the shell script, you can call the entry point directly:

cd /home/wangxingjian/DORO-STVG/eval
python main.py run \
  --model_name=llava-st-qwen2 \
  --model_path=/path/to/model \
  --data_name=doro-stvg \
  --annotation_path=/path/to/test.json \
  --video_dir=/path/to/videos \
  --output_dir=./eval/res

4.2 Run the Data Engine

The current run_generator.sh contains the full pipeline command examples, and the bottom part of the script keeps the active query-generation example.

A typical workflow is:

Generate scene_graphs.jsonl
Generate query.jsonl
Convert it into query_train.jsonl

5. Output Data Formats

5.3 Training Data Format

This is the training-friendly formatted output generated from query.jsonl by utils/format_train.py. The main fields include:

videopath
queryid
query
Difficulty
Width / Height
box

box is a trajectory string in the following format:

target description: <frame_idx, time_sec, x1, y1, x2, y2; ... />

Here the coordinates are already normalized to [0, 1] using the video width and height, which makes this format easier to use for training and annotation consumption.

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
envs		envs
eval		eval
graph_generator		graph_generator
trainer		trainer
.gitignore		.gitignore
README.md		README.md
data_engine.drawio		data_engine.drawio

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DORO-STVG

1. Evaluation Framework

2. Data Engine

3. Environment Setup

3.1 Requirements

3.2 Virtual Environment

3.5 Video Reader Backend

3.6 Extra Configuration for `graph_generator`

4. Usage

4.1 Run Evaluation

4.2 Run the Data Engine

5. Output Data Formats

5.3 Training Data Format

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DORO-STVG

1. Evaluation Framework

2. Data Engine

3. Environment Setup

3.1 Requirements

3.2 Virtual Environment

3.5 Video Reader Backend

3.6 Extra Configuration for graph_generator

4. Usage

4.1 Run Evaluation

4.2 Run the Data Engine

5. Output Data Formats

5.3 Training Data Format

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

3.6 Extra Configuration for `graph_generator`

Packages