Inconsistent subtitle interleaving mode

Thanks for your great work! 

While comparing the subtitle interleaving logic between the two implementations, I noticed they behave differently:

* The two evaluation scripts implement subtitle interleaving differently.

| | `vlmeval/dataset/videommev2.py` (VLMEvalKit) | `test_video_mme_v2.py` (standalone) |
|---|---|---|
| **Matching granularity** | Word-level (`subtitle_between_timestamps`) | Sentence-level (`group_subtitle_segments` → `segments_between_timestamps`) |
| **Timestamp label** | Frame window timestamps | Segment's own timestamps |
| **Output per frame** | One merged text chunk | Multiple independent segment entries |

* The Duplication Problem in `test_video_mme_v2.py`

`segments_between_timestamps` uses overlap matching:

```python
if seg['end_time'] >= start_time and seg['start_time'] < end_time:
```

A sentence-level segment spanning multiple frame windows gets matched — and fully repeated — under **every** overlapping frame. For example, a 2.5s subtitle at fps=2 appears **5 times**:

```
[Subtitle 179.74s - 182.22s]: doing actual practical time traveling.
Frame-309: <image>
[Subtitle 179.74s - 182.22s]: doing actual practical time traveling.
Frame-310: <image>
[Subtitle 179.74s - 182.22s]: doing actual practical time traveling.
Frame-311: <image>
[Subtitle 179.74s - 182.22s]: doing actual practical time traveling.
Frame-312: <image>
[Subtitle 179.74s - 182.22s]: doing actual practical time traveling.
```
## Questions

1. Which script's interleaving behavior is the intended one — word-level (`videomme_v2.py`) or sentence-level (`test_video_mme_v2.py`)? 
2. Is the sentence-level duplication in `test_video_mme_v2.py` expected behavior, or should it be deduplicated (e.g., only emit a subtitle in the frame where its `start_time` falls)? 

	`vlmeval/dataset/videommev2.py` (VLMEvalKit)	`test_video_mme_v2.py` (standalone)
Matching granularity	Word-level (`subtitle_between_timestamps`)	Sentence-level (`group_subtitle_segments` → `segments_between_timestamps`)
Timestamp label	Frame window timestamps	Segment's own timestamps
Output per frame	One merged text chunk	Multiple independent segment entries

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent subtitle interleaving mode #3

Questions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Inconsistent subtitle interleaving mode #3

Description

Questions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions