Skip to content

Inconsistent subtitle interleaving mode #3

Description

@hhhhzp

Thanks for your great work!

While comparing the subtitle interleaving logic between the two implementations, I noticed they behave differently:

  • The two evaluation scripts implement subtitle interleaving differently.
vlmeval/dataset/videommev2.py (VLMEvalKit) test_video_mme_v2.py (standalone)
Matching granularity Word-level (subtitle_between_timestamps) Sentence-level (group_subtitle_segmentssegments_between_timestamps)
Timestamp label Frame window timestamps Segment's own timestamps
Output per frame One merged text chunk Multiple independent segment entries
  • The Duplication Problem in test_video_mme_v2.py

segments_between_timestamps uses overlap matching:

if seg['end_time'] >= start_time and seg['start_time'] < end_time:

A sentence-level segment spanning multiple frame windows gets matched — and fully repeated — under every overlapping frame. For example, a 2.5s subtitle at fps=2 appears 5 times:

[Subtitle 179.74s - 182.22s]: doing actual practical time traveling.
Frame-309: <image>
[Subtitle 179.74s - 182.22s]: doing actual practical time traveling.
Frame-310: <image>
[Subtitle 179.74s - 182.22s]: doing actual practical time traveling.
Frame-311: <image>
[Subtitle 179.74s - 182.22s]: doing actual practical time traveling.
Frame-312: <image>
[Subtitle 179.74s - 182.22s]: doing actual practical time traveling.

Questions

  1. Which script's interleaving behavior is the intended one — word-level (videomme_v2.py) or sentence-level (test_video_mme_v2.py)?
  2. Is the sentence-level duplication in test_video_mme_v2.py expected behavior, or should it be deduplicated (e.g., only emit a subtitle in the frame where its start_time falls)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions