-
Notifications
You must be signed in to change notification settings - Fork 154
Open
Description
Looking at run_detect_segment, we can guess that it requires an annotation file for the video, and that file consists of a start time, an end time, and a text prompt. (The text prompt is not used in the code.)
I wonder if these annotations are created manually, or if they can be created automatically.
Also, when extracting features through the CLIP encoder in the run_clip_filtering file, what text input is required?
Finally, when will the pre-training dataset be released?
Thank you
Metadata
Metadata
Assignees
Labels
No labels