TL;DR (1) - Introduce Vid-SME, the first dedicated method for video membership inference attacks against large video understanding models.
TL;DR (2) - Benchmarking MIA performance by training three VULLMs, each on a distinct dataset, using different representative training strategies.
Figure 1. Vid-SME against Video Understanding Large Language Models (VULLMs). Left: An example of the video instruction context used in our experiments. Middle: The overall pipeline of Vid-SME. Right: The detailed illustration of the membership score calculaiton of Vid-SME.-
Follow the instructions provided in LongVA to build the environment.
-
Download the models and move them into
./checkpoints. For the datasets, the json files are given in the./video_jsonfolder, download the related videos and move them into./video_json/videos.
Run Vid-SME on each model via the corresponding script:
python Vid_SME_main_CinePile.py
If you finding our work interesting or helpful to you, please cite as follows:
@article{li2025vid,
title={Vid-sme: Membership inference attacks against large video understanding models},
author={Li, Qi and Yu, Runpeng and Wang, Xinchao},
journal={arXiv preprint arXiv:2506.03179},
year={2025}
}
