Yiyu Wang1*, Xuyang Liu1,2*†, Xiyan Gui1,3, Xinying Lin4, Boxue Yang1,
Chenfei Liao1,5, Tailai Chen1, Linfeng Zhang1✉
1 EPIC Lab, Shanghai Jiao Tong University 2 Sichuan University
3 Huazhong University of Science and Technology 4 Sun Yat-sen University
5 Hong Kong University of Science and Technology (Guangzhou)
⚡ The first plug-and-play token compression framework for streaming video understanding.
2025.12.02🤗🤗 We release our latest work STC, the first plug-and-play inference acceleration framework for streaming video understanding! Code is available!2025.08.21🎉🎉 Our VidCom2 has been accepted by EMNLP 2025 main conference!2025.05.21🤗🤗 We release VidCom2, a plug-and-play inference acceleration method of VideoLLMs. Code is available!
STC is the first token compression framework for plug-and-play acceleration for streaming video understanding:
- ⚡ Streaming-First Design: Optimized for latency-sensitive applications (e.g., live sports, AR glasses) where frames arrive continuously.
-
🧩 STC-Cacher : Exploits temporal redundancy by caching visual features for similar frames (Cosine Similarity
$> 0.85$ ), significantly reducing ViT encoding overhead. - ✂️ STC-Pruner: Compresses visual tokens after encoding to shorten the LLM prefill sequence while preserving spatiotemporal saliency.
- 🔌 Plug-and-Play: Seamlessly integrates with SOTA VideoLLMs like ReKV, Dispider, StreamForest, and Livecc.
Core Implementation:
- Cache Logic:
model/cache.py(Class:STC_CACHE) - Prune Logic:
model/prune.py(Class:STC_Pruner)
We support the following models enhanced with STC. Code is coming soon.
| Model Base | Status | Code Path |
|---|---|---|
| ReKV (LLaVA-OV) | ✅ Supported | model/llava_onevision_rekv.py |
| StreamForest | 🚧 Coming Soon | - |
| Dispider | 🚧 Coming Soon | - |
| LiveCC | 🚧 Coming Soon | - |
We evaluated our model under the same environments as the original models. So you may set the environments through following the requirements of the mentioned original models.
Links:
| Original Models | urls |
|---|---|
| ReKV | https://github.com/Becomebright/ReKV |
| StreamForest | https://github.com/MCG-NJU/StreamForest |
| Dispider | https://github.com/Mark12Ding/Dispider |
| LiveCC | https://github.com/showlab/livecc |
Besides, we provide a replica for our environment here:
Use our environment
cd ReKV
pip install -e .
cd model/longva
pip install -e .cd StreamForest
conda env create -f environment-StreamForest.ymlcd Dispider
conda env create -f environment-Dispider.yml
pip install -v . # for development mode, `pip install -v -e .`cd LiveCC
conda env create -f environment-LiveCC.yml
pip install -v . # for development mode, `pip install -v -e .`We evaluate STC on both Online (Streaming) benchmarks to demonstrate real-time capabilities and Offline benchmarks to ensure robust general video understanding.
These benchmarks evaluate the model's ability to understand videos in a streaming fashion, where frames are received sequentially.
Download the dataset from mjuicem/StreamingBench.
- Required files:
Real_Time_Visual_Understanding.csvandReal-Time Visual Understanding_*.zip.
- Videos: Download
src_videos.tar.parta[a-e]from JoeLeelyf/OVO-Bench (HF). - Metadata: Download
ovo_bench_new.jsonfrom JoeLeelyf/OVO-Bench (Github).
Supported Datasets: MLVU, EgoSchema, Videomme
We use standard benchmarks to verify that STC maintains high performance on general video understanding tasks.
# Example: Evaluating on MLVU
bash scripts/eval_offline_benchs.shTo evaluate egoschema or videomme, simply change the DATASET argument to the respective dataset name.
- Configuration: Update
eval/scripts/eval_ovobench.sh:- Set
TASK_JSONto the path ofovo_bench_new.json. - Set
VIDEO_DIRto the unzipped video directory.
- Set
bash scripts/ovobench_scipts/eval_rekv.sh
Then you can use the generated result file mentioned above to calculate the indicators.
bash scripts/ovobench_scipts/score_rekv.sh
- Configuration: Update
eval/scripts/eval_streamingbench.sh:- Set
TASK_CSVto the path of the CSV file. - Set
VIDEO_DIRto the unzipped video directory.
- Set
bash scripts/streamingbench_scripts/eval_rekv.sh
Then you can use the generated result file mentioned above to calculate the indicators.
bash scripts/streamingbench_scripts/score_rekv.sh
TODOTODOTODO- Thanks to ReKV for their great work and codebase.
- Thanks to StreamForest for their great work and codebase.
- Thanks to Dispider for their great work and codebase.
- Thanks to LiveCC for their great work and codebase.
Please consider citing our paper in your publications, if our findings help your research.
@misc{wang2025acceleratingstreamingvideolarge,
title={Accelerating Streaming Video Large Language Models via Hierarchical Token Compression},
author={Yiyu Wang and Xuyang Liu and Xiyan Gui and Xinying Lin and Boxue Yang and Chenfei Liao and Tailai Chen and Linfeng Zhang},
year={2025},
eprint={2512.00891},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.00891},
}For any question about our paper or code, please email liuxuyang@stu.scu.edu.cn or ustywan8@ljmu.ac.uk.