🌊 Accelerating Streaming Video Large Language Models via Hierarchical Token Compression 🚀

Yiyu Wang^1, Xuyang Liu^1,2†, Xiyan Gui^1,3, Xinying Lin⁴, Boxue Yang¹,
Chenfei Liao^1,5, Tailai Chen¹, Linfeng Zhang^1✉

¹ EPIC Lab, Shanghai Jiao Tong University ² Sichuan University
³ Huazhong University of Science and Technology ⁴ Sun Yat-sen University
⁵ Hong Kong University of Science and Technology (Guangzhou)

⚡ The first plug-and-play token compression framework for streaming video understanding.

🔥 News

2025.12.02 🤗🤗 We release our latest work STC, the first plug-and-play inference acceleration framework for streaming video understanding! Code is available!
2025.08.21 🎉🎉 Our VidCom² has been accepted by EMNLP 2025 main conference!
2025.05.21 🤗🤗 We release VidCom², a plug-and-play inference acceleration method of VideoLLMs. Code is available!

📌 Highlights

STC is the first token compression framework for plug-and-play acceleration for streaming video understanding:

⚡ Streaming-First Design: Optimized for latency-sensitive applications (e.g., live sports, AR glasses) where frames arrive continuously.
🧩 STC-Cacher : Exploits temporal redundancy by caching visual features for similar frames (Cosine Similarity $> 0.85$), significantly reducing ViT encoding overhead.
✂️ STC-Pruner: Compresses visual tokens after encoding to shorten the LLM prefill sequence while preserving spatiotemporal saliency.
🔌 Plug-and-Play: Seamlessly integrates with SOTA VideoLLMs like ReKV, Dispider, StreamForest, and Livecc.

🦁 Core Codes

Core Implementation:

Cache Logic: model/cache.py (Class: STC_CACHE)
Prune Logic: model/prune.py (Class: STC_Pruner)

🛠 Preparation

We support the following models enhanced with STC. Code is coming soon.

Model Base	Status	Code Path
ReKV (LLaVA-OV)	✅ Supported	`model/llava_onevision_rekv.py`
StreamForest	🚧 Coming Soon	-
Dispider	🚧 Coming Soon	-
LiveCC	🚧 Coming Soon	-

Environment Settings

Original Models (recommended)

We evaluated our model under the same environments as the original models. So you may set the environments through following the requirements of the mentioned original models.

Links:

Original Models	urls
ReKV	https://github.com/Becomebright/ReKV
StreamForest	https://github.com/MCG-NJU/StreamForest
Dispider	https://github.com/Mark12Ding/Dispider
LiveCC	https://github.com/showlab/livecc

Besides, we provide a replica for our environment here:

Use our environment

ReKV

cd ReKV
pip install -e .
cd model/longva
pip install -e .

StreamForest

cd StreamForest
conda env create -f environment-StreamForest.yml

Dispider

cd Dispider
conda env create -f environment-Dispider.yml
pip install -v . # for development mode, `pip install -v -e .`

LiveCC

cd LiveCC
conda env create -f environment-LiveCC.yml
pip install -v . # for development mode, `pip install -v -e .`

🚀 Performance Evaluation

We evaluate STC on both Online (Streaming) benchmarks to demonstrate real-time capabilities and Offline benchmarks to ensure robust general video understanding.

🌊 Online Benchmarks (Streaming)

These benchmarks evaluate the model's ability to understand videos in a streaming fashion, where frames are received sequentially.

1. StreamingBench

Download the dataset from mjuicem/StreamingBench.

Required files: Real_Time_Visual_Understanding.csv and Real-Time Visual Understanding_*.zip.

2. OVO-Bench

Videos: Download src_videos.tar.parta[a-e] from JoeLeelyf/OVO-Bench (HF).
Metadata: Download ovo_bench_new.json from JoeLeelyf/OVO-Bench (Github).

💾 Offline Benchmarks (Standard)

Supported Datasets: MLVU, EgoSchema, Videomme

We use standard benchmarks to verify that STC maintains high performance on general video understanding tasks.

Download benchmarks under data/

Run ReKV

`MLVU`, `EgoSchema`, `Videomme`

# Example: Evaluating on MLVU
bash scripts/eval_offline_benchs.sh

To evaluate egoschema or videomme, simply change the DATASET argument to the respective dataset name.

`OVO-Bench`

Configuration: Update eval/scripts/eval_ovobench.sh:
- Set TASK_JSON to the path of ovo_bench_new.json.
- Set VIDEO_DIR to the unzipped video directory.

bash scripts/ovobench_scipts/eval_rekv.sh

Then you can use the generated result file mentioned above to calculate the indicators.

bash scripts/ovobench_scipts/score_rekv.sh

`StreamingBench`

Configuration: Update eval/scripts/eval_streamingbench.sh:
- Set TASK_CSV to the path of the CSV file.
- Set VIDEO_DIR to the unzipped video directory.

bash scripts/streamingbench_scripts/eval_rekv.sh

Then you can use the generated result file mentioned above to calculate the indicators.

bash scripts/streamingbench_scripts/score_rekv.sh

Run StreamForest

`MLVU`, `EgoSchema`, `Videomme`,`OVO-Bench`,`StreamingBench`

TODO

Run Dispider

`OVO-Bench`

TODO

Run LiveCC

`OVO-Bench`

TODO

👍 Acknowledgment

Thanks to ReKV for their great work and codebase.
Thanks to StreamForest for their great work and codebase.
Thanks to Dispider for their great work and codebase.
Thanks to LiveCC for their great work and codebase.

✏️ Citation

Please consider citing our paper in your publications, if our findings help your research.

@misc{wang2025acceleratingstreamingvideolarge,
      title={Accelerating Streaming Video Large Language Models via Hierarchical Token Compression}, 
      author={Yiyu Wang and Xuyang Liu and Xiyan Gui and Xinying Lin and Boxue Yang and Chenfei Liao and Tailai Chen and Linfeng Zhang},
      year={2025},
      eprint={2512.00891},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.00891}, 
}

📩 Contact

For any question about our paper or code, please email liuxuyang@stu.scu.edu.cn or ustywan8@ljmu.ac.uk.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🌊 Accelerating Streaming Video Large Language Models via Hierarchical Token Compression 🚀

🔥 News

📌 Highlights

🦁 Core Codes

🛠 Preparation

Environment Settings

Original Models (recommended)

ReKV

StreamForest

Dispider

LiveCC

🚀 Performance Evaluation

🌊 Online Benchmarks (Streaming)

1. StreamingBench

2. OVO-Bench

💾 Offline Benchmarks (Standard)

Run ReKV

`MLVU`, `EgoSchema`, `Videomme`

`OVO-Bench`

`StreamingBench`

Run StreamForest

`MLVU`, `EgoSchema`, `Videomme`,`OVO-Bench`,`StreamingBench`

Run Dispider

`OVO-Bench`

Run LiveCC

`OVO-Bench`

👍 Acknowledgment

✏️ Citation

📩 Contact

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

🌊 Accelerating Streaming Video Large Language Models via Hierarchical Token Compression 🚀

🔥 News

📌 Highlights

🦁 Core Codes

🛠 Preparation

Environment Settings

Original Models (recommended)

ReKV

StreamForest

Dispider

LiveCC

🚀 Performance Evaluation

🌊 Online Benchmarks (Streaming)

1. StreamingBench

2. OVO-Bench

💾 Offline Benchmarks (Standard)

Run ReKV

MLVU, EgoSchema, Videomme

OVO-Bench

StreamingBench

Run StreamForest

MLVU, EgoSchema, Videomme,OVO-Bench,StreamingBench

Run Dispider

OVO-Bench

Run LiveCC

OVO-Bench

👍 Acknowledgment

✏️ Citation

📩 Contact

`MLVU`, `EgoSchema`, `Videomme`

`OVO-Bench`

`StreamingBench`

`MLVU`, `EgoSchema`, `Videomme`,`OVO-Bench`,`StreamingBench`

`OVO-Bench`

`OVO-Bench`