Skip to content

shauibi/STC

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🌊 Accelerating Streaming Video Large Language Models via Hierarchical Token Compression 🚀

Yiyu Wang1*, Xuyang Liu1,2*†, Xiyan Gui1,3, Xinying Lin4, Boxue Yang1,
Chenfei Liao1,5, Tailai Chen1, Linfeng Zhang1✉

1 EPIC Lab, Shanghai Jiao Tong University   2 Sichuan University
3 Huazhong University of Science and Technology   4 Sun Yat-sen University
5 Hong Kong University of Science and Technology (Guangzhou)

⚡ The first plug-and-play token compression framework for streaming video understanding.

🔥 News

  • 2025.12.02 🤗🤗 We release our latest work STC, the first plug-and-play inference acceleration framework for streaming video understanding! Code is available!
  • 2025.08.21 🎉🎉 Our VidCom2 has been accepted by EMNLP 2025 main conference!
  • 2025.05.21 🤗🤗 We release VidCom2, a plug-and-play inference acceleration method of VideoLLMs. Code is available!

📌 Highlights

STC is the first token compression framework for plug-and-play acceleration for streaming video understanding:

  • ⚡ Streaming-First Design: Optimized for latency-sensitive applications (e.g., live sports, AR glasses) where frames arrive continuously.
  • 🧩 STC-Cacher : Exploits temporal redundancy by caching visual features for similar frames (Cosine Similarity $> 0.85$), significantly reducing ViT encoding overhead.
  • ✂️ STC-Pruner: Compresses visual tokens after encoding to shorten the LLM prefill sequence while preserving spatiotemporal saliency.
  • 🔌 Plug-and-Play: Seamlessly integrates with SOTA VideoLLMs like ReKV, Dispider, StreamForest, and Livecc.

🦁 Core Codes

Core Implementation:

🛠 Preparation

We support the following models enhanced with STC. Code is coming soon.

Model Base Status Code Path
ReKV (LLaVA-OV) ✅ Supported model/llava_onevision_rekv.py
StreamForest 🚧 Coming Soon -
Dispider 🚧 Coming Soon -
LiveCC 🚧 Coming Soon -

Environment Settings

Original Models (recommended)

We evaluated our model under the same environments as the original models. So you may set the environments through following the requirements of the mentioned original models.

Links:

Original Models urls
ReKV https://github.com/Becomebright/ReKV
StreamForest https://github.com/MCG-NJU/StreamForest
Dispider https://github.com/Mark12Ding/Dispider
LiveCC https://github.com/showlab/livecc

Besides, we provide a replica for our environment here:

Use our environment
ReKV
cd ReKV
pip install -e .
cd model/longva
pip install -e .
StreamForest
cd StreamForest
conda env create -f environment-StreamForest.yml
Dispider
cd Dispider
conda env create -f environment-Dispider.yml
pip install -v . # for development mode, `pip install -v -e .`
LiveCC
cd LiveCC
conda env create -f environment-LiveCC.yml
pip install -v . # for development mode, `pip install -v -e .`

🚀 Performance Evaluation

We evaluate STC on both Online (Streaming) benchmarks to demonstrate real-time capabilities and Offline benchmarks to ensure robust general video understanding.

🌊 Online Benchmarks (Streaming)

These benchmarks evaluate the model's ability to understand videos in a streaming fashion, where frames are received sequentially.

1. StreamingBench

Download the dataset from mjuicem/StreamingBench.

  • Required files: Real_Time_Visual_Understanding.csv and Real-Time Visual Understanding_*.zip.

2. OVO-Bench


💾 Offline Benchmarks (Standard)

Supported Datasets: MLVU, EgoSchema, Videomme

We use standard benchmarks to verify that STC maintains high performance on general video understanding tasks.

Run ReKV

MLVU, EgoSchema, Videomme

# Example: Evaluating on MLVU
bash scripts/eval_offline_benchs.sh

To evaluate egoschema or videomme, simply change the DATASET argument to the respective dataset name.

OVO-Bench

  • Configuration: Update eval/scripts/eval_ovobench.sh:
    • Set TASK_JSON to the path of ovo_bench_new.json.
    • Set VIDEO_DIR to the unzipped video directory.
bash scripts/ovobench_scipts/eval_rekv.sh

Then you can use the generated result file mentioned above to calculate the indicators.

bash scripts/ovobench_scipts/score_rekv.sh

StreamingBench

  • Configuration: Update eval/scripts/eval_streamingbench.sh:
    • Set TASK_CSV to the path of the CSV file.
    • Set VIDEO_DIR to the unzipped video directory.
bash scripts/streamingbench_scripts/eval_rekv.sh

Then you can use the generated result file mentioned above to calculate the indicators.

bash scripts/streamingbench_scripts/score_rekv.sh

Run StreamForest

MLVU, EgoSchema, Videomme,OVO-Bench,StreamingBench

TODO

Run Dispider

OVO-Bench

TODO

Run LiveCC

OVO-Bench

TODO

👍 Acknowledgment

  • Thanks to ReKV for their great work and codebase.
  • Thanks to StreamForest for their great work and codebase.
  • Thanks to Dispider for their great work and codebase.
  • Thanks to LiveCC for their great work and codebase.

✏️ Citation

Please consider citing our paper in your publications, if our findings help your research.

@misc{wang2025acceleratingstreamingvideolarge,
      title={Accelerating Streaming Video Large Language Models via Hierarchical Token Compression}, 
      author={Yiyu Wang and Xuyang Liu and Xiyan Gui and Xinying Lin and Boxue Yang and Chenfei Liao and Tailai Chen and Linfeng Zhang},
      year={2025},
      eprint={2512.00891},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.00891}, 
}

📩 Contact

For any question about our paper or code, please email liuxuyang@stu.scu.edu.cn or ustywan8@ljmu.ac.uk.

About

Accelerating Streaming Video Large Language Models via Hierarchical Token Compression

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 99.0%
  • Shell 1.0%