Skip to content

byminji/map-the-flow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs

[ICLR 2026] Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs
Minji Kim*, Taekyung Kim*, Bohyung Han
(* Equal Contribution)

website arXiv OpenReview Hugging Face

Official PyTorch implementation of the ICLR 2026 paper "Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs"

Updates

  • 2026/03/03: Code and models released.
  • 2026/01/26: Our paper is accepted to ICLR 2026 with strong reviews! πŸŽ‰

Overview

teaser

TL;DR: This paper presents a systematic analysis of where and how information flows in VideoLLMs for temporal reasoning in VideoQA, revealing key patterns and effective pathways.

πŸ“ Summary of our findings on VideoLLMs' information flow:

(a) Temporal reasoning begins with cross-frame interactions within video tokens at early-middle layers green, followed by video-language integration into temporal keywords in the question purple. This information is conveyed to the last token at middle-late layers orange, where answer generation occurs yellow.

(b) These effective pathways are identified via Attention Knockout, which disconnects attention pairs and tracks the drop in probability of the final answer to quantify their impact.

(c) Layer-wise answer probability rises immediately after video-language integration, indicating that the model is ready to predict correct answers after the middle layers.

Based on our analysis, we show that VideoLLMs can retain their VideoQA performance by selecting effective information pathways while suppressing a substantial amount of attention edges, e.g., 58% in LLaVA-NeXT-7B-Video-FT.

πŸ“ This repository supports:

  • Causal intervention tools for VideoLLMs (e.g., Attention Knockout, Logit Lens, Attention Map Visualization)
  • Reproducible experiments from our paper, including figure plotting code
  • Training and evaluation across various model series and video benchmarks

Models

You can download all model checkpoints from the Hugging Face links below. We fine-tuned LLaVA-NeXT and Mini-InternVL on VideoChat2-IT to analyze the impact of video instruction tuning on model behavior. We also adopted VideoLLaMA3 without additional fine-tuning.

Model Link Initialized From
LLaVA-NeXT-7B-Video-FT Model on HF llava-hf/llava-v1.6-vicuna-7b-hf
LLaVA-NeXT-13B-Video-FT Model on HF llava-hf/llava-v1.6-vicuna-13b-hf
Mini-InternVL-4B-Video-FT Model on HF OpenGVLab/Mini-InternVL-Chat-4B-V1-5
VideoLLaMA3-7B - DAMO-NLP-SG/VideoLLaMA3-7B

Environments

Installation

Tested with Python 3.10, PyTorch 2.2.1, CUDA 11.8. Other versions may be compatible.

Step 1: Create a virtual environment

  • Option 1: PyTorch Docker image with torch==2.2.1, torchaudio==2.2.1, torchvision==0.17.1

    docker run -it --gpus all --ipc=host --rm --name=map_the_flow \
    pytorch/pytorch:2.2.1-cuda11.8-cudnn8-devel
  • Option 2: Conda environment

    conda create -n map_the_flow python=3.10 -y
    conda activate map_the_flow
    conda install pytorch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 \
    pytorch-cuda=11.8 -c pytorch -c nvidia -y

Step 2: Clone the repository and install dependencies

git clone https://github.com/byminji/map-the-flow.git
cd map-the-flow

pip install -r requirements.txt
pip install mmcv-full==1.7.2 --no-build-isolation # mmcv-full must be built from source

Data preparation

You can download all evaluation data from the Hugging Face links below. After downloading, set the paths in tasks/eval/config_dataset.py.

  • TVBench: We mainly adopted TVBench for our analysis.
  • TOMATO: Adopted for effective pathway analysis.
  • LongVideoBench: Long video understanding analysis.
  • Video-MME: Spatial understanding analysis.
  • VCGBench: Used for open-ended analysis. We followed the original repo to prepare the evaluation data.

Analysis

All implementations are in analysis folder and run scripts are in scripts/analysis. Results including graph plots and raw data are saved under ${output_path}/${dataset_name}/${target}/${model_name}. To reproduce the plot style used in our paper, run analysis/visualize_graph_plots.py on the saved JSONs.

Common variables

Modify these variables at the top of each script before running.

Variable Description Example
dataset_name Evaluation dataset tvbench
output_path Root directory for saving results workspace/outputs/information_flow_analysis
video_model_path Path to the fine-tuned model workspace/models/LLaVA-NeXT-7B-Video-FT
base_model_path Path to the base model workspace/models/llava-v1.6-vicuna-7b-hf
conv_mode Conversation template eval_mvbench
pooling_shape Token pooling shape (T-H-W) 8-12-12
task_id Task index (-1 = full dataset) 0

Task IDs for TVBench: 0=Action Antonym, 3=Action Sequence, 5=Moving Direction, 6=Object Count, 8=Scene Transition.

Information flow analysis

--target Description
cross-frame Block cross-frame interactions among video tokens
vql-to-ql Block video/question/last β†’ question/last flows
question-and-options-to-last Block question-only, true, false options β†’ last token
vq-to-true-opt Block video/question β†’ true option token

Generation probability analysis

Effective pathway analysis

Logit Lens analysis

Attention map visualization

Training

If you want to reproduce our training process, please refer to docs/TRAIN.md.

Acknowledgement

This project is built upon the following works:

We thank all authors who contributed to these foundational projects.

Citation

If you find our paper useful in your research, please consider citing:

@inproceedings{kim2026map,
  author    = {Kim, Minji and Kim, Taekyung and Han, Bohyung},
  title     = {Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year      = {2026},
}

@article{kim2025map,
  author    = {Kim, Minji and Kim, Taekyung and Han, Bohyung},
  title     = {Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs},
  journal   = {arXiv preprint arXiv:2510.13251},
  year      = {2025},
}

Contact

If you have any questions, please create an issue or contact minji@snu.ac.kr and taekyung.k@navercorp.com.

About

[ICLR 2026] Official implementation of the paper "Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors