Counting-Stars (★)
-
Updated
Nov 24, 2025 - Jupyter Notebook
Counting-Stars (★)
[NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models
Top-K sparse attention has no critical key budget: a 4× swing of k_eff barely moves long-context retrieval accuracy across 3 models (Llama-1B/3B, Qwen2.5-3B). The limit is the base model's disambiguation, not the compressor. Paper + raw per-prompt logs + pre-registrations. Selection is exact; kernel port validated bitwise.
Semantically hard multi-needle long-context data generator. Stop testing LLMs with random-password needles.
Add a description, image, and links to the needle-in-a-haystack topic page so that developers can more easily learn about it.
To associate your repository with the needle-in-a-haystack topic, visit your repo's landing page and select "manage topics."