Exact Query-Axis Ranking and Entropy-Concentration Scaling for Training-Free Top-K Sparse Attention

Evgenii Vyaltsev (ORCID 0009-0004-3712-6798), Daniil Vyaltsev — June 2026.

A training-free, exact-ranking query-axis selector for top-K sparse attention, with an explicit measured entropy-concentration scaling exponent.

TL;DR

Theorem 1 (exact ranking). Ranking keys by the projection onto the unit query direction u_Q = Q/‖Q‖ is order-identical to ranking by the raw attention score, so top-k_eff selection by ⟨u_Q, K_i⟩ returns the true top-k_eff — no learned parameters, no ranking approximation. (Exact where DSA/NSA are learned and Quest/InfLLM/Loki are approximate.)
Theorem 2 (entropy-concentration scaling, descriptive). Attention entropy grows logarithmically, H_N = α·log N + β with a measured α ≈ 0.31 (Llama-3.2-1B, WikiText-2), so the effective support K_eff = e^{H_N} ∼ N^α is sublinear and the coverage-floor perplexity gap decays as ΔPPL(N) = O(N^{-(1-α)}): the relative advantage of sparsity strengthens with context length. Reported as a descriptive law; the causal test (matched-magnitude intervention) is future work.
Verified system (training-free). A Triton selection kernel validated bitwise-identical to a pure-PyTorch reference; bit-identical flag-off behaviour; an INT8 KV cache at 0.508× footprint, +0.714% PPL.

Repository layout

paper2_qaxis_scaling.tex   LaTeX source (arXiv-ready)
paper2_qaxis_scaling.pdf   compiled preprint (7 pp)
make_figures.py            regenerates the theory figures (Thm 1 schematic, Thm 3 curves)
figures/                   fig1 (selector), fig2 (scaling) — .pdf + .png

Status of the numbers (read before citing)

Verified and stated without qualification: Theorem 1 (algebra), the exact-kernel bitwise validation (max|ΔO| = 0, index agreement 1.0), bit-identical flag-off (max|Δlogit| = 0), and the INT8 KV cache (cache ratio 0.5078, +0.714% PPL), reproduced by the sealed self-check (567 tests pass).
Descriptive: Theorem 2. α ≈ 0.31 is a measured exponent for one model/corpus, not a universal constant; its causal status is untested by design.
Deferred to a companion empirical report (future release): the full multi-seed ΔPPL(N) sweep and the latency decomposition. Point estimates are seed- and engine-version sensitive; the robust, version-independent claim is the shape (advantage monotone-increasing in N for α<1, Fig. 2). The engine, exact kernel, and sealed self-check live in the project's dcr-attention repository.

Figures

python3 -m pip install matplotlib numpy
python3 make_figures.py

Figures are theory/illustrative (the Theorem 1 selector schematic and the Theorem 2 predicted curves for α=0.31); no measured PPL points are fabricated.

Rebuild the PDF

pdflatex paper2_qaxis_scaling.tex && pdflatex paper2_qaxis_scaling.tex

Relation to the project

Companion to No Critical Key Budget in Query-Axis Top-K Sparse Attention (retention; DOI 10.5281/zenodo.20514229), which uses the same exact selector and shows the long-context limit is the base model, not the key budget. This paper supplies the selector's exactness (Thm 1) and the cost-scaling framing (Thm 2).

License

Text/figures: CC-BY-4.0. Code: MIT.

Citation

See CITATION.cff; full citation/BibTeX added once the Zenodo DOI is minted.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
figures		figures
CITATION.cff		CITATION.cff
LICENSE-CODE.txt		LICENSE-CODE.txt
LICENSE-TEXT-DATA.txt		LICENSE-TEXT-DATA.txt
README.md		README.md
make_figures.py		make_figures.py
paper2_qaxis_scaling.pdf		paper2_qaxis_scaling.pdf
paper2_qaxis_scaling.tex		paper2_qaxis_scaling.tex

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exact Query-Axis Ranking and Entropy-Concentration Scaling for Training-Free Top-K Sparse Attention

TL;DR

Repository layout

Status of the numbers (read before citing)

Figures

Rebuild the PDF

Relation to the project

License

Citation

About

Licenses found

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Exact Query-Axis Ranking and Entropy-Concentration Scaling for Training-Free Top-K Sparse Attention

TL;DR

Repository layout

Status of the numbers (read before citing)

Figures

Rebuild the PDF

Relation to the project

License

Citation

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages