Skip to content
#

top-k-attention

Here is 1 public repository matching this topic...

Training-free EXACT-ranking top-K sparse attention. Ranking keys by ⟨u_Q,K⟩ (u_Q=Q/‖Q‖) is order-identical to true attention score. Entropy H_N=α·log N+β (α≈0.31) gives sublinear support N^α and PPL gap decaying as N^-(1-α). Verified system: exact Triton kernel, INT8 KV ×0.508, +0.714% PPL delta..

  • Updated Jun 2, 2026
  • TeX

Improve this page

Add a description, image, and links to the top-k-attention topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the top-k-attention topic, visit your repo's landing page and select "manage topics."

Learn more