top-k-attention

Here is 1 public repository matching this topic...

Seqev / qaxis-exact-scaling

Training-free EXACT-ranking top-K sparse attention. Ranking keys by ⟨u_Q,K⟩ (u_Q=Q/‖Q‖) is order-identical to true attention score. Entropy H_N=α·log N+β (α≈0.31) gives sublinear support N^α and PPL gap decaying as N^-(1-α). Verified system: exact Triton kernel, INT8 KV ×0.508, +0.714% PPL delta..

mips reproducible-research transformers triton llama inference-optimization kv-cache scaling-laws sparse-attention llm long-context training-free attention-entropy top-k-attention

Updated Jun 2, 2026
TeX

Improve this page

Add a description, image, and links to the top-k-attention topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the top-k-attention topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

top-k-attention

Here is 1 public repository matching this topic...

Seqev / qaxis-exact-scaling

Improve this page

Add this topic to your repo