Seqev / qaxis-exact-scaling Star 0 Code Issues Pull requests Training-free EXACT-ranking top-K sparse attention. Ranking keys by ⟨u_Q,K⟩ (u_Q=Q/‖Q‖) is order-identical to true attention score. Entropy H_N=α·log N+β (α≈0.31) gives sublinear support N^α and PPL gap decaying as N^-(1-α). Verified system: exact Triton kernel, INT8 KV ×0.508, +0.714% PPL delta.. mips reproducible-research transformers triton llama inference-optimization kv-cache scaling-laws sparse-attention llm long-context training-free attention-entropy top-k-attention Updated Jun 2, 2026 TeX