Skip to content

Commit 03b1ca6

Browse files
Reuvenruvnet
andcommitted
research(kv-cache): TriAttention + TurboQuant stacked compression analysis
Add deep research into three-axis KV cache compression: - TriAttention (arXiv:2604.04921): trigonometric RoPE-based token sparsity, 10.7x - Stacked compression: TriAttention × TurboQuant for ~50x KV reduction - ADR-147: formal architecture decision with GOAP implementation plan No published work combines these orthogonal methods. First-mover opportunity for ruvLLM edge inference (128K context in 175MB on Pi 5). Co-Authored-By: claude-flow <ruv@ruv.net>
1 parent 5c2d469 commit 03b1ca6

3 files changed

Lines changed: 1126 additions & 0 deletions

File tree

0 commit comments

Comments
 (0)