Commit 03b1ca6
research(kv-cache): TriAttention + TurboQuant stacked compression analysis
Add deep research into three-axis KV cache compression:
- TriAttention (arXiv:2604.04921): trigonometric RoPE-based token sparsity, 10.7x
- Stacked compression: TriAttention × TurboQuant for ~50x KV reduction
- ADR-147: formal architecture decision with GOAP implementation plan
No published work combines these orthogonal methods. First-mover opportunity
for ruvLLM edge inference (128K context in 175MB on Pi 5).
Co-Authored-By: claude-flow <ruv@ruv.net>1 parent 5c2d469 commit 03b1ca6
3 files changed
Lines changed: 1126 additions & 0 deletions
File tree
- docs
- adr
- research/quantization-edge
0 commit comments