You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Benchmark and decision framework for KV cache transfer compression in disaggregated LLM serving.
4
6
5
7
KVShuttle evaluates 14+ compression strategies across multiple models and sequence lengths, providing GPU-calibrated timing data and analytical transfer simulation to help practitioners choose the right compression scheme for their bandwidth regime.
@@ -31,7 +33,7 @@ KVShuttle evaluates 14+ compression strategies across multiple models and sequen
0 commit comments