Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
-
Updated
Apr 29, 2026 - Python
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
Boosting GPU utilization for LLM serving via dynamic spatial-temporal prefill & decode orchestration
TensorFusion landing page and product docs
Tricks for GPU sharing pods on Kubernetes without any use of middleware like HAMi or DRA
SLURM-native software GPU slicing for NVIDIA clusters using memory limits and compute time-slicing.
GPUs unite using secure and private crypto transactions to distribute compute to decentralized nodes.
Distributed peer-to-peer LLM inference network. Volunteer your GPU, earn AI credits, run any open-source model for free. Anonymous, encrypted, unstoppable.
Add a description, image, and links to the gpu-sharing topic page so that developers can more easily learn about it.
To associate your repository with the gpu-sharing topic, visit your repo's landing page and select "manage topics."