Focused area: KV Cache Transfer · Scheduler Optimization · HMA
Core repos: vllm-project/vllm
| PR | Title | Status | Impact |
|---|---|---|---|
| vllm#42086 | [Core][KV Connector] Bounded early prefetch for waiting requests | ❌ Closed | ~102ms TTFT reduction (benchmark on A10) |
| vllm#41847 | [KV Transfer] Enable HMA by default for connectors that support it | ✅ Merged | Reduces user config burden; fixes MultiConnector gap vs PR #42045 |
| PR | Title | Status | Impact |
|---|---|---|---|
| vllm#42073 | [Docs] Fix RLHF example links | ✅ Merged | — |
| vllm#42066 | [Docs] Fix OpenAI batch model argument examples | ✅ Merged | — |
| PR | Title | Status | Impact |
|---|---|---|---|
| vllm#44101 | [LMCache] fix lookup lock leak when request is aborted before alloc | 🔄 Open | — |
| vllm#44097 | [LMCache] fix missing cache_salt in free_lookup_locks call | 🔄 Open | — |
| vllm#42872 | [Bugfix][Model Runner v2] Fix MRV2 KV cache kernel block sizing. | ❌ Closed | — |
| vllm#42321 | [KV Connector] Implement on_new_request for LMCacheMPConnector | 🔄 Open | — |
| vllm#42214 | [Test][Bugfix] Fix mypy error: missing enable_prompt_embeds arg in test_tp_sp_nvfp4_generation | ❌ Closed | — |
| vllm#42206 | [Metrics] Add group-aware KV cache capacity Prometheus gauges | 🔄 Open | — |
| vllm#42160 | [Docs] Fix broken local links | ✅ Merged | — |
| vllm#42077 | [Docs] Update server entrypoint examples | ✅ Merged | — |
Last synced: 2026-06-02 06:37 UTC
Brief context on the work: prefill-decode disaggregation requires efficient KV cache transfer between nodes. The PRs above address scheduler-level prefetch scheduling and hybrid KV cache manager (HMA) defaults to reduce latency and simplify configuration.
Related design notes in notes/.