Skip to content

chfeng-cs/vllm-contributions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

vLLM Contributions — Ethan Feng (chfeng-cs)

Focused area: KV Cache Transfer · Scheduler Optimization · HMA

Core repos: vllm-project/vllm


Contributions

Core Feature

PR Title Status Impact
vllm#42086 [Core][KV Connector] Bounded early prefetch for waiting requests ❌ Closed ~102ms TTFT reduction (benchmark on A10)
vllm#41847 [KV Transfer] Enable HMA by default for connectors that support it ✅ Merged Reduces user config burden; fixes MultiConnector gap vs PR #42045

Docs

PR Title Status Impact
vllm#42073 [Docs] Fix RLHF example links ✅ Merged
vllm#42066 [Docs] Fix OpenAI batch model argument examples ✅ Merged

Other

PR Title Status Impact
vllm#44101 [LMCache] fix lookup lock leak when request is aborted before alloc 🔄 Open
vllm#44097 [LMCache] fix missing cache_salt in free_lookup_locks call 🔄 Open
vllm#42872 [Bugfix][Model Runner v2] Fix MRV2 KV cache kernel block sizing. ❌ Closed
vllm#42321 [KV Connector] Implement on_new_request for LMCacheMPConnector 🔄 Open
vllm#42214 [Test][Bugfix] Fix mypy error: missing enable_prompt_embeds arg in test_tp_sp_nvfp4_generation ❌ Closed
vllm#42206 [Metrics] Add group-aware KV cache capacity Prometheus gauges 🔄 Open
vllm#42160 [Docs] Fix broken local links ✅ Merged
vllm#42077 [Docs] Update server entrypoint examples ✅ Merged

Last synced: 2026-06-02 06:37 UTC


Background

Brief context on the work: prefill-decode disaggregation requires efficient KV cache transfer between nodes. The PRs above address scheduler-level prefetch scheduling and hybrid KV cache manager (HMA) defaults to reduce latency and simplify configuration.

Related design notes in notes/.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages