Infrastructure engineer focused on LLM inference systems.
- M.S. Computer Science — Shanghai Jiao Tong University
- B.S. Computer Science — Harbin Institute of Technology
- 2 yrs at Alibaba
Currently contributing to vllm-project/vllm — KV cache transfer, scheduler optimization, and hybrid KV cache management (HMA).
LLM Inference — vLLM internals, KV cache transfer, prefill-decode disaggregation, PagedAttention
Kernel Development — CUDA, Triton (fused kernels, memory hierarchy optimization)
Distributed Systems — background in distributed databases (PolarDB/MySQL), now applying to inference clusters
| Project | Area | Highlights |
|---|---|---|
| vllm-project/vllm | Scheduler / KV Cache | Bounded prefetch scheduling, HMA default behavior, metrics fixes |
→ Full contribution list: vllm-contributions
Python CUDA Triton C++ PyTorch Linux

