Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1443 commits
Select commit Hold shift + click to select a range
dba9537
Report error log after vllm bench serve (#31808)
elvircrn Jan 6, 2026
d498997
[Spec Decode][UX] Add acceptance stats to `vllm bench serve` report (…
MatthewBonanni Jan 6, 2026
2a42ae7
[ROCm][CI] Fix ModernBERT token classification test numerical accurac…
AndreasKaratzas Jan 6, 2026
e5d427e
[ROCm][CI] Pinning timm lib version to fix ImportError in Multi-Modal…
AndreasKaratzas Jan 6, 2026
309a8f6
[Bugfix] Handle mistral tokenizer in get_hf_processor (#31817)
DarkLight1337 Jan 6, 2026
9a1d20a
[CI] Add warmup run in test_fusion_attn (#31183)
angelayi Jan 7, 2026
364a8bc
[ROCm][CI] Fix plugin tests (2 GPUs) failures on ROCm and removing `V…
AndreasKaratzas Jan 7, 2026
6f35154
[Frontend] Implement robust video frame recovery for corrupted videos…
vSeamar Jan 7, 2026
873480d
[Misc][BE] Type coverage for vllm/compilation [1/3] (#31554)
Lucaskabela Jan 7, 2026
5b833be
[1/2][lmcache connector] clean up lmcache multi-process adapter (#31…
ApostaC Jan 7, 2026
a051525
[Model] Enable LoRA support for PaliGemma (#31656)
A1c0r-Z Jan 7, 2026
1b8af95
[Doc] Update release docs (#31799)
DarkLight1337 Jan 7, 2026
f09c5fe
Change warning in get_current_vllm_config to report caller's line num…
tlrmchlsmth Jan 7, 2026
0a2c2dc
fixed mypy warnings for files vllm/v1/attention with TEMPORARY workar…
MrIceCreamMan Jan 7, 2026
aafd4d2
[Chore] Try remove `init_cached_hf_modules` (#31786)
DarkLight1337 Jan 7, 2026
6409004
[ROCm][AITER] bugfix accuracy regression in ROCM_AITER_TRITON_MLA bac…
vllmellm Jan 7, 2026
c7a79d4
[Attention][3/n] Remove usage of deprecated `seq_lens_cpu` and `num_c…
LucasWilkinson Jan 7, 2026
55caa60
refactor: find_loaded_library (#31866)
tom-zju Jan 7, 2026
efeaac9
[Bugfix] Fix race condition in async-scheduling for vlm model (#31841)
tianshu-Michael-yu Jan 7, 2026
4829148
[BugFix] LoRA: Support loading base_layer of experts (#31104)
HollowMan6 Jan 7, 2026
4614c5a
[Bugfix][Hardware][AMD] Consolidate FP8 min/max values helper functio…
c0de128 Jan 7, 2026
0dd5dee
[Bugfix][Kernel] fix bias adding in triton kernel implemented fused m…
xuebwang-amd Jan 7, 2026
e759637
[Refactor][TPU] Remove torch_xla path and use tpu-inference (#30808)
weiyu0824 Jan 7, 2026
59fe6f2
[XPU]fallback to TRITON_ATTN on xpu when use float32 dtype (#31762)
1643661061leo Jan 7, 2026
1f33e38
[Model] Cleanup: Remove redundant manual definition of `make_empty_in…
maang-h Jan 7, 2026
0790f07
[Misc] Improve error messages for unsupported types and parameters (#…
BlankRH Jan 7, 2026
d111bc5
[Bugfix][MTP] Fix GLM4 MoE fp8 loading with MTP on (#31757)
andyl98 Jan 7, 2026
41cfa50
[ROCm][AITER] fix wrong argument passed to AITER `flash_attn_varlen_…
vllmellm Jan 7, 2026
9741387
[Refactor] GLM-ASR Modeling (#31779)
JaredforReal Jan 7, 2026
b665bbc
[Chore] Migrate V0 attention utils (#31891)
DarkLight1337 Jan 7, 2026
1ab055e
[OpenAI] Extend VLLMValidationError to additional validation paramete…
R3hankhan123 Jan 7, 2026
cc6dafa
[Perf][Kernels] Enable FlashInfer DeepGEMM swapAB on SM90 (for W8A8 L…
katec846 Jan 7, 2026
b7036c8
[Refactor] Clean up pooler modules (#31897)
DarkLight1337 Jan 7, 2026
1d9e9ae
[Bugfix]: prevent leaking tokens in crash log (#30751)
dr75 Jan 7, 2026
b89443b
[KVConnector]: Enable Cross-layers KV cache layout for MultiConnector…
kfirtoledo Jan 7, 2026
30399cc
UX: add vLLM env info in '/server_info' (#31899)
jeejeelee Jan 7, 2026
bf184a6
Enable quantized attention in NemotronH models (#31898)
roikoren755 Jan 7, 2026
05f47bd
[Doc] Fix: Correct vLLM announcing blog post link in docs (#31868)
Ayobami-00 Jan 7, 2026
f347ac6
[Perf] Fuse stride preparation for NVFP4 cutlass_moe (#31837)
mgoin Jan 7, 2026
c907d22
[refactor] refactor memory constants usage (#31865)
andyxning Jan 7, 2026
0ada960
[Kernel] Support bias type in grouped_topk kernel (#31781)
xyang16 Jan 7, 2026
6170d47
[EPLB] Optimize EPLB with numpy (#29499)
ilmarkov Jan 7, 2026
10ef65e
[BugFix] Fix bad words with speculative decoding (#31908)
njhill Jan 7, 2026
ffc0a27
Add back missing DeepEP LL params (#31911)
elvircrn Jan 7, 2026
5dcd7ef
[MoE Refactor][15/N] Apply Refactor to Fp8 (#31415)
robertgshaw2-redhat Jan 8, 2026
0d76674
[0/N][Attention] Fix miscellaneous pre-commit issues (#31924)
MatthewBonanni Jan 8, 2026
25eef3d
feat(moe): Add is_act_and_mul=False support for Triton MoE kernels (#…
rabi Jan 8, 2026
39d8200
fix(rocm): add early return in get_flash_attn_version for ROCm (#31286)
rabi Jan 8, 2026
8dd2419
[CI] Skip Qwen-VL in multimodal processing tests due to flaky externa…
AndreasKaratzas Jan 8, 2026
9f6dcb7
[MoE Refactor][16/N] Apply Refactor to NVFP4 (#31692)
robertgshaw2-redhat Jan 8, 2026
a79079f
[BugFix] Fix flakiness in test_eagle_dp for PyTorch 2.10 (#31915)
zou3519 Jan 8, 2026
c4041f3
[ROCm][LoRA] Fix MoE accuracy regression by preserving float32 router…
AndreasKaratzas Jan 8, 2026
087a138
[ROCm][CI] Fix attention backend test flakiness from uninitialized KV…
AndreasKaratzas Jan 8, 2026
cddbc2b
[ROCm][CI] Add rocm support for run-multi-node-test.sh (#31922)
charlifu Jan 8, 2026
f1b1bea
[CI][BugFix][AMD] Actually skip tests marked @pytest.mark.skip_v1 (#3…
rasmith Jan 8, 2026
6b2a672
[Doc] Add Claude code usage example (#31188)
mgoin Jan 8, 2026
5f2a473
[ROCm][CI] v1 cpu offloading attention backend fix (#31833)
AndreasKaratzas Jan 8, 2026
9572f74
[Model] Enable LoRA support for tower and connector in DotsOCR (#31825)
ShaanveerS Jan 8, 2026
2ab441b
[platform] add dp_metadata arg to set_additional_forward_context (#31…
Ronald1995 Jan 8, 2026
be6a81f
[chore] Update FA commit (#30460)
LucasWilkinson Jan 8, 2026
791b2fc
[grpc] Support gRPC server entrypoint (#30190)
CatherineSue Jan 8, 2026
287b37c
[BugFix] Fix spec decoding edge case bugs (#31944)
njhill Jan 8, 2026
d3235cb
[Fix] Enable mm_processor_cache with vision LoRA (#31927)
prashanth058 Jan 8, 2026
e5173d3
[Bugfix] Remove the num_hidden_layers override for glm4_moe (#31745)
andyl98 Jan 8, 2026
63baa28
[Model] Enable LoRA support for tower and connector in GLM4-V (#31652)
Zyyeric Jan 8, 2026
107cf8e
fix(rocm): Add get_supported_kernel_block_sizes() to ROCM_ATTN (#31712)
rabi Jan 8, 2026
33156f5
[docker] A follow-up patch to fix #30913: `[docker] install cuda13 ve…
wangshangsam Jan 8, 2026
573a1d1
[ROCm]Skip test_torchao.py::test_pre_quantized_model on CDNA3 arch (#…
ZhiweiYan-96 Jan 8, 2026
eac3b96
[Models] Allow converting Qwen3-VL into Reranker model (#31890)
Isotr0py Jan 8, 2026
b634e61
Decouple page_size_bytes calculation in AttentionSpec for TPU/RPA Com…
Lumosis Jan 8, 2026
8cbdc7e
[CI/Build] Enable test_kv_cache_events_dp for AMD (#31834)
rjrock Jan 8, 2026
1f21429
fix(compile): apply partition wrapper when loading AOT cached functio…
devbyteai Jan 8, 2026
96fcd3c
[Misc] Support qwen3-next lora (#31719)
BJWang-ant Jan 8, 2026
04a4966
RayLLM Bugfix - Preserve obj store URL for multi engine_config creati…
omer-dayan Jan 8, 2026
d1b6fe0
[Chore] Further cleanup pooler (#31951)
DarkLight1337 Jan 8, 2026
5576227
[Model] Standardize common vision encoders (#31947)
DarkLight1337 Jan 8, 2026
2972a05
[MM Encoder]: Make MMEncoderAttention's `scale` takes effect properly…
Isotr0py Jan 8, 2026
18d4e48
[Voxtral] Fix speech transcription api (#31388)
patrickvonplaten Jan 8, 2026
59d260f
[Model] Add Grok-2 (#31847)
dangoldbj Jan 8, 2026
03fd76c
[Model] Add LFM2-VL model support (#31758)
tianshu-Michael-yu Jan 8, 2026
1123a87
[Model] Enable LoRA support for Pixtral (#31724)
A1c0r-Z Jan 8, 2026
7645bc5
[OpenAI] Fix tool_choice=required streaming when output has trailing …
maylikenoother Jan 8, 2026
72c068b
[CI] [Bugfix] Fix unbounded variable in `run-multi-node-test.sh` (#31…
tjtanaa Jan 8, 2026
1da3a54
[Docs]: update claude code url (#31971)
chaunceyjiang Jan 8, 2026
fe86be6
[Model] Support IQuestCoder model (#31575)
yxing-bj Jan 8, 2026
eaba8ec
[Bugfix]: Fix Step3ReasoningParser missing is_reasoning_end_streaming…
chaunceyjiang Jan 8, 2026
b8112c1
[Bugfix] Fix vllm serve failure with Nemotron Nano V3 FP8 (#31960)
danisereb Jan 8, 2026
49568d5
[Doc] Improve MM models LoRA notes (#31979)
jeejeelee Jan 8, 2026
a3d909a
[Misc] Tidy up some spec decode logic in GPUModelRunner (#31591)
njhill Jan 8, 2026
a563866
Fix ijson build for Power. (#31702)
npanpaliya Jan 8, 2026
83e1c76
[CI][ROCm] Fix NIXL tests on ROCm (#31728)
NickLucche Jan 8, 2026
7508243
[Model Runner V2] Simplify BlockTables with UVA (#31965)
WoosukKwon Jan 8, 2026
87e07a6
Revert "feat(moe): Add is_act_and_mul=False support for Triton MoE ke…
mgoin Jan 8, 2026
f16bfbe
[Documentation][torch.compile] Add documentation for torch.compile + …
Lucaskabela Jan 8, 2026
aa125ec
[Frontend] Improve error message (#31987)
DarkLight1337 Jan 8, 2026
e74698c
[Misc][Refactor] Add FusedMoERouter object (#30519)
bnellnm Jan 8, 2026
5d3b609
[Compressed-Tensors] Simplify NVFP4 Conditions, enable marlin support…
dsikka Jan 8, 2026
6cdf015
[Misc] Fix `Current vLLM config is not set.` warnings, assert to avoi…
LucasWilkinson Jan 8, 2026
d62cfe5
[MoE Refactoring][Bugfix]Wrap WNA16 Triton kernel into mk and change …
zyongye Jan 9, 2026
5825bbc
[Quantization] Deprecate Long Tail of Schemes (#31688)
robertgshaw2-redhat Jan 9, 2026
11cec29
[BugFix] Add spec-decode-incompatible request param validation (#31982)
njhill Jan 9, 2026
6ebe34d
[Feature] Add iteration level logging and enhance nvtx marker (#31193)
maxyanghu Jan 9, 2026
0fa8dd2
[Bugfix] Fix Typo from NVFP4 Refactor (#31977)
robertgshaw2-redhat Jan 9, 2026
a4ec0c5
[Frontend] Add MCP tool streaming support to Responses API (#31761)
daniel-salib Jan 9, 2026
8ff4a99
[Async][Feat] support apply penalty or bad_words for async + spec (#3…
izhuhaoran Jan 9, 2026
8413868
[Bugfix] Fix typo in FusedMoE LoRA reshape comment (#31992)
xyang16 Jan 9, 2026
e2d49ec
[Bugfix] missing tokens occur in harmony streaming (#30437)
Ri0S Jan 9, 2026
a1648c4
[ROCm][CI] Fix test_token_classification.py::test_bert_models (#31993)
divakar-amd Jan 9, 2026
7a05d2d
[CI] [ROCm] Fix `tests/entrypoints/test_grpc_server.py` on ROCm (#31970)
tjtanaa Jan 9, 2026
29ce482
[Cleanup] Remove obsolete spec decoding compatibility logic (#32003)
njhill Jan 9, 2026
707b240
[Bugfix] Fix FusedMoE LoRA w2_output_size (#31949)
xyang16 Jan 9, 2026
bde38c1
fix lora moe sharding when rank < max_lora_rank (#31994)
gnovack Jan 9, 2026
dc77cb7
[Bugfix] Fix Var Length Batched Padding in Granite Speech (#31906)
alex-jw-brooks Jan 9, 2026
0207328
[Bugfix] Fix OpenAPI schema test failures (#31921)
AndreasKaratzas Jan 9, 2026
c8ed39b
[Model] Reorganize pooling layers (#31973)
DarkLight1337 Jan 9, 2026
1a19e9c
[Bugfix][ROCm]Fix Qwen3-Next-80B-A3B-Thinking inference and optimize …
vllmellm Jan 9, 2026
e7b68f4
[Bugfix] Fix Triton FusedMoE LoRA (#30585)
xyang16 Jan 9, 2026
55212c1
fix: remove duplicate engine_id check in nixl_connector (#31948)
xbfs Jan 9, 2026
b474782
[Feature][Benchmarks] Custom dataset: read output length from dataset…
sducouedic Jan 9, 2026
e02706d
[ROCm][CI][V1] Fix `nixl_connector` test failure and achieve CUDA par…
AndreasKaratzas Jan 9, 2026
db07433
[Misc] Skip hashing kwargs if value is `None` (#32025)
ywang96 Jan 9, 2026
4505849
[ROCm][PD] add moriio kv connector. (#29304)
inkcherry Jan 9, 2026
bbf80ed
Fix type error (#31999)
Adolfo-Karim Jan 9, 2026
7cdf7e2
[Model] Remove redundant None check in DeepSeekOCR image input proces…
maang-h Jan 9, 2026
8e27663
[CPU] Add head sizes 80 and 112 with vec16 fallback (#31968)
R3hankhan123 Jan 9, 2026
34cd32f
[Perf][Kernel] Fused SiLU+Mul+Quant kernel for NVFP4 cutlass_moe (#31…
mgoin Jan 9, 2026
2d0c5b6
[Doc] Remove hardcoded Whisper in example openai translation client (…
Isotr0py Jan 9, 2026
ac9f933
Rename --exclude-log-deltas to --enable-log-deltas (#32020)
Catacomba Jan 9, 2026
08d954f
[Doc] Add developer guide for CustomOp (#30886)
shen-shanshan Jan 9, 2026
d5ec6c0
[UX] Add vLLM model inspection view (#29450)
mgoin Jan 9, 2026
cd4a95e
[Feat][Core] Support multiple KV cache groups in Hybrid KV Coordinato…
ivanium Jan 9, 2026
f32c629
[Frontend][gpt-oss] Allow system message to overwrite model identity …
qandrew Jan 9, 2026
28ae32a
[Refactor] Remove numpy split in async scheduling (#32034)
yewentao256 Jan 9, 2026
308feab
[Perf] Optimize cutlass moe problem size calculation, 5.3% E2E Throug…
yewentao256 Jan 9, 2026
657e9c0
[Fix] Introduce audio channels spec (#31595)
jeremyteboul Jan 9, 2026
a4d5d66
Add unpermute-aware fused MoE path and small-batch fallback (#29354)
RunkaiTao Jan 9, 2026
f9e2a75
[fix] add cutedsl to global sf (#32001)
jiahanc Jan 9, 2026
0a0aa07
[Quant] Make static quant support all group shapes (#30833)
LucasWilkinson Jan 9, 2026
1f8b7c5
[responsesAPI] fix incomplete_messages for simple/parsable context (#…
qandrew Jan 9, 2026
2612ba9
[1/N][Attention] Restructure attention: move files (#31916)
MatthewBonanni Jan 9, 2026
9457812
[NIXL] refine decoder side post process for heterogeneous BlockSize a…
xuechendi Jan 9, 2026
97ba96f
[perf][async] support non cpu sync get logprob tensors for spec (#31336)
izhuhaoran Jan 9, 2026
3adffd5
[Misc] Enable async scheduling by default with spec decoding (#31998)
njhill Jan 9, 2026
aaf4b70
[Misc][BE] Type coverage for vllm/compilation [2/3] (#31744)
Lucaskabela Jan 9, 2026
0308901
[2/N][Attention] Fix pre-commit errors (#32052)
MatthewBonanni Jan 10, 2026
1963245
[Core] Use weights_only=True with torch.load (#32045)
russellb Jan 10, 2026
e18464a
[Perf] Optimize async scheduling placeholder using empty (#32056)
yewentao256 Jan 10, 2026
ac0675f
[CI] Allow Deprecated Quantization For LM Eval Tests (#32065)
micah-wil Jan 10, 2026
4dc0d60
[Bugfix] Narrow broad exceptions in compilation backends (#31616)
c0de128 Jan 10, 2026
abd9224
resolve pydantic error in startup benchmark (#31348)
andyxning Jan 10, 2026
ea6d067
[Misc][LLaMa4] Compile LLaMa Vision Encoder (#30709)
Lucaskabela Jan 10, 2026
e45946b
feature/issac 0.2 (#31550)
AkshatSh Jan 10, 2026
80fead8
Fuse RoPE and MLA KV-cache write (#25774)
PatrykSaffer Jan 10, 2026
c60578d
[Bugfix][Hardware][AMD] Use dynamic WARP_SIZE in sampler vectorized_p…
c0de128 Jan 10, 2026
52d4282
[Core] Refactor ColumnParallelLinear: remove unused parameter and opt…
maang-h Jan 10, 2026
583a90e
[Refactor] Separate sequence and token pooling types (#32026)
DarkLight1337 Jan 10, 2026
0c96148
Update modelopt KV cache quantization resolution to new scheme (#31895)
roikoren755 Jan 10, 2026
d83becd
[ROCm][CI] Fix flaky `test_function_calling_with_stream` and reduce s…
AndreasKaratzas Jan 10, 2026
da6709c
[Misc] Delay deprecation of CommonAttentionMetadata properties (#32074)
LucasWilkinson Jan 10, 2026
a01a1c0
[Bugfix] fix encoder cache leak of waiting requests in scheduler to s…
frelam Jan 10, 2026
5f2385a
[Benchmark][1/2] Generalize SLA criterion validation from binary flag…
DarkLight1337 Jan 10, 2026
14fc7a6
[Bugfix] fix offline chat output prompt (#32076)
andyxning Jan 10, 2026
07286ec
[Bugfix] Fix integer overflow in Gemma3n audio processing (#31657)
jeremyteboul Jan 10, 2026
e6c6f2c
[Quant] Support MXFP4 W4A16 for compressed-tensors dense models (#31926)
mgoin Jan 10, 2026
b8bf5c4
[Kernel] Optimize Sliding Window Attention in 3D Triton Kernel (#31984)
jvlunteren Jan 10, 2026
543c23b
[LoRA][Perf] Improve FusedMoE LoRA performance for small rank (#32019)
xyang16 Jan 10, 2026
d1fd802
fused_moe_kernel - cast accumulator after applying router weights (#3…
gnovack Jan 10, 2026
0285997
[BugFix] scheduler: Fix resuming of preempted requests after async lo…
orozery Jan 10, 2026
1c46dea
Revert "[Kernels][FI] Skip trtllm attention when num_kv_heads=1 (#308…
shyeh25 Jan 10, 2026
6ea001c
[Bugfix][Quantization] Ensure input contiguity in per_token_quant_int…
Flink-ddd Jan 10, 2026
e15a5ff
[MISC] Add strict contiguity check for FlashInfer attention tensors (…
vadiklyutiy Jan 10, 2026
8020a60
[Bugfix] Fix Qwen3-VL-Reranker model loading for sequence classificat…
ricky-chaoju Jan 10, 2026
2a4dbe2
[BugFix] Wait for compute before offloading KV to CPU (#31341)
orozery Jan 10, 2026
ef96fa3
[Benchmark][2/2] Use spline interpolation to tune SLA variables (#32095)
DarkLight1337 Jan 11, 2026
0dd6363
[MTP][GLM][Bugfix] Fixed .weight_scale loading logic that dropped MTP…
andyl98 Jan 11, 2026
46eb30f
make assume_32_bit_indexing configurable (#32044)
laithsakka Jan 11, 2026
9103ed1
[CPU][BugFix] Disable AOT Compile for CPU (#32037)
fadara01 Jan 11, 2026
bde57ab
[Hardware][AMD][CI][Bugfix] Fix AMD Quantization test group (#31713)
mawong-amd Jan 11, 2026
4c16ba6
[KVConnector] OffloadingConnector: Fix bug in handling of preemptions…
orozery Jan 11, 2026
cee7436
[Misc] Make `scipy` as optional audio/benchmark dependency (#32096)
Isotr0py Jan 11, 2026
a374532
[CI/Build] Separate out flaky responses API tests (#32110)
DarkLight1337 Jan 11, 2026
d70249e
[Misc] fix this log format not space (#32112)
lengrongfu Jan 11, 2026
a34abc4
[FixBug] Improve exception string in `tensorizer.py` (#31680)
maang-h Jan 11, 2026
d74132c
fix offline inference chat response prompt (#32088)
andyxning Jan 11, 2026
3df619a
[CI] fix `test_concat_and_cache_mla_rope_fused` (#32117)
ZJY0516 Jan 11, 2026
19504ac
[Model Runner V2] Skip building deprecated fields in attn metadata (#…
WoosukKwon Jan 11, 2026
025a32f
[Model Runner V2] Remove async barrier (#32083)
WoosukKwon Jan 12, 2026
9101dc7
[Model] Avoid hardcoding pooling type (#32119)
DarkLight1337 Jan 12, 2026
60446cd
[Model] Improve multimodal pooling examples (#32085)
noooop Jan 12, 2026
600aaab
[Model] Remove incorrect `SupportsPP` from MTP models (#32150)
DarkLight1337 Jan 12, 2026
22970c1
[Misc] Disable default `--ready-check-timeout-sec` extra call in vllm…
NickLucche Jan 12, 2026
5e034f2
[cpu][bench] Add Fused MoE Micro Benchmark for CPU Backend (#32092)
andikarachman Jan 12, 2026
d7b2e57
[Frontend] Fix Flaky MCP Streaming Test (#32153)
daniel-salib Jan 12, 2026
899541b
[doc] fix broken links (#32158)
minimAluminiumalism Jan 12, 2026
05e8981
[Doc] Improve LoRA docs (#32159)
jeejeelee Jan 12, 2026
a5f89ae
[Doc] Add documentation for offline API docs feature (#32134)
ricky-chaoju Jan 12, 2026
9dbe1fe
[Bugfix] Fix missing scale passing for encoder Triton Attention imple…
Isotr0py Jan 12, 2026
0565f1f
[P/D] Refactor mooncake connector sender thread using async coroutine…
dtcccc Jan 12, 2026
49e6b86
[Feature] Support recording expert indices for rollout router replay …
xhx1022 Jan 12, 2026
9cddbdb
OffloadingConnector: Add cpu_bytes_to_use configuration (#24498)
orozery Jan 12, 2026
e68b0da
doc: Update model name for Qwen3-Coder in documentation (#32185)
andyzhangx Jan 12, 2026
0346396
[ROCm] [Bugfix] Fix order of mori build in Dockerfile.rocm_base (#32179)
tjtanaa Jan 12, 2026
95e53d9
doc: Update model references in supported_models.md (#32188)
andyzhangx Jan 12, 2026
63ed240
Add K-EXAONE-236B-A23B (#31621)
lkm2835 Jan 12, 2026
6bc9c84
[MODEL] New model support for kakaocorp/kanana-1.5-v-3b-instruct (#29…
kakao-steve-ai Jan 12, 2026
3f72639
[FIX] Add NO_MUL activation support for modular kernel path (#31528)
danielafrimi Jan 12, 2026
8863c2b
[Model] Standardize pooling heads (#32148)
DarkLight1337 Jan 12, 2026
8fb2c13
[Bugfix] Fix stale SSM state for new Mamba requests scheduled as deco…
Josephasafg Jan 12, 2026
5b68107
[Misc][PD] Fix `get_attn_backend` usage in transfer connectors (#31988)
NickLucche Jan 12, 2026
7c0d3c5
[Benchmark] Share data between SLA runs (#32184)
DarkLight1337 Jan 12, 2026
20228cb
[3/N][Attention] Move AttentionMetadata-related code from utils.py to…
MatthewBonanni Jan 12, 2026
3d962d7
[BugFix] fix FusedMoE.make_expert_params_mapping in EXAONE-MoE (#32196)
lkm2835 Jan 12, 2026
1eb61ab
[Refactor] EPLB rebalance algo to NumPy (#30697)
ilmarkov Jan 12, 2026
16abe6b
[Misc] Set default torch num threads for input processing (#31879)
ywang96 Jan 12, 2026
2be765b
[BugFix] scheduler: Fix ordering preserving of skipped requests (#32173)
orozery Jan 12, 2026
08e8e99
[Misc] Change log level for batch queue log (#32192)
NickLucche Jan 12, 2026
ad8818b
[Misc][BE] Type coverage for vllm/compilation [3/3] (#31748)
Lucaskabela Jan 12, 2026
ca81811
[Model Runner V2] Support logit_bias, allowed_token_ids, min_tokens (…
WoosukKwon Jan 12, 2026
f8bd839
[NIXL][Bugfix] Failure logging overhaul + early metadata free on fail…
NickLucche Jan 12, 2026
9f430c9
[BUGFIX] Add missed remaping of the names of fp8 kv-scale (#32199)
vadiklyutiy Jan 12, 2026
dec2868
[Model Runner V2] Minor refactor for logit_bias (#32209)
WoosukKwon Jan 12, 2026
0a7dd23
[Model Runner V2] Add support for M-RoPE (#32143)
WoosukKwon Jan 12, 2026
629584b
[Kernel][MoE] fix computation order of MoE weight multiplication and …
xuebwang-amd Jan 12, 2026
a28d9f4
[ROCm][CI] Handle pytest status code 5 when a shard isn't allocated a…
divakar-amd Jan 12, 2026
a307ac0
[responsesAPI] add unit test for optional function tool call id (#32036)
qandrew Jan 13, 2026
78d13ea
[Model] Handle `trust_remote_code` for transformers backend (#32194)
DarkLight1337 Jan 13, 2026
9273a42
[Misc] Allow enabling NCCL for DP sync when async scheduling (#32197)
njhill Jan 13, 2026
c6bb5b5
[BugFix] Fix engine crash caused by chat tools + response_format (#32…
njhill Jan 13, 2026
15b33ff
[Misc] improve warning/assert messages (#32226)
cjackal Jan 13, 2026
60b77e1
[Frontend] Add `reasoning_effort` to `OpenAIServing._preprocess_chat(…
sanghoon-yn Jan 13, 2026
f243abc
Fix various typos found in `docs` (#32212)
potatosalad Jan 13, 2026
2a719e0
[Perf] Optimize requests abort (#32211)
yewentao256 Jan 13, 2026
11b6af5
[ROCm][Bugfix] Fix Mamba batched decode producing incorrect output (#…
AndreasKaratzas Jan 13, 2026
0aa8c40
[Bugfix] Replace `PoolingParams.normalize` with `use_activation` (#32…
DarkLight1337 Jan 13, 2026
2c24bc6
[BugFix] [KVConnector] Fix KV events for LMCache connector (#32169)
hickeyma Jan 13, 2026
1b57275
[Bugfix][ROCm][performance] Resolve the performance regression issue …
vllmellm Jan 14, 2026
b622497
[ROCM] Add ROCm image build to release pipeline (#31995)
dllehr-amd Jan 15, 2026
6ac0fcf
[ROCm][Bugfix] Disable hip sampler to fix deepseek's accuracy issue o…
ganyi1996ppo Jan 15, 2026
0e31fc7
[UX] Use kv_offloading_backend=native by default (#32421)
mgoin Jan 15, 2026
c2a37a3
Cherry pick [ROCm] [CI] [Release] Rocm wheel pipeline with sccache #3…
tjtanaa Jan 15, 2026
7f42dc2
[CI] Fix LM Eval Large Models (H100) (#32423)
MatthewBonanni Jan 16, 2026
09f4264
[Bugfix] Fix ROCm dockerfiles (#32447)
tjtanaa Jan 16, 2026
48b67ba
[Frontend] Standardize use of `create_error_response` (#32319)
DarkLight1337 Jan 14, 2026
b17039b
[CI] Implement uploading to PyPI and GitHub in the release pipeline, …
Harry-Chen Jan 17, 2026
d682094
[build] fix cu130 related release pipeline steps and publish as night…
Harry-Chen Jan 18, 2026
f46d576
[Misc] Replace urllib's `urlparse` with urllib3's `parse_url` (#32746)
Isotr0py Jan 22, 2026
2bd95d8
[Misc] Bump opencv-python dependecy version to 4.13 (#32668)
Isotr0py Jan 22, 2026
4dc11b0
[Bugfix] Fix Whisper/encoder-decoder GPU memory leak (#32789)
NickLucche Jan 22, 2026
d7de043
[CI] fix version comparsion and exclusion patterns in upload-release-…
Harry-Chen Jan 23, 2026
7db605b
feat: update to llm-d v0.5.0 spans patch
starpit Feb 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
24 changes: 24 additions & 0 deletions .buildkite/ci_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
name: vllm_ci
job_dirs:
- ".buildkite/test_areas"
- ".buildkite/image_build"
run_all_patterns:
- "docker/Dockerfile"
- "CMakeLists.txt"
- "requirements/common.txt"
- "requirements/cuda.txt"
- "requirements/build.txt"
- "requirements/test.txt"
- "setup.py"
- "csrc/"
- "cmake/"
run_all_exclude_patterns:
- "docker/Dockerfile."
- "csrc/cpu/"
- "csrc/rocm/"
- "cmake/hipify.py"
- "cmake/cpu_extension.cmake"
registries: public.ecr.aws/q9t5s3a7
repositories:
main: "vllm-ci-postmerge-repo"
premerge: "vllm-ci-test-repo"
46 changes: 0 additions & 46 deletions .buildkite/generate_index.py

This file was deleted.

56 changes: 56 additions & 0 deletions .buildkite/image_build/image_build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
#!/bin/bash
set -e

if [[ $# -lt 8 ]]; then
echo "Usage: $0 <registry> <repo> <commit> <branch> <vllm_use_precompiled> <vllm_merge_base_commit> <cache_from> <cache_to>"
exit 1
fi

REGISTRY=$1
REPO=$2
BUILDKITE_COMMIT=$3
BRANCH=$4
VLLM_USE_PRECOMPILED=$5
VLLM_MERGE_BASE_COMMIT=$6
CACHE_FROM=$7
CACHE_TO=$8

# authenticate with AWS ECR
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin $REGISTRY
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 936637512419.dkr.ecr.us-east-1.amazonaws.com

# docker buildx
docker buildx create --name vllm-builder --driver docker-container --use
docker buildx inspect --bootstrap
docker buildx ls

# skip build if image already exists
if [[ -z $(docker manifest inspect $REGISTRY/$REPO:$BUILDKITE_COMMIT) ]]; then
echo "Image not found, proceeding with build..."
else
echo "Image found"
exit 0
fi

if [[ "${VLLM_USE_PRECOMPILED:-0}" == "1" ]]; then
merge_base_commit_build_args="--build-arg VLLM_MERGE_BASE_COMMIT=${VLLM_MERGE_BASE_COMMIT}"
else
merge_base_commit_build_args=""
fi

# build
docker buildx build --file docker/Dockerfile \
--build-arg max_jobs=16 \
--build-arg buildkite_commit=$BUILDKITE_COMMIT \
--build-arg USE_SCCACHE=1 \
--build-arg TORCH_CUDA_ARCH_LIST="8.0 8.9 9.0 10.0" \
--build-arg FI_TORCH_CUDA_ARCH_LIST="8.0 8.9 9.0a 10.0a" \
--build-arg VLLM_USE_PRECOMPILED="${VLLM_USE_PRECOMPILED:-0}" \
${merge_base_commit_build_args} \
--cache-from type=registry,ref=${CACHE_FROM},mode=max \
--cache-to type=registry,ref=${CACHE_TO},mode=max \
--tag ${REGISTRY}/${REPO}:${BUILDKITE_COMMIT} \
$( [[ "${BRANCH}" == "main" ]] && echo "--tag ${REGISTRY}/${REPO}:latest" ) \
--push \
--target test \
--progress plain .
57 changes: 57 additions & 0 deletions .buildkite/image_build/image_build.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
group: Abuild
steps:
- label: ":docker: Build image"
key: image-build
depends_on: []
commands:
- .buildkite/image_build/image_build.sh $REGISTRY $REPO $BUILDKITE_COMMIT $BRANCH $VLLM_USE_PRECOMPILED $VLLM_MERGE_BASE_COMMIT $CACHE_FROM $CACHE_TO
retry:
automatic:
- exit_status: -1 # Agent was lost
limit: 2
- exit_status: -10 # Agent was lost
limit: 2

- label: ":docker: Build CPU image"
key: image-build-cpu
depends_on: []
commands:
- .buildkite/image_build/image_build_cpu.sh $REGISTRY $REPO $BUILDKITE_COMMIT
env:
DOCKER_BUILDKIT: "1"
retry:
automatic:
- exit_status: -1 # Agent was lost
limit: 2
- exit_status: -10 # Agent was lost
limit: 2

- label: ":docker: Build HPU image"
soft_fail: true
depends_on: []
key: image-build-hpu
commands:
- .buildkite/image_build/image_build_hpu.sh $REGISTRY $REPO $BUILDKITE_COMMIT
env:
DOCKER_BUILDKIT: "1"
retry:
automatic:
- exit_status: -1 # Agent was lost
limit: 2
- exit_status: -10 # Agent was lost
limit: 2

- label: ":docker: Build CPU arm64 image"
key: cpu-arm64-image-build
depends_on: []
optional: true
commands:
- .buildkite/image_build/image_build_cpu_arm64.sh $REGISTRY $REPO $BUILDKITE_COMMIT
env:
DOCKER_BUILDKIT: "1"
retry:
automatic:
- exit_status: -1 # Agent was lost
limit: 2
- exit_status: -10 # Agent was lost
limit: 2
36 changes: 36 additions & 0 deletions .buildkite/image_build/image_build_cpu.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/bin/bash
set -e

if [[ $# -lt 3 ]]; then
echo "Usage: $0 <registry> <repo> <commit>"
exit 1
fi

REGISTRY=$1
REPO=$2
BUILDKITE_COMMIT=$3

# authenticate with AWS ECR
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin $REGISTRY

# skip build if image already exists
if [[ -z $(docker manifest inspect $REGISTRY/$REPO:$BUILDKITE_COMMIT-cpu) ]]; then
echo "Image not found, proceeding with build..."
else
echo "Image found"
exit 0
fi

# build
docker build --file docker/Dockerfile.cpu \
--build-arg max_jobs=16 \
--build-arg buildkite_commit=$BUILDKITE_COMMIT \
--build-arg VLLM_CPU_AVX512BF16=true \
--build-arg VLLM_CPU_AVX512VNNI=true \
--build-arg VLLM_CPU_AMXBF16=true \
--tag $REGISTRY/$REPO:$BUILDKITE_COMMIT-cpu \
--target vllm-test \
--progress plain .

# push
docker push $REGISTRY/$REPO:$BUILDKITE_COMMIT-cpu
33 changes: 33 additions & 0 deletions .buildkite/image_build/image_build_cpu_arm64.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#!/bin/bash
set -e

if [[ $# -lt 3 ]]; then
echo "Usage: $0 <registry> <repo> <commit>"
exit 1
fi

REGISTRY=$1
REPO=$2
BUILDKITE_COMMIT=$3

# authenticate with AWS ECR
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin $REGISTRY

# skip build if image already exists
if [[ -z $(docker manifest inspect $REGISTRY/$REPO:$BUILDKITE_COMMIT-cpu) ]]; then
echo "Image not found, proceeding with build..."
else
echo "Image found"
exit 0
fi

# build
docker build --file docker/Dockerfile.cpu \
--build-arg max_jobs=16 \
--build-arg buildkite_commit=$BUILDKITE_COMMIT \
--tag $REGISTRY/$REPO:$BUILDKITE_COMMIT-cpu \
--target vllm-test \
--progress plain .

# push
docker push $REGISTRY/$REPO:$BUILDKITE_COMMIT-cpu
34 changes: 34 additions & 0 deletions .buildkite/image_build/image_build_hpu.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#!/bin/bash
set -e

if [[ $# -lt 3 ]]; then
echo "Usage: $0 <registry> <repo> <commit>"
exit 1
fi

REGISTRY=$1
REPO=$2
BUILDKITE_COMMIT=$3

# authenticate with AWS ECR
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin $REGISTRY

# skip build if image already exists
if [[ -z $(docker manifest inspect $REGISTRY/$REPO:$BUILDKITE_COMMIT-hpu) ]]; then
echo "Image not found, proceeding with build..."
else
echo "Image found"
exit 0
fi

# build
docker build \
--file tests/pytorch_ci_hud_benchmark/Dockerfile.hpu \
--build-arg max_jobs=16 \
--build-arg buildkite_commit=$BUILDKITE_COMMIT \
--tag $REGISTRY/$REPO:$BUILDKITE_COMMIT-hpu \
--progress plain \
https://github.com/vllm-project/vllm-gaudi.git

# push
docker push $REGISTRY/$REPO:$BUILDKITE_COMMIT-hpu
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ tasks:
value: 0.80
limit: 250 # will run on 250 * 14 subjects = 3500 samples
num_fewshot: 5
rtol: 0.05
1 change: 1 addition & 0 deletions .buildkite/lm-eval-harness/configs/models-large-rocm.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Meta-Llama-4-Maverick-17B-128E-Instruct-FP8.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# We can use this script to compute baseline accuracy on chartqa for vllm.
#
# Make sure you have lm-eval-harness installed:
# pip install lm-eval==0.4.9
# pip install "lm-eval[api]>=0.4.9.2"

usage() {
echo``
Expand Down
2 changes: 1 addition & 1 deletion .buildkite/lm-eval-harness/run-lm-eval-gsm-hf-baseline.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# We can use this script to compute baseline accuracy on GSM for transformers.
#
# Make sure you have lm-eval-harness installed:
# pip install git+https://github.com/EleutherAI/lm-evaluation-harness.git@206b7722158f58c35b7ffcd53b035fdbdda5126d#egg=lm-eval[api]
# pip install "lm-eval[api]>=0.4.9.2"

usage() {
echo``
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# We use this for fp8, which HF does not support.
#
# Make sure you have lm-eval-harness installed:
# pip install git+https://github.com/EleutherAI/lm-evaluation-harness.git@206b7722158f58c35b7ffcd53b035fdbdda5126d#egg=lm-eval[api]
# pip install "lm-eval[api]>=0.4.9.2"

usage() {
echo``
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# We use this for fp8, which HF does not support.
#
# Make sure you have lm-eval-harness installed:
# pip install git+https://github.com/EleutherAI/lm-evaluation-harness.git@206b7722158f58c35b7ffcd53b035fdbdda5126d#egg=lm-eval[api]
# pip install "lm-eval[api]>=0.4.9.2"

usage() {
echo``
Expand Down
Loading