Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
155 commits
Select commit Hold shift + click to select a range
9511a3f
[Bugfix] Fix AttributeError in SMControlContextManager (#35338)
LucasWilkinson Feb 26, 2026
160424a
[Bugfix] Fix CUDA compatibility path setting for both datacenter and …
ehfd Feb 26, 2026
86c3b5a
[BugFix] Fix fp4 quant kernel on CUDA 12.8 (#35210)
LopezCastroRoberto Feb 26, 2026
2aa4140
openpangu-vl support video input (#34134)
hujiaxin0 Feb 26, 2026
71dfce6
[Kernel] Refactor FlashInfer allreduce for mnnvl backend (#34109)
hjjq Feb 26, 2026
13025e7
[Model Runner V2] Add coding style guide (#35325)
WoosukKwon Feb 26, 2026
4171ff6
[CPU][Feat] Enable KleidiAI INT8_W4A8 for all input dtypes (#34890)
fadara01 Feb 26, 2026
9d37941
[torch.compile] Sequence Parallelism threshold compile ranges (#28672)
jasonlizhengjian Feb 26, 2026
4a9c07a
[BugFix] anthropic/serving_messages: fix tool call arguments streamin…
dtrifiro Feb 26, 2026
186ea22
[Misc][Harmony] Move Responses API only harmony utils to responses/ha…
sfeng33 Feb 26, 2026
d3a51da
[Benchmark] Simplify SLA scan (#35306)
DarkLight1337 Feb 26, 2026
a07c4c5
[BugFix][XPU] Fix speculative decoding on Intel XPU due to bug with `…
ofirzaf Feb 26, 2026
9f9a675
[XPU][8/N] Fix kernel bugs in XPU LoRA and MOE LORA (#34115)
chaojun-zhang Feb 26, 2026
6042e66
[ROCm] Add extra step in config initialization to populate custom ops…
gshtras Feb 26, 2026
ade81f1
[Bugfix][Hardware][AMD] Gate FP4 ops on gfx950 to prevent MI300X cras…
c0de128 Feb 26, 2026
3827c8c
[Test] Add tests for n parameter in chat completions API (#35283)
KrxGu Feb 26, 2026
ab87f85
[Model] Ring 2.5 (#35102)
ZJY0516 Feb 26, 2026
02acd16
[Benchmarks] Plot benchmark timeline and requests statistics (#35220)
sducouedic Feb 26, 2026
e03ddcf
[Hardware][Powerpc]Enable prefix caching and chunked prefill for ppc6…
Akashcodes732 Feb 26, 2026
32693db
[Bugfix] [Qwen3.5]Fix Qwen3.5 FP8 quantization: tuple shard_id weight…
HZY-Wade Feb 26, 2026
5281713
[XPU] use fixed UMD version in dockerfile.xpu (#35392)
jikunshang Feb 26, 2026
0191444
Remove `bc-lint` (#35274)
hmellor Feb 26, 2026
c0615a2
[Bugfix] Fix Qwen2.5-Omni and Qwen3-Omni mixed-modality embed regress…
linyueqian Feb 26, 2026
c6ca515
[Bugfix] fix device_name for routing replay (#34336)
Li-Yongwen Feb 26, 2026
ec13e54
[Bugfix] Fix uint32 overflow in Mamba selective scan state pointer ar…
Josephasafg Feb 26, 2026
845ee34
[Misc] Standardize handling of `mm_processor_kwargs.size` (#35284)
DarkLight1337 Feb 26, 2026
7fea725
[Bug] Fix missing <think> tag after tool call in MiniMax 2.1 (#35352)
stingoChen Feb 26, 2026
111d869
[Model] Add nvidia/llama-nemotron-embed-vl-1b-v2 multimodal embedding…
jzakrzew Feb 26, 2026
05972ea
[Refactor] Remove dead or duplicate func utils or variables (#35318)
yewentao256 Feb 26, 2026
ec8ab9d
[ROCm] Add dynamic mxfp4 quantization for DeepSeek V2 projection laye…
dllehr-amd Feb 26, 2026
9e2cabd
[ROCm] Update the torch version in rocm_build.txt to use the official…
SageMoore Feb 26, 2026
f2ad952
[BugFix][kv_offload]: Fix kernel block size detection (#35125)
orozery Feb 26, 2026
ec8f943
Add GlmOcrConfig for GLM-OCR model type recognition (#34982)
hujia177 Feb 26, 2026
99c7892
[Perf] Optimize maxsim scores computation for pooling models, 13.9% E…
yewentao256 Feb 26, 2026
d940607
[Core] Support `min_tokens` with speculative decoding (#32642)
qianlihuang Feb 26, 2026
05970c7
[Refactor] Remove dead code for attention benchmark script (#35418)
yewentao256 Feb 26, 2026
a1f53ad
[BugFix] Align fused MoE-LoRA kernel config with actual weight shapes…
RunkaiTao Feb 26, 2026
5e58bdc
[Bugfix] Remove erroneous lower bound on LoRA vocab size constraint (…
LucasWilkinson Feb 26, 2026
b6d5a17
[Model Runner V2] Fix error-handling (#35063)
njhill Feb 26, 2026
c66aa48
[Model Runner V2] Add model states [1/N] (#35350)
WoosukKwon Feb 26, 2026
3d66502
[Model Runner V2] Prepare attn metadata in ModelState [2/N] (#35383)
WoosukKwon Feb 26, 2026
967572d
fix(reasoning): Qwen3ReasoningParser returns truncated output as reas…
stakeswky Feb 26, 2026
98217b0
[Performance] Extract KV cache update op from flashinfer forward (#35…
ElizaWszola Feb 26, 2026
832a780
Nemotron: use per-layer config in NemotronHMLPDecoderLayer for hetero…
danielafrimi Feb 26, 2026
d0105b8
add mixed precision support for modelopt (#35047)
sychen52 Feb 26, 2026
0f2f24c
[Bugfix] Fix MessageQueue connect_ip for cross-node data parallelism …
luccafong Feb 26, 2026
eb19955
[WideEP] Remove pplx all2all backend (#33724)
tlrmchlsmth Feb 26, 2026
31fb6f4
[Kernel][perf] optimize NCCL symm_mem vs custom_AR selection threshol…
pkousha Feb 26, 2026
01923ee
[ROCm][Quantization] GPT OSS Upstream MoE wmxfp4_afp8 with static sca…
maleksan85 Feb 26, 2026
6283021
[Bugfix] Fix KV Scale loading for MLA Models (#35430)
pavanimajety Feb 26, 2026
56a6371
[Update] Use FlashInfer fast_decode_plan directly instead of replicat…
askliar Feb 27, 2026
38c498b
[Performance] Cublas Bf16 Gate with Fp32 Output (#35121)
roikoren755 Feb 27, 2026
4fec53c
[CI] Actually run tests/kernels/quantization/test_block_fp8.py in CI …
mgoin Feb 27, 2026
d43048c
[Bugfix] Emit reasoning_part events in simple streaming path for Resp…
daniel-salib Feb 27, 2026
c29ee9c
[compile] Invalidate cache for cpu flags (#35119)
angelayi Feb 27, 2026
06be535
[Core]Extract is_last_rank in Ray for tpu to override (#33012)
Chenyaaang Feb 27, 2026
cabdaa7
[Misc] Move `GPUModelRunner.prepare_kernel_block_sizes` to utils (#35…
NickLucche Feb 27, 2026
1e5ad9b
[Bugfix] Fix Qwen3NextForCausalLM packed_modules_mapping (#35413)
jeejeelee Feb 27, 2026
a532c83
use 'max_active_experts' for moe lora input size (#33197)
gnovack Feb 27, 2026
062b789
[Bug] Fix outdated links in source code (#35314)
yewentao256 Feb 27, 2026
1a8c716
[BugFix] Repo utils debug print patch (#35434)
pi314ever Feb 27, 2026
487e5c5
[Bugfix] disable allreduce_rms_fusion by default when pp size > 1 (#3…
ZJY0516 Feb 27, 2026
516cf26
[Bug] correct out dtype of rms_norm_gated native path (#35369)
zufangzhu Feb 27, 2026
a572baf
[Model Performance] Add Qwen3MoE tuned MoE configs for H200 (#35457)
chengyinie Feb 27, 2026
07bdabe
[Bugfix] Use 'sum' reduction instead of 'avg' in Async TP reduce-scat…
wangxingran222 Feb 27, 2026
b66a746
[Bugfix] Replace assert with ValueError for response_format validatio…
umut-polat Feb 27, 2026
9c3fe99
Flashinfer cuDNN backend for Qwen3 VL ViT attention (#34580)
maxyanghu Feb 27, 2026
6467b63
[Bugfix] Add missing activation attr to RMSNormGated (#35423)
Tib-Gridello Feb 27, 2026
66c1751
[compile] Cleanup: Remove unnecessary +rms_norm forcing for sequence …
jasonlizhengjian Feb 27, 2026
fbe3f01
Revert "Add GlmOcrConfig for GLM-OCR model type recognition" (#35512)
hmellor Feb 27, 2026
6d4f9d3
[Bugfix] Fix DCP + FA3 crash due to missing num_splits in _forward_wi…
haosdent Feb 27, 2026
e824937
[Bugfix] Fix check_interleaved_audio_video false positive for batched…
linyueqian Feb 27, 2026
9251ed5
[Bugfix] Handle case when kimi ends reasoning with a tool call (#33646)
koush Feb 27, 2026
5de98ab
Add @BoyuanFeng to CODEOWNERS (#35317)
BoyuanFeng Feb 27, 2026
876312f
[Core] Fix `gpu_worker.py` pre-commit errors (#35312)
njhill Feb 27, 2026
9098ce6
[Kernel] [Helion] [7/N] Use HOP to represent Helion Kernel call to en…
gmagogsfm Feb 27, 2026
905d76b
[Model] Add huggingface skt/A.X-K1 model (#32407)
fort726 Feb 27, 2026
1d897ff
[Misc] Fill in some v1 CODEOWNERS gaps (#35524)
njhill Feb 27, 2026
157722d
[perf] Use pinned memory for async H2D transfer in do_mamba_copy_bloc…
hl475 Feb 27, 2026
b602e4f
[Doc] Fix link to Llama chat template for usability (#35525)
hickeyma Feb 27, 2026
c8aca0c
Support parakeet as audio encoder for nemotron-nano-vl (#35100)
netanel-haber Feb 27, 2026
fd6de37
[BugFix] Fix 3D rope in transformers backend (#35097)
zucchini-nlp Feb 27, 2026
b1d9f53
[Model Runner V2] Warmup kernels (#35172)
njhill Feb 27, 2026
29b3547
[compile] Fix caching error over pytree slice node. (#35308)
zhxchen17 Feb 27, 2026
2decec9
[Transformers backend] Ignore MTP weights when num_nextn_predict_laye…
SteadfastAsArt Feb 27, 2026
234a65b
[Bugfix] Add monkeypatch to prevent race condition from writing (#35420)
Lucaskabela Feb 27, 2026
1d532f9
[DP] Only use DP padding when cudagraphs are actually used (#34102)
LucasWilkinson Feb 27, 2026
1f3dbd9
[Bugfix][Model] Fix gpt-oss batch invariance (#35404)
jzakrzew Feb 27, 2026
2ce6f3c
[Feat][RL][2/2] Native Weight Syncing API: IPC (#34171)
hao-aaron Feb 27, 2026
9fa6c68
[ROCm] Enabling encoder and encoder-decoder on ROCm and AITER unified…
gshtras Feb 27, 2026
e369198
[ROCm]: fix aiter rope functionalization (#35533)
Rohan138 Feb 27, 2026
a201ad7
[Refactor][Kernel] Add global helper to deduplicate vectorized memory…
LopezCastroRoberto Feb 28, 2026
5323672
[misc] cleanup one level of error stack when nixl fails to initialize…
youkaichao Feb 28, 2026
405f28d
[Misc] Clean up ResponsesRequest model validators (#35531)
umut-polat Feb 28, 2026
86ac7bc
[Model Runner V2] Support pooling models (#35120)
WoosukKwon Feb 28, 2026
1a014a0
[Model Runner V2] Move MM encoder to Model States [3/N] (#35564)
WoosukKwon Feb 28, 2026
d5b6f3b
[ROCm][Quantization] Add Composable Kernel (CK) backend support for M…
dllehr-amd Feb 28, 2026
0edf101
[ROCm] Add `stablelm` Head Size 80 To Supported Head Sizes For ROCM_A…
micah-wil Feb 28, 2026
fd68cd1
[Bugfix] Fixes for SLA finder (#35537)
DarkLight1337 Feb 28, 2026
2562e02
[MTP] Validate that MTP weights are actually loaded (#35548)
MatthewBonanni Feb 28, 2026
90805ff
[CI/Build] CPU release supports both of AVX2 and AVX512 (#35466)
majian4work Feb 28, 2026
538b288
[XPU][NIXL] support GPUDirect RDMA
zhenwei-intel Feb 25, 2026
cd1fbba
make UCX_VERSION as parameter
zhenwei-intel Feb 25, 2026
f3aaa47
make with-ze as parameter
zhenwei-intel Feb 25, 2026
2185c10
remove unused comments
zhenwei-intel Feb 26, 2026
ff5fcd2
update of install nixl and ucx
zhenwei-intel Feb 28, 2026
23743b9
update nixl memory type
zhenwei-intel Feb 28, 2026
dea2683
[1/N] Elastic EP Milestone 2 (#34861)
itayalroy Feb 28, 2026
7b346ba
[Bugfix] Propagate compilation_time from workers to main process for …
huydhn Feb 28, 2026
1d5ab5d
[Bugfix] Move chat completion response_format validation to Pydantic …
umut-polat Feb 28, 2026
b2d8b42
[EPLB] Enforce sync eplb for NCCL-based all2all backend (#35212)
ilmarkov Feb 28, 2026
88e8525
[ROCm][CI] Adding infiniband mappings for moriio tests (#35170)
AndreasKaratzas Feb 28, 2026
94029ff
[ROCm] Derive device capability from GCN arch string without CUDA ini…
AndreasKaratzas Feb 28, 2026
f5d1281
[ROCm][CI] Expose tests to AMD production CI and fix amdsmi heap corr…
AndreasKaratzas Feb 28, 2026
06254d4
[CI] add trainer_send_weights for MockWeightTransferEngine (#35589)
chaunceyjiang Feb 28, 2026
57c86c0
[Misc] Change logging level from info to debug for tool parser import…
chaunceyjiang Feb 28, 2026
24d6ea8
[Benchmark] Rename SLA Finder to Workload Explorer (#35586)
DarkLight1337 Feb 28, 2026
06733ec
update dockerfile
zhenwei-intel Feb 28, 2026
4292e3b
[Benchmark] Improve UX of sweep scripts (#35600)
DarkLight1337 Feb 28, 2026
1e69c04
[ROCm][CI] Parametrize vision score tests across attention backends w…
AndreasKaratzas Feb 28, 2026
7600642
Add padding support to wvSplitK solution for skinny GEMMs (#33762)
amd-hhashemi Feb 28, 2026
0892d1a
[Feature]Supports Anthropic Thinking Block (#33671)
mariohong128 Feb 28, 2026
8e75d88
add io_process_plugin for sparse embedding (#34214)
staugust Feb 28, 2026
7e08c22
[Feat] Add CUDA torch fallbacks for fp8_mqa_logits/fp8_paged_mqa_logi…
chaunceyjiang Feb 28, 2026
c68e69f
custom dataset img support base64 (#35280)
flutist Feb 28, 2026
63d7972
Fix Qwen3_5MTP packed_modules_mapping for gate_up_proj (#35581)
cwazai Feb 28, 2026
49b9ae3
[Fix] Avoid sending image input to other PP ranks (#35405)
emricksini-h Feb 28, 2026
1dafb29
[Benchmark] Avoid unnecessary video download in MMVU (#35618)
DarkLight1337 Feb 28, 2026
e113a30
[Deprecation] Deprecate code in 0.17 as scheduled (#35441)
yewentao256 Feb 28, 2026
e94b263
[Chore] Cleanup BNB utilization dead code (#35620)
Isotr0py Feb 28, 2026
95a395d
[Bugfix] Fix Anthropic API base64 image handling in Messages endpoint…
voipmonitor Feb 28, 2026
e3eb146
[Model Runner V2] Add ModelStateInterface [4/N] (#35621)
WoosukKwon Feb 28, 2026
795da8e
update dockerfile
zhenwei-intel Mar 1, 2026
3ecd0bf
Add TMA support to fused_moe_lora kernel (#32195)
gnovack Mar 1, 2026
afd089f
[Bugfix][Model] Fix Qwen3.5/Qwen3Next ignoring --dtype flag on older …
lailoo Mar 1, 2026
a9ec392
Fix typo: implictly -> implicitly in isaac.py docstring (#35646)
lin-shh Mar 1, 2026
87d319c
[AMD][CI] Support Triton attention with ExampleConnector (#34931)
rjrock Mar 1, 2026
da543d1
[Model Runner V2] Minor refactoring for EncoderRunner (#35628)
WoosukKwon Mar 1, 2026
bbf81f9
[Mamba1] - Kernel Level Chunk Alignment for Prefix Caching (#34798)
Josephasafg Mar 1, 2026
59d7af9
[MISC] Fixing a null reference by removing parallel_utils from mypy E…
taneem-ibrahim Mar 1, 2026
5a43550
fix(mxfp4): return is_monolithic=False when LoRA is enabled for Trito…
yoonsnowdev Mar 1, 2026
72f4d16
[Model Runner V2] Use block table apis for capture inputs (#35671)
WoosukKwon Mar 1, 2026
6290470
[Bugfix] Fix dtype mismatch in RMSNormGated.forward_native() during t…
haosdent Mar 1, 2026
e82fbee
[torch.compile] Undo the fast_moe_cold_start hack in torch>=2.11 (#35…
zou3519 Mar 1, 2026
57a96e2
Revert "[Bugfix] Disable TRTLLM attention with KV transfer enabled (#…
ZhanqiuHu Mar 1, 2026
8b5014d
[Attention] FA4 integration (#32974)
LucasWilkinson Mar 1, 2026
a60985b
Fix deprecated v1 config tests (#35327)
jcaip Mar 2, 2026
92f5d0f
[XPU] fix mxfp4 activation type (#35691)
jikunshang Mar 2, 2026
f26650d
[ROCm] add amd-quark package in requirements for rocm to use quantize…
hongxiayang Mar 2, 2026
c34963f
[ROCm][CI] Disable skinny GEMMs in language model standard tests to f…
AndreasKaratzas Mar 2, 2026
cb21972
[Kernel] Integrate SM100 MXFP8 blockscaled grouped MM and quant kerne…
EdalatiAli Mar 2, 2026
3fd1d4e
[Rocm][CI] Fix LM Eval Large Models (H100) test group (#34750)
charlifu Mar 2, 2026
ec27b36
[CI] Defining extended V1 e2e + engine tests (#35580)
AndreasKaratzas Mar 2, 2026
c212202
[Misc] Bound NIXL upper bound version (#35495)
NickLucche Mar 2, 2026
6079f20
Merge branch 'main' into xpu_pd_2026
jikunshang Mar 2, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions .buildkite/lm-eval-harness/configs/models-large-rocm.txt
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
Meta-Llama-4-Maverick-17B-128E-Instruct-FP8.yaml
Qwen3-235B-A22B-Instruct-2507-FP8.yaml
194 changes: 183 additions & 11 deletions .buildkite/scripts/hardware_ci/run-amd-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,26 @@
# Multi-node detection: Instead of matching on fragile group names, we detect
# multi-node jobs structurally by looking for the bracket command syntax
# "[node0_cmds] && [node1_cmds]" or via the NUM_NODES environment variable.
#
###############################################################################
# QUOTING / COMMAND PASSING
#
# Passing commands as positional arguments ($*) is fragile when the command
# string itself contains double quotes, e.g.:
#
# bash run-amd-test.sh "export FLAGS="value" && pytest -m "not slow""
#
# The outer shell resolves the nested quotes *before* this script runs, so
# the script receives mangled input it cannot fully recover.
#
# Preferred: pass commands via the VLLM_TEST_COMMANDS environment variable:
#
# export VLLM_TEST_COMMANDS='export FLAGS="value" && pytest -m "not slow"'
# bash run-amd-test.sh
#
# Single-quoted assignment preserves all inner double quotes verbatim.
# The $* path is kept for backward compatibility but callers should migrate.
###############################################################################
set -o pipefail

# Export Python path
Expand Down Expand Up @@ -80,25 +100,140 @@ is_multi_node() {
}

###############################################################################
# Pytest marker re-quoting
# Pytest marker/keyword re-quoting
#
# When commands are passed through Buildkite -> shell -> $* -> bash -c,
# quotes around pytest -m marker expressions get stripped:
# quotes around multi-word pytest -m/-k expressions get stripped:
# pytest -v -s -m 'not cpu_test' v1/core
# becomes:
# pytest -v -s -m not cpu_test v1/core
#
# pytest then interprets "cpu_test" as a file path, not part of the marker.
# This function detects unquoted multi-word marker expressions and re-quotes
# them so they survive the final bash -c expansion.
#
# This function detects unquoted expressions after -m/-k and re-quotes them
# by collecting tokens until a recognizable boundary is reached:
# - test path (contains '/')
# - test file (ends with '.py')
# - another pytest flag (--xxx or -x single-char flags)
# - command separator (&& || ; |)
# - environment variable assignment (FOO=bar)
#
# Single-word markers (e.g. -m cpu_test, -m hybrid_model) pass through
# unquoted since they have no spaces and work fine.
#
# Already-quoted expressions (containing literal single quotes) are passed
# through untouched to avoid double-quoting values injected by
# apply_rocm_test_overrides.
#
# NOTE: This ONLY fixes -m/-k flags. It cannot recover arbitrary inner
# double-quotes stripped by the calling shell (see header comment).
# Use VLLM_TEST_COMMANDS to avoid the problem entirely.
###############################################################################

re_quote_pytest_markers() {
local cmds="$1"
# Pattern: -m not <identifier> -> -m 'not <identifier>'
# Handles the common cases: 'not cpu_test', 'not slow_test', etc.
cmds=$(echo "$cmds" | sed -E "s/-m not ([a-zA-Z_][a-zA-Z0-9_]*)/-m 'not \1'/g")
echo "$cmds"
local input="$1"
local output=""
local collecting=false
local marker_buf=""

# Flatten newlines for consistent tokenization
local flat="${input//$'\n'/ }"

# Disable globbing to prevent *.py etc. from expanding during read -ra
local restore_glob
restore_glob="$(shopt -p -o noglob 2>/dev/null || true)"
set -o noglob
local -a words
read -ra words <<< "$flat"
eval "$restore_glob"

for word in "${words[@]}"; do
if $collecting; then
# If the token we're about to collect already contains a literal
# single quote, the expression was already quoted upstream.
# Flush and stop collecting.
if [[ "$word" == *"'"* ]]; then
if [[ -n "$marker_buf" ]]; then
# Should not normally happen (partial buf + quote), flush raw
output+="${marker_buf} "
marker_buf=""
fi
output+="${word} "
collecting=false
continue
fi

local is_boundary=false
case "$word" in
# Command separators
"&&"|"||"|";"|"|")
is_boundary=true ;;
# Long flags (--ignore, --shard-id, etc.)
--*)
is_boundary=true ;;
# Short flags (-v, -s, -x, etc.) but NOT negative marker tokens
# like "not" which don't start with "-". Also skip -k/-m which
# would start a new marker (handled below).
-[a-zA-Z])
is_boundary=true ;;
# Test path (contains /)
*/*)
is_boundary=true ;;
# Test file (ends with .py, possibly with ::method)
*.py|*.py::*)
is_boundary=true ;;
# Environment variable assignment preceding a command (FOO=bar)
*=*)
# Only treat as boundary if it looks like VAR=value, not
# pytest filter expressions like num_gpus=2 inside markers
if [[ "$word" =~ ^[A-Z_][A-Z0-9_]*= ]]; then
is_boundary=true
fi
;;
esac

if $is_boundary; then
# Flush the collected marker expression
if [[ "$marker_buf" == *" "* || "$marker_buf" == *"("* ]]; then
output+="'${marker_buf}' "
else
output+="${marker_buf} "
fi
collecting=false
marker_buf=""
# Check if this boundary word itself starts a new -m/-k
if [[ "$word" == "-m" || "$word" == "-k" ]]; then
output+="${word} "
collecting=true
else
output+="${word} "
fi
else
# Accumulate into marker buffer
if [[ -n "$marker_buf" ]]; then
marker_buf+=" ${word}"
else
marker_buf="${word}"
fi
fi
elif [[ "$word" == "-m" || "$word" == "-k" ]]; then
output+="${word} "
collecting=true
marker_buf=""
else
output+="${word} "
fi
done

# Flush any trailing marker expression (marker at end of command)
if $collecting && [[ -n "$marker_buf" ]]; then
if [[ "$marker_buf" == *" "* || "$marker_buf" == *"("* ]]; then
output+="'${marker_buf}'"
else
output+="${marker_buf}"
fi
fi

echo "${output% }"
}

###############################################################################
Expand Down Expand Up @@ -231,11 +366,35 @@ HF_CACHE="$(realpath ~)/huggingface"
mkdir -p "${HF_CACHE}"
HF_MOUNT="/root/.cache/huggingface"

commands="$*"
# ---- Command source selection ----
# Prefer VLLM_TEST_COMMANDS (preserves all inner quoting intact).
# Fall back to $* for backward compatibility, but warn that inner
# double-quotes will have been stripped by the calling shell.
if [[ -n "${VLLM_TEST_COMMANDS:-}" ]]; then
commands="${VLLM_TEST_COMMANDS}"
echo "Commands sourced from VLLM_TEST_COMMANDS (quoting preserved)"
else
commands="$*"
if [[ -z "$commands" ]]; then
echo "Error: No test commands provided." >&2
echo "Usage:" >&2
echo " Preferred: VLLM_TEST_COMMANDS='...' bash $0" >&2
echo " Legacy: bash $0 \"commands here\"" >&2
exit 1
fi
echo "Commands sourced from positional args (legacy mode)"
echo "WARNING: Inner double-quotes in the command string may have been"
echo " stripped by the calling shell. If you see syntax errors, switch to:"
echo " export VLLM_TEST_COMMANDS='your commands here'"
echo " bash $0"
fi

echo "Raw commands: $commands"

# Fix quoting before ROCm overrides (so overrides see correct structure)
commands=$(re_quote_pytest_markers "$commands")
echo "After re-quoting: $commands"

commands=$(apply_rocm_test_overrides "$commands")
echo "Final commands: $commands"

Expand All @@ -248,6 +407,18 @@ if [[ -z "$render_gid" ]]; then
exit 1
fi

# --- RDMA device passthrough (conditional) ---
# If the host has RDMA devices, pass them through so tests like
# test_moriio_connector can access ibverbs. On hosts without RDMA
# hardware the tests will gracefully skip via _rdma_available().
RDMA_FLAGS=""
if [ -d /dev/infiniband ]; then
echo "RDMA devices detected on host, enabling passthrough"
RDMA_FLAGS="--device /dev/infiniband --cap-add=IPC_LOCK"
else
echo "No RDMA devices found on host, RDMA tests will be skipped"
fi

# --- Route: multi-node vs single-node ---
if is_multi_node "$commands"; then
echo "--- Multi-node job detected"
Expand Down Expand Up @@ -295,6 +466,7 @@ else
echo "Render devices: $BUILDKITE_AGENT_META_DATA_RENDER_DEVICES"
docker run \
--device /dev/kfd $BUILDKITE_AGENT_META_DATA_RENDER_DEVICES \
$RDMA_FLAGS \
--network=host \
--shm-size=16gb \
--group-add "$render_gid" \
Expand Down
Loading