Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .buildkite/scripts/hardware_ci/run-xpu-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,5 @@ docker run \
pytest -v -s v1/structured_output
pytest -v -s v1/spec_decode --ignore=v1/spec_decode/test_max_len.py --ignore=v1/spec_decode/test_tree_attention.py
pytest -v -s v1/kv_connector/unit --ignore=v1/kv_connector/unit/test_multi_connector.py --ignore=v1/kv_connector/unit/test_nixl_connector.py --ignore=v1/kv_connector/unit/test_shared_storage_connector.py
pytest -v -s v1/test_metrics
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test review 1

pytest -v -s v1/test_serial_utils.py
'
5 changes: 5 additions & 0 deletions docker/Dockerfile.xpu
Original file line number Diff line number Diff line change
Expand Up @@ -69,4 +69,9 @@ RUN --mount=type=cache,target=/root/.cache/pip \

# install development dependencies (for testing)
RUN python3 -m pip install -e tests/vllm_test_utils

# install nixl from source code
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# install nixl from source code
# Install NIXL from source code

test2

RUN python3 /workspace/vllm/tools/install_nixl_from_source_ubuntu.py
ENV LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/lib/python3.12/dist-packages/.nixl.mesonpy.libs/plugins/"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The hardcoded Python version (python3.12) in the LD_LIBRARY_PATH makes this Dockerfile brittle. If the Python version is updated in this file (e.g., on line 20), this path will also need to be manually updated. Forgetting to do so will cause runtime errors because the nixl plugin will not be found.

To make this more robust, I recommend determining the path dynamically. A good way to achieve this is by using a shell-based ENTRYPOINT that computes the path and sets the environment variable before executing vllm serve.

For example, you could replace this ENV line and the existing ENTRYPOINT on line 77 with a dynamic entrypoint:

# (remove the ENV line for LD_LIBRARY_PATH)
...
ENTRYPOINT ["/bin/bash", "-c", "LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(python3 -c 'import site; print(site.getsitepackages()[0])')/.nixl.mesonpy.libs/plugins/ exec vllm serve \"$@\"", "vllm-serve"]

This approach avoids hardcoding the Python version and makes the Docker image more maintainable.


ENTRYPOINT ["vllm", "serve"]
1 change: 0 additions & 1 deletion requirements/xpu.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ wheel
jinja2>=3.1.6
datasets # for benchmark scripts
numba == 0.61.2 # Required for N-gram speculative decoding
nixl==0.3.0 # for PD disaggregation
torch==2.8.0+xpu
torchaudio
torchvision
Expand Down
1 change: 1 addition & 0 deletions tools/install_nixl_from_source_ubuntu.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,7 @@ def build_and_install_prerequisites(args):
"--enable-devel-headers",
"--with-verbs",
"--enable-mt",
"--with-ze=no",
]
run_command(configure_command, cwd=ucx_source_path)
run_command(["make", "-j", str(os.cpu_count() or 1)], cwd=ucx_source_path)
Expand Down
15 changes: 8 additions & 7 deletions vllm/platforms/xpu.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,14 @@ def get_attn_backend_cls(
has_sink: bool,
use_sparse,
) -> str:
from vllm.v1.attention.backends.utils import set_kv_cache_layout

set_kv_cache_layout("NHD")
logger.info(
"Setting VLLM_KV_CACHE_LAYOUT to 'NHD' for XPU; "
"only NHD layout is supported by XPU attention kernels."
)

from vllm.attention.backends.registry import _Backend

if use_sparse:
Expand Down Expand Up @@ -190,13 +198,6 @@ def check_and_update_config(cls, vllm_config: VllmConfig) -> None:
vllm_config.scheduler_config.max_model_len,
DEFAULT_MAX_NUM_BATCHED_TOKENS,
)
from vllm.v1.attention.backends.utils import set_kv_cache_layout

set_kv_cache_layout("NHD")
logger.info(
"Setting VLLM_KV_CACHE_LAYOUT to 'NHD' for XPU; "
"only NHD layout is supported by XPU attention kernels."
)

@classmethod
def support_hybrid_kv_cache(cls) -> bool:
Expand Down