Skip to content

[XPU] Upgrade NIXL to remove CUDA dependency#1

Open
zhenwei-intel wants to merge 3 commits into
mainfrom
add_nixl_install
Open

[XPU] Upgrade NIXL to remove CUDA dependency#1
zhenwei-intel wants to merge 3 commits into
mainfrom
add_nixl_install

Conversation

@zhenwei-intel
Copy link
Copy Markdown
Owner

@zhenwei-intel zhenwei-intel commented Oct 10, 2025

vllm-project#25959 With nixl>=0.5.1, libcuda.so will be missing in UCX package while UCX will still link to libcuda.so. => Which leads to non-cuda platform will not be able to use pre-built NIXL wheel for UCX backend.

Update Dockerfile.xpu to build and install NIXL from source code, eliminating the libcuda dependency for better compatibility with XPU environments.

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds the installation of nixl from source to the XPU Dockerfile. The changes are mostly correct, but I've identified a high-severity issue in docker/Dockerfile.xpu where a hardcoded path is used for LD_LIBRARY_PATH. This will make the Docker image difficult to maintain and prone to breaking when the Python version is updated. My review includes a detailed comment on how to address this.

Comment thread docker/Dockerfile.xpu

# install nixl from source code
RUN python3 /workspace/vllm/tools/install_nixl_from_source_ubuntu.py
ENV LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/lib/python3.12/dist-packages/.nixl.mesonpy.libs/plugins/"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The hardcoded Python version (python3.12) in the LD_LIBRARY_PATH makes this Dockerfile brittle. If the Python version is updated in this file (e.g., on line 20), this path will also need to be manually updated. Forgetting to do so will cause runtime errors because the nixl plugin will not be found.

To make this more robust, I recommend determining the path dynamically. A good way to achieve this is by using a shell-based ENTRYPOINT that computes the path and sets the environment variable before executing vllm serve.

For example, you could replace this ENV line and the existing ENTRYPOINT on line 77 with a dynamic entrypoint:

# (remove the ENV line for LD_LIBRARY_PATH)
...
ENTRYPOINT ["/bin/bash", "-c", "LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(python3 -c 'import site; print(site.getsitepackages()[0])')/.nixl.mesonpy.libs/plugins/ exec vllm serve \"$@\"", "vllm-serve"]

This approach avoids hardcoding the Python version and makes the Docker image more maintainable.

@zhenwei-intel zhenwei-intel changed the title add nixl install to dockerfile.xpu [XPU] Upgrade NIXL to remove CUDA dependency Oct 10, 2025
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Copy link
Copy Markdown
Owner Author

@zhenwei-intel zhenwei-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added two comments, pls take a look

pytest -v -s v1/structured_output
pytest -v -s v1/spec_decode --ignore=v1/spec_decode/test_max_len.py --ignore=v1/spec_decode/test_tree_attention.py
pytest -v -s v1/kv_connector/unit --ignore=v1/kv_connector/unit/test_multi_connector.py --ignore=v1/kv_connector/unit/test_nixl_connector.py --ignore=v1/kv_connector/unit/test_shared_storage_connector.py
pytest -v -s v1/test_metrics
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test review 1

Comment thread docker/Dockerfile.xpu
# install development dependencies (for testing)
RUN python3 -m pip install -e tests/vllm_test_utils

# install nixl from source code
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# install nixl from source code
# Install NIXL from source code

test2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant