Skip to content

[XPU][NIXL] Add GPUDirect RDMA support for XPU#2

Open
zhenwei-intel wants to merge 155 commits into
mainfrom
xpu_pd_2026
Open

[XPU][NIXL] Add GPUDirect RDMA support for XPU#2
zhenwei-intel wants to merge 155 commits into
mainfrom
xpu_pd_2026

Conversation

@zhenwei-intel
Copy link
Copy Markdown
Owner

@zhenwei-intel zhenwei-intel commented Feb 25, 2026

Purpose

Add GPUDirect RDMA support for XPU in NIXL connector.

Requirements

Limitations:

Test Plan

Performance data of Llama3.3-70B int4 model with fp8 kvcache on 8xB60, ISL=1500, OSL=150
2P1D vs Non-PD under SLO TTFT<5s, ITL<100ms

  • Serve more requests: under SLO, 2P1D achieved a request throughput of 1.06, compared to 0.64 for the Non-PD — 1.65x improvement.
image

PD commands

prefill

export UCX_TLS=ib,rc,ze_copy

export ZE_AFFINITY_MASK=2,3
export model_name=ibnzterrell/Meta-Llama-3.3-70B-Instruct-AWQ-INT4
export tp_size=2


VLLM_USE_V1=1 VLLM_NIXL_SIDE_CHANNEL_HOST=localhost VLLM_NIXL_SIDE_CHANNEL_PORT=5577 VLLM_WORKER_MULTIPROC_METHOD=spawn VLLM_ENABLE_V1_MULTIPROCESSING=1 vllm serve $model_name -tp $tp_size --host localhost --port 7101 --seed 42 --enforce-eager --dtype float16 --gpu-memory-utilization 0.9 --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both","kv_buffer_device":"xpu"}' --max-model-len 8192 --block-size 64 --no-enable-prefix-caching --kv-cache-dtype fp8

prefill2

export UCX_TLS=ib,rc,ze_copy

export ZE_AFFINITY_MASK=4,5
export model_name=ibnzterrell/Meta-Llama-3.3-70B-Instruct-AWQ-INT4
export tp_size=2


VLLM_USE_V1=1 VLLM_NIXL_SIDE_CHANNEL_HOST=localhost VLLM_NIXL_SIDE_CHANNEL_PORT=5377 VLLM_WORKER_MULTIPROC_METHOD=spawn VLLM_ENABLE_V1_MULTIPROCESSING=1 vllm serve $model_name -tp $tp_size --host localhost --port 7102 --seed 42 --enforce-eager --dtype float16 --gpu-memory-utilization 0.9 --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both","kv_buffer_device":"xpu"}' --max-model-len 8192 --block-size 64 --no-enable-prefix-caching--kv-cache-dtype fp8

decode

export UCX_TLS=ib,rc,ze_copy

export ZE_AFFINITY_MASK=0,1
export model_name=ibnzterrell/Meta-Llama-3.3-70B-Instruct-AWQ-INT4
export tp_size=2


VLLM_USE_V1=1 VLLM_NIXL_SIDE_CHANNEL_HOST=localhost VLLM_NIXL_SIDE_CHANNEL_PORT=5177 VLLM_WORKER_MULTIPROC_METHOD=spawn VLLM_ENABLE_V1_MULTIPROCESSING=1 vllm serve $model_name -tp $tp_size --host localhost --port 7201 --seed 42 --enforce-eager --dtype float16 --gpu-memory-utilization 0.9 --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both","kv_buffer_device":"xpu"}' --max-model-len 8192 --block-size 64 --no-enable-prefix-caching --kv-cache-dtype fp8

proxy

python3 tests/v1/kv_connector/nixl_integration/toy_proxy_server.py --prefiller-hosts localhost  localhost --prefiller-ports 7101 7102 --decoder-host localhost --decoder-port 7201 --host localhost --port 7300

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for GPUDirect RDMA on XPU devices by integrating NIXL and UCX. Key changes include updating the UCX build process to enable Intel's oneAPI Level Zero (ZE) API, extending NIXL connector support for XPU KV buffer types, and implementing a critical environment variable setting to prevent potential UCX memory misdetection issues on XPU. These changes are essential for enabling efficient KV cache transfers on XPU platforms.

Comment thread vllm/platforms/xpu.py
Comment on lines +198 to +202
# In some cases, the internal memory type cache can misdetect GPU
# memory as host memory, also leading to invalid memory access.
# This cache can be disabled by setting UCX_MEMTYPE_CACHE=n.
# ref. https://openucx.readthedocs.io/en/master/faq.html
os.environ["UCX_MEMTYPE_CACHE"] = "n"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Setting UCX_MEMTYPE_CACHE=n is a critical change to prevent potential memory misdetection issues with UCX on XPU. This directly addresses a potential cause of invalid memory access, which could lead to crashes or data corruption. It's good that this is explicitly documented with a reference to the UCX FAQ.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds GPUDirect RDMA support for Intel XPU devices in the NIXL KV transfer connector. The changes enable direct memory access between XPU devices for more efficient KV cache transfers during distributed inference.

Changes:

  • Enable XPU device memory for NIXL KV transfers (previously only CPU memory was supported)
  • Configure UCX to disable memory type caching to avoid GPU memory misdetection
  • Update UCX build configuration to enable Intel Level Zero (ZE) support for XPU devices

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
vllm/platforms/xpu.py Adds UCX_MEMTYPE_CACHE environment variable configuration to prevent memory type misdetection
vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py Adds "xpu" to supported KV buffer devices for XPU platform and simplifies memory type selection logic
tools/install_nixl_from_source_ubuntu.py Updates UCX build to use specific commit and enable ZE support for XPU compatibility
Comments suppressed due to low confidence (1)

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py:956

  • There is a logic error in this code. After line 951, nixl_memory_type will always have a value (either from current_platform.get_nixl_memory_type() or from the ternary expression), so the condition on line 952 checking if nixl_memory_type is None will never be true. This makes the error handling at lines 952-956 unreachable code.

The check on line 952 appears to be a leftover from the previous code structure and should be removed.

        if nixl_memory_type is None:
            raise RuntimeError(
                f"{self.device_type} with {self.kv_buffer_device} kv_buffer "
                "is not supported."
            )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread vllm/platforms/xpu.py
Comment on lines +198 to +202
# In some cases, the internal memory type cache can misdetect GPU
# memory as host memory, also leading to invalid memory access.
# This cache can be disabled by setting UCX_MEMTYPE_CACHE=n.
# ref. https://openucx.readthedocs.io/en/master/faq.html
os.environ["UCX_MEMTYPE_CACHE"] = "n"
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UCX_MEMTYPE_CACHE environment variable is being set unconditionally for all XPU configurations, but this setting is specifically for GPUDirect RDMA support which is only relevant when KV transfer is enabled. This could have unintended side effects on configurations that don't use KV transfer.

Consider making this setting conditional on kv_transfer_config being enabled, similar to how other KV transfer related configurations are handled in this method (see line 184).

Copilot uses AI. Check for mistakes.
@@ -139,7 +139,7 @@ def build_and_install_prerequisites(args):
if not os.path.exists(UCX_DIR):
run_command(["git", "clone", UCX_REPO_URL, UCX_DIR])
ucx_source_path = os.path.abspath(UCX_DIR)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change from checking out a branch (v1.19.x) to a specific commit hash (e5d9887) lacks documentation explaining why this specific commit is needed. This makes it difficult for maintainers to understand:

  1. What version or features this commit corresponds to
  2. Why this specific commit was chosen over the v1.19.x branch
  3. Whether this commit includes important fixes or features for GPUDirect RDMA support

Consider adding a comment explaining what this commit represents and why it was chosen, or document this in the PR description.

Suggested change
ucx_source_path = os.path.abspath(UCX_DIR)
ucx_source_path = os.path.abspath(UCX_DIR)
# NOTE: We pin UCX to a specific, known-good commit instead of tracking
# a moving branch (e.g., v1.19.x). Commit e5d9887 corresponds to a UCX
# revision that has been validated with nixl for our GPUDirect RDMA
# use case. If you update this hash, please ensure the new commit has
# been tested with nixl and GPUDirect RDMA and update this comment.

Copilot uses AI. Check for mistakes.
Comment thread vllm/platforms/xpu.py
Comment on lines +198 to +202
# In some cases, the internal memory type cache can misdetect GPU
# memory as host memory, also leading to invalid memory access.
# This cache can be disabled by setting UCX_MEMTYPE_CACHE=n.
# ref. https://openucx.readthedocs.io/en/master/faq.html
os.environ["UCX_MEMTYPE_CACHE"] = "n"
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description is empty except for the template. This makes it difficult to understand:

  1. What problem this PR is solving
  2. How GPUDirect RDMA support was tested
  3. What the expected performance or functional improvements are
  4. What the changes to UCX configuration (ZE support, specific commit) are intended to accomplish

Please complete the PR description with the purpose, test plan, and test results as indicated by the template checklist.

Copilot uses AI. Check for mistakes.
@zhenwei-intel
Copy link
Copy Markdown
Owner Author

@copilot help review again

Copy link
Copy Markdown

Copilot AI commented Feb 25, 2026

@zhenwei-intel I've opened a new pull request, #3, to work on those changes. Once the pull request is ready, I'll request review from you.

@zhenwei-intel zhenwei-intel changed the title [XPU][NIXL] support GPUDirect RDMA [XPU][NIXL] Add GPUDirect RDMA support for XPU in NIXL connector Feb 25, 2026
@zhenwei-intel zhenwei-intel changed the title [XPU][NIXL] Add GPUDirect RDMA support for XPU in NIXL connector [XPU][NIXL] Add GPUDirect RDMA support for XPU Feb 25, 2026
LucasWilkinson and others added 20 commits February 25, 2026 18:01
…35338)

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
…consumer NVIDIA GPUs (vllm-project#33992)

Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com>
Signed-off-by: Andrew Mello <19512127+88plug@users.noreply.github.com>
Co-authored-by: 88plug <19512127+88plug@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Signed-off-by: hujiaxin <524446785@qq.com>
Signed-off-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com>
Co-authored-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
…t#34109)

Signed-off-by: hjjq <50634613+hjjq@users.noreply.github.com>
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
…ect#34890)

Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
…roject#28672)

Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com>
Signed-off-by: Jason Li <jasonlizhengjian@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
vllm-project#34887)

Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
…rmony.py (vllm-project#35339)

Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…IGC_ForceOCLSIMDWidth=16` (vllm-project#35298)

Signed-off-by: Ofir Zafrir <ofir.zafrir@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: chzhang <chaojun.zhang@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
… before compilation config init (vllm-project#34848)

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
…oject#35220)

Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
…4le (vllm-project#35081)

Signed-off-by: Akash kaothalkar <akash.kaothalkar@ibm.com>
Co-authored-by: Akash kaothalkar <akash.kaothalkar@ibm.com>
… loading (vllm-project#35289)

Signed-off-by: daowu.hzy <daowu.hzy@alibaba-inc.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
cwazai and others added 30 commits February 28, 2026 14:50
Signed-off-by: emricksini-h <emrick.birivoutin@hcompany.ai>
Co-authored-by: Roger Wang <hey@rogerw.io>
)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: gnovack <gnovack@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
…XCLUDE (vllm-project#35630)

Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
…n backend (vllm-project#35382)

Signed-off-by: Seungho Yoon <yoonsnowdev@gmail.com>
…orch.compile (vllm-project#35256)

Signed-off-by: haosdent <haosdent@gmail.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Jesse Cai <jessecai@fb.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
…d models (vllm-project#35658)

Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com>
Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com>
…ix non-determinism (vllm-project#35152)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…ls (vllm-project#34448)

Signed-off-by: EdalatiAli <aliedalati@cohere.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.