From f52d1f8f44655aa221dbc4ca16c04b2f624993c8 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sat, 14 Mar 2026 09:12:14 +0000 Subject: [PATCH 1/2] Initial plan From 67b43e1040a53f13e77fd8b6d8672e7c1f5d29dd Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sat, 14 Mar 2026 09:17:53 +0000 Subject: [PATCH 2/2] docs: clarify Intel XPU feature and model support gaps Co-authored-by: zhenwei-intel <109187816+zhenwei-intel@users.noreply.github.com> --- docs/models/hardware_supported_models/xpu.md | 35 ++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/docs/models/hardware_supported_models/xpu.md b/docs/models/hardware_supported_models/xpu.md index 2857d80a7698..40c56c874e18 100644 --- a/docs/models/hardware_supported_models/xpu.md +++ b/docs/models/hardware_supported_models/xpu.md @@ -6,6 +6,41 @@ | -------- | | [Intel® Arc™ Pro B-Series Graphics](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/arc/workstations/b-series/overview.html) | +## Current Gaps on Intel XPU + +The following items are currently limited or unsupported on Intel XPU: + +### Feature Gaps + +- **CUDA graph mode** is not supported on Intel XPU yet ([tracking issue](https://github.com/vllm-project/vllm/issues/26970)). +- **Flash Attention with `float32`** falls back to Triton Attention on XPU. +- **`bfloat16` on Intel Arc A770** is blocked due to known accuracy issues (use `float16` instead). +- **XPU graph capture** has additional limits in multi-GPU communication scenarios. + +### Quantization Gaps on Intel GPU + +From the quantization hardware matrix in +[`docs/features/quantization/README.md`](../../features/quantization/README.md), +the following are not supported on Intel GPU: + +- Marlin (GPTQ/AWQ/FP8/FP4) +- INT8 (W8A8) +- FP8 (W8A8) +- bitsandbytes +- DeepSpeedFP +- GGUF + +## Model Support Scope + +vLLM currently publishes a **validated model list** for Intel XPU (below), but +does not maintain an exhaustive "unsupported model" deny list. + +For Intel XPU, treat a model as **not supported / not yet validated** when any of the following is true: + +- The model architecture or checkpoint is **not listed** in the validated tables below. +- The model depends on a quantization method listed above as unsupported on Intel GPU. +- The model only works with unsupported XPU feature combinations. + ## Recommended Models ### Text-only Language Models