Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions docs/models/hardware_supported_models/xpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,41 @@
| -------- |
| [Intel® Arc™ Pro B-Series Graphics](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/arc/workstations/b-series/overview.html) |

## Current Gaps on Intel XPU

The following items are currently limited or unsupported on Intel XPU:

### Feature Gaps

- **CUDA graph mode** is not supported on Intel XPU yet ([tracking issue](https://github.com/vllm-project/vllm/issues/26970)).
- **Flash Attention with `float32`** falls back to Triton Attention on XPU.
- **`bfloat16` on Intel Arc A770** is blocked due to known accuracy issues (use `float16` instead).
- **XPU graph capture** has additional limits in multi-GPU communication scenarios.

### Quantization Gaps on Intel GPU

From the quantization hardware matrix in
[`docs/features/quantization/README.md`](../../features/quantization/README.md),
the following are not supported on Intel GPU:

- Marlin (GPTQ/AWQ/FP8/FP4)
- INT8 (W8A8)
- FP8 (W8A8)
- bitsandbytes
- DeepSpeedFP
- GGUF

## Model Support Scope

vLLM currently publishes a **validated model list** for Intel XPU (below), but
does not maintain an exhaustive "unsupported model" deny list.

For Intel XPU, treat a model as **not supported / not yet validated** when any of the following is true:

- The model architecture or checkpoint is **not listed** in the validated tables below.
- The model depends on a quantization method listed above as unsupported on Intel GPU.
- The model only works with unsupported XPU feature combinations.

## Recommended Models

### Text-only Language Models
Expand Down