Skip to content

Add ROCm / AMD Instinct MI300x support#17

Open
subhajitdchow wants to merge 20 commits into
tanishqkumar:mainfrom
AMD-AGI:rocm-upstream
Open

Add ROCm / AMD Instinct MI300x support#17
subhajitdchow wants to merge 20 commits into
tanishqkumar:mainfrom
AMD-AGI:rocm-upstream

Conversation

@subhajitdchow

Copy link
Copy Markdown

Summary

Adds AMD Instinct MI300x support via ROCm 7.2, PyTorch 2.9.1, FlashInfer v0.5.3+amd.2, and CK-backend flash-attn (pinned to 0f82fea).

NVIDIA path is unchanged — all new logic is gated on torch.version.hip.

What's included

  • setup_rocm.sh — one-command ROCm setup inside FlashInfer Docker container
  • ssd/layers/flash_attn_compat.py — SDPA fallback when native flash-attn unavailable
  • Dual attention backend (sgl_kernel on NVIDIA, flash_attn/SDPA on AMD)
  • Dual tree-decode backend via SSD_TREE_DECODE_BACKEND={flashinfer,sdpa}
  • Runtime-conditional FlashInfer plan() args and default CUDA arch
  • Qwen3 rope_scaling fix, return_dict=False for newer transformers

Validation

70B target + 1B draft, 5× MI300x: ~225.86 tok/s (4.32× AR, 1.64× SD) across alpaca / c4 / ultrafeedback / humaneval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants