Add ROCm / AMD Instinct MI300x support by subhajitdchow · Pull Request #17 · tanishqkumar/ssd

subhajitdchow · 2026-05-12T06:21:04Z

Summary

Adds AMD Instinct MI300x support via ROCm 7.2, PyTorch 2.9.1, FlashInfer v0.5.3+amd.2, and CK-backend flash-attn (pinned to 0f82fea).

NVIDIA path is unchanged — all new logic is gated on torch.version.hip.

What's included

setup_rocm.sh — one-command ROCm setup inside FlashInfer Docker container
ssd/layers/flash_attn_compat.py — SDPA fallback when native flash-attn unavailable
Dual attention backend (sgl_kernel on NVIDIA, flash_attn/SDPA on AMD)
Dual tree-decode backend via SSD_TREE_DECODE_BACKEND={flashinfer,sdpa}
Runtime-conditional FlashInfer plan() args and default CUDA arch
Qwen3 rope_scaling fix, return_dict=False for newer transformers

Validation

70B target + 1B draft, 5× MI300x: ~225.86 tok/s (4.32× AR, 1.64× SD) across alpaca / c4 / ultrafeedback / humaneval.

…lash_attn/SDPA)

…m CUDA_HOME

…ailure

Co-authored-by: Cursor <cursoragent@cursor.com>

Subhajitdchow/add governance docs

subhajitdchow and others added 20 commits May 12, 2026 06:15

Add flash-attn compatibility layer with 3-tier import fallback

45cde12

Add dual tree-decode backend selectable via SSD_TREE_DECODE_BACKEND

0e0329f

Add ROCm setup script and README section

4d03075

Update README.md

fea4224

Update README.md

4aaba04

Update README with placeholder for username

d32f940

Import torch in paths.py for GPU compatibility

5bdea65

Add dual attention backend support for NVIDIA (sgl_kernel) and AMD (f…

1fe3030

…lash_attn/SDPA)

Streamline README with unified CUDA and ROCm setup paths

4ca5819

Add dual attention backend for NVIDIA/AMD, streamline README, fix ROC…

3fc3567

…m CUDA_HOME

Fix dual backend: skip sgl_kernel import on ROCm to avoid CUDA init f…

a085d7c

…ailure

Fix setup and README

dbeda34

Fix setup and README for extra package download

d74bd40

Fix README for package installation and update qwen rope

2509067

chat format update

0970f0b

update readme

12a7b6b

update readme to suppress warnings during chat

1c03bff

Add repository contribution guidelines

b35c2dc

Co-authored-by: Cursor <cursoragent@cursor.com>

Add security reporting policy

76fc573

Co-authored-by: Cursor <cursoragent@cursor.com>

Merge pull request #1 from AMD-AGI/subhajitdchow/add-governance-docs

b4312e8

Subhajitdchow/add governance docs

xiaoshingshing2 mentioned this pull request Jun 9, 2026

AMD GPU Support Contribution for SSD #21

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ROCm / AMD Instinct MI300x support#17

Add ROCm / AMD Instinct MI300x support#17
subhajitdchow wants to merge 20 commits into
tanishqkumar:mainfrom
AMD-AGI:rocm-upstream

subhajitdchow commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

subhajitdchow commented May 12, 2026

Summary

What's included

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants