[DRAFT] [ROCM][AMD] Support qwen35-9B on MI300 gpus by benenzhu · Pull Request #28 · redai-infra/Relax

benenzhu · 2026-05-08T07:18:01Z

What

Why

How

Testing

pre-commit run --all-files passes
Tests pass (pytest tests/)
New tests added (if applicable)
Documentation updated (if applicable)

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update
Refactoring (no functional changes)
Performance improvement
CI/CD or build changes

Screenshots / Logs

* add first qwen3 * zz * Update amd/run_qwen3-4b.sh Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update amd/run_qwen3-4b.sh Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update amd/run-qwen3-4b-dapo-math-direct.sh Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update amd/run-qwen35-9b-dapo-math-direct.sh Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

# ⭐ Feature ## Add Qwen3.5-9B ROCm smoke path - Add one-command Qwen3.5-9B DAPO-Math smoke runner that restarts Ray from a clean state. - Align Qwen3.5 runtime flags with the validated Qwen3-4B ROCm path: TE auto attention, BSHD, and disabled Dynamo/JIT fuser. - Reduce the default Qwen3.5 run to a smaller smoke profile to avoid filling SGLang KV/logits memory during initial validation. Made-with: Cursor --- # 🐛 Bug Fix ## Install ROCm training dependencies in Dockerfile - Install ROCm TransformerEngine 2.4.0 wheels for Megatron TE specs. - Install flash-linear-attention for Qwen3.5 GatedDeltaNet initialization. --- # 📝 Documentation ## Record Qwen3.5 ROCm bring-up results - Document FLA import resolution, SGLang rollout device ordinal fix, and the full-profile rollout OOM finding. - Record that the smoke reached all service registration and step-0 training/logprob flow. --- # 🎨 Style ## Apply pre-commit formatting - Apply import ordering, doc formatting, script copyright, and formatting fixes from pre-commit.

Resolve AMD runner conflicts after the Qwen3-4B ROCm PR landed on main. Keep the Qwen3.5 smoke runner changes while preserving the main branch's Qwen3-4B baseline, and avoid hardcoded private IP defaults in local smoke wrappers. Made-with: Cursor

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

# 🐛 Bug Fix ## Avoid debug logging failures during tiny smoke runs - Skip rollout metric logging when the transient training debug buffer does not include response length metadata yet. - Keep training from failing before the rollout/logprob path has completed. Made-with: Cursor --- # ⭐ Feature ## Tighten Qwen3.5-9B smoke defaults - Use a smaller global batch for the reduced smoke profile while preserving GRPO grouping constraints. - Record the latest validation findings in the AMD runbook.

* add first qwen3 * zz * feat(amd): add qwen35 rocm smoke # ⭐ Feature ## Add Qwen3.5-9B ROCm smoke path - Add one-command Qwen3.5-9B DAPO-Math smoke runner that restarts Ray from a clean state. - Align Qwen3.5 runtime flags with the validated Qwen3-4B ROCm path: TE auto attention, BSHD, and disabled Dynamo/JIT fuser. - Reduce the default Qwen3.5 run to a smaller smoke profile to avoid filling SGLang KV/logits memory during initial validation. Made-with: Cursor --- # 🐛 Bug Fix ## Install ROCm training dependencies in Dockerfile - Install ROCm TransformerEngine 2.4.0 wheels for Megatron TE specs. - Install flash-linear-attention for Qwen3.5 GatedDeltaNet initialization. --- # 📝 Documentation ## Record Qwen3.5 ROCm bring-up results - Document FLA import resolution, SGLang rollout device ordinal fix, and the full-profile rollout OOM finding. - Record that the smoke reached all service registration and step-0 training/logprob flow. --- # 🎨 Style ## Apply pre-commit formatting - Apply import ordering, doc formatting, script copyright, and formatting fixes from pre-commit. * Update amd/run-qwen35-9b-dapo-math-direct.sh Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update amd/run_qwen35-9b.sh Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * fix(amd): harden qwen35 smoke validation # 🐛 Bug Fix ## Avoid debug logging failures during tiny smoke runs - Skip rollout metric logging when the transient training debug buffer does not include response length metadata yet. - Keep training from failing before the rollout/logprob path has completed. Made-with: Cursor --- # ⭐ Feature ## Tighten Qwen3.5-9B smoke defaults - Use a smaller global batch for the reduced smoke profile while preserving GRPO grouping constraints. - Record the latest validation findings in the AMD runbook. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…o zty_dev_qwen35

vivienfanghuagood and others added 27 commits April 24, 2026 04:18

Add experimental MI355 ROCm support

1495852

Trim ROCm bring-up changes

37cd38f

add first qwen3

95f2698

zz

4b6d22b

merge: sync qwen35 branch with main

f7102af

Resolve AMD runner conflicts after the Qwen3-4B ROCm PR landed on main. Keep the Qwen3.5 smoke runner changes while preserving the main branch's Qwen3-4B baseline, and avoid hardcoded private IP defaults in local smoke wrappers. Made-with: Cursor

Update amd/run-qwen35-9b-dapo-math-direct.sh

d909fef

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update amd/run_qwen35-9b.sh

b5e2a03

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

zz

87221b6

zz

5633f9a

zz

699dd95

add

0a8e421

zz

d22a539

add

05146b3

zz

236c666

Merge branch 'zty_dev_qwen35' of https://github.com/AMD-AIM/Relax int…

58e6758

…o zty_dev_qwen35

zz

a12a9e2

add

18c1b5e

add

7ee9b58

add log then restart machine

08ccda9

zz

50f3639

add all

e7f70db

Merge remote-tracking branch 'origin/main' into zty_dev_qwen35

b674a4c

add

71d9a5e

benenzhu requested review from Aurelius84, NINGBENZHE and Yangruipis as code owners May 8, 2026 07:18

benenzhu marked this pull request as draft May 8, 2026 07:18

benenzhu added 4 commits May 13, 2026 17:10

Merge remote-tracking branch 'up/main' into zty_dev_qwen35

f18d9b7

zz

8c8b450

zz

ee5c4cb

zz

71a949e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] [ROCM][AMD] Support qwen35-9B on MI300 gpus#28

[DRAFT] [ROCM][AMD] Support qwen35-9B on MI300 gpus#28
benenzhu wants to merge 31 commits into
redai-infra:mainfrom
AMD-AIM:zty_dev_qwen35

benenzhu commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

benenzhu commented May 8, 2026

What

Why

How

Testing

Type of Change

Screenshots / Logs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants