Skip to content

[DRAFT] [ROCM][AMD] Support qwen35-9B on MI300 gpus#28

Draft
benenzhu wants to merge 31 commits into
redai-infra:mainfrom
AMD-AIM:zty_dev_qwen35
Draft

[DRAFT] [ROCM][AMD] Support qwen35-9B on MI300 gpus#28
benenzhu wants to merge 31 commits into
redai-infra:mainfrom
AMD-AIM:zty_dev_qwen35

Conversation

@benenzhu
Copy link
Copy Markdown

@benenzhu benenzhu commented May 8, 2026

What

Why

How

Testing

  • pre-commit run --all-files passes
  • Tests pass (pytest tests/)
  • New tests added (if applicable)
  • Documentation updated (if applicable)

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • CI/CD or build changes

Screenshots / Logs

vivienfanghuagood and others added 27 commits April 24, 2026 04:18
* add first qwen3

* zz

* Update amd/run_qwen3-4b.sh

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update amd/run_qwen3-4b.sh

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update amd/run-qwen3-4b-dapo-math-direct.sh

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update amd/run-qwen35-9b-dapo-math-direct.sh

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
# ⭐ Feature

## Add Qwen3.5-9B ROCm smoke path

- Add one-command Qwen3.5-9B DAPO-Math smoke runner that restarts Ray from a clean state.
- Align Qwen3.5 runtime flags with the validated Qwen3-4B ROCm path: TE auto attention, BSHD, and disabled Dynamo/JIT fuser.
- Reduce the default Qwen3.5 run to a smaller smoke profile to avoid filling SGLang KV/logits memory during initial validation.

Made-with: Cursor

---

# 🐛 Bug Fix

## Install ROCm training dependencies in Dockerfile

- Install ROCm TransformerEngine 2.4.0 wheels for Megatron TE specs.
- Install flash-linear-attention for Qwen3.5 GatedDeltaNet initialization.

---

# 📝 Documentation

## Record Qwen3.5 ROCm bring-up results

- Document FLA import resolution, SGLang rollout device ordinal fix, and the full-profile rollout OOM finding.
- Record that the smoke reached all service registration and step-0 training/logprob flow.

---

# 🎨 Style

## Apply pre-commit formatting

- Apply import ordering, doc formatting, script copyright, and formatting fixes from pre-commit.
Resolve AMD runner conflicts after the Qwen3-4B ROCm PR landed on main.

Keep the Qwen3.5 smoke runner changes while preserving the main branch's Qwen3-4B baseline, and avoid hardcoded private IP defaults in local smoke wrappers.

Made-with: Cursor
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
# 🐛 Bug Fix

## Avoid debug logging failures during tiny smoke runs

- Skip rollout metric logging when the transient training debug buffer does not include response length metadata yet.
- Keep training from failing before the rollout/logprob path has completed.

Made-with: Cursor

---

# ⭐ Feature

## Tighten Qwen3.5-9B smoke defaults

- Use a smaller global batch for the reduced smoke profile while preserving GRPO grouping constraints.
- Record the latest validation findings in the AMD runbook.
* add first qwen3

* zz

* feat(amd): add qwen35 rocm smoke

# ⭐ Feature

## Add Qwen3.5-9B ROCm smoke path

- Add one-command Qwen3.5-9B DAPO-Math smoke runner that restarts Ray from a clean state.
- Align Qwen3.5 runtime flags with the validated Qwen3-4B ROCm path: TE auto attention, BSHD, and disabled Dynamo/JIT fuser.
- Reduce the default Qwen3.5 run to a smaller smoke profile to avoid filling SGLang KV/logits memory during initial validation.

Made-with: Cursor

---

# 🐛 Bug Fix

## Install ROCm training dependencies in Dockerfile

- Install ROCm TransformerEngine 2.4.0 wheels for Megatron TE specs.
- Install flash-linear-attention for Qwen3.5 GatedDeltaNet initialization.

---

# 📝 Documentation

## Record Qwen3.5 ROCm bring-up results

- Document FLA import resolution, SGLang rollout device ordinal fix, and the full-profile rollout OOM finding.
- Record that the smoke reached all service registration and step-0 training/logprob flow.

---

# 🎨 Style

## Apply pre-commit formatting

- Apply import ordering, doc formatting, script copyright, and formatting fixes from pre-commit.

* Update amd/run-qwen35-9b-dapo-math-direct.sh

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update amd/run_qwen35-9b.sh

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* fix(amd): harden qwen35 smoke validation

# 🐛 Bug Fix

## Avoid debug logging failures during tiny smoke runs

- Skip rollout metric logging when the transient training debug buffer does not include response length metadata yet.
- Keep training from failing before the rollout/logprob path has completed.

Made-with: Cursor

---

# ⭐ Feature

## Tighten Qwen3.5-9B smoke defaults

- Use a smaller global batch for the reduced smoke profile while preserving GRPO grouping constraints.
- Record the latest validation findings in the AMD runbook.

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@benenzhu benenzhu marked this pull request as draft May 8, 2026 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants