[DRAFT] [ROCM][AMD] Support qwen35-9B on MI300 gpus#28
Draft
benenzhu wants to merge 31 commits into
Draft
Conversation
* add first qwen3 * zz * Update amd/run_qwen3-4b.sh Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update amd/run_qwen3-4b.sh Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update amd/run-qwen3-4b-dapo-math-direct.sh Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update amd/run-qwen35-9b-dapo-math-direct.sh Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
# ⭐ Feature ## Add Qwen3.5-9B ROCm smoke path - Add one-command Qwen3.5-9B DAPO-Math smoke runner that restarts Ray from a clean state. - Align Qwen3.5 runtime flags with the validated Qwen3-4B ROCm path: TE auto attention, BSHD, and disabled Dynamo/JIT fuser. - Reduce the default Qwen3.5 run to a smaller smoke profile to avoid filling SGLang KV/logits memory during initial validation. Made-with: Cursor --- # 🐛 Bug Fix ## Install ROCm training dependencies in Dockerfile - Install ROCm TransformerEngine 2.4.0 wheels for Megatron TE specs. - Install flash-linear-attention for Qwen3.5 GatedDeltaNet initialization. --- # 📝 Documentation ## Record Qwen3.5 ROCm bring-up results - Document FLA import resolution, SGLang rollout device ordinal fix, and the full-profile rollout OOM finding. - Record that the smoke reached all service registration and step-0 training/logprob flow. --- # 🎨 Style ## Apply pre-commit formatting - Apply import ordering, doc formatting, script copyright, and formatting fixes from pre-commit.
Resolve AMD runner conflicts after the Qwen3-4B ROCm PR landed on main. Keep the Qwen3.5 smoke runner changes while preserving the main branch's Qwen3-4B baseline, and avoid hardcoded private IP defaults in local smoke wrappers. Made-with: Cursor
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
# 🐛 Bug Fix ## Avoid debug logging failures during tiny smoke runs - Skip rollout metric logging when the transient training debug buffer does not include response length metadata yet. - Keep training from failing before the rollout/logprob path has completed. Made-with: Cursor --- # ⭐ Feature ## Tighten Qwen3.5-9B smoke defaults - Use a smaller global batch for the reduced smoke profile while preserving GRPO grouping constraints. - Record the latest validation findings in the AMD runbook.
* add first qwen3 * zz * feat(amd): add qwen35 rocm smoke # ⭐ Feature ## Add Qwen3.5-9B ROCm smoke path - Add one-command Qwen3.5-9B DAPO-Math smoke runner that restarts Ray from a clean state. - Align Qwen3.5 runtime flags with the validated Qwen3-4B ROCm path: TE auto attention, BSHD, and disabled Dynamo/JIT fuser. - Reduce the default Qwen3.5 run to a smaller smoke profile to avoid filling SGLang KV/logits memory during initial validation. Made-with: Cursor --- # 🐛 Bug Fix ## Install ROCm training dependencies in Dockerfile - Install ROCm TransformerEngine 2.4.0 wheels for Megatron TE specs. - Install flash-linear-attention for Qwen3.5 GatedDeltaNet initialization. --- # 📝 Documentation ## Record Qwen3.5 ROCm bring-up results - Document FLA import resolution, SGLang rollout device ordinal fix, and the full-profile rollout OOM finding. - Record that the smoke reached all service registration and step-0 training/logprob flow. --- # 🎨 Style ## Apply pre-commit formatting - Apply import ordering, doc formatting, script copyright, and formatting fixes from pre-commit. * Update amd/run-qwen35-9b-dapo-math-direct.sh Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update amd/run_qwen35-9b.sh Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * fix(amd): harden qwen35 smoke validation # 🐛 Bug Fix ## Avoid debug logging failures during tiny smoke runs - Skip rollout metric logging when the transient training debug buffer does not include response length metadata yet. - Keep training from failing before the rollout/logprob path has completed. Made-with: Cursor --- # ⭐ Feature ## Tighten Qwen3.5-9B smoke defaults - Use a smaller global batch for the reduced smoke profile while preserving GRPO grouping constraints. - Record the latest validation findings in the AMD runbook. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…o zty_dev_qwen35
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Why
How
Testing
pre-commit run --all-filespassespytest tests/)Type of Change
Screenshots / Logs