[megatron] feat: support DeepSeek V4 GRPO#6473
Conversation
Adds DeepSeek V4 Flash GRPO support with Megatron-Bridge actor/ref workers, vLLM rollout, FP8/MXFP4 weight transfer handling, and checkpoint save/export verification. Signed-off-by: Hollow Man <hollowman@opensuse.org>
There was a problem hiding this comment.
Code Review
This pull request introduces support for the DeepSeek-V4-Flash model in the GRPO trainer, adding a new Megatron training script and extensive utilities for FP8 and MXFP4 quantization weight loading. It also enhances custom chat template resolution, aligns offsets in bucketed weight transfers to respect tensor element sizes, and adds comprehensive unit tests. The review feedback highlights two critical issues: a potential AttributeError in transformer_impl.py when accessing csa_compress_ratios on standard models, and another potential AttributeError in vllm_fp8_utils.py when copying parameter attributes.
There was a problem hiding this comment.
Pull request overview
Adds DeepSeek V4 Flash GRPO support for Megatron-Bridge actor/ref workers with vLLM rollout, including FP8/MXFP4 weight reload handling, config overrides, and example + unit coverage.
Changes:
- Normalize vLLM HF overrides (MTP disablement + Yarn RoPE scaling) and align Megatron transformer/provider overrides for disabled MTP layers.
- Improve vLLM rollout weight sync for quantized models (FP8/MXFP4), including bucketed transfer alignment and reload-safe parameter handling.
- Add
custom_chat_templateresolution (file/env) plus tests and a DeepSeek-V4-Flash Megatron GRPO example script / README entry.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| verl/workers/rollout/vllm_rollout/vllm_async_server.py | Adds HF override normalization for MTP disablement and Yarn RoPE scaling. |
| verl/workers/rollout/vllm_rollout/utils.py | Updates vLLM colocated worker to better handle FP8 reload preparation/post-processing across bucketed weight sync. |
| verl/workers/rollout/vllm_rollout/bucketed_weight_transfer.py | Aligns bucket offsets to tensor element size to support safe dtype views on the receiver side. |
| verl/workers/engine/megatron/transformer_impl.py | Adjusts Megatron provider overrides when MTP is disabled (and trims CSA ratios). |
| verl/workers/config/model.py | Adds custom_chat_template support and resolution from file/env at config materialization time. |
| verl/utils/vllm/vllm_fp8_utils.py | Adds MXFP4 quantization + reload-safe param restoration and DeepSeek V4 naming/mapping tweaks. |
| tests/workers/config/test_model_config_on_cpu.py | Adds unit tests for chat template resolution and mutability. |
| tests/utils/test_vllm_fp8_utils.py | Adds targeted unit tests for MXFP4 packing, prequantized detection, and scale-name conventions. |
| tests/utils/test_bucketed_weight_transfer.py | Adds unit test for new bucket offset alignment helper. |
| examples/grpo_trainer/run_deepseek_v4_flash_megatron.sh | Adds runnable DeepSeek-V4-Flash GRPO example (Megatron + vLLM rollout). |
| examples/grpo_trainer/README.md | Documents the new DeepSeek-V4-Flash example entry. |
Comments suppressed due to low confidence (1)
verl/workers/rollout/vllm_rollout/utils.py:311
- The drafter FP8 reload path still calls
load_quanted_weights(...)with defaultprepare_model=True/process_model=Truefor every received bucket. With bucketed transfer this can repeat non-idempotent post-processing and adds significant overhead. It should mirror the main-model behavior by skipping per-bucket prepare/process whenquant_preparedis true, and rely on the once-per-reload processing at the end ofupdate_weights_from_ipc().
# Keep the draft model in sync when present.
if self._use_mtp_drafter_weight_sync():
load_quanted_weights(weights, self.model_runner, is_drafter=True)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Code Review
This pull request introduces support for the DeepSeek-V4-Flash model, adding training scripts, custom chat template resolution, and extensive FP8/MXFP4 quantization and weight-loading utilities. It also updates the bucketed weight transfer mechanism to align offsets based on tensor element size and fixes Megatron transformer configuration building when MTP is disabled. The review feedback highlights critical stability improvements: guarding against None values when casting rope_scaling factors to float in vllm_async_server.py, handling cases where packed_modules_mapping is explicitly set to None in vllm_fp8_utils.py, and broadening exception handling during parameter attribute copying to prevent unexpected crashes.
There was a problem hiding this comment.
Code Review
This pull request introduces support for the DeepSeek-V4-Flash model, featuring a Megatron training script, custom chat template resolution from files or environment variables, and comprehensive FP8/MXFP4 quantization utilities for vLLM rollout. It also addresses alignment issues in bucketed weight transfer by respecting tensor element size and disables MTP layers when MTP is not enabled. A critical feedback item was identified in the _model_type helper function, where an AttributeError could occur if the model's configuration is None; a safe traversal suggestion has been provided to prevent potential crashes.
There was a problem hiding this comment.
Code Review
This pull request introduces support for the DeepSeek-V4-Flash model in the GRPO trainer, including a new Megatron training launch script. Key changes include implementing MXFP4 weight quantization and loading utilities, aligning offsets during weight transfer based on tensor element sizes, resolving custom chat templates from files or environment variables, and disabling Multi-Token Prediction (MTP) layers when not enabled. Feedback focuses on avoiding blocking ZMQ socket operations within an asynchronous method by offloading them to a separate thread, and ensuring that rope_scaling configurations are properly converted to dictionaries before processing to handle RoPEScalingConfig objects.
Signed-off-by: Hollow Man <hollowman@opensuse.org>
There was a problem hiding this comment.
Code Review
This pull request introduces support for the DeepSeek-V4-Flash model within the GRPO trainer, including a new Megatron training script and updated documentation. It adds robust FP8 and MXFP4 quantization utilities, improves weight loading and reloading mechanisms for vLLM, and enhances custom chat template resolution from files or environment variables. Additionally, it updates the bucketed weight transfer logic to align offsets by tensor element size and use asynchronous ZMQ. There are no review comments, so no further feedback is provided.
|
Hey @HollowMan6 — opened #6515 as the symmetric fix for the End-to-end verified on GB200 single GPU through ISEEKYAN/mbridge + DSv4 hybrid attention (forward+backward+optimizer.step), pairing well with your Also kicked an upstream bridge-side default at NVIDIA-NeMo/Megatron-Bridge#4003 for |
…on vanilla_mbridge=True path PR verl-project#6473 added the same fix to the vanilla_mbridge=False (NeMo MB) path of MegatronEngine._build_tf_config. The vanilla_mbridge=True (ISEEKYAN/mbridge) path needs the symmetric treatment: when self.model_config.mtp.enable is False, force mtp_num_layers=0 so the bridge does not build MTP blocks, and trim the per-layer csa_compress_ratios list (DSv4-Flash HF configs pad it for the MTP layer when num_nextn_predict_layers > 0). mtp_num_layers uses direct assignment (not setdefault) so a disabled-MTP run always forces 0 even if override_transformer_config carried a stale value. Why not duplicate: verl-project#6473 only modifies the vanilla_mbridge=False branch. This PR modifies the vanilla_mbridge=True branch — different code path, complementary fix. Test plan: validated end-to-end on GB200 (1 GPU) through ISEEKYAN/mbridge + DSv4 hybrid attention — forward + backward + optimizer.step() with the vanilla=True path produces finite loss / finite grad_norm / update_successful=True. AI assistance disclosure: developed with AI-assisted coding (Claude); author reviewed every changed line. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Lingrui Mei <lmei@nvidia.com>
…on vanilla_mbridge=True path PR verl-project#6473 added the same fix to the vanilla_mbridge=False (NeMo MB) path of MegatronEngine._build_tf_config. The vanilla_mbridge=True (ISEEKYAN/mbridge) path needs the symmetric treatment: when self.model_config.mtp.enable is False, force mtp_num_layers=0 so the bridge does not build MTP blocks, and trim the per-layer csa_compress_ratios list (DSv4-Flash HF configs pad it for the MTP layer when num_nextn_predict_layers > 0). mtp_num_layers uses direct assignment (not setdefault) so a disabled-MTP run always forces 0 even if override_transformer_config carried a stale value. Why not duplicate: verl-project#6473 only modifies the vanilla_mbridge=False branch. This PR modifies the vanilla_mbridge=True branch — different code path, complementary fix. Test plan: validated end-to-end on GB200 (1 GPU) through ISEEKYAN/mbridge + DSv4 hybrid attention — forward + backward + optimizer.step() with the vanilla=True path produces finite loss / finite grad_norm / update_successful=True. AI assistance disclosure: developed with AI-assisted coding (Claude); author reviewed every changed line. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Lingrui Mei <lmei@nvidia.com>
What does this PR do?
Adds DeepSeek V4 Flash GRPO support with Megatron-Bridge actor/ref workers, vLLM rollout, FP8/MXFP4 weight transfer handling, and checkpoint save/export verification.
Need
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,veomni,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,fully_async,one_step_off,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
End-to-end verification:
Key metrics:
API and Usage Example
This PR adds an example script:
bash examples/grpo_trainer/run_deepseek_v4_flash_megatron.sh
Example Slurm usage:
Design & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)recipesubmodule, please also update the reference to the submodule commit viagit submodule update --remoteorcd recipe && git pull origin main.