sync: gitlab/main -> github/main#29
Merged
Merged
Conversation
# 🔩 Chore ## Remove dead helpers from data module - Delete the unreferenced `filter_long_prompt` helper from `relax/utils/data/data.py` - Delete the dead `_build_messages` helper that was shadowed by `relax/utils/data/data_utils.py` - Delete the unused `process_rollout_data` helper and the imports it required
# 🐛 Bug Fix ## Keep multimodal prompt building non-destructive - Build multimodal message content without mutating cached prompt rows - Prevent reused raw samples from carrying expanded message content into later reads ## Support sliced eager dataset paths - Parse per-file generalized slice syntax in eager file readers - Keep multi-file eager path behavior aligned with streaming path semantics
The rollout component exits its main loop on the final training step, leaving the eval handler un-awaited. This caused a race condition where the controller's atexit shutdown tore down SGLang engines mid-flight. This fix blocks until the evaluation finishes at the end of training.
# ⭐ Feature ## Migrate from Megatron-LM to Megatron-Bridge - Replace direct Megatron-LM checkout with Megatron-Bridge (commit 2faedbf6) in Dockerfile - Upgrade transformer_engine from 2.10.0 to 2.14.1 - Archive old megatron patch (3714d81d) and add new patch for 20260506-85bced0ae ## Adapt Relax backend to Megatron-Bridge API changes - Update vocab_size_with_padding import with fallback for new module path - Rename enable_gloo_process_groups to use_gloo_process_groups - Rename norm_epsilon to layernorm_epsilon in HF config validation - Accept **kwargs in wrapped_provider for new model_provider signature - Relax partition_stride assertion for GLU/SwiGLU linear_fc1 layers (stride=2) - Guard checkpoint_write_patch against removed write_preloaded_data_multiproc
# ⭐ Feature ## Add Qwen3.6 model support with automatic expert format detection - Add Qwen3.6-35B-A3B model configuration script with MoE parameters (256 experts, 8-way routing) - Implement MTP MoE expert weight format detection in Qwen35VL bridge - Qwen3.5: per-expert storage (gate_proj/up_proj/down_proj per expert) - Qwen3.6: packed format (gate_up_proj/down_proj shared tensor) - Add training script for Qwen3.6-35B-A3B 8xGPU colocate mode with multimodal support - Extend Megatron bridge patch with format-aware weight mappings --- # 🐛 Bug Fix ## Fix multimodal data counting and training script paths - Fix remain_data counter for pre-structured multimodal content (was skipping already-processed items) - Remove invalid dataset slice notation (@[0:1000]) from PROMPT_SET path in training script
# ⭐ Feature ## Add rollout reward field metrics - Aggregate numeric fields from reward dictionaries during rollout logging - Skip the primary reward key and raw_reward to preserve existing reward metrics - Reuse the shared helper from the SGLang rollout metrics path
# 🐛 Bug Fix ## Strip consecutive `<|image_pad|>` tokens in pre-tokenized prompts - Add `QwenVLImageProcessor._strip_image_token` static helper that collapses runs of `<|image_pad|>` (token id 151655) into a single placeholder while leaving the surrounding `<|vision_start|>`/`<|vision_end|>` markers intact. - Apply the helper to `prompt` before calling `load_mm_data` in `process_mm_data_async`, so pre-tokenized `input_ids` (where each image is already expanded to N image-pad tokens, one per visual patch) no longer collide with `load_mm_data` re-expanding the placeholder itself. Without the collapse, the pipeline saw `N x M` image-pad tokens and miscounted positions, breaking mrope bookkeeping. - Raw text (`str`) prompts are passed through unchanged.
Aurelius84
approved these changes
May 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Routine internal -> external sync.