sync: gitlab/main -> github/main by Yangruipis · Pull Request #29 · redai-infra/Relax

Yangruipis · 2026-05-09T10:43:07Z

Routine internal -> external sync.

# 🔩 Chore ## Remove dead helpers from data module - Delete the unreferenced `filter_long_prompt` helper from `relax/utils/data/data.py` - Delete the dead `_build_messages` helper that was shadowed by `relax/utils/data/data_utils.py` - Delete the unused `process_rollout_data` helper and the imports it required

# 🐛 Bug Fix ## Keep multimodal prompt building non-destructive - Build multimodal message content without mutating cached prompt rows - Prevent reused raw samples from carrying expanded message content into later reads ## Support sliced eager dataset paths - Parse per-file generalized slice syntax in eager file readers - Keep multi-file eager path behavior aligned with streaming path semantics

The rollout component exits its main loop on the final training step, leaving the eval handler un-awaited. This caused a race condition where the controller's atexit shutdown tore down SGLang engines mid-flight. This fix blocks until the evaluation finishes at the end of training.

# ⭐ Feature ## Migrate from Megatron-LM to Megatron-Bridge - Replace direct Megatron-LM checkout with Megatron-Bridge (commit 2faedbf6) in Dockerfile - Upgrade transformer_engine from 2.10.0 to 2.14.1 - Archive old megatron patch (3714d81d) and add new patch for 20260506-85bced0ae ## Adapt Relax backend to Megatron-Bridge API changes - Update vocab_size_with_padding import with fallback for new module path - Rename enable_gloo_process_groups to use_gloo_process_groups - Rename norm_epsilon to layernorm_epsilon in HF config validation - Accept **kwargs in wrapped_provider for new model_provider signature - Relax partition_stride assertion for GLU/SwiGLU linear_fc1 layers (stride=2) - Guard checkpoint_write_patch against removed write_preloaded_data_multiproc

# ⭐ Feature ## Add Qwen3.6 model support with automatic expert format detection - Add Qwen3.6-35B-A3B model configuration script with MoE parameters (256 experts, 8-way routing) - Implement MTP MoE expert weight format detection in Qwen35VL bridge - Qwen3.5: per-expert storage (gate_proj/up_proj/down_proj per expert) - Qwen3.6: packed format (gate_up_proj/down_proj shared tensor) - Add training script for Qwen3.6-35B-A3B 8xGPU colocate mode with multimodal support - Extend Megatron bridge patch with format-aware weight mappings --- # 🐛 Bug Fix ## Fix multimodal data counting and training script paths - Fix remain_data counter for pre-structured multimodal content (was skipping already-processed items) - Remove invalid dataset slice notation (@[0:1000]) from PROMPT_SET path in training script

# ⭐ Feature ## Add rollout reward field metrics - Aggregate numeric fields from reward dictionaries during rollout logging - Skip the primary reward key and raw_reward to preserve existing reward metrics - Reuse the shared helper from the SGLang rollout metrics path

# 🐛 Bug Fix ## Strip consecutive `<|image_pad|>` tokens in pre-tokenized prompts - Add `QwenVLImageProcessor._strip_image_token` static helper that collapses runs of `<|image_pad|>` (token id 151655) into a single placeholder while leaving the surrounding `<|vision_start|>`/`<|vision_end|>` markers intact. - Apply the helper to `prompt` before calling `load_mm_data` in `process_mm_data_async`, so pre-tokenized `input_ids` (where each image is already expanded to N image-pad tokens, one per visual patch) no longer collide with `load_mm_data` re-expanding the placeholder itself. Without the collapse, the pipeline saw `N x M` image-pad tokens and miscounted positions, breaking mrope bookkeeping. - Raw text (`str`) prompts are passed through unchanged.

dirtyDan0 and others added 7 commits May 9, 2026 18:35

Yangruipis requested review from NINGBENZHE and yxyOo as code owners May 9, 2026 10:43

Aurelius84 approved these changes May 9, 2026

View reviewed changes

Yangruipis merged commit 2100c15 into main May 9, 2026
5 checks passed

Yangruipis deleted the sync/from-gitlab branch May 9, 2026 10:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync: gitlab/main -> github/main#29

sync: gitlab/main -> github/main#29
Yangruipis merged 7 commits into
mainfrom
sync/from-gitlab

Yangruipis commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Yangruipis commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants