feat(engine): add Qwen3-VL dense support to Megatron path by Adiactive · Pull Request #1299 · areal-project/AReaL

Adiactive · 2026-05-04T18:28:49Z

Description

Adds Qwen3-VL dense support to AReaL's Megatron engine via mbridge, so GRPO/PPO of any Qwen3-VL model on the Megatron backend is unblocked.

Key changes:

areal/engine/megatron_utils/megatron.py: new convert_qwen3_vl_to_hf registered before "qwen3" in _CONVERSION_FN_REGISTRY (mapping anchored on mbridge.models.qwen3_vl.Qwen3VLBridge). Carries an early-raise on Qwen3-VL-MoE param names so qwen3_vl_moe cannot silently dispatch to the dense converter via substring match. _vision_qkv_mcore_to_hf gains a no-GQA assertion that guards future vision-GQA VLMs.
areal/engine/core/model.py: new lang_config(hf_config) helper — getattr(hf_config, "text_config", hf_config) — so callers can read language-side attrs (vocab_size, num_attention_heads, num_key_value_heads, hidden_size, head_dim) uniformly across Qwen3-VL (nested) and Qwen2.5-VL / pure text (flat).
areal/models/mcore/hf_load.py: _merge_qkv_weights, _load_fused_qkv_weight, and the GQA branch of _weight_to_mcore_tp use the shared lang_config helper.
areal/engine/megatron_engine.py: _collect_param uses lang_config(self.hf_config).vocab_size for remove_padding.
tests/test_megatron_engine_vlm.py: add TestConvertQwen3VLToHF and parametrize the VLM integration tests across qwen25_vl and qwen3_vl via _VLM_MODELS + env_overrides.
tests/torchrun/run_megatron_engine_vlm.py: mock_vlm_input reads patch geometry from engine.hf_config so it works for both VLMs without code-side branching.

Tests added:

TestConvertQwen3VLToHF (CPU unit tests for convert_qwen3_vl_to_hf):
- Registry dispatch: qwen3_vl resolves before qwen3 substring fallback.
- Language model: embedding / final norm / output_layer prefixes; QKV weight + bias split with GQA; q_norm / k_norm / o_proj / input_layernorm / post_attention_layernorm; gated MLP fc1 split + fc2.
- Vision direct mappings: patch_embed weight + bias, pos_embed, merger (patch_norm, linear_fc1/2).
- Vision per-block: QKV per-head→grouped reorder for weight + bias; requires hf_config.vision_config.num_heads; attn proj; norm1 / norm2 weight + bias; non-gated MLP regression guard (linear_fc1 must NOT chunk); fc2.
- Deepstack mergers: parametrized over indices [0, 1, 2] for norm and linear_fc{1,2}.
- Error handling: unknown language-model and unknown vision parameter names.
test_qwen3_vl_detected added to TestVisionModelDetection.
The 4 existing test_vlm_* integration tests (test_engine_initializes, test_simple_forward, test_hf_save_load_weights, test_train_tensor_parallel) become parametrized over _VLM_MODELS so they run once per VLM (qwen25_vl, qwen3_vl). Adding a new VLM is a one-line addition to _VLM_MODELS.

Scope:

bridge_type: mbridge only. The bridge_type: megatron-bridge path with Qwen3-VL + gradient_checkpointing: true crashes inside Qwen3VLTransformerBlock._checkpointed_forward with TypeError: save_for_backward can only save variables, but argument 6 is of type list — deepstack_visual_embeds is handed verbatim to tensor_parallel.checkpoint. Fixed upstream in megatron-bridge v0.4.0 (variadic-flatten of deepstack_visual_embeds); this PR intentionally does NOT vendor-patch the megatron-bridge path so it lights up automatically when the dependency upgrade lands as chore(deps): upgrade runtime dependencies and CI workflow #1206 plans to do.
Context parallelism for VLM (context_parallel_size > 1) continues to raise NotImplementedError — matches Qwen2.5-VL state.
Out of scope: Qwen3-VL MoE (qwen3_vl_moe), pixel_values_videos plumbing for streaming video inputs.

Related Issue

N/A — net-new feature support.

Type of Change

Checklist

I have read the Contributing Guide
Pre-commit hooks pass (pre-commit run --all-files)
Relevant tests pass; new tests added for new functionality
Documentation updated (if applicable; built with ./docs/build_all.sh)
Branch is up to date with main
Self-reviewed via /review-pr command
This PR was created by a coding agent via /create-pr
This PR is a breaking change

Breaking Change Details (if applicable):

N/A.

Additional Context

Fix for #1298 is required to run integration tests or actual training. This PR was tested with a local patch which is not committed.

Training Reward Example

Image: ghcr.io/inclusionai/areal-runtime:v1.0.3-vllm with mbridge upgraded according to #1258
Dataset: Geometry3k
Model: Qwen3-VL-3B-Instruct / Qwen3-VL-32B-Instruct
Scheduler: Slurm

Extend the Megatron engine to train Qwen3-VL dense models end-to-end: mcore→HF weight conversion for update_weights and HF→mcore loading that handles Qwen3-VL's nested HF config layout. Without this, GRPO/PPO of any Qwen3-VL model on the Megatron backend is blocked. Key changes: - megatron_utils/megatron.py: convert_qwen3_vl_to_hf (anchored on mbridge.models.qwen3_vl.Qwen3VLBridge), registered before "qwen3" in _CONVERSION_FN_REGISTRY. - mcore/hf_load.py: _lang_config() helper for the HF→mcore loader routes language-side config reads through getattr(hf_config, "text_config", hf_config) so Qwen3-VL's nested text_config works alongside Qwen2.5-VL and pure text models. - megatron_engine.py: _collect_param reads text_config.vocab_size with the same getattr fallback. - test_megatron_engine_vlm.py: add TestConvertQwen3VLToHF and parametrize the VLM integration tests across qwen25_vl and qwen3_vl. - run_megatron_engine_vlm.py: mock_vlm_input reads patch geometry from engine.hf_config so it works for both VLMs.

gemini-code-assist

Code Review

This pull request adds support for the Qwen3-VL model, featuring a new parameter conversion utility and updates to the weight loading logic to handle nested text configurations. The testing suite was also updated to include Qwen3-VL and support parameterized integration tests. Feedback was provided to simplify the head dimension calculation in the conversion logic for improved readability.

gemini-code-assist · 2026-05-04T18:31:57Z

+            try:
+                head_dim = (
+                    tf_config.kv_channels
+                    if tf_config.kv_channels is not None
+                    else tf_config.hidden_size // tf_config.num_attention_heads
+                )
+            except (AttributeError, TypeError):
+                head_dim = tf_config.hidden_size // tf_config.num_attention_heads


The try-except block for calculating head_dim can be simplified for better readability and to more clearly express the intent of falling back if kv_channels is not available.

kv_channels = getattr(tf_config, "kv_channels", None) if kv_channels is not None: head_dim = kv_channels else: head_dim = tf_config.hidden_size // tf_config.num_attention_heads

Addressed in the follow-up commit

- Consolidate `lang_config` helper into areal/engine/core/model.py so hf_load.py and megatron_engine.py:_collect_param share a single `getattr(hf_config, "text_config", hf_config)` accessor instead of inlining it twice. - convert_qwen3_vl_to_hf: early-raise on Qwen3-VL-MoE expert / router param names. _CONVERSION_FN_REGISTRY uses substring matching, so `qwen3_vl_moe` would silently fall through to the dense converter unless registered before `qwen3_vl`. Make the registry-order requirement explicit and actionable. - _vision_qkv_mcore_to_hf: assert vision has no GQA (num_kv_heads == num_heads). Both Qwen2.5-VL and Qwen3-VL satisfy this; guard catches future vision-GQA VLMs that would otherwise silently miscompile QKV. - convert_qwen3_vl_to_hf: replace try/except for tf_config.kv_channels with getattr (per gemini-code-assist bot suggestion on PR areal-project#1299). - TestConvertQwen3VLToHF: pin fixture dims to real Qwen/Qwen3-VL-2B-Instruct values (vision hidden=1024 instead of synthetic 1152) and update dependent shape literals.

Adiactive · 2026-05-05T21:53:58Z

Will submit another PR which includes Qwen3-VL-MOE support along with current changes therefore closing this PR

Adiactive requested review from garrett4wade, nuzant and rchardx as code owners May 4, 2026 18:28

gemini-code-assist Bot reviewed May 4, 2026

View reviewed changes

Adiactive closed this May 5, 2026

Adiactive mentioned this pull request May 5, 2026

feat(engine): add Qwen3-VL dense and MoE support to Megatron path #1301

Merged

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(engine): add Qwen3-VL dense support to Megatron path#1299

feat(engine): add Qwen3-VL dense support to Megatron path#1299
Adiactive wants to merge 2 commits into
areal-project:mainfrom
Adiactive:pr/megatron-qwen3-vl

Adiactive commented May 4, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 4, 2026

Uh oh!

Adiactive May 4, 2026

Uh oh!

Adiactive commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Adiactive commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Type of Change

Checklist

Additional Context

Training Reward Example

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Adiactive May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Adiactive commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Adiactive commented May 4, 2026 •

edited

Loading