Skip to content

support qwen3vl-235b#525

Open
liusy58 wants to merge 3 commits intosgl-project:mainfrom
liusy58:qwen3-235b
Open

support qwen3vl-235b#525
liusy58 wants to merge 3 commits intosgl-project:mainfrom
liusy58:qwen3-235b

Conversation

@liusy58
Copy link
Copy Markdown

@liusy58 liusy58 commented Apr 8, 2026

Motivation

This PR adds support assets for training Eagle3 on Qwen3-VL-235B by introducing:

  • a new draft model config for Qwen3-VL-235B
  • an offline training example script for hidden state preparation and Eagle3 training

Why

Qwen3-VL-235B needs a dedicated Eagle3 configuration and a reproducible offline example for training.

The example script was also updated to avoid machine-specific absolute paths in command arguments, which makes it easier to reuse across different environments and reduces configuration mistakes.

Modifications

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a configuration file for the Qwen3-VL-235B model with Eagle3 and an offline training script. The feedback identifies that the rope_type should be set to mrope for multimodal rotary embeddings and that the training script should use the correct model path variable to load weights.

20,
20
],
"rope_type": "default"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

For Qwen3-VL models, the rope_type should be set to mrope to correctly utilize multimodal rotary embeddings. The current default setting will cause the model to use standard Llama rotary embeddings, which are incompatible with the 3D rope indices used in VL models. The mrope_section and mrope_interleaved parameters are only active when rope_type is mrope in the LlamaAttention implementation.

Suggested change
"rope_type": "default"
"rope_type": "mrope"

--standalone \
--nproc_per_node 1 \
$ROOT_DIR/scripts/train_eagle3.py \
--target-model-path "$MODEL_NAME" \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The --target-model-path in the training step should use $MODEL_PATH instead of "$MODEL_NAME". $MODEL_NAME is a descriptive string (e.g., Qwen3-VL-235B-A22B-Instruct-FP8), whereas $MODEL_PATH contains the actual path to the model weights required for loading embeddings and the LM head during offline training. This is consistent with the first torchrun command in this script.

Suggested change
--target-model-path "$MODEL_NAME" \
--target-model-path "$MODEL_PATH" \

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant