Conversation
There was a problem hiding this comment.
Code Review
This pull request adds a configuration file for the Qwen3-VL-235B model with Eagle3 and an offline training script. The feedback identifies that the rope_type should be set to mrope for multimodal rotary embeddings and that the training script should use the correct model path variable to load weights.
| 20, | ||
| 20 | ||
| ], | ||
| "rope_type": "default" |
There was a problem hiding this comment.
For Qwen3-VL models, the rope_type should be set to mrope to correctly utilize multimodal rotary embeddings. The current default setting will cause the model to use standard Llama rotary embeddings, which are incompatible with the 3D rope indices used in VL models. The mrope_section and mrope_interleaved parameters are only active when rope_type is mrope in the LlamaAttention implementation.
| "rope_type": "default" | |
| "rope_type": "mrope" |
| --standalone \ | ||
| --nproc_per_node 1 \ | ||
| $ROOT_DIR/scripts/train_eagle3.py \ | ||
| --target-model-path "$MODEL_NAME" \ |
There was a problem hiding this comment.
The --target-model-path in the training step should use $MODEL_PATH instead of "$MODEL_NAME". $MODEL_NAME is a descriptive string (e.g., Qwen3-VL-235B-A22B-Instruct-FP8), whereas $MODEL_PATH contains the actual path to the model weights required for loading embeddings and the LM head during offline training. This is consistent with the first torchrun command in this script.
| --target-model-path "$MODEL_NAME" \ | |
| --target-model-path "$MODEL_PATH" \ |
Motivation
This PR adds support assets for training Eagle3 on Qwen3-VL-235B by introducing:
Why
Qwen3-VL-235B needs a dedicated Eagle3 configuration and a reproducible offline example for training.
The example script was also updated to avoid machine-specific absolute paths in command arguments, which makes it easier to reuse across different environments and reduces configuration mistakes.
Modifications
Related Issues
Accuracy Test
Benchmark & Profiling
Checklist