Conversation
|
Thank you for the contribution. Can you please rebase ? |
|
Thanks for bringup new models! We usually run this script to test correctness against HF version. Please feel free to let us know if you meet any issues. cc @parambole who is working on Qwen3 bringup and could help PR review. |
hengtaoguo
left a comment
There was a problem hiding this comment.
Generally LGTM! Have you run forward_pass_logit_check for this 1.7B model?
|
No, not yet unfortunately - I'm out of town and haven't had much of a chance to look at it quite yet, but will after I get back. And of course, happy to contribute! |
|
Hi, thanks for advising me about the script, and here are the logs that I got when I ran it. I think they're similar enough, but will let y'all be the judge of that. Details |
This looks great, Max KL divergence is below the threshold 0.015 which verifies the correctness. |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Description
Adds Qwen3 1.7B model configs. It's extremely similar to other Qwen model configs, with slight changes to the base_emb_dim and base_mlp_dim relative to the 0.6B. I've made a slight change to the documentation listing 1.7B as a supported model.
Tests
I tested this on a Google Colab v6e1 instance, via the provided Qwen SFT demo notebook. The training cell with TFLOPs and loss succeeded, so I'm pretty confident the architecture mapping and parameter conversion was done properly. To reproduce, all I changed was the notebook to use 1.7B instead of the 0.6B model. However, I wasn't able to run the cells regarding vLLM, due to a ipython issue.
vLLM Error
Logs
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.