-
Notifications
You must be signed in to change notification settings - Fork 51
Fix convert_hf_to_dcp
#48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThe Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @CarlOwOs, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical issue in the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request fixes an issue in convert_hf_to_dcp.py where Hugging Face model weights were being saved with an extra level of nesting (model.model.*), causing problems when loading checkpoints for continued training. The change removes this nesting by saving the state dictionary directly, which aligns with the training loop's expectations for model.* keys. While this change correctly addresses the issue for training, it appears to introduce an inconsistency with the convert_dcp_to_hf.py script. This other script seems to expect the nested structure that is being removed, which could cause it to fail when processing checkpoints created with this updated converter. I've added a specific comment with more details.
| checkpoint.mkdir(parents=True, exist_ok=True) | ||
| storage_writer = DCP.filesystem.FileSystemWriter(checkpoint, thread_count=8) | ||
| DCP.save({"model": state_dict}, storage_writer=storage_writer) | ||
| DCP.save(state_dict, storage_writer=storage_writer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change correctly flattens the checkpoint structure to align with the training loop's expectations. However, this is likely to break the flame/utils/convert_dcp_to_hf.py script.
That script expects a checkpoint with a top-level 'model' key, as seen on line 51 of that file:
model.load_state_dict(torch.load(checkpoint_path, map_location='cpu')['model'])After this change, checkpoints created by convert_hf_to_dcp.py will no longer have this 'model' key, and torch.load(...)['model'] will likely raise a KeyError.
If convert_dcp_to_hf.py is intended to work with checkpoints converted from Hugging Face, it may need to be updated to handle the new flat structure. Could you please clarify if this is an intended side-effect or if the other script should be updated as well?
|
Need to change this line in
I also need to check if the model key is used elsewhere throughout the code. |
Summary
convert_hf_to_dcppreviously saved weights under a nested{"model": state_dict}key, producingmodel.model.*entries.flame/torchtitanexpects a flatmodel.*state dict for seed checkpoints; the nested layout brokedcp.loadwhen resuming training.model.*keys), matching the loader’s expectations.Testing
python -m flame.utils.convert_hf_to_dcp --model Qwen/Qwen3-1.7B --checkpoint <path>/checkpoint/step-0and verified keys aremodel.*(nomodel.model.*).train.shfor continual training with that checkpoint; training now proceeds past checkpoint load.Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.