Skip to content

Conversation

@CarlOwOs
Copy link

@CarlOwOs CarlOwOs commented Jan 1, 2026

Summary

  • convert_hf_to_dcp previously saved weights under a nested {"model": state_dict} key, producing model.model.* entries.
  • flame/torchtitan expects a flat model.* state dict for seed checkpoints; the nested layout broke dcp.load when resuming training.
  • Updated the converter to save the flat state dict directly (model.* keys), matching the loader’s expectations.

Testing

  • Converted Qwen3-1.7B from Hugging Face using python -m flame.utils.convert_hf_to_dcp --model Qwen/Qwen3-1.7B --checkpoint <path>/checkpoint/step-0 and verified keys are model.* (no model.model.*).
  • Ran train.sh for continual training with that checkpoint; training now proceeds past checkpoint load.

Summary by CodeRabbit

  • Changes
    • Updated the model weight storage format when converting from HuggingFace models. The saved checkpoint structure has been simplified, which may affect dependent workflows that rely on the previous model structure.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 1, 2026

📝 Walkthrough

Walkthrough

The convert_hf_weights function in the DCP conversion utility was modified to save the raw state dictionary directly to DCP instead of wrapping it in a {"model": state_dict} structure, altering the saved payload format without changing traversal logic.

Changes

Cohort / File(s) Summary
DCP Payload Structure
flame/utils/convert_hf_to_dcp.py
Modified DCP.save() call to persist raw state_dict instead of {"model": state_dict} wrapper, changing the saved checkpoint structure

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 Off comes the wrapper, streamlined and light,
Direct to DCP, the state_dict takes flight,
No layers of boxes, just essence so pure,
A cleaner save path, more elegant and sure! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Fix convert_hf_to_dcp' is vague and generic, using non-descriptive language that doesn't convey what was actually fixed in the changeset. Consider a more descriptive title that explains the specific fix, such as 'Fix convert_hf_to_dcp to save flat state dict' or 'Remove nested model key from convert_hf_to_dcp output'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e11e7be and c56bc29.

📒 Files selected for processing (1)
  • flame/utils/convert_hf_to_dcp.py
🔇 Additional comments (1)
flame/utils/convert_hf_to_dcp.py (1)

24-24: Change is correct and fixes the double "model." prefix issue.

Removing the {"model": state_dict} wrapper is the right fix. AutoModelForCausalLM.state_dict() already produces keys with model. prefix; wrapping it in another dict created the unintended model.model.* nesting. Saving the state_dict directly is the correct approach that aligns with DCP.save() expectations.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello @CarlOwOs, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue in the convert_hf_to_dcp utility that prevented flame/torchtitan from correctly loading converted Hugging Face checkpoints. By adjusting how the state dictionary is saved, the utility now produces checkpoints with the expected flat key structure, thereby resolving the incompatibility and enabling seamless resumption of training with these converted models.

Highlights

  • Checkpoint Key Structure Fix: The convert_hf_to_dcp utility now saves model weights with a flat model.* key structure, resolving an issue where previously weights were nested under model.model.*.
  • Compatibility with flame/torchtitan: This change ensures that converted Hugging Face checkpoints are compatible with flame/torchtitan's dcp.load mechanism, which expects a flat model.* state dictionary.
  • Unblocking Continual Training: The fix allows for successful loading of converted checkpoints, enabling continual training to proceed without errors related to incorrect state dictionary keys.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes an issue in convert_hf_to_dcp.py where Hugging Face model weights were being saved with an extra level of nesting (model.model.*), causing problems when loading checkpoints for continued training. The change removes this nesting by saving the state dictionary directly, which aligns with the training loop's expectations for model.* keys. While this change correctly addresses the issue for training, it appears to introduce an inconsistency with the convert_dcp_to_hf.py script. This other script seems to expect the nested structure that is being removed, which could cause it to fail when processing checkpoints created with this updated converter. I've added a specific comment with more details.

checkpoint.mkdir(parents=True, exist_ok=True)
storage_writer = DCP.filesystem.FileSystemWriter(checkpoint, thread_count=8)
DCP.save({"model": state_dict}, storage_writer=storage_writer)
DCP.save(state_dict, storage_writer=storage_writer)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This change correctly flattens the checkpoint structure to align with the training loop's expectations. However, this is likely to break the flame/utils/convert_dcp_to_hf.py script.

That script expects a checkpoint with a top-level 'model' key, as seen on line 51 of that file:

model.load_state_dict(torch.load(checkpoint_path, map_location='cpu')['model'])

After this change, checkpoints created by convert_hf_to_dcp.py will no longer have this 'model' key, and torch.load(...)['model'] will likely raise a KeyError.

If convert_dcp_to_hf.py is intended to work with checkpoints converted from Hugging Face, it may need to be updated to handle the new flat structure. Could you please clarify if this is an intended side-effect or if the other script should be updated as well?

@CarlOwOs CarlOwOs marked this pull request as draft January 2, 2026 11:01
@CarlOwOs
Copy link
Author

CarlOwOs commented Jan 2, 2026

Need to change this line in convert_dcp_to_hf:

  • model.load_state_dict(torch.load(checkpoint_path, map_location='cpu')['model'])

I also need to check if the model key is used elsewhere throughout the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant