Fix `convert_hf_to_dcp` #48

CarlOwOs · 2026-01-01T13:02:12Z

Summary

convert_hf_to_dcp previously saved weights under a nested {"model": state_dict} key, producing model.model.* entries.
flame/torchtitan expects a flat model.* state dict for seed checkpoints; the nested layout broke dcp.load when resuming training.
Updated the converter to save the flat state dict directly (model.* keys), matching the loader’s expectations.

Testing

Converted Qwen3-1.7B from Hugging Face using python -m flame.utils.convert_hf_to_dcp --model Qwen/Qwen3-1.7B --checkpoint <path>/checkpoint/step-0 and verified keys are model.* (no model.model.*).
Ran train.sh for continual training with that checkpoint; training now proceeds past checkpoint load.

Summary by CodeRabbit

Changes
- Updated the model weight storage format when converting from HuggingFace models. The saved checkpoint structure has been simplified, which may affect dependent workflows that rely on the previous model structure.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-01T13:02:21Z

📝 Walkthrough

Walkthrough

The convert_hf_weights function in the DCP conversion utility was modified to save the raw state dictionary directly to DCP instead of wrapping it in a {"model": state_dict} structure, altering the saved payload format without changing traversal logic.

Changes

Cohort / File(s)	Summary
DCP Payload Structure `flame/utils/convert_hf_to_dcp.py`	Modified `DCP.save()` call to persist raw `state_dict` instead of `{"model": state_dict}` wrapper, changing the saved checkpoint structure

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 Off comes the wrapper, streamlined and light,
Direct to DCP, the state_dict takes flight,
No layers of boxes, just essence so pure,
A cleaner save path, more elegant and sure! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	❓ Inconclusive	The title 'Fix `convert_hf_to_dcp`' is vague and generic, using non-descriptive language that doesn't convey what was actually fixed in the changeset.	Consider a more descriptive title that explains the specific fix, such as 'Fix convert_hf_to_dcp to save flat state dict' or 'Remove nested model key from convert_hf_to_dcp output'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e11e7be and c56bc29.

📒 Files selected for processing (1)

flame/utils/convert_hf_to_dcp.py

🔇 Additional comments (1)

flame/utils/convert_hf_to_dcp.py (1)

24-24: Change is correct and fixes the double "model." prefix issue.

Removing the {"model": state_dict} wrapper is the right fix. AutoModelForCausalLM.state_dict() already produces keys with model. prefix; wrapping it in another dict created the unintended model.model.* nesting. Saving the state_dict directly is the correct approach that aligns with DCP.save() expectations.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-01-01T13:02:22Z

Summary of Changes

Hello @CarlOwOs, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue in the convert_hf_to_dcp utility that prevented flame/torchtitan from correctly loading converted Hugging Face checkpoints. By adjusting how the state dictionary is saved, the utility now produces checkpoints with the expected flat key structure, thereby resolving the incompatibility and enabling seamless resumption of training with these converted models.

Highlights

Checkpoint Key Structure Fix: The convert_hf_to_dcp utility now saves model weights with a flat model.* key structure, resolving an issue where previously weights were nested under model.model.*.
Compatibility with flame/torchtitan: This change ensures that converted Hugging Face checkpoints are compatible with flame/torchtitan's dcp.load mechanism, which expects a flat model.* state dictionary.
Unblocking Continual Training: The fix allows for successful loading of converted checkpoints, enabling continual training to proceed without errors related to incorrect state dictionary keys.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request fixes an issue in convert_hf_to_dcp.py where Hugging Face model weights were being saved with an extra level of nesting (model.model.*), causing problems when loading checkpoints for continued training. The change removes this nesting by saving the state dictionary directly, which aligns with the training loop's expectations for model.* keys. While this change correctly addresses the issue for training, it appears to introduce an inconsistency with the convert_dcp_to_hf.py script. This other script seems to expect the nested structure that is being removed, which could cause it to fail when processing checkpoints created with this updated converter. I've added a specific comment with more details.

gemini-code-assist · 2026-01-01T13:03:22Z

flame/utils/convert_hf_to_dcp.py

    checkpoint.mkdir(parents=True, exist_ok=True)
    storage_writer = DCP.filesystem.FileSystemWriter(checkpoint, thread_count=8)
-    DCP.save({"model": state_dict}, storage_writer=storage_writer)
+    DCP.save(state_dict, storage_writer=storage_writer)


This change correctly flattens the checkpoint structure to align with the training loop's expectations. However, this is likely to break the flame/utils/convert_dcp_to_hf.py script.

That script expects a checkpoint with a top-level 'model' key, as seen on line 51 of that file:

model.load_state_dict(torch.load(checkpoint_path, map_location='cpu')['model'])

After this change, checkpoints created by convert_hf_to_dcp.py will no longer have this 'model' key, and torch.load(...)['model'] will likely raise a KeyError.

If convert_dcp_to_hf.py is intended to work with checkpoints converted from Hugging Face, it may need to be updated to handle the new flat structure. Could you please clarify if this is an intended side-effect or if the other script should be updated as well?

CarlOwOs · 2026-01-02T11:02:53Z

Need to change this line in convert_dcp_to_hf:

model.load_state_dict(torch.load(checkpoint_path, map_location='cpu')['model'])

I also need to check if the model key is used elsewhere throughout the code.

fixed hf to dcp conversion by removing the double "model." prefix

c56bc29

gemini-code-assist bot reviewed Jan 1, 2026

View reviewed changes

CarlOwOs marked this pull request as draft January 2, 2026 11:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix `convert_hf_to_dcp` #48

Fix `convert_hf_to_dcp` #48

CarlOwOs commented Jan 1, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 1, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

gemini-code-assist bot commented Jan 1, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 1, 2026

Uh oh!

CarlOwOs commented Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix convert_hf_to_dcp #48

Are you sure you want to change the base?

Fix convert_hf_to_dcp #48

Conversation

CarlOwOs commented Jan 1, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

gemini-code-assist bot commented Jan 1, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

CarlOwOs commented Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix `convert_hf_to_dcp` #48

Fix `convert_hf_to_dcp` #48

CarlOwOs commented Jan 1, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 1, 2026 •

edited

Loading