Skip to content

Checkpoint conversion tool: Optimize to_maxtext & Onboard deepseek2/3/3.2#3184

Draft
shuningjin wants to merge 15 commits intomainfrom
shuningjin-ckpt-opt3
Draft

Checkpoint conversion tool: Optimize to_maxtext & Onboard deepseek2/3/3.2#3184
shuningjin wants to merge 15 commits intomainfrom
shuningjin-ckpt-opt3

Conversation

@shuningjin
Copy link
Collaborator

@shuningjin shuningjin commented Feb 18, 2026

Description

onboard deepseek family

  • deepseek2-16b, deepseek3-671b, deepseek3.2-671b

optimize to_maxtext

  • use bfloat16 to load and save
  • reduce memory by half in all cases
  • speedup for large model: e.g., deepseek3-671b, previously impractical (loading alone takes 11hr), now total conversion is 9hr (with 4min loading)
  • increase support: can convert model without hf code, as long as there is safetensor (e.g., deepseek3.2)

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments