Skip to content

Fall back to a built-in chat template by model_type#2

Open
lixiangnlp wants to merge 1 commit into
Thump604:deepseek-v4-support-fixesfrom
lixiangnlp:chat-template-fallback-by-model-type
Open

Fall back to a built-in chat template by model_type#2
lixiangnlp wants to merge 1 commit into
Thump604:deepseek-v4-support-fixesfrom
lixiangnlp:chat-template-fallback-by-model-type

Conversation

@lixiangnlp
Copy link
Copy Markdown

Summary

When a checkpoint's tokenizer ships without a chat template (no
`chat_template` in `tokenizer_config.json`, no `chat_template.json` /
`chat_template.jinja` file, and no `chat_template_type` set), look up
`model_type` from `config.json` and use a built-in template module if
registered.

Currently registered:

`model_type` template module
`deepseek_v32` `deepseek_v32`
`deepseek_v4` `deepseek_v32`

Behavior is unchanged when either `tokenizer.chat_template` or
`chat_template_type` is set, or when `model_type` is not in the registry.

Motivation

`Thump604/DeepSeek-V4-Flash-MLX-Q2-mixed-gs128-affine`
ships a `tokenizer_config.json` without any `chat_template` and no
external `chat_template.jinja`, so `mlx_lm.chat` raises before generation:

```
ValueError: Cannot use chat template functions because tokenizer.chat_template
is not set and no template argument was passed!
```

Adding the entry to the upload would also fix it on the data side, but the
mlx_lm-side fallback is useful for any future V4 / V3.2 quant that's
distributed without one — and the registry is the natural place for the
maintainer's own internal `deepseek_v32` Python template to be reused
beyond `chat_template_type` opt-in.

Changes

  • New `_DEFAULT_CHAT_TEMPLATE_TYPE_BY_MODEL_TYPE` registry in
    `mlx_lm/tokenizer_utils.py`.
  • New `_infer_chat_template_type(model_path)` helper that reads
    `config.json` and looks up `model_type`.
  • The `chat_template_type` resolution in `load()` now falls through to
    the helper only when `tokenizer.chat_template is None` and the tokenizer
    config has no explicit `chat_template_type`.

Tests

Adds two cases to `tests/test_tokenizers.py`:

  • `test_chat_template_falls_back_for_known_model_type` — `model_type=deepseek_v4`
    picks up the `deepseek_v32` template; `tokenizer.has_chat_template` is True.
  • `test_chat_template_no_fallback_for_unknown_model_type` — unrecognized
    `model_type` leaves the tokenizer with no chat template (no behavior change).

```
$ python -m unittest tests.test_tokenizers.TestTokenizers.test_chat_template_falls_back_for_known_model_type \
tests.test_tokenizers.TestTokenizers.test_chat_template_no_fallback_for_unknown_model_type \
tests.test_tokenizers.TestTokenizers.test_unknown_model_config_tokenizer_fallback
...
Ran 3 tests in 0.004s
OK
```

Dependency

Pairs with #1 — without that, the wrapper's auto-injected `enable_thinking`
kwarg makes the registered template raise `TypeError` once it actually fires.
The tests in this PR avoid that path by stopping at `tokenizer._chat_template`
identity, so this PR is mergeable independently, but real chat use needs both.

When a tokenizer has no chat_template (no `chat_template` field in
tokenizer_config.json, no chat_template.json/.jinja, and no
`chat_template_type` set), look up the model's `model_type` from
config.json and use a built-in template module if registered.

Currently maps `deepseek_v4` and `deepseek_v32` -> `deepseek_v32`.
Behavior is unchanged when either `chat_template` or
`chat_template_type` is set, or when `model_type` is not in the
registry.

This unblocks chat against quantized DeepSeek-V4 checkpoints whose
upload omitted the chat template, e.g.
huggingface.co/Thump604/DeepSeek-V4-Flash-MLX-Q2-mixed-gs128-affine.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant