deepseek_v32 chat template: accept wrapper-injected kwargs by lixiangnlp · Pull Request #1 · Thump604/mlx-lm

lixiangnlp · 2026-04-30T01:06:02Z

Summary

TokenizerWrapper.apply_chat_template (in mlx_lm/tokenizer_utils.py) sets
kwargs[\"enable_thinking\"] = self.has_thinking on every call and may also
forward tokenize= and similar kwargs through to a registered chat-template
module. mlx_lm/chat_templates/deepseek_v32.py:apply_chat_template currently
hands **kwargs straight to encode_messages, which only accepts a fixed
set of named arguments — so the very first call from the wrapper raises:

```
TypeError: encode_messages() got an unexpected keyword argument 'enable_thinking'
```

This means anyone setting `chat_template_type: "deepseek_v32"` in
`tokenizer_config.json` today is dead in the water before generating a token.

Changes

Accept `enable_thinking` explicitly and translate it to `thinking_mode`,
with `thinking_mode` winning if both are set.
Filter `**kwargs` down to the set `encode_messages` accepts
(`thinking_mode`, `context`, `drop_thinking`, `add_default_bos_token`,
`tools`).
Strip the trailing `<｜Assistant｜>` marker when
`add_generation_prompt=False` and `enable_thinking=False`, mirroring the
existing `<｜Assistant｜>` strip for the thinking case.

Tests

New `tests/test_chat_templates.py` covering:

`enable_thinking=True` → ends with `<｜Assistant｜>`
`enable_thinking=False` → ends with `<｜Assistant｜>`
explicit `thinking_mode` overrides `enable_thinking`
unknown kwargs (`tokenize`, `return_tensors`, …) are silently filtered
`add_generation_prompt=False` strips the assistant marker in both modes

```
$ python -m unittest tests.test_chat_templates
.....

Ran 5 tests in 0.000s
OK
```

Notes

This is independent of the V4 forward-pass work happening on this branch —
it's a pre-existing latent bug in the `chat_template_type` integration that
becomes load-bearing once anything actually wires `deepseek_v32` as a
fallback (see follow-up PR for fallback-by-model-type).

`TokenizerWrapper.apply_chat_template` injects `enable_thinking=...` into every call (set from `has_thinking`) and may also forward `tokenize=` / other kwargs. The current `apply_chat_template` hands these through to `encode_messages`, which only accepts a fixed set of arguments and raises `TypeError` on anything else. Map `enable_thinking` -> `thinking_mode` (with `thinking_mode` winning if explicitly set), and filter kwargs down to the set `encode_messages` understands. Also strip the trailing `<｜Assistant｜></think>` marker when `add_generation_prompt=False` and `enable_thinking=False`, mirroring the existing `<｜Assistant｜><think>` strip for the thinking case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lixiangnlp mentioned this pull request Apr 30, 2026

Fall back to a built-in chat template by model_type #2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deepseek_v32 chat template: accept wrapper-injected kwargs#1

deepseek_v32 chat template: accept wrapper-injected kwargs#1
lixiangnlp wants to merge 1 commit into
Thump604:deepseek-v4-support-fixesfrom
lixiangnlp:deepseek-v32-template-enable-thinking

lixiangnlp commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lixiangnlp commented Apr 30, 2026

Summary

Changes

Tests

``` $ python -m unittest tests.test_chat_templates .....

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

```
$ python -m unittest tests.test_chat_templates
.....