Skip to content

Fix ByteLevel streaming detokenizer detection for Step 3.7 Flash#1246

Merged
Blaizzy merged 5 commits into
Blaizzy:mainfrom
ivanfioravanti:fix-step37-streaming-decode
May 30, 2026
Merged

Fix ByteLevel streaming detokenizer detection for Step 3.7 Flash#1246
Blaizzy merged 5 commits into
Blaizzy:mainfrom
ivanfioravanti:fix-step37-streaming-decode

Conversation

@ivanfioravanti
Copy link
Copy Markdown
Contributor

Fix ByteLevel streaming detokenizer detection for Step 3.7 Flash

@ivanfioravanti
Copy link
Copy Markdown
Contributor Author

Here final touches @Blaizzy
We did it 🙌🏻

@Blaizzy
Copy link
Copy Markdown
Owner

Blaizzy commented May 30, 2026

The root cause is a tokenizer configuration issue in that model repo: its vocab is byte-level BPE-style, but its decoder metadata says to decode like SentencePiece/metaspace.

mlx_vlm exposed it because it trusts tokenizer.json to choose the streaming detokenizer. That is reasonable for normal models, but this model’s metadata is inconsistent.

Fixed in fd05cde

Copy link
Copy Markdown
Owner

@Blaizzy Blaizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@Blaizzy
Copy link
Copy Markdown
Owner

Blaizzy commented May 30, 2026

uv run mlx_vlm.generate \
  --model ivanfioravanti/Step-3.7-Flash-4bit \
  --prompt "Create a script to calculate 100 decimals of pi" \
  --max-tokens 10
Fetching 34 files: 100%|██████████████████████████████████████████████████| 34/34 [00:00<00:00, 22061.62it/s]
Download complete: : 0.00B [00:03, ?B/s]                                              | 0/34 [00:00<?, ?it/s]
==========
Files: [] 

Prompt: <|begin▁of▁sentence|><|im_start|>user
Create a script to calculate 100 decimals of pi<|im_end|>
<|im_start|>assistant
<think>

Got it, the user wants a script to calculate
==========
Prompt: 25 tokens, 31.374 tokens-per-sec
Generation: 10 tokens, 57.211 tokens-per-sec
Peak memory: 114.830 GB

@Blaizzy Blaizzy merged commit 4ee7e66 into Blaizzy:main May 30, 2026
1 check passed
@ivanfioravanti
Copy link
Copy Markdown
Contributor Author

LGTM! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants