Experiment about choice of text tokenizer

Thank you for your excellent work!
I'm very interested in whether you have explored the impact of different tokenizers on model performance.
Furthermore, a closely related issue is that the drawbacks of **BPE-Tokenizer** have been a long-standing problem. Recently, some research has begun to address this issue by attempting to directly process byte sequences using attention mechanisms or prediction-based dynamic grouping approaches:

> Byte Latent Transformer: Patches Scale Better Than Tokens
> https://arxiv.org/abs/2412.09871

> From Bytes to Ideas: Language Modeling with Autoregressive U-Nets
> https://arxiv.org/abs/2506.14761

> H-Net++: Hierarchical Dynamic Chunking for Tokenizer-Free Language Modelling in Morphologically-Rich Languages
> https://arxiv.org/abs/2508.05628

I would like to know whether you think it's possible to build a byte-level continuous autoregressive language model? Or to apply autoencoders to byte sequences.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment about choice of text tokenizer #13

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Experiment about choice of text tokenizer #13

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions