#
streaming-tokenizer
Here are 3 public repositories matching this topic...
A streaming tokenizer.
-
Updated
Apr 28, 2021 - Python
HTGM.2 is a Hindi-first BPE tokenizer trained on ~41GB corpus using streaming architecture for scalable Hindi LLMs, Devanagari NLP, and low-memory tokenizer engineering.
nlp machine-learning research artificial-intelligence devanagari streaming-tokenizer huggingface-tokenizers llm bpe-tokenizer hindi-ai hindi-llm hindi-tokenizer tokenizer-engineering
-
Updated
May 17, 2026 - Python
Improve this page
Add a description, image, and links to the streaming-tokenizer topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the streaming-tokenizer topic, visit your repo's landing page and select "manage topics."