Releases: neluca/tinybpe
Releases · neluca/tinybpe
TinyBPE 0.1.1 Release
🌟 Features
- The core is meticulously designed and implemented in C , using an AVL-Tree as the index for fast and efficient performance.
- Used as a Python module with a simple and elegant
API. - Supports training BPE models and continuing training on imported models to expand the vocabulary.
- Implements a general byte-level tokenizer, supporting fast encoding and decoding,as well asstreaming decoding.
- Supports regular expression pre-tokenization and adding special Tokens.
- Supports converting model parameters from tiktoken.
- Highly customizable, easy to integrate and extend, and the core is zero dependencies.
- Refine the content of the document.
TinyBPE 0.1.0 Release
🌟 Features
- The core is meticulously designed and implemented in C , using an AVL-Tree as the index for fast and efficient performance.
- Used as a Python module with a simple and elegant
API. - Supports training BPE models and continuing training on imported models to expand the vocabulary.
- Implements a general byte-level tokenizer, supporting fast encoding and decoding,as well asstreaming decoding.
- Supports regular expression pre-tokenization and adding special Tokens.
- Supports converting model parameters from tiktoken.
- Highly customizable, easy to integrate and extend, and the core is zero dependencies.