Skip to content

Releases: neluca/tinybpe

TinyBPE 0.1.1 Release

18 Apr 18:58

Choose a tag to compare

🌟 Features

  • The core is meticulously designed and implemented in C , using an AVL-Tree as the index for fast and efficient performance.
  • Used as a Python module with a simple and elegant API.
  • Supports training BPE models and continuing training on imported models to expand the vocabulary.
  • Implements a general byte-level tokenizer, supporting fast encoding and decoding,as well asstreaming decoding.
  • Supports regular expression pre-tokenization and adding special Tokens.
  • Supports converting model parameters from tiktoken.
  • Highly customizable, easy to integrate and extend, and the core is zero dependencies.
  • Refine the content of the document.

TinyBPE 0.1.0 Release

17 Apr 22:31

Choose a tag to compare

🌟 Features

  • The core is meticulously designed and implemented in C , using an AVL-Tree as the index for fast and efficient performance.
  • Used as a Python module with a simple and elegant API.
  • Supports training BPE models and continuing training on imported models to expand the vocabulary.
  • Implements a general byte-level tokenizer, supporting fast encoding and decoding,as well asstreaming decoding.
  • Supports regular expression pre-tokenization and adding special Tokens.
  • Supports converting model parameters from tiktoken.
  • Highly customizable, easy to integrate and extend, and the core is zero dependencies.