This repository contains the scripts and code associated with the paper:
"AdaptBPE: From General Purpose to Specialized Tokenizers"
The scripts implement post-training BPE merge refinement algorithms and experimental setups described in the paper. They allow you to replicate the tokenization, merge selection, and evaluation experiments on various datasets and languages.