-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Did you happen to resolve this/figure out that this was not the case? I'm running into the same question because it seems like Stage 1 is pointless in the current code as Stage 2 does not load the learned tokens.
Originally posted by @narutatsuri in #14
When playing around with the code, I noticed that Stage 2 does not honor the subtokens learned in Stage 1 and instead simply runs tokenization from scratch with the sole difference of allowing cross-space merging. Is this intended? I am assuming it's not based on the description in the paper, which states that Stage 2 is supposed to extend Stage 1 (which I understood as preserve all tokens in Stage 1 but learn new supertokens up until the total number of tokens hits the provided max token count).
Thanks!