Stage 2 Ignores Outputs from Stage 1?

> Did you happen to resolve this/figure out that this was not the case? I'm running into the same question because it seems like Stage 1 is pointless in the current code as Stage 2 does not load the learned tokens. 

 _Originally posted by @narutatsuri in [#14](https://github.com/PythonNut/superbpe/issues/14#issuecomment-3471011949)_


When playing around with the code, I noticed that Stage 2 does not honor the subtokens learned in Stage 1 and instead simply runs tokenization from scratch with the sole difference of allowing cross-space merging. Is this intended? I am assuming it's not based on the description in the paper, which states that Stage 2 is supposed to extend Stage 1 (which I understood as preserve all tokens in Stage 1 but learn new supertokens up until the total number of tokens hits the provided max token count).

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stage 2 Ignores Outputs from Stage 1? #19

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Stage 2 Ignores Outputs from Stage 1? #19

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions