Automatically batch texts when too long

For samples that exceed the 512 subword token limit, we currently do not have a strategy in place to deal with this.
This is both unwanted and relatively easy to improve. There are a few considerations with respect to the exact strategy to be used, but it seems like a good starting point to approximate sentences with something like a lightweight spacy model, and then chunk based on approximate max length.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Automatically batch texts when too long #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Automatically batch texts when too long #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions