ListOps brackets are not tokenized

Hi, 

A ListOps input of "[MAX 4 3 [ MIN 2 3 ] 1 0 ])" will get encoded as "MAX 4 3 MIN 2 3 1 0" so all brackets are removed, which makes the task unsolvable.
This is also described here https://github.com/google-research/long-range-arena/issues/20

How I got aware of this: In the paper, page 3 under ListOps you write "models are fed 512 tokens of dimension 15".
However there are 4 operations, 2 brackets and 10 numbers which would require dimension 16.
Checking the dataset code, there is one unused UNK token, 10 numbers, 4 operations which equals to a vocabulary length of 15.

Your code reproduces the ~38% accuracy of ListOps described in the paper correctly.

Best

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ListOps brackets are not tokenized #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

ListOps brackets are not tokenized #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions