[kaz] Kazakh "small" data is mis-sized

Hello from the future @jkodner05 and team. I noticed that the file [`part1/suprise_languages/kaz_small.train`](part1/suprise_languages/kaz_small.train) is the same size (7,000 exemplars) as `kaz_large.train` in [that directory](part1/suprise_languages).

1. I assume this is in error. Can you confirm?
2. What is the best way to evaluate/compare to prior results in the lower-resource ("small") setting given this finding? Should I just use `kaz_small.train` as training data in the lower-resource setting, even though it's not "small" (700 examples) like, say, `hye_small.train`? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[kaz] Kazakh "small" data is mis-sized #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[kaz] Kazakh "small" data is mis-sized #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions