Hello from the future @jkodner05 and team. I noticed that the file part1/suprise_languages/kaz_small.train is the same size (7,000 exemplars) as kaz_large.train in that directory.
- I assume this is in error. Can you confirm?
- What is the best way to evaluate/compare to prior results in the lower-resource ("small") setting given this finding? Should I just use
kaz_small.train as training data in the lower-resource setting, even though it's not "small" (700 examples) like, say, hye_small.train?