This is the Github repo of OpenAinu-Saru project. This repo is still under construction.
This repo contains our training and evaluation code files.
Currently this is not a repo that can work out-of-box, this is because we used our internal workflow to run these files. Later we'll reformat these code to make a out-of-box repo.
Currently, all the components needed to reproduce our results are included in the repo. That includes:
- Functions we used to train/evaluate our models.
- The conda environment file.
- Tokenizers.
- Weights.
An example set of weights can be accessed at https://huggingface.co/JacobZh/OpenAinu-Saru-25-Sep.
While the dataset we are using are all public, we will not re-distribute the dataset here. Instead, we encourage you to obtain them from official webpages. Ainu Folklore
Ainu Textbooks (Japanese)
We provided tool classes to those datasets; however, you are encouraged to use your own scheme to organize those datasets.