This repository contains the model responses and evaluations in the paper: Judging Instruction Responses in a Low-Resource Language: A Case Study on Basque.
The inferences were generated for the just-eval-eus dataset, available at: Vicomtech/just-eval-instruct-eus
.
├── multi
│ ├── justeval_scores_multi.csv
│ ├── sampled_instructions_common.json
│ ├── sampled_instructions_specific-A.json
│ ├── sampled_instructions_specific-B.json
│ ├── sampled_instructions_specific-D.json
│ ├── sampled_instructions_specific-E.json
│ └── sampled_instructions_specific-G.json
└── safety
├── sampled_instructions_safety.json
└── scores_safety.csv
Contains the general-purpose instruction-response samples and their judge evaluations.
sampled_instructions_common.json: sampled instructions evaluated by all human annotators.sampled_instructions_specific-[A/B/D/E/G].json: sampled instructions evaluated by a single human annotator.justeval_scores_multi.csv: judge scores for themultisubset.
Contains the safety-oriented instruction-response samples and their judge evaluations.
sampled_instructions_safety.json: sampled safety instructions and generated responses.scores_safety.csv: judge scores for thesafetysubset.
The sampled_instructions JSON files include the generated responses from the inference models described in the paper.
The score CSV files include the corresponding judge responses and scores.
If you use this repository, please cite the paper:
@inproceedings{ponce2026judging,
title = {Judging Instruction Responses in a Low-Resource Language: A Case Study on Basque},
author = {Ponce, David and Gete, Harritxu and Etchegoyhen, Thierry and Zubiaga, Irune and Soroa, Aitor},
booktitle = {Proceedings of the 15th edition of the Language Resources and Evaluation Conference (LREC 2026)},
note = {to appear}
year = {2026}
}