Can Large Language Models Judge in Basque?

This repository contains the model responses and evaluations in the paper: Judging Instruction Responses in a Low-Resource Language: A Case Study on Basque.

The inferences were generated for the just-eval-eus dataset, available at: Vicomtech/just-eval-instruct-eus

.
├── multi
│   ├── justeval_scores_multi.csv
│   ├── sampled_instructions_common.json
│   ├── sampled_instructions_specific-A.json
│   ├── sampled_instructions_specific-B.json
│   ├── sampled_instructions_specific-D.json
│   ├── sampled_instructions_specific-E.json
│   └── sampled_instructions_specific-G.json
└── safety
    ├── sampled_instructions_safety.json
    └── scores_safety.csv

Files

`multi/`

Contains the general-purpose instruction-response samples and their judge evaluations.

sampled_instructions_common.json: sampled instructions evaluated by all human annotators.
sampled_instructions_specific-[A/B/D/E/G].json: sampled instructions evaluated by a single human annotator.
justeval_scores_multi.csv: judge scores for the multi subset.

`safety/`

Contains the safety-oriented instruction-response samples and their judge evaluations.

sampled_instructions_safety.json: sampled safety instructions and generated responses.
scores_safety.csv: judge scores for the safety subset.

Data

The sampled_instructions JSON files include the generated responses from the inference models described in the paper.

The score CSV files include the corresponding judge responses and scores.

Citation

If you use this repository, please cite the paper:

@inproceedings{ponce2026judging,
  title = {Judging Instruction Responses in a Low-Resource Language: A Case Study on Basque},
  author = {Ponce, David and Gete, Harritxu and Etchegoyhen, Thierry and Zubiaga, Irune and Soroa, Aitor},
  booktitle = {Proceedings of the 15th edition of the Language Resources and Evaluation Conference (LREC 2026)},
  note = {to appear}
  year = {2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
multi		multi
safety		safety
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Can Large Language Models Judge in Basque?

Contents

Files

`multi/`

`safety/`

Data

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Can Large Language Models Judge in Basque?

Contents

Files

multi/

safety/

Data

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

`multi/`

`safety/`

Packages