pl-llm-errors

This repository contains the source code and datasets for experiments conducted as part of the Master's thesis titled: "Analysis of the impact of errors in Polish-language queries on the quality of responses from large language models". The objective of the project is to investigate the robustness of various models against Polish-specific errors.

how to run

Build prompts

python -m src.prompt_preparation.prompt_builder

Run inference (collect LLM responses)

python -m src.inference.inference_runner

Run judgement (LLM-as-a-judge)

python -m src.judgement.judgement_runner <answers_file.json>

bielik-4.5b-v3.0-instruct

Error type	Correct	Incorrect	Accuracy
identity	304	196	60.80%
diacritic	301	199	60.20%
punctuation	281	219	56.20%
spelling	259	241	51.80%
typo 30%	277	223	55.40%
typo 70%	248	252	49.60%
typo 100%	240	260	48.00%

gemma3-4b

Error type	Correct	Incorrect	Error	Accuracy
identity	279	221	0	55.80%
diacritic	275	225	0	55.00%
punctuation	260	240	0	52.00%
spelling	231	269	0	46.20%
typo 30%	241	259	0	48.20%
typo 70%	224	276	0	44.80%
typo 100%	203	296	1	40.60%

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
.github/instructions		.github/instructions
final-prompts		final-prompts
run-1/diacritic		run-1/diacritic
run-2/identity		run-2/identity
run-3/punctuation		run-3/punctuation
run-4/spelling_10%		run-4/spelling_10%
run-5/spelling_40%		run-5/spelling_40%
run-6/typo_10%		run-6/typo_10%
run-7/typo_40%		run-7/typo_40%
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
settings.yaml		settings.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pl-llm-errors

how to run

Build prompts

Run inference (collect LLM responses)

Run judgement (LLM-as-a-judge)

bielik-4.5b-v3.0-instruct

gemma3-4b

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pl-llm-errors

how to run

Build prompts

Run inference (collect LLM responses)

Run judgement (LLM-as-a-judge)

bielik-4.5b-v3.0-instruct

gemma3-4b

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages