NorEval Stats

Evaluation results and interactive visualizations for Norwegian language models, benchmarked with lm-eval-harness on the NorEval benchmark suite.

View the interactive results

Models

25 base models across two categories:

Norwegian models: NorOLMo 13B, NorMistral 7B/11B/11B Long, NorBERT4 1B, NorwAI NorGPT 3B, NorwAI NorLlama 8B, NorwAI Mistral 7B, NorwAI Mixtral 8x7B, NB-GPT-J 6B,

Multilingual baselines: Apertus 8B, EuroLLM 9B/22B, Gemma3 12B/27B, Llama3.1 8B, Mistral 7B/12B, OLMo2 13B, OLMo2 13B (stage 1), OLMo3 7B/32B, Qwen3 8B/14B/32B

The training progress of NorOLMo is tracked across 33 checkpoints (steps 1,000–33,000).

Benchmarks

34 benchmarks across 5 categories, evaluated at 0-shot, 1-shot, and 5-shot settings:

Category	Benchmarks
World Knowledge & Reasoning	CommonsenseQA, OpenBookQA (no fact), TruthfulQA (MC & gen), NRK Quiz
Language Understanding	BeleBele, OpenBookQA, NoReC (sentence & document), NorQuAD
Linguistic Knowledge	NCB, NoCOLA, Idiom Completion, SLIDE, Grammar Correction (ASK-GEC)
Generation & Summarization	Summarization, Instruction-following
Translation	ENG↔NOB, ENG↔NNO, NOB↔SME, NOB↔NNO

Many benchmarks include both Bokmål (NOB) and Nynorsk (NNO) variants. The best score across prompt variants (typically 4–6 per benchmark) is reported.

Interactive Website

The website provides two views:

Model Comparison — Bar charts comparing all models on selected benchmarks
Training Progress — Line charts showing NorOLMo's performance over training steps

Features include normalized aggregate scores, per-task views, category/language filters, 0/1/5-shot toggle, and high-resolution PNG/SVG chart export.

Adding a New Model

Add evaluation results under results/<model-name>/ (same structure as existing models)
Add a display name in MODEL_DISPLAY_NAMES in build_data.py
Run python3 build_data.py to regenerate docs/data.json
Commit and push — GitHub Actions will rebuild automatically

Building Locally

pip install pyyaml
python3 build_data.py
python3 -m http.server 8000 -d docs   # Preview at http://localhost:8000

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
.github/workflows		.github/workflows
NorOLMo_progress		NorOLMo_progress
docs		docs
results		results
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
build_data.py		build_data.py
check_missing.py		check_missing.py
metrics_setup.yaml		metrics_setup.yaml
models_setup.yaml		models_setup.yaml
run_interactive.sh		run_interactive.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NorEval Stats

Models

Benchmarks

Interactive Website

Adding a New Model

Building Locally

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NorEval Stats

Models

Benchmarks

Interactive Website

Adding a New Model

Building Locally

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages