HIPE-OCRepair-2026 Data Repository

HIPE-OCRepair-2026 is an ICDAR 2026 Competition focused on LLM-assisted OCR post-correction of historical documents, with a particular emphasis on historical newspapers.

With renewed interest driven by large language models (LLMs), OCR post-correction has (re)gained momentum, resulting in a growing number of models and experimental approaches. However, these efforts often rely on heterogeneous legacy datasets that come with important limitations, making systematic evaluation and meaningful comparison across approaches difficult.

A central question motivating this competition is:

To what extent can modern large language models address the OCR debt accumulated in large-scale digitized historical collections?

The competition addresses this by providing HIPE-OCRepair-Bench, a unified multilingual benchmark for OCR post-correction, comprising curated datasets, an evaluation protocol, baseline systems, and an open leaderboard.

📋 Participation Guidelines

All information about the task, datasets, evaluation protocol, and submission instructions is available in the Participation Guidelines.

🔗 Important Links


🌐 Competition website	https://hipe-eval.github.io/HIPE-OCRepair-2026/
📋 Participation Guidelines	README-Participation-Guidelines.md
📈 Scorer	https://github.com/hipe-eval/HIPE-OCRepair-scorer
📊 Evaluation repository (after competition)	https://github.com/hipe-eval/HIPE-OCRepair-2026-eval
🏆 Leaderboard (to come)	https://huggingface.co/spaces/hipe-ocrepair-2026-eval
📝 Registration & contact	see competition website

📦 Data

Data is available:

in the data/ folder of this repository and in the git releases
later: also on Zenodo

Release History

20.04.2026: Final competition release with unmasked test files with post-submission ground truth corrections | Release tag v0.9.5
06.04.2026: Release of masked test files for the competition | Release tag v0.9.3
20.03.2026: Release of train and dev sets for dta19 dataset | Release tag v0.9.2
11.03.2026: Hot fix for impresso-snippets dataset | Release tag v0.9.1
02.03.2026: First data release with overproof, icdar17, impresso-nzz and impresso-snippets | Release tag v0.9

🎓 Citation

Competition Report

The HIPE-OCRepair-2026 shared task will be described in a competition report paper currently under review for the proceedings of ICDAR 2026.

Citation: (BibTeX entry will be provided upon publication)

Participant Publications

Participating teams may publish their own system description papers. We will maintain a list of these publications here:

Team Name - Paper Title - [Venue/Preprint] - [Link] (coming soon)

🤝 Acknowledgments

The HIPE-OCRepair-2026 organising team expresses its sincere appreciation to the ICDAR-2026 Competition Committee for the overall coordination and support.

👥 Participating Teams

We thank all teams who participated in the HIPE-OCRepair-2026 shared task:

BnF-Mistral

Mistral AI: Maxime Kunsch, Jacques Sun, Yassine El Ouahidi
Bibliothèque nationale de France (BnF): Sébastien Crétin, Marcel Bautista, Jean-Philippe Moreux

BLOCR

British Library: Valentina Vavassori, Harry Lloyds
Code: github.com/harrylloyd-bl/hipe-ocrepair

L3i

L3i Laboratory, La Rochelle Université: Tien Nam Nguyen, Wenjun Sun, Ahmed Hamdi, Carlos-Emiliano Gonzalez-Gallardo, Mickaël Coustaty, Antoine Doucet

zakaria-ENSIAS

ENSIAS, Rabat, Morocco: Zakaria Bouaouda

📚 HIPE-eval Series

HIPE-OCRepair-2026 is part of the HIPE-eval series of shared tasks on historical document and information processing and evaluation.

HIPE-eval editions are organised within the framework of the Impresso – Media Monitoring of the Past project, funded by the Swiss National Science Foundation under grant No. CRSII5_213585 and by the Luxembourg National Research Fund under grant No. 17498891.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
data/v0.9		data/v0.9
documentation		documentation
schema		schema
.gitignore		.gitignore
README-Participation-Guidelines.md		README-Participation-Guidelines.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HIPE-OCRepair-2026 Data Repository

📋 Participation Guidelines

🔗 Important Links

📦 Data

Release History

🎓 Citation

Competition Report

Participant Publications

🤝 Acknowledgments

👥 Participating Teams

BnF-Mistral

BLOCR

L3i

zakaria-ENSIAS

📚 HIPE-eval Series

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

HIPE-OCRepair-2026 Data Repository

📋 Participation Guidelines

🔗 Important Links

📦 Data

Release History

🎓 Citation

Competition Report

Participant Publications

🤝 Acknowledgments

👥 Participating Teams

BnF-Mistral

BLOCR

L3i

zakaria-ENSIAS

📚 HIPE-eval Series

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Packages