HIPE-OCRepair-2026 is an ICDAR 2026 Competition focused on LLM-assisted OCR post-correction of historical documents, with a particular emphasis on historical newspapers.
With renewed interest driven by large language models (LLMs), OCR post-correction has (re)gained momentum, resulting in a growing number of models and experimental approaches. However, these efforts often rely on heterogeneous legacy datasets that come with important limitations, making systematic evaluation and meaningful comparison across approaches difficult.
A central question motivating this competition is:
To what extent can modern large language models address the OCR debt accumulated in large-scale digitized historical collections?
The competition addresses this by providing HIPE-OCRepair-Bench, a unified multilingual benchmark for OCR post-correction, comprising curated datasets, an evaluation protocol, baseline systems, and an open leaderboard.
All information about the task, datasets, evaluation protocol, and submission instructions is available in the Participation Guidelines.
| 🌐 Competition website | https://hipe-eval.github.io/HIPE-OCRepair-2026/ |
| 📋 Participation Guidelines | README-Participation-Guidelines.md |
| 📈 Scorer | https://github.com/hipe-eval/HIPE-OCRepair-scorer |
| 📊 Evaluation repository (after competition) | https://github.com/hipe-eval/HIPE-OCRepair-2026-eval |
| 🏆 Leaderboard (to come) | https://huggingface.co/spaces/hipe-ocrepair-2026-eval |
| 📝 Registration & contact | see competition website |
Data is available:
- 20.04.2026: Final competition release with unmasked test files with post-submission ground truth corrections | Release tag v0.9.5
- 06.04.2026: Release of masked test files for the competition | Release tag v0.9.3
- 20.03.2026: Release of train and dev sets for
dta19dataset | Release tag v0.9.2 - 11.03.2026: Hot fix for
impresso-snippetsdataset | Release tag v0.9.1 - 02.03.2026: First data release with
overproof,icdar17,impresso-nzzandimpresso-snippets| Release tag v0.9
The HIPE-OCRepair-2026 shared task will be described in a competition report paper currently under review for the proceedings of ICDAR 2026.
Citation: (BibTeX entry will be provided upon publication)
Participating teams may publish their own system description papers. We will maintain a list of these publications here:
- Team Name - Paper Title - [Venue/Preprint] - [Link] (coming soon)
The HIPE-OCRepair-2026 organising team expresses its sincere appreciation to the ICDAR-2026 Competition Committee for the overall coordination and support.
We thank all teams who participated in the HIPE-OCRepair-2026 shared task:
- Mistral AI: Maxime Kunsch, Jacques Sun, Yassine El Ouahidi
- Bibliothèque nationale de France (BnF): Sébastien Crétin, Marcel Bautista, Jean-Philippe Moreux
- British Library: Valentina Vavassori, Harry Lloyds
- Code: github.com/harrylloyd-bl/hipe-ocrepair
- L3i Laboratory, La Rochelle Université: Tien Nam Nguyen, Wenjun Sun, Ahmed Hamdi, Carlos-Emiliano Gonzalez-Gallardo, Mickaël Coustaty, Antoine Doucet
- ENSIAS, Rabat, Morocco: Zakaria Bouaouda
HIPE-OCRepair-2026 is part of the HIPE-eval series of shared tasks on historical document and information processing and evaluation.
HIPE-eval editions are organised within the framework of the Impresso – Media Monitoring of the Past project, funded by the Swiss National Science Foundation under grant No. CRSII5_213585 and by the Luxembourg National Research Fund under grant No. 17498891.