ProofNet-Verified (ProofNet-V) is an audited, corrected, and verified version of the ProofNet dataset that is upgraded to be compatible with Lean v4.28.0.
ProofNet is a benchmark for theorem proving and autoformalization of undergraduate-level mathematics. It originally consists of 371 data points, each consisting of a formal theorem statement in Lean 3, a natural language theorem statement, and a natural language proof. The problems are primarily drawn from popular undergraduate pure mathematics textbooks and cover topics such as real and complex analysis, linear algebra, abstract algebra, and topology. We intend for ProofNet to be a challenging benchmark that will drive progress in autoformalization and automatic theorem proving
ProofNet# is a cleaner version of ProofNet compatible with Lean v4.7.0 to Lean v4.16.0-rc2, correcting 118 errors in previous Lean 4 implementations of ProofNet like ProofNet-lean4.
We gratefully acknowledge the authors of these projects for their effort and making the data publicly available.
Based on ProofNet and ProofNet#, we made the following modifications:
- Filter out 4 non-proof problems that have inconsistent natural language statement and formal statements, resulting in 367 problems in total.
- Axler|exercise_1_6
- Axler|exercise_1_7
- Rudin|exercise_1_14
- Rudin|exercise_1_19
-
Upgrade the formal statements to be compatible with
Lean v4.28.0, so that more automated proof search tools like Grind, Canonical, and Lean-SMT are available. -
Audit the benchmark for both errors in the ground-truth formalizations and ambiguity in the informal statements, identifying 204/367 (55.6%) errors in ProofNet and 97/367 (26.4%) errors in ProofNet#.
-
Confirm each error by proving the negation of incorrect formalizations, providing trivialized proofs of formalization that are provably true but unfaithful to the informal statement, or giving detailed reasoning for why a formalization is unfaithful or an informal statement is ambiguous.
-
Correct these errors and verify all 367 formal statements by providing formal proofs for each. Additionally, the faithfulness of these corrections is assessed by both human reviewers and an AI agent until they reach a consensus.
If you find the ProofNet-Verified benchmark useful for your research, please kindly cite:
@inproceedings{
min2026divide,
title={Divide and Abstract: Autoformalization via Decomposition and Abstraction Learning},
author={Marcus J. Min and Yeqi Gao and Wilson Sy and Zhaoyu Li and Xujie Si and Osbert Bastani},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=NjgaeXNit3}
}| ProofNet | ProofNet# | |
|---|---|---|
| Faithful | 163 (44.4%) | 270 (73.6%) |
| Unfaithful | 194 (52.9%) | 88 (24.0%) |
| Ambiguous NL | 9 (2.5%) | 8 (2.2%) |
| Wrong NL | 1 (0.3%) | 1 (0.3%) |
| Total Errors | 204 (55.6%) | 97 (26.4%) |
| Total | 367 | 367 |
| Unfaithful Error Type | ProofNet | ProofNet# |
|---|---|---|
| Misunderstanding of informal statement | 94 (25.6%) | 27 (7.4%) |
| Missing premise | 30 (8.2%) | 3 (0.8%) |
| Misunderstanding of Mathlib | 26 (7.1%) | 31 (8.4%) |
| Missing implicit premise | 18 (4.9%) | 18 (4.9%) |
| Misunderstanding of Lean | 10 (2.7%) | 8 (2.2%) |
| Adding extra case | 9 (2.5%) | 0 (0.0%) |
| Adding extra conclusion | 7 (1.9%) | 1 (0.3%) |
| Total Unfaithful | 194 (52.9%) | 88 (24.0%) |
Each entry in proofnet-verified.jsonl contains:
| Field | Description |
|---|---|
index |
Problem number (1-367) |
name |
Identifier, e.g. Artin_exercise_2_2_9 |
textbook |
Source textbook |
header |
Lean imports and open declarations |
helper |
Auxiliary definitions needed by some problems |
informal_stmt |
Natural language theorem statement |
formal_stmt |
Lean 4 formal statement (ends with sorry) |
informal_proof |
Natural language proof |
formal_proof |
Zipped to avoid data contamination |
import Mathlib
open Function Fintype Subgroup Ideal Polynomial Submodule Zsqrtd
open scoped BigOperators
/-Informal Statement
Let $H$ be the subgroup generated by two elements $a, b$ of a group $G$.
Prove that if $a b=b a$, then $H$ is an abelian group.
-/
theorem Artin_exercise_2_2_9 {G : Type*} [Group G] {a b : G}
(h : a * b = b * a) :
∀ x y : closure {x | x = a ∨ x = b}, x * y = y * x := by
sorry# Clone the repository
git clone https://github.com/marcusm117/ProofNet-Verified
cd ProofNet-Verified
# Fetch dependencies and pre-built Mathlib artifacts
lake updateTo type-check all 367 formal statements in parallel:
uv sync
uv run python scripts/verify_compilation.pyThis runs lake env lean on each file with 48 parallel workers (configurable via --jobs).
Python scripts (e.g. scripts/verify_compilation.py) are managed with uv:
# Create .venv and install dependencies (joblib, tqdm)
uv sync
# Run a script through the managed environment
uv run python scripts/verify_compilation.pyThe proofnet-verified/ directory contains 367 Lean files, each with a single theorem statement ending in sorry. Replace sorry with a valid proof to solve the problem. We recommend using Lean Comparator as a rigorous proof checker.
Each entry in proofnet-verified.jsonl pairs an informal_stmt with a formal_stmt. We recommend using symbolic checkers like BEq+ over LLM-as-a-Judge to compare the ground truth formal_stmt with the generated candidate formalizations.
The error_taxonomy/ directory provides detailed per-problem error classifications, useful for studying common failure modes in mathematical formalization (Lean syntax misunderstandings, Mathlib API misuse, missing premises, etc.).
This project is licensed under the Apache License 2.0. See the LICENSE file for details.