Note: For the Chinese version of this README, please refer to README_zh.md.
- 📊 [2024-06] Evaluation results released for the 2024 Gaokao Mathematics papers.
The freshly released Chinese National College Entrance Examination (Gaokao) papers possess high originality and confidentiality, making them an excellent benchmark for evaluating large language models. The FDU-NLP LLMEval team presents a series of evaluations on the 2024 Gaokao Mathematics papers.
The leaderboard is continuously updated. Results are for reference only.
- Zero contamination — brand-new exam questions that models cannot have seen during pre-training
- Dual prompt formats — LaTeX and escape-character versions to test prompt sensitivity
- Two exam papers — New Paper I (新I卷) and New Paper II (新II卷)
- Standardized rubrics — official Gaokao scoring criteria
Each model is evaluated on both papers using two different prompt formats:
| Paper | Format | Description |
|---|---|---|
| New Paper I | LaTeX | Mathematical expressions in standard LaTeX |
| New Paper I | Escape | Mathematical expressions in text-based escape characters |
| New Paper II | LaTeX | Mathematical expressions in standard LaTeX |
| New Paper II | Escape | Mathematical expressions in text-based escape characters |
This dual-format design reveals how sensitive models are to prompt formatting in mathematical contexts.
Results are available as ranking images in the repository:
- LaTeX format:
新I卷/latex测试/ - Escape format:
新I卷/转义符测试/
- LaTeX format:
新II卷/latex测试/ - Escape format:
新II卷/转义符测试/
| Project | Description | Link |
|---|---|---|
| LLMEval (AAAI 2024) | Foundational evaluation methodology paper | arXiv |
| LLMEval-Fair (ACL 2026) | Robust & fair evaluation, 200K+ questions | GitHub |
| LLMEval-Med (EMNLP 2025) | Medical LLM benchmark | GitHub |
| LLMEval-1 | Phase I: General capability evaluation | GitHub |
| LLMEval-2 | Phase II: Professional domain evaluation | GitHub |
| Official Website | All projects & leaderboard | llmeval.com |
@misc{llmeval-gaokao2024-math,
author = {LLMEval Team},
title = {LLMEval-Gaokao2024-Math},
year = {2024},
url = {https://github.com/llmeval/Llmeval-Gaokao2024-Math}
}- Website: http://llmeval.com/
- Email: mingzhang23@m.fudan.edu.cn
- WeChat: zanyingluan
LLMEval | Fudan University NLP Lab
