FDU-NLP LLMEval Team

LLMEval-1 Public

[AAAI 2024] LLMEval Phase I dataset — 17 categories, 453 questions, 2186 annotators for Chinese LLM evaluation

LLMEval-2 Public

[AAAI 2024] LLMEval Phase II dataset — professional domain evaluation across 12 academic disciplines

LLMEval-Fair Public

[ACL 2026] A large-scale longitudinal study on robust and fair evaluation of LLMs — 200K+ generative questions across 13 disciplines

LLMEval-Med Public

[EMNLP 2025] A real-world clinical benchmark for medical LLMs with physician validation — 2,996 questions from EHRs

Python 25 1

Llmeval-Gaokao2024-Math Public

LLM evaluation on 2024 Chinese Gaokao Mathematics — zero-contamination benchmark with dual prompt formats

llmeval.github.io Public

Official website for the LLMEval research series — Fudan NLP Lab

TypeScript

Provide feedback