Context
The AI Models Catalog tracks pricing, context windows, and capabilities. One frequently requested feature is benchmark scores — MMLU, HumanEval, MATH, GPQA, etc.
What to do
- Add a
benchmarks optional field to the model schema in types/model.ts
- Add the Zod validation in
types/schemas.ts
- Add a few benchmark entries for well-known models as examples
- Update the interactive catalog to show benchmark scores
- Create a docs/benchmarks.md page
Suggested schema
benchmarks:
mmlu: 88.7
humaneval: 92.0
math: 78.3
gpqa: 65.2
Notes
- Benchmark data should come from official model cards/reports (first-party)
- Not all models have benchmarks — this is an optional field
- Follow the pattern of existing optional fields
Context
The AI Models Catalog tracks pricing, context windows, and capabilities. One frequently requested feature is benchmark scores — MMLU, HumanEval, MATH, GPQA, etc.
What to do
benchmarksoptional field to the model schema intypes/model.tstypes/schemas.tsSuggested schema
Notes