Skip to content

Add model benchmark scores (MMLU, HumanEval, etc.) #19

@i-need-token

Description

@i-need-token

Context

The AI Models Catalog tracks pricing, context windows, and capabilities. One frequently requested feature is benchmark scores — MMLU, HumanEval, MATH, GPQA, etc.

What to do

  1. Add a benchmarks optional field to the model schema in types/model.ts
  2. Add the Zod validation in types/schemas.ts
  3. Add a few benchmark entries for well-known models as examples
  4. Update the interactive catalog to show benchmark scores
  5. Create a docs/benchmarks.md page

Suggested schema

benchmarks:
  mmlu: 88.7
  humaneval: 92.0
  math: 78.3
  gpqa: 65.2

Notes

  • Benchmark data should come from official model cards/reports (first-party)
  • Not all models have benchmarks — this is an optional field
  • Follow the pattern of existing optional fields

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions