feat: add LabelModelGrader support for OpenAI Evals backend by mesutoezdil · Pull Request #137 · agentevals-dev/agentevals

mesutoezdil · 2026-05-05T13:29:06Z

Closes #97

Adds label_model as a second grader type next to text_similarity.

Config validates model, input, labels, and passing_labels. Items sent to the Evals API only include actual_response for label_model, since expected behavior is encoded in the grader config. The expected_invocations check is now type-aware, so label_model runs work without a golden set. Result details include model and passing_labels instead of evaluation_metric.

Example config:

type: openai_eval
name: quality-check
grader:
  type: label_model
  model: gpt-4o-mini
  input:
    - role: user
      content: "Rate this response: {{ item.actual_response }}"
  labels: [good, bad]
  passing_labels: [good]

The diff shows 184 insertions but 137 of those are the new test file. The actual code change is about 33 lines across config.py and openai_eval_backend.py.

Tests cover config validation, criteria shape, jsonl item structure, schema selection, and the evaluate flow with and without expected invocations.

Adds label_model grader type alongside text_similarity. Config validates model, input, labels, and passing_labels fields. Items sent to OpenAI only include actual_response for label_model, since the expected behavior is encoded in labels and passing_labels. Details in results include model and passing_labels instead of evaluation_metric. Closes agentevals-dev#97

mesutoezdil mentioned this pull request May 5, 2026

Add LabelModelGrader OpenAI Grader #97

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add LabelModelGrader support for OpenAI Evals backend#137

feat: add LabelModelGrader support for OpenAI Evals backend#137
mesutoezdil wants to merge 1 commit intoagentevals-dev:mainfrom
mesutoezdil:feat/label-model-grader

mesutoezdil commented May 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mesutoezdil commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mesutoezdil commented May 5, 2026 •

edited

Loading