add-benchmark: check Harbor + Inspect Evals for prior art by elronbandel · Pull Request #19 · Exgentic/eval-containers

elronbandel · 2026-06-03T12:18:58Z

Adds a pre-start step to the add-benchmark skill: before porting a benchmark from the raw upstream, check Inspect Evals (UKGovernmentBEIS/inspect_evals) and Harbor (harbor-framework/harbor) for an existing reference implementation, and reuse their pinned dataset revision, scorer, and prompt format as references (not dependencies — the Dock benchmark stays self-contained per rule 2).

Motivated by checking SkillsBench: it's in neither harness, so that one would be a from-scratch port — exactly the thing this step makes explicit up front.

Add a pre-start step to the add-benchmark skill: before porting from the raw upstream, check Inspect Evals (UKGovernmentBEIS/inspect_evals) and Harbor (harbor-framework/harbor) for an existing reference implementation, and reuse their pinned dataset revision, scorer, and prompt format as references (not dependencies). Signed-off-by: Elron Bandel <elron.bandel@ibm.com>

elronbandel force-pushed the main branch from 39be206 to 90d90b8 Compare June 15, 2026 10:35

elronbandel force-pushed the add-benchmark-prior-art branch from 3b71a71 to 4ddc059 Compare June 15, 2026 10:35

elronbandel force-pushed the main branch from cf2320e to 8bf6d75 Compare June 15, 2026 12:24

elronbandel force-pushed the add-benchmark-prior-art branch from 4ddc059 to 30d862f Compare June 15, 2026 12:24

elronbandel force-pushed the main branch from 3f4e4b8 to 9b46aee Compare June 15, 2026 13:47

elronbandel force-pushed the add-benchmark-prior-art branch from 30d862f to 7685ecd Compare June 15, 2026 13:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add-benchmark: check Harbor + Inspect Evals for prior art#19

add-benchmark: check Harbor + Inspect Evals for prior art#19
elronbandel wants to merge 1 commit into
mainfrom
add-benchmark-prior-art

elronbandel commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

elronbandel commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant