Skip to content

add-benchmark: check Harbor + Inspect Evals for prior art#19

Open
elronbandel wants to merge 1 commit into
mainfrom
add-benchmark-prior-art
Open

add-benchmark: check Harbor + Inspect Evals for prior art#19
elronbandel wants to merge 1 commit into
mainfrom
add-benchmark-prior-art

Conversation

@elronbandel

Copy link
Copy Markdown
Contributor

Adds a pre-start step to the add-benchmark skill: before porting a benchmark from the raw upstream, check Inspect Evals (UKGovernmentBEIS/inspect_evals) and Harbor (harbor-framework/harbor) for an existing reference implementation, and reuse their pinned dataset revision, scorer, and prompt format as references (not dependencies — the Dock benchmark stays self-contained per rule 2).

Motivated by checking SkillsBench: it's in neither harness, so that one would be a from-scratch port — exactly the thing this step makes explicit up front.

Add a pre-start step to the add-benchmark skill: before porting from the raw
upstream, check Inspect Evals (UKGovernmentBEIS/inspect_evals) and Harbor
(harbor-framework/harbor) for an existing reference implementation, and reuse
their pinned dataset revision, scorer, and prompt format as references (not
dependencies).

Signed-off-by: Elron Bandel <elron.bandel@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant