35 lines (26 loc) · 908 Bytes

Submitting TwinBench Results

TwinBench v1 accepts result submissions in artifact form.

Minimum Submission Contents

benchmark version
system name
system version
evaluation date
scenario-level observations
per-metric scores
total score
scenario coverage
metric coverage
evaluator notes
caveats

Evidence Expectations

Preferred submissions also include:

links to result artifacts
links to prompts or run manifests
explanation of any scenario deviations
notes on whether evaluation steps were automated or evaluator-mediated

Submission Guidance

Use the canonical result structure shown in LEADERBOARD.md.
Keep caveats explicit.
Do not report benchmark totals without metric-level detail.
If only part of the scenario set was run, disclose that clearly.

TwinBench is intended to reward honest reporting more than flattering reporting.