Revert #128: a benchmark baseline must be the bare model, no system prompt by DietrichGebert · Pull Request #175 · DietrichGebert/ponytail

DietrichGebert · 2026-06-18T23:44:11Z

Reverts #128.

A baseline arm is the control: the bare model with the task and nothing else. #128 gave it a system prompt ("Provide just one example for any given task, and no commentary or usage examples"), which:

makes the "baseline (no skill)" label false (it now carries a hand-written instruction),
desyncs the published numbers from the code: the single-shot table (baseline 518/693/256 LOC, "80-94% less code") was measured against the bare baseline and was never recomputed, so main shipped a baseline that would produce different numbers than the ones printed beside it,
was undisclosed.

Any prompt on the baseline tilts the comparison ("write the minimum amount of code" would erase the gap; the opposite would inflate it). The fair control is no prompt at all. The rambling critique in #126 is already answered by the agentic benchmark, so the single-shot baseline does not need de-rambling.

🤖 Generated with Claude Code

…amble (closes #126) (#128)" This reverts commit 37f46b8.

tosage05 · 2026-06-19T06:22:09Z

lol?

Revert "benchmarks: add system prompt to baseline arm so it doesn't r…

b21b979

…amble (closes #126) (#128)" This reverts commit 37f46b8.

DietrichGebert merged commit 48cdf05 into main Jun 18, 2026

DietrichGebert mentioned this pull request Jun 18, 2026

benchmarks: add system prompt to baseline arm so it doesn't ramble (closes #126) #128

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revert #128: a benchmark baseline must be the bare model, no system prompt#175

Revert #128: a benchmark baseline must be the bare model, no system prompt#175
DietrichGebert merged 1 commit into
mainfrom
revert/128-baseline-system-prompt

DietrichGebert commented Jun 18, 2026

Uh oh!

tosage05 commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

DietrichGebert commented Jun 18, 2026

Uh oh!

tosage05 commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants