benchmark: exit-0 unparseable reviewer output still scores F1=1.0 on clean-baseline

**Files:** `scripts/benchmark.py:189` (zero-score gate), `scripts/benchmark.py:_dispatch` (~line 143), `scripts/aggregate_bench.py:_rescore_run`

The PR #12 zero-score gate only checks `exit_code != 0`. A reviewer whose CLI exits 0 but prints unparseable prose (extract_json → None) yields `findings=[]`, `error=None`, `exit_code=0` — flows into `_score` and earns P=R=F1=1.0 on clean-baseline. This is the exact "broken reviewer earns perfect F1" bug class the gate was added to close, surviving via the parse-failure path (the documented copilot-gpt5 "returns prose not JSON" failure mode). `_dispatch` doesn't propagate a parse_error flag, so neither benchmark.py nor aggregate_bench.py can see it.

**Fix:** `_dispatch` records `parse_error`; the zero-score gate and `_rescore_run` treat parse_error like a failed call; run rows persist the flag.

Found by /code-review round 2 (3 finder angles independently, CONFIRMED).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark: exit-0 unparseable reviewer output still scores F1=1.0 on clean-baseline #14

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

benchmark: exit-0 unparseable reviewer output still scores F1=1.0 on clean-baseline #14

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions