Backlog: consolidate the five duplicated pipelines (dispatch, cost gate, adapters, scoring, leaderboard)

Tracking issue — deliberately NOT closed by the current fix PR (each item is a standalone refactor):

1. `benchmark.py:_dispatch` duplicates `dispatch.py:_dispatch_one` (primary→fallback pipeline). The two have already drifted twice (exception isolation went to dispatch only; scoring semantics to benchmark only). → shared `dispatch_with_fallback()` in _common.py.
2. Cost model exists 3×: `estimate_cost.py`, `benchmark.py` inline gate (which additionally honors ARGUS_YES_COST and skips warn exit semantics), `bench_cost.py`. → `estimate_roster_cost()` in _common.py.
3. Five near-identical stdin-CLI adapters (gemini/codex/claude/opencode/copilot) — the ARG_MAX fix needed two separate commits to port between them. → one factory `make_stdin_cli_adapter(route_key, extra_env, template_vars)`.
4. Failed-call zero-scoring lives in 3 hand-synced copies (benchmark wall-cap stub, benchmark exit-code branch, aggregate_bench._rescore_run). → shared `score_run()`.
5. Leaderboard markdown rendered by both `benchmark.py:_write_outputs` and `aggregate_bench.py:_leaderboard_md` with already-divergent columns. → shared renderer.

Also: per-reviewer benchmark calls are fully serialized (single-reviewer bench wastes the max_parallel budget; interacts with --max-wall-sec semantics) and detect_host signals could move to config beside host_rules.

From /code-review rounds 1+2 (reuse + altitude angles, all verified against HEAD).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backlog: consolidate the five duplicated pipelines (dispatch, cost gate, adapters, scoring, leaderboard) #22

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Backlog: consolidate the five duplicated pipelines (dispatch, cost gate, adapters, scoring, leaderboard) #22

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions