Skip to content

Backlog: consolidate the five duplicated pipelines (dispatch, cost gate, adapters, scoring, leaderboard) #22

@jimstratus

Description

@jimstratus

Tracking issue — deliberately NOT closed by the current fix PR (each item is a standalone refactor):

  1. benchmark.py:_dispatch duplicates dispatch.py:_dispatch_one (primary→fallback pipeline). The two have already drifted twice (exception isolation went to dispatch only; scoring semantics to benchmark only). → shared dispatch_with_fallback() in _common.py.
  2. Cost model exists 3×: estimate_cost.py, benchmark.py inline gate (which additionally honors ARGUS_YES_COST and skips warn exit semantics), bench_cost.py. → estimate_roster_cost() in _common.py.
  3. Five near-identical stdin-CLI adapters (gemini/codex/claude/opencode/copilot) — the ARG_MAX fix needed two separate commits to port between them. → one factory make_stdin_cli_adapter(route_key, extra_env, template_vars).
  4. Failed-call zero-scoring lives in 3 hand-synced copies (benchmark wall-cap stub, benchmark exit-code branch, aggregate_bench._rescore_run). → shared score_run().
  5. Leaderboard markdown rendered by both benchmark.py:_write_outputs and aggregate_bench.py:_leaderboard_md with already-divergent columns. → shared renderer.

Also: per-reviewer benchmark calls are fully serialized (single-reviewer bench wastes the max_parallel budget; interacts with --max-wall-sec semantics) and detect_host signals could move to config beside host_rules.

From /code-review rounds 1+2 (reuse + altitude angles, all verified against HEAD).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions