run: scope non-canonical --mode job overlay patches to the eval Job by elronbandel · Pull Request #5 · Exgentic/eval-containers

elronbandel · 2026-05-31T13:01:51Z

Problem

eval-containers run <bench> --mode job with non-canonical axes (--agent / --task-id / --model) synthesizes a Kustomize overlay. Its rename patch is scoped by Job name, but the runner-env, pod-label, and gateway-env patches used an unscoped target: { kind: Job }.

Every benchmark renders exactly one Job except tau-bench, which ships a bespoke second Job (a user-simulation harness) in the same manifest. The unscoped patches strategic-merged a runner and gateway container into the harness Job — imageless, so the Job spec is invalid and kubectl apply is rejected.

harness Job containers
before	`['gateway', 'runner', 'harness']` ← 2 imageless → invalid
after	`['harness']`

Canonical --mode job and all 100 single-Job benchmarks were unaffected; only non-canonical runs of tau-bench broke (silently — kustomize skips a non-matching target, so there was no error, just a bad manifest).

Fix

Scope all three patches to the canonical Job by name (the way the rename patch already was), and move the rename last so the name-scoped patches still match <bench>-task-0 before the rename changes the name.

Verification — rendered the actual generated overlay (not a mock)

tau-bench --task-id 5 --agent codex --model gpt-x: harness Job → ['harness'] (leak gone); tau-bench-task-5 → [otelcol, gateway, runner], zero imageless containers.
mmlu-pro (single Job): one mmlu-pro-task-5 Job, runner env AGENT=codex TASK_ID=5, all containers have images — a current-vs-fixed render diff is empty (no behavior change for the other 100 benchmarks).
cargo build + cargo test --no-run pass; cargo fmt --check clean for run.rs.

Follow-up (separate)

No automated test covers overlay generation today; this multi-Job class of bug is exactly what the render/drift lint from the gen-kustomization discussion would catch. Tracking separately.

🤖 Generated with Claude Code

The synthesized Kustomize overlay for non-canonical `--mode job` runs patched the runner/gateway env and pod-template labels with an unscoped `target: { kind: Job }`. On a benchmark that ships a bespoke second Job in the same manifest — only tau-bench today (a user-sim `harness` Job beside the eval `tau-bench-task-0` Job) — those strategic-merge patches leaked into the harness Job too, injecting imageless `runner` and `gateway` containers. The result is an invalid Job spec, so `run tau-bench --mode job --task-id N --agent X` produced a manifest that fails admission. Scope all three patches (runner env, pod labels, gateway env) to the canonical Job by name, and move the rename patch LAST so the name-scoped patches still match `<bench>-task-0` before the rename changes it. Verified by rendering the actual generated overlay: - tau-bench: harness Job now has only [harness] (was [gateway, runner, harness], two of them imageless); the task-N Job is unchanged. - single-Job benchmarks (mmlu-pro, ...): render identically — no regression. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

elronbandel mentioned this pull request Jun 11, 2026

ci(release): publish the crate to crates.io from release.yml (rule 2) #109

Merged

elronbandel force-pushed the main branch from 39be206 to 90d90b8 Compare June 15, 2026 10:35

elronbandel force-pushed the elron/fix-overlay-job-scope branch from f1ca5a8 to 6058ed2 Compare June 15, 2026 10:35

elronbandel force-pushed the main branch from cf2320e to 8bf6d75 Compare June 15, 2026 12:24

elronbandel force-pushed the elron/fix-overlay-job-scope branch from 6058ed2 to 4bddc01 Compare June 15, 2026 12:24

elronbandel force-pushed the main branch from 3f4e4b8 to 9b46aee Compare June 15, 2026 13:47

elronbandel force-pushed the elron/fix-overlay-job-scope branch from 4bddc01 to 9e3e5b6 Compare June 15, 2026 13:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

run: scope non-canonical --mode job overlay patches to the eval Job#5

run: scope non-canonical --mode job overlay patches to the eval Job#5
elronbandel wants to merge 1 commit into
mainfrom
elron/fix-overlay-job-scope

elronbandel commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

elronbandel commented May 31, 2026

Problem

Fix

Verification — rendered the actual generated overlay (not a mock)

Follow-up (separate)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant