Skip to content

refactor(chart): bundle benchmark presets in the chart (self-contained, OCI-ready)#26

Merged
elronbandel merged 3 commits into
mainfrom
elron/chart-native-presets
Jun 3, 2026
Merged

refactor(chart): bundle benchmark presets in the chart (self-contained, OCI-ready)#26
elronbandel merged 3 commits into
mainfrom
elron/chart-native-presets

Conversation

@elronbandel

@elronbandel elronbandel commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Why

In job mode the chart was rendered as helm template benchmarks/_chart -f benchmarks/<x>/values.yaml — both args are local paths, so deploying required cloning the repo. This makes the chart self-contained: it renders from --set benchmark=<x> alone, with each benchmark's bespoke topology bundled inside the chart. That's the prerequisite for packaging/publishing it to an OCI registry (follow-up PR).

What

  • Benchmark selection moves from -f benchmarks/<x>/values.yaml to --set benchmark=<x>.
  • Bespoke topology moves into the chart at benchmarks/_chart/presets/<x>.yaml, loaded with Helm's .Files.Get and overlaid over the chart defaults via a new eval.values helper. Standard benchmarks have no preset — .Files.Get returns empty and the chart defaults apply unchanged.
  • The 4 non-trivial benchmarks (osworld, tau-bench, visualwebarena, webarena) became presets (git-detected as renames); the 98 trivial one-line values.yaml files are deleted — their identity now comes from --set benchmark=<x>.
  • Merge semantics: presets set only structural keys (sidecars, resources, extraManifests); the per-run axes (agent/task/model) come from --set and always win. Verified no preset touches an axis.
  • CLI (run --mode job): drops the -f values, adds --set benchmark=<x>.
  • Tests: tests/helm.rs renders via --set benchmark=<x>; tests/sanity/check.rs no longer requires a per-benchmark values.yaml (the k8s surface works for every benchmark with no file).
  • Doctrine + docs retargeted at the preset model: RULES.md rules 24/24b/24c/24e/25/29 + changelog, the add-benchmark skill & template, src/RULES.md, delivery/build, and the user docs (helm-chart, triple-mode, deploy guides, chart-values, README).

Verification

  • Byte-identical render vs. the prior -f values.yaml form for all 102 benchmarks (trivial + the 4 rich) — confirmed by diff.
  • cargo test --test helm (renders + kubeconform-validates all 102) — green.
  • cargo test --test check — the same 4 pre-existing reds as main (agents-smoke Dockerfile, hle fixture, README count); no new failures, and the values.yaml existence/pin checks are correctly gone.
  • cargo build + clippy clean (pre-existing warnings only).

Live-cluster validation (server-side dry-run)

Validated against a real Kubernetes API server (kind), which exercises schema, defaulting, and admission — everything short of containers starting:

  • Whole fleet: 101/101 benchmarks pass helm template … --set benchmark=<x> | kubectl apply --dry-run=server (the chart selection accepted by the API server for every benchmark, including the 4 rich presets).
  • Rich topology (osworld): the desktop Deployment + Service + wait-for-desktop init container all render and validate alongside the Job.
  • Full CLI path: eval-containers run <b> --mode job --dry-run → helm → kubectl apply --dry-run=server accepted for trivial and rich.
  • OpenShift overlay: --overlay deploy/values-openshift.yaml injects serviceAccountName: anyuid-sa and validates.

Not covered (needs pullable images + the eval-secrets Secret): a container actually running a task to produce result.json.

Follow-up

Chart publishing (helm package + helm push to quay) is a separate small PR, per scoping. Independent of #25 (registrySuffix), which also touches job.yaml — whichever merges second resolves a trivial conflict on the three image lines.

…hmark values.yaml)

Make the Helm chart self-contained so it renders — and can be published to an
OCI registry — without the repo. The benchmark is now selected with
`--set benchmark=<x>` instead of `-f benchmarks/<x>/values.yaml`, and a
benchmark's bespoke topology lives in the chart at `presets/<x>.yaml`, loaded
via Helm's `.Files.Get` and overlaid over the chart defaults.

- The 4 non-trivial benchmarks (osworld, tau-bench, visualwebarena, webarena)
  move to benchmarks/_chart/presets/<name>.yaml; the 98 trivial one-line
  values.yaml files are deleted (their identity now comes from --set benchmark).
- A new eval.values helper merges the selected preset over .Values; job.yaml
  reads the merged result. Presets set only structural keys, so per-run axes
  (agent/task/model via --set) always win.
- CLI run --mode job: drop the -f values, add --set benchmark=<x>.
- tests/helm.rs renders via --set benchmark; check.rs no longer requires a
  per-benchmark values.yaml (k8s works for every benchmark with no file).
- Doctrine (RULES.md 24/24b/24c/24e/25/29, add-benchmark skill+template,
  src/RULES.md, delivery/build) and docs retargeted at the preset model.

Renders byte-identical to the prior `-f values.yaml` form for all 102
benchmarks (trivial + the 4 rich); `cargo test --test helm` is green.
Chart publishing (helm package/push) lands in a follow-up.
@elronbandel elronbandel merged commit a811811 into main Jun 3, 2026
1 check failed
elronbandel added a commit that referenced this pull request Jun 15, 2026
refactor(chart): bundle benchmark presets in the chart (self-contained, OCI-ready)
elronbandel added a commit that referenced this pull request Jun 15, 2026
refactor(chart): bundle benchmark presets in the chart (self-contained, OCI-ready)
elronbandel added a commit that referenced this pull request Jun 15, 2026
refactor(chart): bundle benchmark presets in the chart (self-contained, OCI-ready)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant