Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,19 +81,20 @@ eval-containers run aime --task-id 0 --agent codex --mode job

### Kubernetes (`--mode job`)

Every benchmark renders from one shared [Helm](https://helm.sh/) chart (`benchmarks/_chart`) — select it with `--set benchmark=<x>` and apply, no CLI needed. A benchmark with bespoke topology (extra Deployments/sidecars) adds a `presets/<x>.yaml` inside the chart; standard ones need nothing:
Every benchmark renders from one shared [Helm](https://helm.sh/) chart — select it with `--set benchmark=<x>` and apply. The chart is self-contained: each benchmark's bespoke topology ships inside it as a `presets/<x>.yaml` (standard ones need nothing), so it pulls straight from the registry — **no clone**:

```bash
helm template aime benchmarks/_chart --set benchmark=aime \
--set agent=claude-code,task=0 | kubectl apply -f -
# From the published chart (see Pre-release note above) — no clone needed:
helm template aime oci://quay.io/eval-containers/charts/eval --version 0.1.0 \
--set benchmark=aime --set agent=claude-code --set task=0 | kubectl apply -f -
```

The CLI does exactly that, mapping every axis to a `--set`:
Working in a clone, render the local chart instead — which is what the CLI builds today, mapping every axis to a `--set`:

```bash
eval-containers run aime --agent codex --task-id 42 --mode job
# → helm template aime-codex-task-42 benchmarks/_chart --set benchmark=aime \
# --set registry=…,agent=codex,task=42 | kubectl apply -f -
# → helm template aime-codex-task-42 ./benchmarks/_chart --set benchmark=aime \
# --set registry=… --set agent=codex --set task=42 | kubectl apply -f -
```

Platform specifics (corp registry, NodeAffinity, NetworkPolicies, a different service account, ...) are a Helm **values file you own**, layered on with `--overlay` (an extra `helm -f`), so the eval axes and your platform settings merge. A ready-to-adapt OpenShift overlay (sets the `anyuid` service account) ships as [`deploy/values-openshift.yaml`](deploy/values-openshift.yaml):
Expand Down
2 changes: 1 addition & 1 deletion deploy/openshift-service-account.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
# Then deploy any benchmark with the OpenShift values layered on:
#
# helm template <bench> benchmarks/_chart --set benchmark=<bench> \
# -f deploy/values-openshift.yaml --set agent=<a>,task=<t> | oc apply -f -
# -f deploy/values-openshift.yaml --set agent=<a> --set task=<t> | oc apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
Expand Down
2 changes: 1 addition & 1 deletion docs/concepts/the-helm-chart.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ Full field list: [Chart values reference](../reference/chart-values.md).

```bash
helm template aime benchmarks/_chart --set benchmark=aime \
--set agent=claude-code,task=0 | kubectl apply -f -
--set agent=claude-code --set task=0 | kubectl apply -f -
```

The `eval-containers run … --mode job` command builds exactly this, mapping each
Expand Down
2 changes: 1 addition & 1 deletion docs/guides/deploy-on-kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Plain Helm — no CLI required:

```bash
helm template aime benchmarks/_chart --set benchmark=aime \
--set agent=claude-code,task=0 | kubectl apply -f -
--set agent=claude-code --set task=0 | kubectl apply -f -
```

Or with the CLI, which builds the same command:
Expand Down