Per-task benchmarks at scale on OC: fan out N Jobs (an Indexed Job can't vary the image per index)

## Context

Per-task benchmarks (compilebench, cybench, mle-bench, swe-bench, swe-bench-pro, swe-lancer, terminal-bench) bake **one eval image per task** — `evals/<benchmark>-<task>--<agent>` (see #79, rule 24f).

The chart's scale model for *shared-env* benchmarks is ONE **Indexed Job** (`datasetSize` → `completionMode: Indexed`, task id = `$JOB_COMPLETION_INDEX`): one image, N task indices.

## Problem

That **cannot work for per-task benchmarks** — each task is a *different image*, and a k8s Job uses one pod template (one image) for all indices. #79 added a guard rejecting `datasetSize` + `perTask`, so per-task runs as **one Job per task**. There's currently no mechanism to run a *full* per-task benchmark (e.g. all 500 SWE-bench Verified tasks) at scale on OC.

## Options to investigate

- **CLI fan-out:** `eval-containers run <bench> --mode job --all-tasks` renders one Job per id from `tasks.txt`, each pinning `evals/<b>-<task>--<a>`, admitted by Kueue for global concurrency.
- **External sweep** (the `oc/` tooling): loop `tasks.txt` → `helm template … --set task=<id> --set perTask=true` → apply, Kueue as the concurrency cap.
- **Result aggregation** across N Jobs (each writes its own `/output/<task>/result.json`).

## Notes

- Surfaced in #79. Not urgent; needed before per-task benchmarks run at scale on the cluster.
- Relevant: rule 24f (eval naming), the chart's `datasetSize`/`perTask` model, the existing Kueue sweep concurrency.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Per-task benchmarks at scale on OC: fan out N Jobs (an Indexed Job can't vary the image per index) #83

Context

Problem

Options to investigate

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Per-task benchmarks at scale on OC: fan out N Jobs (an Indexed Job can't vary the image per index) #83

Description

Context

Problem

Options to investigate

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions