Per-task / built-from-source benchmarks can't produce replay fixtures

## Problem

Per-task / built-from-source benchmarks (terminal-bench, skills-bench, swe-bench-pro, swe-lancer) can't get a replay fixture, so they can't pass the replay sweep or earn `eval.benchmark.released="true"` (benchmarks/RULES.md rule 21a).

Recording a fixture needs a live run → the combined eval image → `eval-containers build eval` → `docker buildx bake` with `FROM ${BENCHMARK_IMAGE}`. The per-task base is built into the **local image store** (`build.sh` / per-task `docker build`, not a registry). On a `docker-container` buildx builder (e.g. a podman-backed Docker), bake resolves `FROM` from a registry, not the local store, so it fails with *"failed to resolve source metadata"* → the eval image can't be stitched → no live run → no fixture.

## Impact

terminal-bench and skills-bench (both built-from-source after #125) have no replay fixtures and aren't `released`.

## Fix direction

Have `eval-containers build eval --task-id` fall back to `podman build` (which reads the local image store) when buildx can't resolve the local base — driving the `docker buildx bake --print` spec so bake stays the source of truth. Then record fixtures and mark the benchmarks released.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Per-task / built-from-source benchmarks can't produce replay fixtures #149

Problem

Impact

Fix direction

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Per-task / built-from-source benchmarks can't produce replay fixtures #149

Description

Problem

Impact

Fix direction

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions