Skip to content

Agent task-id isolation (#80) is bypassed by custom-runnerArgs benchmarks (tau-bench) #84

Description

@elronbandel

Context

#80 stops leaking the benchmark task identity to the agent by removing TASK_ID from the agent's env -i allow-list in core/process-compose/process-compose.yaml (rule 7 — a model can recall a memorized solution from an instance id). That covers every benchmark whose runner goes through /usr/local/bin/run → process-compose (all three modes).

Problem

A benchmark whose chart preset (or compose.yaml / container.Dockerfile) sets a custom runnerArgs/command runs its agent without going through process-compose's env -i — so it inherits the runner container env, which still carries EVAL_TASK_ID/TASK_ID. The agent (and model) can then see the task id, defeating #80 for that benchmark.

Known instance: tau-benchbenchmarks/_chart/presets/tau-bench.yaml sets runnerArgs: python3 /app/agent.py …, bypassing /usr/local/bin/run. (tau-bench is shared-env, so its task id is a dataset index — lower memorization risk than SWE-bench's instance ids — but the isolation guarantee still has a hole.)

To do

  • Audit all surfaces for custom agent invocations that bypass the process-compose env -i allow-list (chart presets' runnerArgs, per-benchmark compose.yaml command overrides, container.Dockerfile).
  • Ensure each strips the task identity from the agent's env (run the custom agent under the shared task-id-free allow-list, or unset EVAL_TASK_ID TASK_ID before it).
  • Extend the rule-7 conformance test added in fix(isolation): don't leak the task id to the agent process #80 (tests/sanity/check.rs::agent_env_excludes_the_task_id) to cover custom-runnerArgs benchmarks.

Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions