feat(agents): EVAL_AGENT_REASONING_EFFORT for zerostack via extra_body by elronbandel · Pull Request #249 · Exgentic/eval-containers

elronbandel · 2026-06-29T17:18:35Z

What

Two things for zerostack, found and fixed together while making the reasoning-effort knob (#237) work:

Wire EVAL_AGENT_REASONING_EFFORT — the one agent feat(agents): EVAL_AGENT_REASONING_EFFORT knob (7 agents, loud on the rest) #237 left loud-rejected.
Fix a pre-existing glibc bug that prevented zerostack from running in any eval combo.

1. Reasoning effort

At pin time (v1.5.0) zerostack had no reasoning-effort surface: /thinking is an alias for /reasoning, which only swaps a system-prompt prefix — not an API param. v1.5.1 adds a global extra_body config field ("shallow-merges JSON into every completion request"); traced end-to-end it reaches the custom OpenAI/completions provider. So map the effort onto the OpenAI reasoning_effort body param in the config.json the entrypoint writes:

reasoning=
if [ -n "${EVAL_AGENT_REASONING_EFFORT:-}" ]; then
  reasoning=",\"extra_body\":{\"reasoning_effort\":\"$EVAL_AGENT_REASONING_EFFORT\"}"
fi

Built with an explicit if (not the flag-tier ${VAR:+…} idiom) because the JSON value nests braces — ${VAR:+} closes at the first inner } and emits a stray brace when unset (caught in review; the one-liner produced invalid JSON for the default case).

2. glibc → musl (pre-existing, unblocks zerostack entirely)

Verifying end-to-end surfaced this: zerostack's gnu binary needs glibc 2.39, but benchmark bases (where the combo runs the agent) are Debian 2.36 → GLIBC_2.39 not found. zerostack could not run in any combo. Not introduced here — v1.5.0 has the identical requirement; the agent image (Ubuntu 2.39) masks it, the combo (Debian 2.36) exposes it. Fix: the static musl release binary (no glibc dependency).

Verification

config.json jq-valid both set and unset (unset → no extra_body).
Full combo pipeline (run-agent → musl zerostack): EVAL_AGENT_REASONING_EFFORT=high → upstream request carries "reasoning_effort":"high"; unset → no key. run-agent's grep guard accepts zerostack (exit 0, not loud-reject).
musl binary verified static (not a dynamic executable) and runs on the glibc-2.36 base.
cargo test … check 15/15; pre-commit (hadolint, conftest, detect-secrets, trivy) green.

Note: gaia itself is a poor demonstrator here — its agent network has no web egress and zerostack has exa-mcp disabled, so it can't browse → ~0 score regardless of effort. The plumbing is benchmark-independent, so this aime-combo verification covers the gaia path.

Rules

Code-only PR (contributing/RULES.md 2). agents/RULES.md 12 (unset = reproducible default), 5 (base_url unchanged); models/RULES.md 3 (agent owns request params).

⚠️ Pre-release pin

extra_body first ships in 1.5.1; the only published release today is v1.5.1-rc2, so this pins that rc (ships arm64+amd64 musl binaries, verified). Recommend bumping to 1.5.1 stable when it lands.

zerostack was loud-rejected by the reasoning-effort knob (#237): its pinned v1.5.0 had no documented reasoning-effort surface (/thinking is a prompt-prefix toggle, not an API param). v1.5.1 adds a global `extra_body` config field that shallow-merges JSON into every completion request, so map EVAL_AGENT_REASONING_EFFORT onto the OpenAI `reasoning_effort` body param (unset = no extra_body = byte-identical config = reproducible default, agents/RULES.md rule 12). Built with an explicit if, not the flag-tier agents' ${VAR:+...} idiom, because the JSON value nests braces. Also fixes a pre-existing bug found while verifying end-to-end: zerostack's gnu binary needs glibc 2.39, but benchmark bases (where the combo runs the agent) are Debian 2.36 -- so zerostack could not run in ANY combo (affects 1.5.0 too). Switch to the static musl release binary, which has no glibc dependency. - bump zerostack 1.5.0 -> 1.5.1-rc2 (first release shipping `extra_body`) - switch gnu -> static musl binary (runs on the glibc-2.36 benchmark base) - static test + supported-agents doc add zerostack Verified: config.json valid both set/unset (jq); full combo pipeline (run-agent -> musl zerostack) sends "reasoning_effort":"high" when set, nothing when unset; run-agent grep guard accepts zerostack (exit 0, not loud-reject). Signed-off-by: Elron Bandel <elron.bandel@ibm.com>

elronbandel force-pushed the elron/zerostack-reasoning-effort branch 3 times, most recently from c9df409 to 0911127 Compare June 29, 2026 17:38

elronbandel force-pushed the elron/zerostack-reasoning-effort branch from 0911127 to 66cd821 Compare June 29, 2026 18:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(agents): EVAL_AGENT_REASONING_EFFORT for zerostack via extra_body#249

feat(agents): EVAL_AGENT_REASONING_EFFORT for zerostack via extra_body#249
elronbandel wants to merge 1 commit into
mainfrom
elron/zerostack-reasoning-effort

elronbandel commented Jun 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

elronbandel commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

1. Reasoning effort

2. glibc → musl (pre-existing, unblocks zerostack entirely)

Verification

Rules

⚠️ Pre-release pin

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

elronbandel commented Jun 29, 2026 •

edited

Loading