Skip to content

feat(agents): EVAL_AGENT_REASONING_EFFORT for zerostack via extra_body#249

Open
elronbandel wants to merge 1 commit into
mainfrom
elron/zerostack-reasoning-effort
Open

feat(agents): EVAL_AGENT_REASONING_EFFORT for zerostack via extra_body#249
elronbandel wants to merge 1 commit into
mainfrom
elron/zerostack-reasoning-effort

Conversation

@elronbandel

@elronbandel elronbandel commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

What

Two things for zerostack, found and fixed together while making the reasoning-effort knob (#237) work:

  1. Wire EVAL_AGENT_REASONING_EFFORT — the one agent feat(agents): EVAL_AGENT_REASONING_EFFORT knob (7 agents, loud on the rest) #237 left loud-rejected.
  2. Fix a pre-existing glibc bug that prevented zerostack from running in any eval combo.

1. Reasoning effort

At pin time (v1.5.0) zerostack had no reasoning-effort surface: /thinking is an alias for /reasoning, which only swaps a system-prompt prefix — not an API param. v1.5.1 adds a global extra_body config field ("shallow-merges JSON into every completion request"); traced end-to-end it reaches the custom OpenAI/completions provider. So map the effort onto the OpenAI reasoning_effort body param in the config.json the entrypoint writes:

reasoning=
if [ -n "${EVAL_AGENT_REASONING_EFFORT:-}" ]; then
  reasoning=",\"extra_body\":{\"reasoning_effort\":\"$EVAL_AGENT_REASONING_EFFORT\"}"
fi

Built with an explicit if (not the flag-tier ${VAR:+…} idiom) because the JSON value nests braces${VAR:+} closes at the first inner } and emits a stray brace when unset (caught in review; the one-liner produced invalid JSON for the default case).

2. glibc → musl (pre-existing, unblocks zerostack entirely)

Verifying end-to-end surfaced this: zerostack's gnu binary needs glibc 2.39, but benchmark bases (where the combo runs the agent) are Debian 2.36GLIBC_2.39 not found. zerostack could not run in any combo. Not introduced here — v1.5.0 has the identical requirement; the agent image (Ubuntu 2.39) masks it, the combo (Debian 2.36) exposes it. Fix: the static musl release binary (no glibc dependency).

Verification

  • config.json jq-valid both set and unset (unset → no extra_body).
  • Full combo pipeline (run-agent → musl zerostack): EVAL_AGENT_REASONING_EFFORT=high → upstream request carries "reasoning_effort":"high"; unset → no key. run-agent's grep guard accepts zerostack (exit 0, not loud-reject).
  • musl binary verified static (not a dynamic executable) and runs on the glibc-2.36 base.
  • cargo test … check 15/15; pre-commit (hadolint, conftest, detect-secrets, trivy) green.

Note: gaia itself is a poor demonstrator here — its agent network has no web egress and zerostack has exa-mcp disabled, so it can't browse → ~0 score regardless of effort. The plumbing is benchmark-independent, so this aime-combo verification covers the gaia path.

Rules

Code-only PR (contributing/RULES.md 2). agents/RULES.md 12 (unset = reproducible default), 5 (base_url unchanged); models/RULES.md 3 (agent owns request params).

⚠️ Pre-release pin

extra_body first ships in 1.5.1; the only published release today is v1.5.1-rc2, so this pins that rc (ships arm64+amd64 musl binaries, verified). Recommend bumping to 1.5.1 stable when it lands.

@elronbandel elronbandel force-pushed the elron/zerostack-reasoning-effort branch 3 times, most recently from c9df409 to 0911127 Compare June 29, 2026 17:38
zerostack was loud-rejected by the reasoning-effort knob (#237): its pinned
v1.5.0 had no documented reasoning-effort surface (/thinking is a prompt-prefix
toggle, not an API param). v1.5.1 adds a global `extra_body` config field that
shallow-merges JSON into every completion request, so map EVAL_AGENT_REASONING_EFFORT
onto the OpenAI `reasoning_effort` body param (unset = no extra_body = byte-identical
config = reproducible default, agents/RULES.md rule 12). Built with an explicit if,
not the flag-tier agents' ${VAR:+...} idiom, because the JSON value nests braces.

Also fixes a pre-existing bug found while verifying end-to-end: zerostack's gnu
binary needs glibc 2.39, but benchmark bases (where the combo runs the agent) are
Debian 2.36 -- so zerostack could not run in ANY combo (affects 1.5.0 too). Switch
to the static musl release binary, which has no glibc dependency.

- bump zerostack 1.5.0 -> 1.5.1-rc2 (first release shipping `extra_body`)
- switch gnu -> static musl binary (runs on the glibc-2.36 benchmark base)
- static test + supported-agents doc add zerostack

Verified: config.json valid both set/unset (jq); full combo pipeline (run-agent ->
musl zerostack) sends "reasoning_effort":"high" when set, nothing when unset; run-agent
grep guard accepts zerostack (exit 0, not loud-reject).

Signed-off-by: Elron Bandel <elron.bandel@ibm.com>
@elronbandel elronbandel force-pushed the elron/zerostack-reasoning-effort branch from 0911127 to 66cd821 Compare June 29, 2026 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant