feat(agents): EVAL_AGENT_REASONING_EFFORT for zerostack via extra_body#249
Open
elronbandel wants to merge 1 commit into
Open
feat(agents): EVAL_AGENT_REASONING_EFFORT for zerostack via extra_body#249elronbandel wants to merge 1 commit into
elronbandel wants to merge 1 commit into
Conversation
c9df409 to
0911127
Compare
zerostack was loud-rejected by the reasoning-effort knob (#237): its pinned v1.5.0 had no documented reasoning-effort surface (/thinking is a prompt-prefix toggle, not an API param). v1.5.1 adds a global `extra_body` config field that shallow-merges JSON into every completion request, so map EVAL_AGENT_REASONING_EFFORT onto the OpenAI `reasoning_effort` body param (unset = no extra_body = byte-identical config = reproducible default, agents/RULES.md rule 12). Built with an explicit if, not the flag-tier agents' ${VAR:+...} idiom, because the JSON value nests braces. Also fixes a pre-existing bug found while verifying end-to-end: zerostack's gnu binary needs glibc 2.39, but benchmark bases (where the combo runs the agent) are Debian 2.36 -- so zerostack could not run in ANY combo (affects 1.5.0 too). Switch to the static musl release binary, which has no glibc dependency. - bump zerostack 1.5.0 -> 1.5.1-rc2 (first release shipping `extra_body`) - switch gnu -> static musl binary (runs on the glibc-2.36 benchmark base) - static test + supported-agents doc add zerostack Verified: config.json valid both set/unset (jq); full combo pipeline (run-agent -> musl zerostack) sends "reasoning_effort":"high" when set, nothing when unset; run-agent grep guard accepts zerostack (exit 0, not loud-reject). Signed-off-by: Elron Bandel <elron.bandel@ibm.com>
0911127 to
66cd821
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Two things for zerostack, found and fixed together while making the reasoning-effort knob (#237) work:
EVAL_AGENT_REASONING_EFFORT— the one agent feat(agents): EVAL_AGENT_REASONING_EFFORT knob (7 agents, loud on the rest) #237 left loud-rejected.1. Reasoning effort
At pin time (v1.5.0) zerostack had no reasoning-effort surface:
/thinkingis an alias for/reasoning, which only swaps a system-prompt prefix — not an API param. v1.5.1 adds a globalextra_bodyconfig field ("shallow-merges JSON into every completion request"); traced end-to-end it reaches the custom OpenAI/completions provider. So map the effort onto the OpenAIreasoning_effortbody param in theconfig.jsonthe entrypoint writes:Built with an explicit
if(not the flag-tier${VAR:+…}idiom) because the JSON value nests braces —${VAR:+}closes at the first inner}and emits a stray brace when unset (caught in review; the one-liner produced invalid JSON for the default case).2. glibc → musl (pre-existing, unblocks zerostack entirely)
Verifying end-to-end surfaced this: zerostack's gnu binary needs glibc 2.39, but benchmark bases (where the combo runs the agent) are Debian 2.36 →
GLIBC_2.39 not found. zerostack could not run in any combo. Not introduced here — v1.5.0 has the identical requirement; the agent image (Ubuntu 2.39) masks it, the combo (Debian 2.36) exposes it. Fix: the static musl release binary (no glibc dependency).Verification
config.jsonjq-valid both set and unset (unset → noextra_body).run-agent → musl zerostack):EVAL_AGENT_REASONING_EFFORT=high→ upstream request carries"reasoning_effort":"high"; unset → no key. run-agent's grep guard accepts zerostack (exit 0, not loud-reject).not a dynamic executable) and runs on the glibc-2.36 base.cargo test … check15/15; pre-commit (hadolint, conftest, detect-secrets, trivy) green.Rules
Code-only PR (contributing/RULES.md 2). agents/RULES.md 12 (unset = reproducible default), 5 (base_url unchanged); models/RULES.md 3 (agent owns request params).
extra_bodyfirst ships in 1.5.1; the only published release today isv1.5.1-rc2, so this pins that rc (ships arm64+amd64 musl binaries, verified). Recommend bumping to 1.5.1 stable when it lands.