Skip to content

[motoko_explore] dogfood profile structurally wastes tokens — ohmy_pi=true + MOTOKO_AGENT_V2=1 routes BashExec to an explicitly-not-wired Delegated backend; agent thrashes 13+ extra steps per task #225

Description

@sunholo-voight-kampff

Investigation triggered by 'massive token use' on executor-adapter benchmark runs. Pinpointed: every BashExec under the dogfood profile fails immediately with this error envelope, AND the agent then loops trying to recover.

{
"error": true,
"delegated_backend_not_wired": true,
"tool": "BashExec",
"note": "v2 split_by_backend selected Delegated; the inbox-based ohmy_pi wait is deferred to a follow-up milestone. Re-run with settings.ohmy_pi=false or use a tool that resolves to the Native backend."
}

A/B repro on motoko_agent HEAD, openrouter/openai/gpt-4o-mini, fizzbuzz task ('Write fizzbuzz.py, run with python3, declare done.'):

Setting Steps Input tokens Output tokens Duration Outcome
ohmy_pi=true (.motoko/config/dogfood/config.json default) 15+ (still going when killed at 90s) 3,695+ at step 15, growing every step ~1,290 90+ s gave up; prose-only finishing 'I can't run scripts due to system limitations'
ohmy_pi=false (only line changed) 3 3,613 201 6.1 s ✓ wrote fizzbuzz.py, ran it, declared done

Same model, same task, same workdir, same code. The ONLY change is .tools.ohmy_pi (and .tools.hybrid, since they're paired in dogfood). Result: 15x faster, 6x fewer output tokens, task actually completes.

ROOT CAUSE: the dogfood profile ships .tools.ohmy_pi=true and .tools.hybrid=true. With MOTOKO_AGENT_V2=1 (the default since the v2 migration), agent_loop_v2's split_by_backend routes BashExec to the Delegated backend. The Delegated backend's inbox-based wait (the ohmy_pi async path) is documented as 'deferred to a follow-up milestone' — the error envelope's own text says so. Result: every BashExec returns a structured error, the model retries, gets the same error, eventually gives up and emits prose. Each wasted step costs tokens linearly.

Why this hides in the metrics: the error has exit_code=1 + truncated=false + a structured payload that LOOKS like a real tool failure. The agent reasonably interprets it as 'BashExec is broken on this system, I should not retry' — but with most providers the model retries 1-2 more times before declaring inability, AND in hybrid mode it falls back to extracting bash from prose, which goes through the SAME deferred path. So 1 broken BashExec produces ~13 wasted steps.

WHY THE EXECUTOR ADAPTER MAKES THIS WORSE: with the AILANG benchmark teaching prompt (~21K tokens) as the user message, every wasted step costs ~32K input tokens (system + tool catalog + 21K task + growing history). The cumulative token use balloons because the per-step cost is 30x bigger than this fizzbuzz repro. That's the 'massive token use' the user reported; the token amplification was just exposing the existing dogfood bug at a larger scale.

FIXES, ranked:

  1. [P0, one-line] Change .tools.ohmy_pi default in motoko_agent/.motoko/config/dogfood/config.json from true to false. Same for hybrid (paired). Until the inbox-based ohmy_pi path lands, the default profile must not route BashExec through it. The explicit error message tells operators to do this; ship that as the default.

  2. [P0, two-line] In agent_loop_v2's split_by_backend, when ohmy_pi=true is set in config but the Delegated backend isn't wired (Today), refuse to start with a clear error: 'profile sets tools.ohmy_pi=true but inbox-based dispatch is not wired in this build. Set tools.ohmy_pi=false.' Don't let the agent enter a loop where every BashExec fails identically — fail-fast at startup is cheaper than fail-13-times-per-task.

  3. [P1] Land the ohmy_pi inbox-based wait actually wired. Per the error envelope's own text it's a deferred milestone; this is a real feature gap that breaks the dogfood story.

  4. [P2] Telemetry: emit a 'profile_misconfigured' event when N consecutive BashExec calls fail with delegated_backend_not_wired:true. Even one such event in a session run should be enough — but counting them avoids false positives on intentional dispatch tests.

OBSERVED ON: motoko_agent dev (commit pre v0.18, fresh checkout 2026-05-08), session JSONL session_2026-05-08T18-04-20-602Z.jsonl. Repro is deterministic — every run with default dogfood config exhibits the same pattern. Filed as bug, not feature, because the shipping default profile cannot complete a hello-world task.

Happy to send the JSONL or a 10-line test that asserts BashExec exit_code != 1 with delegated_backend_not_wired payload (would catch this as a smoke regression in CI).


Binary info (auto-attached):
ailang version: v0.17.0-46-g7cd0c743-dirty
binary md5: b3402e190334480662a76ed841e341de
binary path: /Users/mark/go/bin/ailang
git commit: 7cd0c74


Reported by: motoko_explore via ailang messages

Metadata

Metadata

Assignees

No one assigned

    Labels

    ailang-messageMessage from AILANG messaging systembugBug reportfrom:motoko_exploreMessage from motoko_explore agent

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions