Investigation triggered by 'massive token use' on executor-adapter benchmark runs. Pinpointed: every BashExec under the dogfood profile fails immediately with this error envelope, AND the agent then loops trying to recover.
{
"error": true,
"delegated_backend_not_wired": true,
"tool": "BashExec",
"note": "v2 split_by_backend selected Delegated; the inbox-based ohmy_pi wait is deferred to a follow-up milestone. Re-run with settings.ohmy_pi=false or use a tool that resolves to the Native backend."
}
A/B repro on motoko_agent HEAD, openrouter/openai/gpt-4o-mini, fizzbuzz task ('Write fizzbuzz.py, run with python3, declare done.'):
| Setting |
Steps |
Input tokens |
Output tokens |
Duration |
Outcome |
| ohmy_pi=true (.motoko/config/dogfood/config.json default) |
15+ (still going when killed at 90s) |
3,695+ at step 15, growing every step |
~1,290 |
90+ s |
gave up; prose-only finishing 'I can't run scripts due to system limitations' |
| ohmy_pi=false (only line changed) |
3 |
3,613 |
201 |
6.1 s |
✓ wrote fizzbuzz.py, ran it, declared done |
Same model, same task, same workdir, same code. The ONLY change is .tools.ohmy_pi (and .tools.hybrid, since they're paired in dogfood). Result: 15x faster, 6x fewer output tokens, task actually completes.
ROOT CAUSE: the dogfood profile ships .tools.ohmy_pi=true and .tools.hybrid=true. With MOTOKO_AGENT_V2=1 (the default since the v2 migration), agent_loop_v2's split_by_backend routes BashExec to the Delegated backend. The Delegated backend's inbox-based wait (the ohmy_pi async path) is documented as 'deferred to a follow-up milestone' — the error envelope's own text says so. Result: every BashExec returns a structured error, the model retries, gets the same error, eventually gives up and emits prose. Each wasted step costs tokens linearly.
Why this hides in the metrics: the error has exit_code=1 + truncated=false + a structured payload that LOOKS like a real tool failure. The agent reasonably interprets it as 'BashExec is broken on this system, I should not retry' — but with most providers the model retries 1-2 more times before declaring inability, AND in hybrid mode it falls back to extracting bash from prose, which goes through the SAME deferred path. So 1 broken BashExec produces ~13 wasted steps.
WHY THE EXECUTOR ADAPTER MAKES THIS WORSE: with the AILANG benchmark teaching prompt (~21K tokens) as the user message, every wasted step costs ~32K input tokens (system + tool catalog + 21K task + growing history). The cumulative token use balloons because the per-step cost is 30x bigger than this fizzbuzz repro. That's the 'massive token use' the user reported; the token amplification was just exposing the existing dogfood bug at a larger scale.
FIXES, ranked:
-
[P0, one-line] Change .tools.ohmy_pi default in motoko_agent/.motoko/config/dogfood/config.json from true to false. Same for hybrid (paired). Until the inbox-based ohmy_pi path lands, the default profile must not route BashExec through it. The explicit error message tells operators to do this; ship that as the default.
-
[P0, two-line] In agent_loop_v2's split_by_backend, when ohmy_pi=true is set in config but the Delegated backend isn't wired (Today), refuse to start with a clear error: 'profile sets tools.ohmy_pi=true but inbox-based dispatch is not wired in this build. Set tools.ohmy_pi=false.' Don't let the agent enter a loop where every BashExec fails identically — fail-fast at startup is cheaper than fail-13-times-per-task.
-
[P1] Land the ohmy_pi inbox-based wait actually wired. Per the error envelope's own text it's a deferred milestone; this is a real feature gap that breaks the dogfood story.
-
[P2] Telemetry: emit a 'profile_misconfigured' event when N consecutive BashExec calls fail with delegated_backend_not_wired:true. Even one such event in a session run should be enough — but counting them avoids false positives on intentional dispatch tests.
OBSERVED ON: motoko_agent dev (commit pre v0.18, fresh checkout 2026-05-08), session JSONL session_2026-05-08T18-04-20-602Z.jsonl. Repro is deterministic — every run with default dogfood config exhibits the same pattern. Filed as bug, not feature, because the shipping default profile cannot complete a hello-world task.
Happy to send the JSONL or a 10-line test that asserts BashExec exit_code != 1 with delegated_backend_not_wired payload (would catch this as a smoke regression in CI).
Binary info (auto-attached):
ailang version: v0.17.0-46-g7cd0c743-dirty
binary md5: b3402e190334480662a76ed841e341de
binary path: /Users/mark/go/bin/ailang
git commit: 7cd0c74
Reported by: motoko_explore via ailang messages
Investigation triggered by 'massive token use' on executor-adapter benchmark runs. Pinpointed: every BashExec under the dogfood profile fails immediately with this error envelope, AND the agent then loops trying to recover.
{
"error": true,
"delegated_backend_not_wired": true,
"tool": "BashExec",
"note": "v2 split_by_backend selected Delegated; the inbox-based ohmy_pi wait is deferred to a follow-up milestone. Re-run with settings.ohmy_pi=false or use a tool that resolves to the Native backend."
}
A/B repro on motoko_agent HEAD, openrouter/openai/gpt-4o-mini, fizzbuzz task ('Write fizzbuzz.py, run with python3, declare done.'):
Same model, same task, same workdir, same code. The ONLY change is .tools.ohmy_pi (and .tools.hybrid, since they're paired in dogfood). Result: 15x faster, 6x fewer output tokens, task actually completes.
ROOT CAUSE: the dogfood profile ships .tools.ohmy_pi=true and .tools.hybrid=true. With MOTOKO_AGENT_V2=1 (the default since the v2 migration), agent_loop_v2's split_by_backend routes BashExec to the Delegated backend. The Delegated backend's inbox-based wait (the ohmy_pi async path) is documented as 'deferred to a follow-up milestone' — the error envelope's own text says so. Result: every BashExec returns a structured error, the model retries, gets the same error, eventually gives up and emits prose. Each wasted step costs tokens linearly.
Why this hides in the metrics: the error has exit_code=1 + truncated=false + a structured payload that LOOKS like a real tool failure. The agent reasonably interprets it as 'BashExec is broken on this system, I should not retry' — but with most providers the model retries 1-2 more times before declaring inability, AND in hybrid mode it falls back to extracting bash from prose, which goes through the SAME deferred path. So 1 broken BashExec produces ~13 wasted steps.
WHY THE EXECUTOR ADAPTER MAKES THIS WORSE: with the AILANG benchmark teaching prompt (~21K tokens) as the user message, every wasted step costs ~32K input tokens (system + tool catalog + 21K task + growing history). The cumulative token use balloons because the per-step cost is 30x bigger than this fizzbuzz repro. That's the 'massive token use' the user reported; the token amplification was just exposing the existing dogfood bug at a larger scale.
FIXES, ranked:
[P0, one-line] Change .tools.ohmy_pi default in motoko_agent/.motoko/config/dogfood/config.json from true to false. Same for hybrid (paired). Until the inbox-based ohmy_pi path lands, the default profile must not route BashExec through it. The explicit error message tells operators to do this; ship that as the default.
[P0, two-line] In agent_loop_v2's split_by_backend, when ohmy_pi=true is set in config but the Delegated backend isn't wired (Today), refuse to start with a clear error: 'profile sets tools.ohmy_pi=true but inbox-based dispatch is not wired in this build. Set tools.ohmy_pi=false.' Don't let the agent enter a loop where every BashExec fails identically — fail-fast at startup is cheaper than fail-13-times-per-task.
[P1] Land the ohmy_pi inbox-based wait actually wired. Per the error envelope's own text it's a deferred milestone; this is a real feature gap that breaks the dogfood story.
[P2] Telemetry: emit a 'profile_misconfigured' event when N consecutive BashExec calls fail with delegated_backend_not_wired:true. Even one such event in a session run should be enough — but counting them avoids false positives on intentional dispatch tests.
OBSERVED ON: motoko_agent dev (commit pre v0.18, fresh checkout 2026-05-08), session JSONL session_2026-05-08T18-04-20-602Z.jsonl. Repro is deterministic — every run with default dogfood config exhibits the same pattern. Filed as bug, not feature, because the shipping default profile cannot complete a hello-world task.
Happy to send the JSONL or a 10-line test that asserts BashExec exit_code != 1 with delegated_backend_not_wired payload (would catch this as a smoke regression in CI).
Binary info (auto-attached):
ailang version: v0.17.0-46-g7cd0c743-dirty
binary md5: b3402e190334480662a76ed841e341de
binary path: /Users/mark/go/bin/ailang
git commit: 7cd0c74
Reported by: motoko_explore via ailang messages