[motoko_explore] dogfood profile structurally wastes tokens — ohmy_pi=true + MOTOKO_AGENT_V2=1 routes BashExec to an explicitly-not-wired Delegated backend; agent thrashes 13+ extra steps per task

Investigation triggered by 'massive token use' on executor-adapter benchmark runs. Pinpointed: every BashExec under the dogfood profile fails immediately with this error envelope, AND the agent then loops trying to recover.

  {
    "error": true,
    "delegated_backend_not_wired": true,
    "tool": "BashExec",
    "note": "v2 split_by_backend selected Delegated; the inbox-based ohmy_pi wait is deferred to a follow-up milestone. Re-run with settings.ohmy_pi=false or use a tool that resolves to the Native backend."
  }

A/B repro on motoko_agent HEAD, openrouter/openai/gpt-4o-mini, fizzbuzz task ('Write fizzbuzz.py, run with python3, declare done.'):

| Setting          | Steps | Input tokens | Output tokens | Duration | Outcome |
|------------------|-------|--------------|---------------|----------|---------|
| ohmy_pi=true (.motoko/config/dogfood/config.json default)  | 15+ (still going when killed at 90s) | 3,695+ at step 15, growing every step | ~1,290 | 90+ s | gave up; prose-only finishing 'I can't run scripts due to system limitations' |
| ohmy_pi=false (only line changed)                          | 3 | 3,613 | 201 | 6.1 s | ✓ wrote fizzbuzz.py, ran it, declared done |

Same model, same task, same workdir, same code. The ONLY change is .tools.ohmy_pi (and .tools.hybrid, since they're paired in dogfood). Result: 15x faster, 6x fewer output tokens, task actually completes.

ROOT CAUSE: the dogfood profile ships .tools.ohmy_pi=true and .tools.hybrid=true. With MOTOKO_AGENT_V2=1 (the default since the v2 migration), agent_loop_v2's split_by_backend routes BashExec to the Delegated backend. The Delegated backend's inbox-based wait (the ohmy_pi async path) is documented as 'deferred to a follow-up milestone' — the error envelope's own text says so. Result: every BashExec returns a structured error, the model retries, gets the same error, eventually gives up and emits prose. Each wasted step costs tokens linearly.

Why this hides in the metrics: the error has exit_code=1 + truncated=false + a structured payload that LOOKS like a real tool failure. The agent reasonably interprets it as 'BashExec is broken on this system, I should not retry' — but with most providers the model retries 1-2 more times before declaring inability, AND in hybrid mode it falls back to extracting bash from prose, which goes through the SAME deferred path. So 1 broken BashExec produces ~13 wasted steps.

WHY THE EXECUTOR ADAPTER MAKES THIS WORSE: with the AILANG benchmark teaching prompt (~21K tokens) as the user message, every wasted step costs ~32K input tokens (system + tool catalog + 21K task + growing history). The cumulative token use balloons because the per-step cost is 30x bigger than this fizzbuzz repro. That's the 'massive token use' the user reported; the token amplification was just exposing the existing dogfood bug at a larger scale.

FIXES, ranked:

1. [P0, one-line] Change .tools.ohmy_pi default in motoko_agent/.motoko/config/dogfood/config.json from true to false. Same for hybrid (paired). Until the inbox-based ohmy_pi path lands, the default profile must not route BashExec through it. The explicit error message tells operators to do this; ship that as the default.

2. [P0, two-line] In agent_loop_v2's split_by_backend, when ohmy_pi=true is set in config but the Delegated backend isn't wired (Today), refuse to start with a clear error: 'profile sets tools.ohmy_pi=true but inbox-based dispatch is not wired in this build. Set tools.ohmy_pi=false.' Don't let the agent enter a loop where every BashExec fails identically — fail-fast at startup is cheaper than fail-13-times-per-task.

3. [P1] Land the ohmy_pi inbox-based wait actually wired. Per the error envelope's own text it's a deferred milestone; this is a real feature gap that breaks the dogfood story.

4. [P2] Telemetry: emit a 'profile_misconfigured' event when N consecutive BashExec calls fail with delegated_backend_not_wired:true. Even one such event in a session run should be enough — but counting them avoids false positives on intentional dispatch tests.

OBSERVED ON: motoko_agent dev (commit pre v0.18, fresh checkout 2026-05-08), session JSONL session_2026-05-08T18-04-20-602Z.jsonl. Repro is deterministic — every run with default dogfood config exhibits the same pattern. Filed as bug, not feature, because the shipping default profile cannot complete a hello-world task.

Happy to send the JSONL or a 10-line test that asserts BashExec exit_code != 1 with delegated_backend_not_wired payload (would catch this as a smoke regression in CI).

---
Binary info (auto-attached):
ailang version: v0.17.0-46-g7cd0c743-dirty
binary md5: b3402e190334480662a76ed841e341de
binary path: /Users/mark/go/bin/ailang
git commit: 7cd0c743bd30e3c92f28e7ffa946d1cbc5c4d59a

---
_Reported by: motoko_explore via ailang messages_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[motoko_explore] dogfood profile structurally wastes tokens — ohmy_pi=true + MOTOKO_AGENT_V2=1 routes BashExec to an explicitly-not-wired Delegated backend; agent thrashes 13+ extra steps per task #225

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Setting	Steps	Input tokens	Output tokens	Duration	Outcome
ohmy_pi=true (.motoko/config/dogfood/config.json default)	15+ (still going when killed at 90s)	3,695+ at step 15, growing every step	~1,290	90+ s	gave up; prose-only finishing 'I can't run scripts due to system limitations'
ohmy_pi=false (only line changed)	3	3,613	201	6.1 s	✓ wrote fizzbuzz.py, ran it, declared done

Uh oh!

[motoko_explore] dogfood profile structurally wastes tokens — ohmy_pi=true + MOTOKO_AGENT_V2=1 routes BashExec to an explicitly-not-wired Delegated backend; agent thrashes 13+ extra steps per task #225

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions