Long-session context growth triggers Opus tool-call decoder-flip (malformed calls leak to captain); add context-threshold session breaking

## Problem

During a long continuous supervision session, firstmate (Opus 4.8 — `claude-opus-4-8[1m]`, Claude Code 2.1.193) emitted **malformed tool calls**: the tool-call decoder flipped to legacy Anthropic XML syntax (`<invoke name="Read"><parameter name="file_path">…</parameter></invoke>`, prefixed by a stray junk token) instead of the harness's expected format. This happened twice in one session.

- **Once** the raw `<invoke>…</invoke>` text was rendered verbatim into the **captain-visible transcript**, the turn ended with no tool executed and no retry — to the captain it looked like firstmate had hung ("I'm not receiving anything back").
- **Once** the harness caught it ("Your tool call was malformed and could not be parsed. Please retry.") and recovery was clean.

## Root cause (upstream)

This is the decoder-level model/harness regression tracked in **anthropics/claude-code#49747** — "Opus 4.7 mixes legacy XML tool-use format into JSON tool calls on longer payloads." The upstream reporter confirmed it is **not fixable from the prompt layer**. Our session corroborates and broadens it:

- Persists on **Opus 4.8** (not just 4.7).
- Hits **built-in tools** (`Read`), whole call as XML — not MCP-specific.
- Correlates with **total context length** (~500k tokens accumulated), even when the tool-call arguments are trivial (a single file path). So total context — not per-call payload size — is at least one independent trigger.

## Why firstmate is unusually exposed

firstmate's core loop is long-lived by design: continuous supervision, heartbeats, and many small repeated tool calls (peek / wake-drain / status reads / watcher re-arm) that accumulate context indefinitely within a single session. That is exactly the long-context regime where the decoder-flip rate climbs, so firstmate trips this more often than a typical short session would.

## Proposed mitigation: context-threshold session breaking

firstmate already guarantees that **"a restart must be a non-event"** (AGENTS.md §5): all truth lives in tmux, `state/`, `data/backlog.md`, `data/secondmates.md`, persistent secondmate homes, and treehouse — conversation memory is only a cache. We can exploit that invariant:

- Add a **context-size guard**: when firstmate's own context crosses a threshold (token count / context-% — observed trouble near ~500k tokens), proactively break the session and let the existing bootstrap + recovery protocol rebuild state in a **fresh, smaller context**.
- **Minimum viable version:** a guard / heartbeat line that warns the captain ("context is large; recommend a recovery-safe reset") and relies on the captain (or the `/afk` daemon) to `/clear`, after which recovery reconstructs everything.
- **Stretch:** an automatic recovery-safe reset where the harness permits an agent to trigger its own `/clear`/compaction.
- **Net effect:** caps the context regime that triggers the upstream glitch, and also lowers cost/latency on marathon sessions.

## Open questions

- Threshold value, and how firstmate reads its own context size **portably across harnesses** (claude/codex/opencode/pi).
- Can an agent trigger its own recovery-safe `/clear`, or must this be captain-/daemon-initiated?
- Interaction with the `/afk` daemon (it owns the watcher while `state/.afk` exists) and with in-flight crewmates mid-session.

## Evidence

- Upstream root cause: anthropics/claude-code#49747
- This session: Opus 4.8 `claude-opus-4-8[1m]`, Claude Code 2.1.193; malformed `<invoke name="Read">` leaked to the captain twice at ~500k tokens of context.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long-session context growth triggers Opus tool-call decoder-flip (malformed calls leak to captain); add context-threshold session breaking #95

Problem

Root cause (upstream)

Why firstmate is unusually exposed

Proposed mitigation: context-threshold session breaking

Open questions

Evidence

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Long-session context growth triggers Opus tool-call decoder-flip (malformed calls leak to captain); add context-threshold session breaking #95

Description

Problem

Root cause (upstream)

Why firstmate is unusually exposed

Proposed mitigation: context-threshold session breaking

Open questions

Evidence

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions