Description
Problem
I've observed multiple times now the agent tool producing no output, wasting $ and forcing the parent LLM to redo work and use up precious context.
Here's an example:
Solution
I had two three models do independent investigations into this question (GLM-5.2, Opus 4.8, and GPT-5.5). GPT-5.5 and Opus found the same issue, and GLM found another that Opus confirmed.
GLM-5.2 thought the bug had to do with the error from a non-critical cost-bookkeeping step being propagated as a Go error from the tool, which the fantasy framework treats as a critical failure and so discards the sub-agent's already-produced output — specifically this code in coordinator.runSubAgent (internal/agent/coordinator.go:1271-1274):
// Update parent session cost
if err := c.updateParentSessionCost(ctx, session.ID, params.SessionID); err != nil {
return fantasy.ToolResponse{}, err
}
Opus 4.8 thought it had to do with result.Response.Content.Text() only reading the last step's first text block, so a sub-agent that stops on a StopWhen condition right after a tool-call step returns empty text — specifically this code in coordinator.runSubAgent (internal/agent/coordinator.go:1276):
return fantasy.NewTextResponse(result.Response.Content.Text()), nil
I asked Opus 4.8 to give me its thoughts on GLM-5.2's theory, and it said:
GLM-5.2's theory is plausible and is a real, valid bug. The sub-agent has already run successfully and produced result, then a pure-bookkeeping step (updateParentSessionCost, which gets the child session, gets the parent session, and saves the parent) returns its error and throws the output away. This path is genuinely reachable: parent-session-not-found, context cancellation during the DB call, or SQLite lock/WAL contention — and the repo itself notes WAL/header desync "under concurrent sub-agents" (internal/db/connect.go:140). Because it fires after a fully successful sub-agent run on a common failure mode, it matches the reported symptom ("the agent produced output but nothing came back") more directly than my theory does, and the log-and-continue fix is correct and necessary.
The one caveat is that GLM-5.2's description of the downstream mechanism is partly inaccurate: it describes the fantasy Generate/executeTools path returning (nil, error) and the parent loop "swallowing" the error by returning (agentResult, nil). Crush actually runs through Stream, where the tool error propagates via toolExecutionErr and causes the parent sessionAgent.Run to return (nil, err) and error out the whole turn — the user-visible effect is the same (no sub-agent output reaches the caller), but the error isn't silently swallowed the way the writeup states. My own theory (last-step/first-text-block extraction) is a separate, narrower edge case that only triggers on abnormal termination via a StopWhen stop condition, so the two are independent, complementary bugs rather than competing explanations.
As both are possible bugs that could affect the Agent tool, I'll send in a PR fixing both.
EDIT: I also had GPT-5.5 do its own independent investigation and it came to the same conclusion as Opus, so that's a good sign that a legitimate bug and possible real fix to this problem has been found.
Version
commit 4abf206
Environment
Fedora Linux
Description
Problem
I've observed multiple times now the
agenttool producing no output, wasting $ and forcing the parent LLM to redo work and use up precious context.Here's an example:
Solution
I had
twothree models do independent investigations into this question (GLM-5.2, Opus 4.8, and GPT-5.5). GPT-5.5 and Opus found the same issue, and GLM found another that Opus confirmed.GLM-5.2 thought the bug had to do with the error from a non-critical cost-bookkeeping step being propagated as a Go error from the tool, which the fantasy framework treats as a critical failure and so discards the sub-agent's already-produced output — specifically this code in
coordinator.runSubAgent(internal/agent/coordinator.go:1271-1274):Opus 4.8 thought it had to do with
result.Response.Content.Text()only reading the last step's first text block, so a sub-agent that stops on aStopWhencondition right after a tool-call step returns empty text — specifically this code incoordinator.runSubAgent(internal/agent/coordinator.go:1276):I asked Opus 4.8 to give me its thoughts on GLM-5.2's theory, and it said:
As both are possible bugs that could affect the Agent tool, I'll send in a PR fixing both.
EDIT: I also had GPT-5.5 do its own independent investigation and it came to the same conclusion as Opus, so that's a good sign that a legitimate bug and possible real fix to this problem has been found.
Version
commit 4abf206
Environment
Fedora Linux