diff --git a/docs/CORRUPTED_THOUGHT_SIGNATURE_ANALYSIS.md b/docs/CORRUPTED_THOUGHT_SIGNATURE_ANALYSIS.md new file mode 100644 index 0000000000..a55164e9a7 --- /dev/null +++ b/docs/CORRUPTED_THOUGHT_SIGNATURE_ANALYSIS.md @@ -0,0 +1,100 @@ +# Crush + Gemini 3 Pro `Corrupted thought signature` 问题分析报告 + +- 版本:crush v0.77.0(本地源码树 `../crush`) +- 依赖:`charm.land/fantasy v0.31.1`,底层 `google.golang.org/genai v1.60.0` +- 现象:搭配 Gemini 3 Pro,运行一段时间后频繁报 `bad request: Corrupted thought signature`,且同一会话持续复现、跨重启仍失败(日志 `1.txt`/`2.txt`,同一 `session_id e1d37cb9` 自 10:16 起反复失败)。 + +--- + +## 1. 结论(TL;DR) + +`Corrupted thought signature` 是 Gemini 的 API 报错。Gemini 3 / 2.5 思考模型在返回 function call 时,会在对应 part 上附带加密的 `thoughtSignature`,**后续请求必须把每个签名原样、按 part 一一对应地回传**。 + +crush 的消息模型每条 assistant 消息**只保存一个 `ReasoningContent`**,无法表达「多个签名各属于不同 part」。当 Gemini 3 Pro 发起**多个/并行 function call** 时: + +1. crush 把多个签名**拼接成一个字符串**回传 → 签名损坏; +2. 损坏的签名被**持久化进 SQLite 会话历史**,之后每次请求都重放被污染的历史 → 错误持续复现、跨重启仍失败。 + +--- + +## 2. fantasy 侧的契约(关键约束) + +文件:`charm.land/fantasy@v0.31.1/providers/google/google.go` + +### 2.1 解析响应(响应 → fantasy 事件) +- `Stream`(行 728–826)与 `mapResponse`(行 1362–1404):**每个 function call 的签名都作为一次独立的 `OnReasoningEnd` 事件**抛出,并携带 `ReasoningMetadata{Signature, ToolID: <该 call 的 id>}`。 +- 即:**一次助手回合会触发多次 `OnReasoningEnd`,每次对应一个工具调用的签名**;纯文本回合的签名则 `ToolID == ""`。 + +### 2.2 回放请求(fantasy 消息 → genai 请求) +- `toGooglePrompt`(行 414–477)按 `Content` 顺序遍历 assistant parts: + - 遇到带 google 元数据的 `ReasoningPart`,把签名暂存到 `currentReasoningMetadata`(本身不产出 genai part); + - 在**紧随其后的下一个 text / toolCall part** 上设置 `Part.ThoughtSignature`,然后清空暂存。 +- **结论:签名与 part 的对应完全依赖顺序**——每个签名必须放在它自己的 `ReasoningPart` 里,且紧贴它所属的那个 part。没有 google 元数据的 reasoning part 会被直接跳过(行 425–428),不会误挂。 + +--- + +## 3. crush 侧的缺陷(真正的 bug) + +`internal/message/content.go` 的 getter `ReasoningContent()`(行 153)只返回**第一个** reasoning part,且所有 `Append*` 都作用于这个唯一的 reasoning part —— 模型层面只有一个 `ReasoningContent`。 + +### 缺陷 1:多个签名被拼接成一个(核心触发点) +- `internal/agent/agent.go:884-888` 每次 `OnReasoningEnd` 调用 `AppendThoughtSignature`。 +- `content.go:270-285` 的 `AppendThoughtSignature` 执行 `c.ThoughtSignature + signature`,把 N 个不同 base64 签名首尾相连成一个串,`ToolID` 只保留最后一个。 +- 回放时 `ToAIMessage`(`content.go:520-525`)把这坨拼接串作为**单个**签名发回 → `Corrupted thought signature`。 + +### 缺陷 2:工具调用 part 不携带各自签名 +- `ToolCall` 结构体(`content.go:101-107`)无签名字段。 +- `ToAIMessage`(`content.go:528-535`)重建 tool call 时不带任何 google 元数据。每个 function call 的独立签名无处安放。 + +### 缺陷 3:签名被 `if reasoning.Thinking != ""` 门控丢弃 +- `ToAIMessage:510` 仅在思考文本非空时才发出 reasoning part(连同签名)。 +- Gemini 工具回合常常签名非空但思考文本为空,此时签名整体不回传。 + +### 缺陷 4:mutator 静默清空签名 +逐字段重建结构体时未拷贝签名字段,会在回合中途擦除已存签名: +- `FinishThinking`(`content.go:316`):未拷贝 `ThoughtSignature`/`ToolID`/`ResponsesData`。 +- `AppendReasoningContent`(`content.go:249`):未拷贝 `ThoughtSignature`/`ToolID`/`Signature`/`ResponsesData`。 +- `SetReasoningResponsesData`(`content.go:302`):未拷贝 `ThoughtSignature`/`ToolID`/`Signature`。 +- 若给 `ToolCall` 加签名字段,`FinishToolCall`(346)、`AppendToolCallInput`(362)同样会擦除,需一并修。 + +### 缺陷 5:顺序错位 +`ToAIMessage` 把 part 重排为 `text → reasoning → 所有 toolcall`,与 fantasy 要求的「签名紧贴其 part」不符。 + +--- + +## 4. 为什么「跑一段时间后必现且持续」 + +- 早期简单回合(纯文本 / 单工具调用)拼接退化为单签名,多数能蒙混过去; +- 一旦 Gemini 3 Pro 发起**多个/并行 function call**,签名被拼接 → 损坏; +- 关键:**损坏的拼接签名被持久化进 SQLite 会话历史**,之后该会话每次请求都重放被污染的历史 → 错误持续复现,甚至跨重启(与日志中同一 `session_id` 反复失败完全吻合)。 + +> 排除项:`agent.go:800` 的 `prepared.Messages[i].ProviderOptions = nil` 是**消息级** `Message.ProviderOptions`(cache-control 用),与签名所在的 **part 级** `ReasoningPart.ProviderOptions[google]` 不是同一字段,已核实不影响签名。 + +--- + +## 5. 修复方案 + +核心思路:让 crush 模型能**逐工具调用**保存签名,并在 `ToAIMessage` 中按 fantasy 要求的顺序(每个签名一个独立 `ReasoningPart`,紧贴其 part)回放。 + +### 改动 1 — `internal/message/content.go` +- `ToolCall` 增加字段 `ThoughtSignature string` `json:"thought_signature,omitempty"`。 +- `FinishToolCall`(346)、`AppendToolCallInput`(362)、`AddToolCall` 重建时保留 `ThoughtSignature`。 +- 修复 `FinishThinking`(316)、`AppendReasoningContent`(249)、`SetReasoningResponsesData`(302):重建时拷贝 `ThoughtSignature`/`ToolID`/`ResponsesData`/`Signature`,不再清空。 +- 新增 `SetToolCallThoughtSignature(id, sig string)`。 +- **重写 `ToAIMessage` 的 Assistant 分支**: + - 思考/文本签名(`ToolID==""`):发一个 `ReasoningPart`,**仅当 `ThoughtSignature != ""` 时**才写 `ProviderOptions[google]`,随后发 text part; + - 每个 tool call:若其 `ThoughtSignature != ""`,**先发只含该签名的 `ReasoningPart`**(`ReasoningMetadata{Signature, ToolID: call.ID}`),紧接着发该 `ToolCallPart`。 + +### 改动 2 — `internal/agent/agent.go`(Stream 闭包内) +- 新增 `pendingThoughtSigs := map[string]string{}`。 +- `OnReasoningEnd`(877) 处理 google 元数据:`ToolID != ""` 时存 `pendingThoughtSigs[ToolID] = Signature`(不再拼接);`ToolID == ""` 时才 `AppendThoughtSignature(sig, "")`。 +- `OnToolCall`(923)(终态)创建 `message.ToolCall` 时设置 `ThoughtSignature = pendingThoughtSigs[tc.ToolCallID]`。 + +### 改动 3 — 测试 +`internal/message` 增加 `ToAIMessage` 单测:构造「思考 + 2 个并行工具调用、各带不同签名」的 assistant 消息,断言输出顺序为 `reasoning(sig_text)?, text, reasoning(sig1)+toolcall1, reasoning(sig2)+toolcall2`,每个 `ReasoningPart` 仅含单个签名且 `ToolID` 正确。 + +### 已损坏会话说明 +此修复只防止**新回合**污染;已写入旧会话历史的「拼接签名」无法还原,受影响会话需**新开 session**。 + +### 验证 +`go build ./...` + `go test ./internal/message/... ./internal/agent/...`;再用 Gemini 3 Pro 跑含多次并行工具调用的长会话回归确认不再报错。 diff --git a/internal/agent/agent.go b/internal/agent/agent.go index f4972b181a..cced02dd91 100644 --- a/internal/agent/agent.go +++ b/internal/agent/agent.go @@ -723,6 +723,14 @@ func (a *sessionAgent) Run(ctx context.Context, call SessionAgentCall) (result * // message of the turn is the value reachable through this // pointer when the defer runs. var currentAssistant *message.Message + // pendingThoughtSigs buffers Google Gemini per-tool-call thought + // signatures keyed by tool call ID. The provider emits a tool call's + // signature (via OnReasoningEnd with a ToolID) BEFORE the tool call + // itself arrives, so we stash it here and attach it once OnToolCall + // creates the tool call. Each signature must be replayed verbatim on its + // own tool call or Gemini rejects the request with "Corrupted thought + // signature". + pendingThoughtSigs := make(map[string]string) // Drain any debounced message updates before returning. message.Service // already flushes synchronously on terminal updates, but a defer here // guarantees the contract at every Run exit (success, error, panic @@ -864,6 +872,7 @@ func (a *sessionAgent) Run(ctx context.Context, call SessionAgentCall) (result * callContext = context.WithValue(callContext, tools.SupportsImagesContextKey, largeModel.CatwalkCfg.SupportsImages) callContext = context.WithValue(callContext, tools.ModelNameContextKey, largeModel.CatwalkCfg.Name) currentAssistant = &assistantMsg + clear(pendingThoughtSigs) return callContext, prepared, err }, OnReasoningStart: func(id string, reasoning fantasy.ReasoningContent) error { @@ -883,7 +892,16 @@ func (a *sessionAgent) Run(ctx context.Context, call SessionAgentCall) (result * } if googleData, ok := reasoning.ProviderMetadata[google.Name]; ok { if reasoning, ok := googleData.(*google.ReasoningMetadata); ok { - currentAssistant.AppendThoughtSignature(reasoning.Signature, reasoning.ToolID) + // A signature bound to a tool call (ToolID set) must travel + // with that specific tool call, not be concatenated onto the + // shared reasoning block. Buffer it until OnToolCall creates + // the tool call. Signatures without a ToolID belong to the + // final text answer and stay on the reasoning content. + if reasoning.ToolID != "" { + pendingThoughtSigs[reasoning.ToolID] = reasoning.Signature + } else { + currentAssistant.AppendThoughtSignature(reasoning.Signature, reasoning.ToolID) + } } } if openaiData, ok := reasoning.ProviderMetadata[openai.Name]; ok { @@ -927,6 +945,9 @@ func (a *sessionAgent) Run(ctx context.Context, call SessionAgentCall) (result * Input: tc.Input, ProviderExecuted: false, Finished: true, + // Attach the buffered Google thought signature (if any) so it + // is persisted and replayed verbatim with this tool call. + ThoughtSignature: pendingThoughtSigs[tc.ToolCallID], } currentAssistant.AddToolCall(toolCall) // Use parent ctx instead of genCtx to ensure the update succeeds diff --git a/internal/message/content.go b/internal/message/content.go index c62cdf5161..485aa36602 100644 --- a/internal/message/content.go +++ b/internal/message/content.go @@ -104,6 +104,11 @@ type ToolCall struct { Input string `json:"input"` ProviderExecuted bool `json:"provider_executed"` Finished bool `json:"finished"` + // ThoughtSignature is the per-tool-call thought signature returned by + // Google Gemini thinking models. It must be replayed verbatim, attached + // to this specific tool call, or Gemini rejects the request with + // "Corrupted thought signature". + ThoughtSignature string `json:"thought_signature,omitempty"` } func (ToolCall) isPart() {} @@ -251,10 +256,13 @@ func (m *Message) AppendReasoningContent(delta string) { for i, part := range m.Parts { if c, ok := part.(ReasoningContent); ok { m.Parts[i] = ReasoningContent{ - Thinking: c.Thinking + delta, - Signature: c.Signature, - StartedAt: c.StartedAt, - FinishedAt: c.FinishedAt, + Thinking: c.Thinking + delta, + Signature: c.Signature, + ThoughtSignature: c.ThoughtSignature, + ToolID: c.ToolID, + ResponsesData: c.ResponsesData, + StartedAt: c.StartedAt, + FinishedAt: c.FinishedAt, } found = true } @@ -303,10 +311,13 @@ func (m *Message) SetReasoningResponsesData(data *openai.ResponsesReasoningMetad for i, part := range m.Parts { if c, ok := part.(ReasoningContent); ok { m.Parts[i] = ReasoningContent{ - Thinking: c.Thinking, - ResponsesData: data, - StartedAt: c.StartedAt, - FinishedAt: c.FinishedAt, + Thinking: c.Thinking, + Signature: c.Signature, + ThoughtSignature: c.ThoughtSignature, + ToolID: c.ToolID, + ResponsesData: data, + StartedAt: c.StartedAt, + FinishedAt: c.FinishedAt, } return } @@ -318,10 +329,13 @@ func (m *Message) FinishThinking() { if c, ok := part.(ReasoningContent); ok { if c.FinishedAt == 0 { m.Parts[i] = ReasoningContent{ - Thinking: c.Thinking, - Signature: c.Signature, - StartedAt: c.StartedAt, - FinishedAt: time.Now().Unix(), + Thinking: c.Thinking, + Signature: c.Signature, + ThoughtSignature: c.ThoughtSignature, + ToolID: c.ToolID, + ResponsesData: c.ResponsesData, + StartedAt: c.StartedAt, + FinishedAt: time.Now().Unix(), } } return @@ -348,10 +362,12 @@ func (m *Message) FinishToolCall(toolCallID string) { if c, ok := part.(ToolCall); ok { if c.ID == toolCallID { m.Parts[i] = ToolCall{ - ID: c.ID, - Name: c.Name, - Input: c.Input, - Finished: true, + ID: c.ID, + Name: c.Name, + Input: c.Input, + ProviderExecuted: c.ProviderExecuted, + Finished: true, + ThoughtSignature: c.ThoughtSignature, } return } @@ -364,10 +380,12 @@ func (m *Message) AppendToolCallInput(toolCallID string, inputDelta string) { if c, ok := part.(ToolCall); ok { if c.ID == toolCallID { m.Parts[i] = ToolCall{ - ID: c.ID, - Name: c.Name, - Input: c.Input + inputDelta, - Finished: c.Finished, + ID: c.ID, + Name: c.Name, + Input: c.Input + inputDelta, + ProviderExecuted: c.ProviderExecuted, + Finished: c.Finished, + ThoughtSignature: c.ThoughtSignature, } return } @@ -387,6 +405,21 @@ func (m *Message) AddToolCall(tc ToolCall) { m.Parts = append(m.Parts, tc) } +// SetToolCallThoughtSignature attaches a Google thought signature to the tool +// call with the given ID, preserving its other fields. No-op if not found. +func (m *Message) SetToolCallThoughtSignature(toolCallID, signature string) { + if signature == "" { + return + } + for i, part := range m.Parts { + if c, ok := part.(ToolCall); ok && c.ID == toolCallID { + c.ThoughtSignature = signature + m.Parts[i] = c + return + } + } +} + func (m *Message) SetToolCalls(tc []ToolCall) { // remove any existing tool call part it could have multiple parts := make([]ContentPart, 0) @@ -503,11 +536,17 @@ func (m *Message) ToAIMessage() []fantasy.Message { case Assistant: var parts []fantasy.MessagePart text := strings.TrimSpace(m.Content().Text) - if text != "" { - parts = append(parts, fantasy.TextPart{Text: text}) - } reasoning := m.ReasoningContent() - if reasoning.Thinking != "" { + + // Emit the reasoning block (if any) BEFORE the text part. The Google + // provider replays a thought signature by attaching it to the part + // immediately following its ReasoningPart, so the order matters. We + // only carry the Google signature here when it is NOT bound to a + // specific tool call (ToolID == ""), i.e. the signature of the final + // text answer; per-tool-call signatures are emitted next to their + // tool call below. + hasGoogleTextSig := reasoning.ThoughtSignature != "" && reasoning.ToolID == "" + if reasoning.Thinking != "" || hasGoogleTextSig { reasoningPart := fantasy.ReasoningPart{Text: reasoning.Thinking, ProviderOptions: fantasy.ProviderOptions{}} if reasoning.Signature != "" { reasoningPart.ProviderOptions[anthropic.Name] = &anthropic.ReasoningOptionMetadata{ @@ -517,7 +556,7 @@ func (m *Message) ToAIMessage() []fantasy.Message { if reasoning.ResponsesData != nil { reasoningPart.ProviderOptions[openai.Name] = reasoning.ResponsesData } - if reasoning.ThoughtSignature != "" { + if hasGoogleTextSig { reasoningPart.ProviderOptions[google.Name] = &google.ReasoningMetadata{ Signature: reasoning.ThoughtSignature, ToolID: reasoning.ToolID, @@ -525,7 +564,23 @@ func (m *Message) ToAIMessage() []fantasy.Message { } parts = append(parts, reasoningPart) } + if text != "" { + parts = append(parts, fantasy.TextPart{Text: text}) + } for _, call := range m.ToolCalls() { + // Replay the per-tool-call thought signature in its own + // ReasoningPart placed immediately before the tool call, so the + // Google provider attaches it to exactly this function call. + if call.ThoughtSignature != "" { + parts = append(parts, fantasy.ReasoningPart{ + ProviderOptions: fantasy.ProviderOptions{ + google.Name: &google.ReasoningMetadata{ + Signature: call.ThoughtSignature, + ToolID: call.ID, + }, + }, + }) + } parts = append(parts, fantasy.ToolCallPart{ ToolCallID: call.ID, ToolName: call.Name, diff --git a/internal/message/content_test.go b/internal/message/content_test.go index 04e601012a..c62ecb4751 100644 --- a/internal/message/content_test.go +++ b/internal/message/content_test.go @@ -7,6 +7,7 @@ import ( "testing" "charm.land/fantasy" + "charm.land/fantasy/providers/google" "github.com/stretchr/testify/require" ) @@ -116,6 +117,86 @@ func TestToAIMessage_ASCIIButInvalidBase64(t *testing.T) { require.Equal(t, mediaLoadFailedPlaceholder, textContent.Text) } +// TestToAIMessage_GoogleThoughtSignaturesPerToolCall verifies that each tool +// call's Google thought signature is replayed in its own ReasoningPart placed +// immediately before that tool call, never concatenated. Concatenation or +// misplacement is what triggers Gemini's "Corrupted thought signature" error. +func TestToAIMessage_GoogleThoughtSignaturesPerToolCall(t *testing.T) { + t.Parallel() + + msg := &Message{ + Role: Assistant, + Parts: []ContentPart{ + ReasoningContent{Thinking: "let me think", FinishedAt: 1}, + ToolCall{ID: "call_1", Name: "view", Input: "{}", Finished: true, ThoughtSignature: "SIG1"}, + ToolCall{ID: "call_2", Name: "ls", Input: "{}", Finished: true, ThoughtSignature: "SIG2"}, + }, + } + + messages := msg.ToAIMessage() + require.Len(t, messages, 1) + content := messages[0].Content + // reasoning(thinking), reasoning(SIG1), toolcall_1, reasoning(SIG2), toolcall_2 + require.Len(t, content, 5) + + // [0] thinking reasoning, no google signature attached. + r0, ok := content[0].(fantasy.ReasoningPart) + require.True(t, ok) + require.Equal(t, "let me think", r0.Text) + require.Nil(t, r0.ProviderOptions[google.Name]) + + assertGoogleSig := func(i int, sig, toolID string) { + t.Helper() + rp, ok := content[i].(fantasy.ReasoningPart) + require.True(t, ok, "part %d must be a ReasoningPart", i) + meta, ok := rp.ProviderOptions[google.Name].(*google.ReasoningMetadata) + require.True(t, ok, "part %d must carry google ReasoningMetadata", i) + require.Equal(t, sig, meta.Signature) + require.Equal(t, toolID, meta.ToolID) + } + + assertGoogleSig(1, "SIG1", "call_1") + tc1, ok := content[2].(fantasy.ToolCallPart) + require.True(t, ok) + require.Equal(t, "call_1", tc1.ToolCallID) + + assertGoogleSig(3, "SIG2", "call_2") + tc2, ok := content[4].(fantasy.ToolCallPart) + require.True(t, ok) + require.Equal(t, "call_2", tc2.ToolCallID) +} + +// TestToAIMessage_GoogleTextAnswerSignature verifies the final-answer thought +// signature (no tool ID) is replayed on a ReasoningPart immediately before the +// text part. +func TestToAIMessage_GoogleTextAnswerSignature(t *testing.T) { + t.Parallel() + + msg := &Message{ + Role: Assistant, + Parts: []ContentPart{ + ReasoningContent{ThoughtSignature: "TEXTSIG", FinishedAt: 1}, + TextContent{Text: "final answer"}, + }, + } + + messages := msg.ToAIMessage() + require.Len(t, messages, 1) + content := messages[0].Content + require.Len(t, content, 2) + + rp, ok := content[0].(fantasy.ReasoningPart) + require.True(t, ok) + meta, ok := rp.ProviderOptions[google.Name].(*google.ReasoningMetadata) + require.True(t, ok) + require.Equal(t, "TEXTSIG", meta.Signature) + require.Empty(t, meta.ToolID) + + tp, ok := content[1].(fantasy.TextPart) + require.True(t, ok) + require.Equal(t, "final answer", tp.Text) +} + func BenchmarkPromptWithTextAttachments(b *testing.B) { cases := []struct { name string