Skip to content

Fix cross-turn assistant overwrite + separate reasoning bubbles#676

Open
RBrid wants to merge 1 commit into
openclaw:masterfrom
RBrid:user/rbrid/ReasoningReset2
Open

Fix cross-turn assistant overwrite + separate reasoning bubbles#676
RBrid wants to merge 1 commit into
openclaw:masterfrom
RBrid:user/rbrid/ReasoningReset2

Conversation

@RBrid
Copy link
Copy Markdown
Contributor

@RBrid RBrid commented Jun 3, 2026

Three related fixes for chat timeline rendering, validated by three rounds of adversarial dual-model code review.

  1. Cross-turn assistant overwrite (the original repro) In a user -> reply1 -> tool-call -> tool-output -> reply2 flow, reply2 was silently overwriting reply1's text in place instead of appending a new bubble. Cause: the chat.message final for reply2 carried reconcilePrevious=true, and UpsertAssistant's fast-path merged into the most recent assistant entry unconditionally - including a previously finalised one.

    Fix in ChatTimelineReducer.UpsertAssistant: narrow reconcile to only merge into entries that are still IsStreaming=true. A finalised assistant entry (IsStreaming=false) belongs to a completed turn and must not be overwritten. The byte-equal duplicate safety net remains unconditional so genuine duplicate emissions still collapse.

    Defense-in-depth: extract ClearStreamingAtTurnBoundary helper and call from both AddLocalUser (production typed-input path) and ApplyUserMessage (gateway-injected path), hoisted above the nonce early-return. This handles the dropped-ChatTurnEndEvent scenario where a stale ActiveAssistantId could otherwise leak across turns.

  2. Concatenated thinking prose (Option B) The gateway emits a stream:"item", kind:"reasoning", phase:"end" bracket marker between distinct thinking passes within a single turn. Map it to a new ChatReasoningEndEvent that nulls ActiveReasoningId so the next reasoning chunk starts a fresh bubble instead of appending to / replacing the previous one. Also tag delta-state ChatMessageEvents with IsStreaming=true so the reducer's new reconcile guard distinguishes them from finals.

  3. Trace logging plumbing Add IOpenClawLogger.Trace verbose channel with a default no-op so existing implementers (tests, console logger, setup logger) don't need updating. Logger.Trace in the tray is gated by OPENCLAW_TRAY_TRACE=1|true. Forward Trace through DiagnosticTeeLogger and AppLogger; SetupOpenClawLogger inherits the default no-op (no opt-in gate available in setup engine). OpenClawGatewayClient now surfaces item-event kind+phase metadata (no payload content) at trace level for shape diagnostics.

Test coverage added in ChatTimelineReducerTests:

  • StaleStreamingPreview_DoesNotMergeAcrossUserBoundary
  • UserMessage_AsTurnBoundary_PreventsCrossTurnOverwrite
  • AddLocalUser_AsTurnBoundary_PreventsCrossTurnOverwrite
  • AddLocalUser_ClearsStaleStreamingPreviewAcrossTurns

Validation:

  • ./build.ps1: succeeds
  • OpenClaw.Tray.Tests: 944 passed
  • OpenClaw.Shared.Tests: 2045 passed / 29 skipped

Three related fixes for chat timeline rendering, validated by three rounds
of adversarial dual-model code review.

1. Cross-turn assistant overwrite (the original repro)
   In a user -> reply1 -> tool-call -> tool-output -> reply2 flow, reply2
   was silently overwriting reply1's text in place instead of appending a
   new bubble. Cause: the chat.message final for reply2 carried
   reconcilePrevious=true, and UpsertAssistant's fast-path merged into the
   most recent assistant entry unconditionally - including a previously
   finalised one.

   Fix in ChatTimelineReducer.UpsertAssistant: narrow reconcile to only
   merge into entries that are still IsStreaming=true. A finalised
   assistant entry (IsStreaming=false) belongs to a completed turn and
   must not be overwritten. The byte-equal duplicate safety net remains
   unconditional so genuine duplicate emissions still collapse.

   Defense-in-depth: extract ClearStreamingAtTurnBoundary helper and
   call from both AddLocalUser (production typed-input path) and
   ApplyUserMessage (gateway-injected path), hoisted above the nonce
   early-return. This handles the dropped-ChatTurnEndEvent scenario
   where a stale ActiveAssistantId could otherwise leak across turns.

2. Concatenated thinking prose (Option B)
   The gateway emits a stream:"item", kind:"reasoning", phase:"end"
   bracket marker between distinct thinking passes within a single
   turn. Map it to a new ChatReasoningEndEvent that nulls
   ActiveReasoningId so the next reasoning chunk starts a fresh bubble
   instead of appending to / replacing the previous one. Also tag
   delta-state ChatMessageEvents with IsStreaming=true so the reducer's
   new reconcile guard distinguishes them from finals.

3. Trace logging plumbing
   Add IOpenClawLogger.Trace verbose channel with a default no-op so
   existing implementers (tests, console logger, setup logger) don't
   need updating. Logger.Trace in the tray is gated by
   OPENCLAW_TRAY_TRACE=1|true. Forward Trace through DiagnosticTeeLogger
   and AppLogger; SetupOpenClawLogger inherits the default no-op (no
   opt-in gate available in setup engine). OpenClawGatewayClient now
   surfaces item-event kind+phase metadata (no payload content) at
   trace level for shape diagnostics.

Test coverage added in ChatTimelineReducerTests:
- StaleStreamingPreview_DoesNotMergeAcrossUserBoundary
- UserMessage_AsTurnBoundary_PreventsCrossTurnOverwrite
- AddLocalUser_AsTurnBoundary_PreventsCrossTurnOverwrite
- AddLocalUser_ClearsStaleStreamingPreviewAcrossTurns

Validation:
- ./build.ps1: succeeds
- OpenClaw.Tray.Tests: 944 passed
- OpenClaw.Shared.Tests: 2045 passed / 29 skipped

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented Jun 3, 2026

Codex review: needs real behavior proof before merge. Reviewed June 3, 2026, 4:26 PM ET / 20:26 UTC.

Summary
The PR changes chat timeline reducer/provider behavior to stop cross-turn assistant bubble overwrites, split reasoning passes on item-end markers, add gated trace logging, and add regression coverage.

Reproducibility: yes. Source inspection shows current master applies ReconcilePrevious from live assistant chat.message frames and merges into the latest assistant entry without checking IsStreaming, which matches the overwrite path described by the PR.

Review metrics: 2 noteworthy metrics.

  • Diff surface: 11 files changed, +370/-6. The patch touches shared chat models, reducer behavior, tray provider mapping, logging, and tests, so live behavior proof matters more than a small isolated unit change.
  • Regression tests added: 10 test cases added. The new tests cover reducer and provider-level cases for overwrite prevention and reasoning bubble boundaries.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🐚 platinum hermit
Result: blocked until real behavior proof is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P1] Add redacted screenshots, a recording, terminal/live output, or logs showing a real tray/gateway flow where the second assistant reply does not overwrite the first and reasoning passes appear as separate bubbles.
  • After adding proof, update the PR body so ClawSweeper can re-review automatically; if it does not, ask a maintainer to comment @clawsweeper re-review.

Proof guidance:

  • [P1] Needs real behavior proof before merge: No after-fix real tray/gateway chat proof is attached; tests and reported validation are supplemental but not sufficient for this external PR. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Mantis proof suggestion
A visible desktop proof would materially help verify the tray timeline rendering path that unit tests cannot fully demonstrate. A maintainer can ask Mantis to capture proof by posting a new PR comment that starts with the OpenClaw Mantis account mention, followed by:

visual task: verify a tray chat flow where a second assistant reply after a tool event appears as a new bubble and separate reasoning passes render as separate bubbles.

Risk before merge

  • [P1] No after-fix real tray/gateway proof shows the second assistant reply rendering as a new bubble or reasoning passes rendering as separate bubbles.
  • [P1] The diff changes live chat timeline reconciliation, so unit tests reduce risk but do not fully prove real gateway event ordering and tray rendering behavior.
  • [P1] This read-only review did not run AGENTS.md build/test validation; the PR body reports success, but CI or maintainer validation still needs to confirm it.

Maintainer options:

  1. Require live chat rendering proof (recommended)
    Ask for redacted proof from a real tray/gateway chat flow showing the overwrite fix and separate reasoning bubbles before merge.
  2. Accept tests-only evidence deliberately
    Maintainers could choose to merge on the regression tests and reported validation alone, but that accepts the unproven live gateway ordering risk.

Next step before merge

  • [P1] The remaining blocker is contributor-provided real behavior proof plus normal CI/maintainer validation, not a narrow automated repair.

Security
Cleared: The diff adds gated trace logging for event shape metadata only and does not introduce a concrete security or supply-chain concern.

Review details

Best possible solution:

Merge only after redacted live tray/gateway proof and required validation confirm finalized assistant bubbles are preserved and reasoning passes split correctly in the real UI.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection shows current master applies ReconcilePrevious from live assistant chat.message frames and merges into the latest assistant entry without checking IsStreaming, which matches the overwrite path described by the PR.

Is this the best way to solve the issue?

Yes for the code direction: limiting reconcile merges to still-streaming assistant entries and closing reasoning bubbles on item end is a narrow reducer/provider fix. Merge should still wait for live behavior proof because the changed path depends on real gateway event ordering.

AGENTS.md: found, but no applicable review policy affected this item.

Codex review notes: model gpt-5.5, reasoning high; reviewed against be64bea0bec8.

Label changes

Label changes:

  • add merge-risk: 🚨 message-delivery: Merging changes the reducer/provider path that decides whether chat messages appear as new bubbles or overwrite existing timeline entries.

Label justifications:

  • P1: The PR targets a broken chat timeline workflow where assistant output can be silently overwritten or reasoning output can be visually conflated.
  • merge-risk: 🚨 message-delivery: Merging changes the reducer/provider path that decides whether chat messages appear as new bubbles or overwrite existing timeline entries.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🐚 platinum hermit.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: No after-fix real tray/gateway chat proof is attached; tests and reported validation are supplemental but not sufficient for this external PR. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
Evidence reviewed

What I checked:

  • Current master still has unconditional reconcile merging: On current master, ChatMessageEvent is always applied as non-streaming while passing ReconcilePrevious into UpsertAssistant, and UpsertAssistant merges into the most recent assistant entry whenever reconcilePrevious is true without checking whether that entry is still streaming. (src/OpenClaw.Chat/ChatTimelineReducer.cs:16, be64bea0bec8)
  • Provider sends reconcilePrevious for assistant chat.message frames: Current master maps assistant chat.message frames to ChatMessageEvent(..., ReconcilePrevious: true), so the reducer path is reachable from the live tray provider. (src/OpenClaw.Tray.WinUI/Chat/OpenClawChatDataProvider.cs:1316, be64bea0bec8)
  • PR diff narrows the reducer behavior and adds coverage: The PR diff adds ChatReasoningEndEvent, tags streaming chat.message frames with IsStreaming, limits reconcilePrevious merges to streaming assistant entries, and adds regression tests for cross-turn overwrite and separate reasoning bubbles. (src/OpenClaw.Chat/ChatTimelineReducer.cs:344, e1cbba31dc0f)
  • Real behavior proof remains absent: The PR body reports ./build.ps1 and unit test results, but it does not attach a redacted screenshot, recording, live output, terminal output, or logs showing the fixed tray/gateway chat rendering after the patch. (e1cbba31dc0f)
  • Feature history points to recent chat timeline work: git log and blame show the affected reducer/provider code on current master came from commit 7d9152f, the v0.6.0 chat approval/timeline hardening change. (src/OpenClaw.Tray.WinUI/Chat/OpenClawChatDataProvider.cs:1316, 7d9152f427a3)

Likely related people:

  • RBrid: Blame and file history show the current reducer/provider behavior came from the recent chat approval/timeline hardening commit, and this PR is a follow-up in the same code path. (role: introduced current behavior and recent area contributor; confidence: high; commits: 7d9152f427a3; files: src/OpenClaw.Chat/ChatTimelineReducer.cs, src/OpenClaw.Tray.WinUI/Chat/OpenClawChatDataProvider.cs, tests/OpenClaw.Tray.Tests/ChatTimelineReducerTests.cs)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P1 Urgent regression or broken agent/channel workflow affecting real users now. labels Jun 3, 2026
@RBrid RBrid marked this pull request as ready for review June 3, 2026 20:21
@clawsweeper clawsweeper Bot added the merge-risk: 🚨 message-delivery 🚨 Merging this PR could drop, duplicate, misroute, suppress, or wrongly target messages. label Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 message-delivery 🚨 Merging this PR could drop, duplicate, misroute, suppress, or wrongly target messages. P1 Urgent regression or broken agent/channel workflow affecting real users now. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant