Fix mobile streaming throttle and stale IsProcessing state#449
Fix mobile streaming throttle and stale IsProcessing state#449
Conversation
PureWeen
left a comment
There was a problem hiding this comment.
Review: PR #449 — Fix mobile streaming throttle and stale IsProcessing state
This PR bundles two independent mobile fixes. PR #447 (streaming throttle bypass) was already merged, and this PR re-applies that commit on top of the stale-IsProcessing fix.
✅ Fix 1: Streaming render throttle bypass
Identical to PR #447 which already passed review and merged. The _contentDirty volatile flag + isStreaming parameter to ShouldRefresh() are correct and well-tested. Tests are the same 2 tests from #447. ✅
✅ Fix 2: Stale IsProcessing guard (_recentTurnEndSessions)
Root cause diagnosis is correct. The debounced sessions_list event from the server is captured at snapshot time, which can be before the server's CompleteResponse runs. When it arrives on mobile after the authoritative TurnEnd has already set IsProcessing=false, it overwrites it with true, causing the session to appear stuck.
Implementation is sound:
_recentTurnEndSessionsis aConcurrentDictionary<string, DateTime>— correct for thread-safe access from the TurnEnd handler (UI thread viaInvokeOnUI) andSyncRemoteSessions(bridge background thread and triggered from task continuations).- The guard only fires when
rs.IsProcessing == trueAND a recent TurnEnd exists — i.e., it only blocks false→true transitions, not true→false. The server saying "done" always wins. ✅ - 5-second window is reasonable — debounce on the server is typically 500ms, so 5s gives 10× headroom. ✅
TurnStartclears the guard via_recentTurnEndSessions.TryRemovebefore its ownInvokeOnUIfires, so the next turn can setIsProcessing=truefromsessions_list. ✅OnSessionCompletealso sets the guard as belt-and-suspenders for lostTurnEndmessages. ✅MessageCountis updated unconditionally (outside the guard block) — correct, message count from the server is always reliable. ✅
Test coverage is appropriate:
SyncRemoteSessions_DoesNotResetIsProcessing_AfterTurnEnd— core scenario ✅SyncRemoteSessions_AllowsIsProcessingTrue_OnInitialSync— no guard without prior TurnEnd ✅TurnStart_ClearsGuard_AllowsSessionsListToSetProcessing— guard lifecycle ✅SyncRemoteSessions_AllowsSessionsListToClearProcessing— false always passes ✅
⚠️ Minor: _recentTurnEndSessions not cleared on reconnect
ReconnectAsync clears _sessions, _closedSessionIds, _closedSessionNames, etc., but does NOT clear _recentTurnEndSessions. If a user reconnects with the same session names as a previous connection (common with persistent sessions), stale guard entries could theoretically block the first sessions_list update from setting IsProcessing=true for up to 5 seconds.
In practice, this is unlikely to cause a user-visible issue because:
- The guard entries auto-expire after 5s
TurnStartclears them when a new turn begins
But for consistency with how other remote-state dicts are managed, consider adding _recentTurnEndSessions.Clear() alongside the other clears in ReconnectAsync.
The same applies to _remoteStreamingSessions and _requestedHistorySessions — these pre-exist this PR and are also not cleared on reconnect. Worth a cleanup pass.
✅ No test isolation concerns
SetTurnEndGuardForTesting and SyncRemoteSessions exposed as internal follow the established pattern (SetStreamingSessionForTesting etc.). ✅
Verdict
✅ Approve. Both fixes are correct and well-tested. The reconnect cleanup gap is minor (5s auto-expiry mitigates it) and pre-exists this PR for the analogous _remoteStreamingSessions dict.
🔍 Squad Review — PR #449PR: Fix mobile streaming throttle and stale IsProcessing state Fix 1: Render Throttle Bypass for Streaming Content (commits
|
🔍 Squad Review — PR #449 R1Status: ✅ Approve This PR consolidates PR #447 (streaming throttle + stale IsProcessing fixes) plus adds diagnostic logging. ChangesCommit 1: Streaming throttle bypass — fixes mobile not rendering content_delta Verified✅ Thread safety: volatile, ConcurrentDictionary, InvokeOnUI Verdict✅ Approve — Ready to merge. |
🔍 PR Review Round 1 (4-Model Consensus)Models: claude-opus-4.6 ×2, claude-sonnet-4.6 (pending), gpt-5.3-codex Context: Relationship to PR #447PR #447 (already merged) added the same 🟡 Moderate (consensus: 2–3/4 models)M1 —
|
| # | Issue | Models |
|---|---|---|
| m1 | _recentTurnEndSessions entries never removed on session delete/disconnect — only TryRemove in TurnStart. Entries accumulate (inert after 5s but not freed). Low risk for realistic session counts. |
3/4 |
| m2 | TurnStart_ClearsGuard_AllowsSessionsListToSetProcessing test uses SetTurnEndGuardForTesting to simulate TurnStart rather than _bridgeClient.FireTurnStart(). Doesn't exercise the real TurnStart handler path. |
2/4 |
| m3 | SessionComplete handler doesn't clear session.IsResumed = false (unlike TurnEnd handler). If TurnEnd is lost and only SessionComplete fires, IsResumed stays true → watchdog uses 600s timeout instead of 120s. |
1/4 |
| m4 | _recentTurnEndSessions not cleared in DisposeAsync — harmless since the service is being destroyed, but inconsistent with other CTS/dictionary cleanup patterns. |
1/4 |
✅ Verified Non-Issues
- Guard only blocks
IsProcessing=true— false (done) updates always propagate correctly. ✅ - Guard window (5s) vs debounce (500ms) — 10× safety margin is well-calibrated. ✅
MessageCountupdated outside guard — monotonically increasing, purely informational, no inconsistency. ✅_contentDirtynon-atomic read-clear — lost set just adds one 50ms render cycle;HandleContentalways callsScheduleRender()which re-firesRefreshState. Acceptable. ✅SyncRemoteSessionsmadeinternal— test-only,InternalsVisibleTopattern consistent with codebase. ✅- SessionComplete extending guard timestamp — correct and intentional; provides fresh 5s window from most recent authoritative signal. ✅
- Guard set inside
InvokeOnUIbut checked on bridge thread —ConcurrentDictionarymakes this safe. ✅
⚠️ Request Changes
The stale IsProcessing guard is architecturally sound and the streaming throttle fix is correct. Two items before merge:
- M1 (blocking): Add
_recentTurnEndSessions.Clear()toReconnectAsync— prevents stale guards from the previous connection blocking IsProcessing updates after reconnect - M2 (recommended): Move
_contentDirty = trueinside the session-visibility guard to avoid unnecessary throttle bypasses under background streaming
All other items are non-blocking.
🔍 Squad Review — PR #449 Round 15-model consensus review (claude-opus-4.6 ×2, claude-sonnet-4.6 ×2, gpt-5.3-codex) SummaryWell-scoped PR fixing two real mobile/remote-mode bugs: (1) streaming content not rendering due to throttle, and (2) stale CI StatusNo CI checks configured. Consensus Findings🟡 M1 —
|
| # | Finding | Models |
|---|---|---|
| 🟢 m1 | volatile on _contentDirty is harmless but unnecessary (both paths are UI thread) |
3/5 |
| 🟢 m2 | TurnStart_ClearsGuard test uses SetTurnEndGuardForTesting instead of FireTurnStart — doesn't exercise actual event handler |
2/5 |
| 🟢 m3 | Missing [BRIDGE-SESSION-COMPLETE] diagnostic tag in docs/skill (already present in code) |
2/5 |
What's Good
- ✅ Correct architecture: TurnEnd guard only blocks
IsProcessing=true, always allowsfalsefrom server - ✅ Streaming guard and TurnEnd guard interact correctly (streaming checked first)
- ✅ TurnStart properly clears the guard for new turns
- ✅
SyncRemoteSessions→internalfollows existing test pattern (SetRemoteStreamingGuardForTesting) - ✅ 6 well-structured tests covering the key scenarios
- ✅
FireTurnStart/FireTurnEnd/FireSessionComplete/FireStateChangedtest helpers enable clean event simulation
Verdict: ⚠️ Request Changes
Two non-blocking but important fixes needed:
- M1 — Add
session.IsResumed = falseto SessionComplete handler (+ lastAssistant completion). Simple one-liner that aligns with TurnEnd handler. - M2 — Clear
_recentTurnEndSessionsinReconnectAsync. Simple one-liner.
M3 and M4 are improvement suggestions (shorter streaming throttle, longer/named TTL) that can be addressed in a follow-up.
🔄 R1 Addendum (sonnet results in)After the R1 comment was posted, the 4th model (claude-sonnet-4.6) completed. It confirms M1 and M2, and upgrades one minor finding to moderate: 🟡 Upgraded:
|
Logs when IsProcessing changes, when the TurnEnd guard blocks a stale snapshot, and when the streaming guard skips a session. Helps diagnose future stale-state issues on mobile. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…s stuck The streaming guard (_remoteStreamingSessions) blocks SyncRemoteSessions from updating IsProcessing. If TurnStart fires but TurnEnd is lost (connection drop), the guard stays active forever, causing permanent stale 'busy/sending' state that even the sync button can't fix. ForceRefreshRemoteAsync now: - Applies server's authoritative IsProcessing to ALL sessions (bypasses guards) - Clears stuck streaming guards for sessions the server reports as idle Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
887629c to
f75e5c4
Compare
…start Previously, eager resume only triggered when LastPrompt was saved (debounced). If the app was killed before the debounce fired, actively-running sessions were only loaded as lazy placeholders with no SDK connection — appearing idle/stuck even though the headless server was still processing them. Now checks events.jsonl via IsSessionStillProcessing() to detect sessions that are genuinely still active, regardless of whether LastPrompt was persisted. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When resuming a session after app restart, the RESUME-ABORT logic detected unmatched tool.execution_start events and aborted the session to clear pending state. But in persistent mode, the headless CLI keeps running tools while PolyPilot is down — those tools WILL complete. Now checks IsSessionStillProcessing() before aborting. If the CLI is still active (events.jsonl fresh + last event is a tool/active event), we skip the abort and instead set IsProcessing=true with watchdog flags so the session correctly shows as working. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nd tasks Sessions that received IDLE-DEFER (background agents/shells active) were getting killed by the watchdog after 300s because they weren't flagged as multi-agent. The 300s freshness window was too short — subagents can run for 10+ minutes without producing events.jsonl writes. Added HasDeferredIdle flag on SessionState, set when IDLE-DEFER fires. The watchdog Case B now uses the 1800s multi-agent freshness window for any session with HasDeferredIdle=true, not just multi-agent groups. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ling When relaunch.sh kills the old instance and immediately starts a new one, port 4322 may still be in TIME_WAIT. Previously, Start() would try once and silently give up, leaving the bridge dead. Mobile clients could never reconnect because there was no server listening. Now Start() tries to bind immediately and, if the port is busy, starts the accept loop anyway — it retries via TryRestartListenerAsync with exponential backoff (2s, 4s, 8s... up to 30s). Also increased the retry delay from 500ms to 2s to better match macOS TIME_WAIT behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔍 Squad Re-Review — PR #449 Round 2New commits since Round 1 (approved):
Tests: ✅ 2929/2929 pass (confirmed locally, same run as PR #446 R3) Commit-by-Commit Analysis
|
🔍 Squad Review — PR #449 R2Status: ✅ Approve SummaryR2 adds 5 critical stability fixes for mobile/bridge reliability, session lifecycle, and watchdog accuracy. New Commits Analysis✅ f75e5c4 — Fix force-sync not clearing stale IsProcessing when streaming guard stuckProblem: Streaming guard prevented force-sync from clearing stale IsProcessing ✅ 70ae3e1 — Eagerly resume sessions still active on headless server after restartProblem: App restart lost active server sessions (weren't in active-sessions.json) ✅ 8225a6b — Skip abort for sessions where CLI is still actively processingProblem: Abort attempt on active CLI session caused errors ✅ b16625b — Watchdog uses longer freshness window for sessions with background tasksProblem: 120s watchdog timeout too short for multi-agent workers with sub-agents ✅ 6d4b1f0 — WsBridge retries port binding on startupProblem: Port 4322 in TIME_WAIT after relaunch → silent failure → mobile can't connect Correctness Verified✅ All fixes address real production edge cases Verdict✅ Approve — Excellent stability improvements. Ready to merge. |
R2 Review — Fix mobile streaming throttle and stale IsProcessing stateTests: (running — will update if any failures) Previous Findings Status
M1 — Still blocking
Fix (2 lines in _recentTurnEndSessions.Clear();
_remoteStreamingSessions.Clear();New Findings (Commits 4-6)🟡 N1 —
If a deferred-idle session is aborted and then reconnected (watchdog or bridge reconnect path) before // In AbortSessionAsync and CompleteResponse cleanup:
state.HasDeferredIdle = false;🟡 N2 —
This is a narrow race window (must crash in that specific event window), but creates bad UX when hit. A comment documenting the known false-positive window and the 600s self-correction would be sufficient if a code fix is complex. 🟢 N3 —
🟢 N4 — WsBridge The null-propagation guard Summary
Verdict:
|
🔄 Squad Re-Review — PR #449 Round 22-model consensus review (claude-opus-4.6, claude-sonnet-4.6) R1 Finding Status
New Findings🟡 M5 — RESUME-SKIP-ABORT sets IsProcessing=true without starting watchdog (1/2 models)
The new Fix: Call 🟡 M6 —
|
M5: Start processing watchdog after RESUME-SKIP-ABORT sets IsProcessing=true.
Without this, a session marked as processing had no recovery if the CLI
finishes without emitting session.idle.
R1-M1: Add IsResumed=false to SessionComplete handler in Bridge.cs.
Missing from the belt-and-suspenders cleanup that clears 4 other fields.
M6: Clear HasDeferredIdle in CompleteResponse, AbortSessionAsync, error
handler, and all watchdog completion paths. Prevents stale flag from
granting an unwarranted 1800s freshness window on the next turn.
R1-M2: Clear _recentTurnEndSessions in ReconnectAsync and server restart.
Entries were only removed on TurnStart — after reconnect, stale entries
could block legitimate IsProcessing updates.
Minor: Deduplicate TryBindListener/TryRestartListenerAsync in WsBridgeServer.
Both had identical prefix iteration loops. TryRestartListenerAsync now
delegates to TryBindListener.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Changes
1. Mobile streaming content not rendering (render throttle bypass)
On mobile (remote mode),
content_deltais the only event source. The 500ms render throttle inRenderThrottle.ShouldRefresh()blocked every render attempt, causing streaming content to not appear until the user hit refresh.Fix: Added
isStreamingparameter toShouldRefresh()that bypasses the throttle. A_contentDirtyvolatile flag in Dashboard.razor is set byHandleContentand consumed byRefreshState.2. Stale IsProcessing on mobile after multi-agent completion
The debounced
sessions_list(500ms) could arrive with a staleIsProcessing=truesnapshot captured before the server'sCompleteResponseran. This overwrote the authoritative TurnEnd'sIsProcessing=false, causing sessions to show 'busy/sending' indefinitely on mobile.Fix:
_recentTurnEndSessionsguard: when TurnEnd clears IsProcessing, marks the session soSyncRemoteSessionswon't re-setIsProcessing=truefrom stale snapshots (5s expiry)SessionCompletealso clears IsProcessing as belt-and-suspenders fallbackSyncRemoteSessionsfor future debuggingTests
FireTurnStart,FireTurnEnd,FireSessionComplete,FireStateChangedon StubWsBridgeClientSetTurnEndGuardForTesting()andSyncRemoteSessions()exposed as internal for tests