Describe the bug
The async-job SSE stream emits an id: field on synthetic control events (stream.mode, job.status, job.shutdown, job.error), using a fabricated per-connection counter (sequence_id) rather than a real event-store id.
Browsers persist the last seen SSE id: as EventSource.lastEventId and replay it on reconnect, where the server uses it as the cursor for EventStore.get_events_async (WHERE id > :after_id). Because a synthetic id can be larger than the next real DB event's _id, on reconnect the cursor overshoots and the query silently skips real events — a client that disconnects mid-stream can lose tool-call updates, section completions, or even the final report artifact, with no error surfaced.
Per the WHATWG SSE spec, only events that carry an id field should advance the last-event-ID buffer; control frames that are not durable resume points should omit id:.
Steps to reproduce
- Start an async deep-research job and open its SSE stream.
- Let the server emit several real DB events (e.g. ids 1–6) followed by a synthetic
stream.mode/job.status control frame — the control frame ships id: 7 (a fabricated value).
- Disconnect after the control frame; the browser stores
lastEventId = 7.
- Reconnect. The cursor
7 is now >= the next real DB event id, so WHERE id > 7 skips real event 7 and any others ≤ 7.
Expected behavior
Reconnect should resume exactly after the last persisted event, never skipping real events. Synthetic control frames should not move the resume cursor.
Additional context
Two related observations while investigating:
- The route reads the resume cursor only from the URL path (
/stream/{last_event_id}). A native browser EventSource cannot encode the cursor in the URL — on an automatic reconnect it sends the standard Last-Event-ID request header, which the route currently ignores. So a standard client's auto-reconnect starts from scratch.
- These are the only two SSE generators (
_sse_generator_postgres, _sse_generator_polling) and both have the same defect via a duplicated inner format_sse closure.
I have a fix ready (emit id: only for real DB-backed events; honor the Last-Event-ID header; consolidate the duplicated formatter) with tests, and will open a PR referencing this issue.
Describe the bug
The async-job SSE stream emits an
id:field on synthetic control events (stream.mode,job.status,job.shutdown,job.error), using a fabricated per-connection counter (sequence_id) rather than a real event-store id.Browsers persist the last seen SSE
id:asEventSource.lastEventIdand replay it on reconnect, where the server uses it as the cursor forEventStore.get_events_async(WHERE id > :after_id). Because a syntheticidcan be larger than the next real DB event's_id, on reconnect the cursor overshoots and the query silently skips real events — a client that disconnects mid-stream can lose tool-call updates, section completions, or even the final report artifact, with no error surfaced.Per the WHATWG SSE spec, only events that carry an
idfield should advance the last-event-ID buffer; control frames that are not durable resume points should omitid:.Steps to reproduce
stream.mode/job.statuscontrol frame — the control frame shipsid: 7(a fabricated value).lastEventId = 7.7is now >= the next real DB event id, soWHERE id > 7skips real event 7 and any others ≤ 7.Expected behavior
Reconnect should resume exactly after the last persisted event, never skipping real events. Synthetic control frames should not move the resume cursor.
Additional context
Two related observations while investigating:
/stream/{last_event_id}). A native browserEventSourcecannot encode the cursor in the URL — on an automatic reconnect it sends the standardLast-Event-IDrequest header, which the route currently ignores. So a standard client's auto-reconnect starts from scratch._sse_generator_postgres,_sse_generator_polling) and both have the same defect via a duplicated innerformat_sseclosure.I have a fix ready (emit
id:only for real DB-backed events; honor theLast-Event-IDheader; consolidate the duplicated formatter) with tests, and will open a PR referencing this issue.