Skip to content

SSE reconnect skips events: synthetic control frames pollute EventSource lastEventId #272

Description

@torkian

Describe the bug

The async-job SSE stream emits an id: field on synthetic control events (stream.mode, job.status, job.shutdown, job.error), using a fabricated per-connection counter (sequence_id) rather than a real event-store id.

Browsers persist the last seen SSE id: as EventSource.lastEventId and replay it on reconnect, where the server uses it as the cursor for EventStore.get_events_async (WHERE id > :after_id). Because a synthetic id can be larger than the next real DB event's _id, on reconnect the cursor overshoots and the query silently skips real events — a client that disconnects mid-stream can lose tool-call updates, section completions, or even the final report artifact, with no error surfaced.

Per the WHATWG SSE spec, only events that carry an id field should advance the last-event-ID buffer; control frames that are not durable resume points should omit id:.

Steps to reproduce

  1. Start an async deep-research job and open its SSE stream.
  2. Let the server emit several real DB events (e.g. ids 1–6) followed by a synthetic stream.mode/job.status control frame — the control frame ships id: 7 (a fabricated value).
  3. Disconnect after the control frame; the browser stores lastEventId = 7.
  4. Reconnect. The cursor 7 is now >= the next real DB event id, so WHERE id > 7 skips real event 7 and any others ≤ 7.

Expected behavior

Reconnect should resume exactly after the last persisted event, never skipping real events. Synthetic control frames should not move the resume cursor.

Additional context

Two related observations while investigating:

  • The route reads the resume cursor only from the URL path (/stream/{last_event_id}). A native browser EventSource cannot encode the cursor in the URL — on an automatic reconnect it sends the standard Last-Event-ID request header, which the route currently ignores. So a standard client's auto-reconnect starts from scratch.
  • These are the only two SSE generators (_sse_generator_postgres, _sse_generator_polling) and both have the same defect via a duplicated inner format_sse closure.

I have a fix ready (emit id: only for real DB-backed events; honor the Last-Event-ID header; consolidate the duplicated formatter) with tests, and will open a PR referencing this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions