Skip to content

fix(server): chunk-form SSE keepalive shares the stream's completion id#1479

Merged
jundot merged 1 commit into
jundot:mainfrom
jcalvert:fix/sse-keepalive-chunk-shares-stream-id
May 28, 2026
Merged

fix(server): chunk-form SSE keepalive shares the stream's completion id#1479
jundot merged 1 commit into
jundot:mainfrom
jcalvert:fix/sse-keepalive-chunk-shares-stream-id

Conversation

@jcalvert
Copy link
Copy Markdown
Contributor

Problem

Since v0.3.9 the default SSE keepalive mode (chunk) emits a chat.completion.chunk whose id is the sentinel chatcmpl-keepalive, sent as the first frame of every stream. The real completion chunks then arrive under a different, freshly-minted id.

Strict OpenAI stream accumulators assume all chunks in one streamed completion share a single id. The official OpenAI Go SDK (openai-go) ChatCompletionAccumulator latches the first chunk's id and silently rejects every later chunk whose id differs:

if len(cc.ID) == 0 {
    cc.ID = chunk.ID          // becomes "chatcmpl-keepalive"
} else if cc.ID != chunk.ID {
    return false              // every real chunk dropped
}

So the streamed tool_calls, finish_reason, and usage are all discarded, producing an empty assistant message on any tool-calling turn. (Plain content can survive if read from live deltas, which is why prose replies often look fine while tool-calling turns come back empty.)

Reported in #1478. This is a regression from the fix for #839, which switched the default keepalive from an SSE comment to this chunk form.

Fix

Pre-mint response_id in create_chat_completion and reuse it for both the keepalive frame and stream_chat_completion. The keepalive (_chat_keepalive_chunk) now carries the stream's real id, so it is a true no-op for spec accumulators while remaining a parseable data event for the comment-intolerant clients chunk mode was added for (OpenClaw / WorkBuddy, #839/#1035). comment and off modes are unchanged.

Verification

  • New unit tests in tests/test_sse_keepalive.py assert the chat keepalive frame reuses the given id and never emits the sentinel.
  • Validated end-to-end against a local build: with the patch the first keepalive frame's id matches the completion id, and an unmodified openai-go client that previously received empty tool-calling turns now gets the tool call + usage correctly.

Notes

Fixes #1478.

The default `chunk` SSE keepalive (`sse_keepalive_mode`, default since v0.3.9)
emits a `chat.completion.chunk` carrying the sentinel id `chatcmpl-keepalive`,
which differs from the real completion chunks' id. Strict OpenAI stream
accumulators (e.g. the official `openai-go` SDK) assume every chunk in one
streamed completion shares a single `id`: they latch the first chunk's id and
silently drop every later chunk whose id differs — discarding the real
`tool_calls`/`finish_reason`/`usage` and yielding empty assistant turns on
tool-calling requests.

Pre-mint `response_id` in the chat handler and reuse it for both the keepalive
frame (`_chat_keepalive_chunk`) and `stream_chat_completion`, so the keepalive
is a true no-op for those clients while remaining a parseable data event for
clients that can't handle SSE comment lines (the reason chunk mode was added,
jundot#839). `comment` and `off` modes are unaffected.

Fixes jundot#1478.
@jundot
Copy link
Copy Markdown
Owner

jundot commented May 28, 2026

Thanks for catching this and the clean fix. Merging into the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants