fix(server): chunk-form SSE keepalive shares the stream's completion id#1479
Merged
jundot merged 1 commit intoMay 28, 2026
Merged
Conversation
The default `chunk` SSE keepalive (`sse_keepalive_mode`, default since v0.3.9) emits a `chat.completion.chunk` carrying the sentinel id `chatcmpl-keepalive`, which differs from the real completion chunks' id. Strict OpenAI stream accumulators (e.g. the official `openai-go` SDK) assume every chunk in one streamed completion shares a single `id`: they latch the first chunk's id and silently drop every later chunk whose id differs — discarding the real `tool_calls`/`finish_reason`/`usage` and yielding empty assistant turns on tool-calling requests. Pre-mint `response_id` in the chat handler and reuse it for both the keepalive frame (`_chat_keepalive_chunk`) and `stream_chat_completion`, so the keepalive is a true no-op for those clients while remaining a parseable data event for clients that can't handle SSE comment lines (the reason chunk mode was added, jundot#839). `comment` and `off` modes are unaffected. Fixes jundot#1478.
Owner
|
Thanks for catching this and the clean fix. Merging into the next release. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Since v0.3.9 the default SSE keepalive mode (
chunk) emits achat.completion.chunkwhoseidis the sentinelchatcmpl-keepalive, sent as the first frame of every stream. The real completion chunks then arrive under a different, freshly-minted id.Strict OpenAI stream accumulators assume all chunks in one streamed completion share a single
id. The official OpenAI Go SDK (openai-go)ChatCompletionAccumulatorlatches the first chunk's id and silently rejects every later chunk whose id differs:So the streamed
tool_calls,finish_reason, andusageare all discarded, producing an empty assistant message on any tool-calling turn. (Plaincontentcan survive if read from live deltas, which is why prose replies often look fine while tool-calling turns come back empty.)Reported in #1478. This is a regression from the fix for #839, which switched the default keepalive from an SSE comment to this chunk form.
Fix
Pre-mint
response_idincreate_chat_completionand reuse it for both the keepalive frame andstream_chat_completion. The keepalive (_chat_keepalive_chunk) now carries the stream's real id, so it is a true no-op for spec accumulators while remaining a parseable data event for the comment-intolerant clients chunk mode was added for (OpenClaw / WorkBuddy, #839/#1035).commentandoffmodes are unchanged.Verification
tests/test_sse_keepalive.pyassert the chat keepalive frame reuses the given id and never emits the sentinel.idmatches the completion id, and an unmodifiedopenai-goclient that previously received empty tool-calling turns now gets the tool call + usage correctly.Notes
create_chat_completionstreamingreturnas open PR feat(server): typed pre-StreamingResponse preflight + HTTP 413 #1452 (preflight guards) — only trivial line-context proximity, no logic overlap; may need a one-line rebase if both land.Fixes #1478.