fix: prevent silent content loss when splitting long messages#11
Conversation
|
Solid and necessary fix — the content loss bug is real and well-diagnosed. The regression test is especially well done; documenting the original failure shape with the comment explaining the old truncation path is exactly the right approach. One edge case worth addressing before merge:
Also a pre-existing note: Great work overall! |
splitMessageImpl was discarding the tail of any single paragraph longer than maxLength via `para.slice(0, maxLength)` — that tail just vanished. Visible symptom: long fenced blocks (ASCII tables, etc.) ended mid-row with the rest of the response missing from Discord entirely. Rewrite splitMessageImpl to never drop content: - Paragraph fits in current chunk → append normally - Paragraph alone too long → flush current, split paragraph by lines - Line alone too long → hard-split by characters (last-resort fallback) Also add balanceCodeFences post-pass: when a chunk ends with an open ``` fence, close it there and re-open with the same language tag at the start of the next chunk. Without this, a fenced block split across multiple Discord messages renders as one fenced + one raw-text message. Discovered via instrumented debug logs that showed chunk sizes summing to less than the buffer length on every split (e.g. buffer=2706 but chunks=[63,1904], 739 chars lost). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… coverage Includes a regression test that fails on the pre-fix implementation (verified by temporarily reverting the file — the test caught it), plus general coverage for: - single-chunk passthrough - paragraph-boundary splits - hard char-splitting when a single line exceeds maxLength - code-fence balancing across chunk boundaries - mixed-shape inputs (short + long + short paragraphs) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ged closes Addresses review feedback on Open-ACP#11. The previous balanceCodeFences treated fences as anonymous tokens — when fence count came out odd, it set `pendingOpenFence = fences[fences.length - 1]`. For a chunk whose fences are [open, close, close] (e.g. a balanced fenced block followed by a stray closing ```), the 'last fence' is a closing ``` with no language tag. That bare ``` then got prepended to the next chunk, silently stripping the language tag from every subsequent chunk. Walk fences in order and toggle the currently-open fence instead. At chunk end, whatever's open is what needs to be re-opened in the next chunk — and only if it's tagged. Naively trusting the last fence to be the opening one fails on [open, close, close]. Includes a regression test that exercises the [plain, balanced+orphan, tagged-block] arrangement and verifies the third chunk reopens with ```python rather than a bare ```. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
48c254e to
49d672c
Compare
…ged closes Addresses review feedback on Open-ACP#11. The previous balanceCodeFences treated fences as anonymous tokens — when fence count came out odd, it set `pendingOpenFence = fences[fences.length - 1]`. For a chunk whose fences are [open, close, close] (e.g. a balanced fenced block followed by a stray closing ```), the 'last fence' is a closing ``` with no language tag. That bare ``` then got prepended to the next chunk, silently stripping the language tag from every subsequent chunk. Walk fences in order and toggle the currently-open fence instead. At chunk end, whatever's open is what needs to be re-opened in the next chunk — and only if it's tagged. Naively trusting the last fence to be the opening one fails on [open, close, close]. Includes a regression test that exercises the [plain, balanced+orphan, tagged-block] arrangement and verifies the third chunk reopens with ```python rather than a bare ```. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The prior balanceCodeFences fix dropped untagged carry-over entirely to avoid the "extra closer corrupts language tag" case. That suppressed a legitimate scenario: a long bare ```...``` block (e.g. an ASCII table with no language tag) split across chunks would leave the continuation chunks rendering as raw text. Refine the rule: an untagged open at the end of a chunk carries forward only if the chunk has content after the trailing fence — i.e., we genuinely split mid-block, not "balanced block followed by a dangling orphan closer". Tagged opens still always carry. Add a regression test covering the split-untagged-block path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ged closes Addresses review feedback on Open-ACP#11. The previous balanceCodeFences treated fences as anonymous tokens — when fence count came out odd, it set `pendingOpenFence = fences[fences.length - 1]`. For a chunk whose fences are [open, close, close] (e.g. a balanced fenced block followed by a stray closing ```), the 'last fence' is a closing ``` with no language tag. That bare ``` then got prepended to the next chunk, silently stripping the language tag from every subsequent chunk. Walk fences in order and toggle the currently-open fence instead. At chunk end, whatever's open is what needs to be re-opened in the next chunk — and only if it's tagged. Naively trusting the last fence to be the opening one fails on [open, close, close]. Includes a regression test that exercises the [plain, balanced+orphan, tagged-block] arrangement and verifies the third chunk reopens with ```python rather than a bare ```. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Thanks for the review @0xmrpeter - updated now with another bug fix I found while manually testing. TDD with AI is a lot of fun! |
Why
splitMessageImplwas discarding the tail of any single paragraph longer thanmaxLengthviapara.slice(0, maxLength)— that tail vanished. Visible symptom: long fenced blocks (ASCII tables, etc.) ended mid-row with the rest of the response missing entirely. Caught by instrumenting the finalize path and noticing chunk sizes summing to less than the buffer length on every split (e.g. buffer=2706, chunks=[63,1904], 739 chars lost).What changes
Rewrite
splitMessageImplto never drop content:Add
balanceCodeFencespost-pass: when a chunk ends with an open```fence, close it there and re-open with the same language tag at the start of the next chunk. Without this, a fenced block split across multiple Discord messages renders as one fenced + one raw-text message.Test plan
[MessageDraft.finalize] splittingdebug logs — chunk sizes now sum to (slightly more than) the buffer length, with overflow accounted for by fence re-balancing/newsession prompt that elicits a long fenced response (~3000+ chars). Confirm all content arrives across multiple Discord messages with intact fences.