Skip to content

fix: prevent silent content loss when splitting long messages#11

Merged
0xmrpeter merged 4 commits into
Open-ACP:mainfrom
kevin-david:fix-message-split-content-loss
May 18, 2026
Merged

fix: prevent silent content loss when splitting long messages#11
0xmrpeter merged 4 commits into
Open-ACP:mainfrom
kevin-david:fix-message-split-content-loss

Conversation

@kevin-david

@kevin-david kevin-david commented May 11, 2026

Copy link
Copy Markdown
Contributor

Why

splitMessageImpl was discarding the tail of any single paragraph longer than maxLength via para.slice(0, maxLength) — that tail vanished. Visible symptom: long fenced blocks (ASCII tables, etc.) ended mid-row with the rest of the response missing entirely. Caught by instrumenting the finalize path and noticing chunk sizes summing to less than the buffer length on every split (e.g. buffer=2706, chunks=[63,1904], 739 chars lost).

What changes

Rewrite splitMessageImpl to never drop content:

  • Paragraph fits in current chunk → append normally
  • Paragraph alone too long → flush current, split the paragraph by lines
  • Single line too long → hard-split by characters (last-resort fallback)

Add balanceCodeFences post-pass: when a chunk ends with an open ``` fence, close it there and re-open with the same language tag at the start of the next chunk. Without this, a fenced block split across multiple Discord messages renders as one fenced + one raw-text message.

Test plan

  • Verified by tailing [MessageDraft.finalize] splitting debug logs — chunk sizes now sum to (slightly more than) the buffer length, with overflow accounted for by fence re-balancing
  • Reviewer: send a /new session prompt that elicits a long fenced response (~3000+ chars). Confirm all content arrives across multiple Discord messages with intact fences.
  • Verify works in Claude (mostly testing with Gemini)

Comment thread src/formatting.ts
@kevin-david kevin-david marked this pull request as ready for review May 11, 2026 04:46
@0xmrpeter

Copy link
Copy Markdown
Contributor

Solid and necessary fix — the content loss bug is real and well-diagnosed. The regression test is especially well done; documenting the original failure shape with the comment explaining the old truncation path is exactly the right approach.

One edge case worth addressing before merge:

balanceCodeFences can misidentify a closing fence as an opening one — when pushLines isolates a standalone ``` into its own chunk (e.g., after flushing the content lines before it), the fence count comes out odd and pendingOpenFence gets set. The continuation chunk then gets an untagged ``` block prepended instead of the correct language tag (e.g., ```python). A test that splits inside a named-language fence where the closing ``` ends up isolated would cover this — verify that the correct language tag propagates all the way through, not just an empty one.

Also a pre-existing note: splitMessage is marked @deprecated but is the live code path in streaming.ts:163 — worth tracking separately to avoid confusion.

Great work overall!

kevin-david and others added 3 commits May 11, 2026 08:26
splitMessageImpl was discarding the tail of any single paragraph longer
than maxLength via `para.slice(0, maxLength)` — that tail just vanished.
Visible symptom: long fenced blocks (ASCII tables, etc.) ended mid-row
with the rest of the response missing from Discord entirely.

Rewrite splitMessageImpl to never drop content:

- Paragraph fits in current chunk → append normally
- Paragraph alone too long → flush current, split paragraph by lines
- Line alone too long → hard-split by characters (last-resort fallback)

Also add balanceCodeFences post-pass: when a chunk ends with an open
``` fence, close it there and re-open with the same language tag at
the start of the next chunk. Without this, a fenced block split across
multiple Discord messages renders as one fenced + one raw-text message.

Discovered via instrumented debug logs that showed chunk sizes summing
to less than the buffer length on every split (e.g. buffer=2706 but
chunks=[63,1904], 739 chars lost).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… coverage

Includes a regression test that fails on the pre-fix implementation
(verified by temporarily reverting the file — the test caught it),
plus general coverage for:
  - single-chunk passthrough
  - paragraph-boundary splits
  - hard char-splitting when a single line exceeds maxLength
  - code-fence balancing across chunk boundaries
  - mixed-shape inputs (short + long + short paragraphs)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ged closes

Addresses review feedback on Open-ACP#11.

The previous balanceCodeFences treated fences as anonymous tokens — when
fence count came out odd, it set `pendingOpenFence = fences[fences.length - 1]`.
For a chunk whose fences are [open, close, close] (e.g. a balanced fenced
block followed by a stray closing ```), the 'last fence' is a closing
``` with no language tag. That bare ``` then got prepended to the next
chunk, silently stripping the language tag from every subsequent chunk.

Walk fences in order and toggle the currently-open fence instead. At
chunk end, whatever's open is what needs to be re-opened in the next
chunk — and only if it's tagged. Naively trusting the last fence to be
the opening one fails on [open, close, close].

Includes a regression test that exercises the [plain, balanced+orphan,
tagged-block] arrangement and verifies the third chunk reopens with
```python rather than a bare ```.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kevin-david kevin-david force-pushed the fix-message-split-content-loss branch from 48c254e to 49d672c Compare May 11, 2026 12:55
kevin-david added a commit to kevin-david/discord-adapter that referenced this pull request May 11, 2026
…ged closes

Addresses review feedback on Open-ACP#11.

The previous balanceCodeFences treated fences as anonymous tokens — when
fence count came out odd, it set `pendingOpenFence = fences[fences.length - 1]`.
For a chunk whose fences are [open, close, close] (e.g. a balanced fenced
block followed by a stray closing ```), the 'last fence' is a closing
``` with no language tag. That bare ``` then got prepended to the next
chunk, silently stripping the language tag from every subsequent chunk.

Walk fences in order and toggle the currently-open fence instead. At
chunk end, whatever's open is what needs to be re-opened in the next
chunk — and only if it's tagged. Naively trusting the last fence to be
the opening one fails on [open, close, close].

Includes a regression test that exercises the [plain, balanced+orphan,
tagged-block] arrangement and verifies the third chunk reopens with
```python rather than a bare ```.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kevin-david kevin-david marked this pull request as draft May 11, 2026 13:39
The prior balanceCodeFences fix dropped untagged carry-over entirely to
avoid the "extra closer corrupts language tag" case. That suppressed a
legitimate scenario: a long bare ```...``` block (e.g. an ASCII table
with no language tag) split across chunks would leave the continuation
chunks rendering as raw text.

Refine the rule: an untagged open at the end of a chunk carries forward
only if the chunk has content after the trailing fence — i.e., we
genuinely split mid-block, not "balanced block followed by a dangling
orphan closer". Tagged opens still always carry. Add a regression test
covering the split-untagged-block path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kevin-david added a commit to kevin-david/discord-adapter that referenced this pull request May 11, 2026
…ged closes

Addresses review feedback on Open-ACP#11.

The previous balanceCodeFences treated fences as anonymous tokens — when
fence count came out odd, it set `pendingOpenFence = fences[fences.length - 1]`.
For a chunk whose fences are [open, close, close] (e.g. a balanced fenced
block followed by a stray closing ```), the 'last fence' is a closing
``` with no language tag. That bare ``` then got prepended to the next
chunk, silently stripping the language tag from every subsequent chunk.

Walk fences in order and toggle the currently-open fence instead. At
chunk end, whatever's open is what needs to be re-opened in the next
chunk — and only if it's tagged. Naively trusting the last fence to be
the opening one fails on [open, close, close].

Includes a regression test that exercises the [plain, balanced+orphan,
tagged-block] arrangement and verifies the third chunk reopens with
```python rather than a bare ```.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kevin-david kevin-david marked this pull request as ready for review May 11, 2026 17:12
@kevin-david

kevin-david commented May 11, 2026

Copy link
Copy Markdown
Contributor Author

Solid and necessary fix — the content loss bug is real and well-diagnosed. The regression test is especially well done; documenting the original failure shape with the comment explaining the old truncation path is exactly the right approach.

One edge case worth addressing before merge:

balanceCodeFences can misidentify a closing fence as an opening one — when pushLines isolates a standalone ``` into its own chunk (e.g., after flushing the content lines before it), the fence count comes out odd and pendingOpenFence gets set. The continuation chunk then gets an untagged ``` block prepended instead of the correct language tag (e.g., ```python). A test that splits inside a named-language fence where the closing ``` ends up isolated would cover this — verify that the correct language tag propagates all the way through, not just an empty one.

Also a pre-existing note: splitMessage is marked @deprecated but is the live code path in streaming.ts:163 — worth tracking separately to avoid confusion.

Great work overall!

Thanks for the review @0xmrpeter - updated now with another bug fix I found while manually testing. TDD with AI is a lot of fun!

@0xmrpeter 0xmrpeter merged commit f6eb392 into Open-ACP:main May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants