fix(context): prevent OOM crash on large llms-full.txt files#99
fix(context): prevent OOM crash on large llms-full.txt files#99wbisschoff13 wants to merge 1 commit into
Conversation
…s-full.txt files Large markdown files (>1MB) like Cloudflare's llms-full.txt previously caused Node.js heap OOM because remark-parse built a full AST of the entire document. Now they are pre-split by ## headings so each chunk is independently parseable with minimal memory. Fixes: neuledge#99
🦋 Changeset detectedLatest commit: 84864dd The changes in this PR will be included in the next version bump. This PR includes changesets to release 2 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
moshest
left a comment
There was a problem hiding this comment.
Nice fix. This solves a real crash and the tests are easy to follow.
A few things before merge:
- CI is red, but it's only a formatting nitpick in the test file. Run
pnpm fixand it should go green. - Left one inline note about splitting on
##inside code blocks. - Heads up: a single huge section, or a file with no
##headings at all, would still crash. That's fine for now, but maybe add a quick comment in the code so it's clear this isn't a full fix.
Changeset is already included, so that's covered.
Generated by Claude Code
| let current: string[] = []; | ||
|
|
||
| for (const line of file.content.split("\n")) { | ||
| if (line.startsWith("## ")) { |
There was a problem hiding this comment.
One thing to watch here. If a line inside a code block starts with ## , it gets treated as a heading and the file is split in the wrong place. That could break code samples in exactly the big files this targets. Might be worth skipping lines inside fenced ``` blocks.
Generated by Claude Code
Large markdown files (>1MB) like Cloudflare's llms-full.txt previously caused Node.js heap OOM because remark-parse built a full AST of the entire document.
Now they are pre-split by
##headings before AST parsing, so each chunk stays small and is independently parseable.Changes
packages/context/src/package-builder.ts: AddedsplitMarkdownByHeadings()function and pre-processing inbuildPackage()that splits large.md/.mdx/.txtfiles by##headingspackages/context/src/package-builder.test.ts: Added 7 tests covering splitting behaviorVerification