Close #22 scaling walls: streamed emit + borrowed partitions + opt-in lazy engine#31
Merged
Merged
Conversation
…itions, opt-in lazy engine Wall C (streamed emit): write each sheet module to disk as generated and drop the string instead of collect-all-then-write; heavy sheets (>=200k formulas) emit one-at-a-time, light ones in parallel. Was materializing all ~800 MB of generated JS in memory before writing any module -> ~18 GB peak on the real models, sheets/ empty for the whole run. Wall B (borrowed partitions): SheetPartition<'a> holds Vec<&CellData> instead of cloning ~6M cells while the workbook still holds the originals (peak-memory doubling). Consumers are read-only, so unchanged beyond the borrow. Wall A (opt-in --lazy-engine): emit a chunked engine whose sheet modules load on demand via async load()/runScoped() with output-cone scoping (load only the requested sheets/cells' transitive dependency closure, whole clusters included). Sync run() preserved and guarded against pre-load calls. Default engine.js is unchanged (eager + synchronous) so the Mippy contract, ete eval, smoke, and the engine suite are untouched; eager and lazy share the run() body so they can't drift. New `npm run test:lazy-engine` (19) + CI step. Validated: cargo test 17/17, smoke 78/78, test:engine 21/21, test:runnable 20/20, test:depgraph 11/11, test:slimming 13/13, test:golden 20/20, full npm test, and an `ete init --lazy-engine` e2e build. Residual (deferred): generate_sheet_module builds a Vec<String> then joins (~2x a monster module transiently); row-chunk the 3 monster sheets so even one is small to emit + import. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Closes the three chunked-build scaling walls under #22 so the real PE models both build and run at scale. With the partition-hang fix landed, a clean build got past partitioning but then the module-emit step drove the parser past 18 GB (it materialized all ~800 MB of generated module strings before writing any), and even a complete engine was slow to run (eager imports).
Wall C — streamed emit (the active OOM) ·
chunked_emitter.rsThe emit did
partitions.par_iter().map(generate_sheet_module).collect()then wrote in a second pass — holding all generated JS in memory at once, withsheets/empty until every module finished. Now each module is written to disk the instant it's generated and the string dropped; heavy sheets (≥200k formulas) emit one-at-a-time (peak ≈ one monster module) while the many light sheets stay parallel. Files land incrementally; a write failure is still fatal.Wall B — borrowed partitions ·
sheet_partition.rspartition_sheetscloned every cell into the partition whileworkbook.sheetsstill held the originals — a second full copy of ~6MCellData→ peak-memory doubling.SheetPartition<'a>now holdsVec<&'a CellData>(the workbook outlives every partition). The four consumers are read-only, so unchanged beyond the borrow.Wall A — opt-in
ete init --lazy-engine·chunked_emitter.rs,main.rs,cli/The default
engine.jsstatically imports every sheet module, soimport('engine.js')pulls ~800 MB into the heap beforerun()can be called.--lazy-engineemits an engine whose modules load on demand viaasync load({ sheets | cells })(output-cone scoping — loads only the requested sheets/cells' transitive dependency closure, whole clusters included), a synchronousrun()guarded against pre-load calls, andrunScoped(). The default engine is unchanged (eager + sync) — the Mippy contract,ete eval, smoke, and the engine suite are untouched. Eager and lazy share therun()body viaemit_run_function, so they can't drift.Tests
New
npm run test:lazy-engine(19) + CI step. All green:cargo test17/17 ·smoke78/78 ·test:engine21/21 ·test:runnable20/20 ·test:depgraph11/11 ·test:slimming13/13 ·test:golden20/20 · fullnpm test·ete init --lazy-engineend-to-end.Caveats / follow-ups
generate_sheet_modulebuilds aVec<String>then.join()s (~2× a monster transiently), and one ~200 MB monster module is still heavy to import → row-chunk the 3 monster sheets into smaller lazy modules. Plus the rest of Guidedete createskill +--output-profile: scope generated artifacts to the consumer's actual need #22's umbrella (--output-profile contract, guidedete create).🤖 Generated with Claude Code