Skip to content

Close #22 scaling walls: streamed emit + borrowed partitions + opt-in lazy engine#31

Merged
ebootheee merged 1 commit into
mainfrom
claude/distracted-taussig-aad70e
May 29, 2026
Merged

Close #22 scaling walls: streamed emit + borrowed partitions + opt-in lazy engine#31
ebootheee merged 1 commit into
mainfrom
claude/distracted-taussig-aad70e

Conversation

@ebootheee
Copy link
Copy Markdown
Owner

What

Closes the three chunked-build scaling walls under #22 so the real PE models both build and run at scale. With the partition-hang fix landed, a clean build got past partitioning but then the module-emit step drove the parser past 18 GB (it materialized all ~800 MB of generated module strings before writing any), and even a complete engine was slow to run (eager imports).

Wall C — streamed emit (the active OOM) · chunked_emitter.rs

The emit did partitions.par_iter().map(generate_sheet_module).collect() then wrote in a second pass — holding all generated JS in memory at once, with sheets/ empty until every module finished. Now each module is written to disk the instant it's generated and the string dropped; heavy sheets (≥200k formulas) emit one-at-a-time (peak ≈ one monster module) while the many light sheets stay parallel. Files land incrementally; a write failure is still fatal.

Wall B — borrowed partitions · sheet_partition.rs

partition_sheets cloned every cell into the partition while workbook.sheets still held the originals — a second full copy of ~6M CellData → peak-memory doubling. SheetPartition<'a> now holds Vec<&'a CellData> (the workbook outlives every partition). The four consumers are read-only, so unchanged beyond the borrow.

Wall A — opt-in ete init --lazy-engine · chunked_emitter.rs, main.rs, cli/

The default engine.js statically imports every sheet module, so import('engine.js') pulls ~800 MB into the heap before run() can be called. --lazy-engine emits an engine whose modules load on demand via async load({ sheets | cells }) (output-cone scoping — loads only the requested sheets/cells' transitive dependency closure, whole clusters included), a synchronous run() guarded against pre-load calls, and runScoped(). The default engine is unchanged (eager + sync) — the Mippy contract, ete eval, smoke, and the engine suite are untouched. Eager and lazy share the run() body via emit_run_function, so they can't drift.

Tests

New npm run test:lazy-engine (19) + CI step. All green: cargo test 17/17 · smoke 78/78 · test:engine 21/21 · test:runnable 20/20 · test:depgraph 11/11 · test:slimming 13/13 · test:golden 20/20 · full npm test · ete init --lazy-engine end-to-end.

Caveats / follow-ups

  • Not yet measured on the real (gitignored) models — a clean A1/A2 regen confirms the emit completes within memory. (Running this before merge.)
  • Residual (deferred): generate_sheet_module builds a Vec<String> then .join()s (~2× a monster transiently), and one ~200 MB monster module is still heavy to import → row-chunk the 3 monster sheets into smaller lazy modules. Plus the rest of Guided ete create skill + --output-profile: scope generated artifacts to the consumer's actual need #22's umbrella (--output-profile contract, guided ete create).

🤖 Generated with Claude Code

…itions, opt-in lazy engine

Wall C (streamed emit): write each sheet module to disk as generated and drop the
string instead of collect-all-then-write; heavy sheets (>=200k formulas) emit
one-at-a-time, light ones in parallel. Was materializing all ~800 MB of generated
JS in memory before writing any module -> ~18 GB peak on the real models, sheets/
empty for the whole run.

Wall B (borrowed partitions): SheetPartition<'a> holds Vec<&CellData> instead of
cloning ~6M cells while the workbook still holds the originals (peak-memory
doubling). Consumers are read-only, so unchanged beyond the borrow.

Wall A (opt-in --lazy-engine): emit a chunked engine whose sheet modules load on
demand via async load()/runScoped() with output-cone scoping (load only the
requested sheets/cells' transitive dependency closure, whole clusters included).
Sync run() preserved and guarded against pre-load calls. Default engine.js is
unchanged (eager + synchronous) so the Mippy contract, ete eval, smoke, and the
engine suite are untouched; eager and lazy share the run() body so they can't drift.

New `npm run test:lazy-engine` (19) + CI step. Validated: cargo test 17/17, smoke
78/78, test:engine 21/21, test:runnable 20/20, test:depgraph 11/11, test:slimming
13/13, test:golden 20/20, full npm test, and an `ete init --lazy-engine` e2e build.

Residual (deferred): generate_sheet_module builds a Vec<String> then joins (~2x a
monster module transiently); row-chunk the 3 monster sheets so even one is small to
emit + import.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant