Close #22 scaling walls: streamed emit + borrowed partitions + opt-in lazy engine by ebootheee · Pull Request #31 · ebootheee/excel-to-engine

ebootheee · 2026-05-29T17:46:52Z

What

Closes the three chunked-build scaling walls under #22 so the real PE models both build and run at scale. With the partition-hang fix landed, a clean build got past partitioning but then the module-emit step drove the parser past 18 GB (it materialized all ~800 MB of generated module strings before writing any), and even a complete engine was slow to run (eager imports).

Wall C — streamed emit (the active OOM) · `chunked_emitter.rs`

The emit did partitions.par_iter().map(generate_sheet_module).collect() then wrote in a second pass — holding all generated JS in memory at once, with sheets/ empty until every module finished. Now each module is written to disk the instant it's generated and the string dropped; heavy sheets (≥200k formulas) emit one-at-a-time (peak ≈ one monster module) while the many light sheets stay parallel. Files land incrementally; a write failure is still fatal.

Wall B — borrowed partitions · `sheet_partition.rs`

partition_sheets cloned every cell into the partition while workbook.sheets still held the originals — a second full copy of ~6M CellData → peak-memory doubling. SheetPartition<'a> now holds Vec<&'a CellData> (the workbook outlives every partition). The four consumers are read-only, so unchanged beyond the borrow.

Wall A — opt-in `ete init --lazy-engine` · `chunked_emitter.rs`, `main.rs`, `cli/`

The default engine.js statically imports every sheet module, so import('engine.js') pulls ~800 MB into the heap before run() can be called. --lazy-engine emits an engine whose modules load on demand via async load({ sheets | cells }) (output-cone scoping — loads only the requested sheets/cells' transitive dependency closure, whole clusters included), a synchronous run() guarded against pre-load calls, and runScoped(). The default engine is unchanged (eager + sync) — the Mippy contract, ete eval, smoke, and the engine suite are untouched. Eager and lazy share the run() body via emit_run_function, so they can't drift.

Tests

New npm run test:lazy-engine (19) + CI step. All green: cargo test 17/17 · smoke 78/78 · test:engine 21/21 · test:runnable 20/20 · test:depgraph 11/11 · test:slimming 13/13 · test:golden 20/20 · full npm test · ete init --lazy-engine end-to-end.

Caveats / follow-ups

Not yet measured on the real (gitignored) models — a clean A1/A2 regen confirms the emit completes within memory. (Running this before merge.)
Residual (deferred): generate_sheet_module builds a Vec<String> then .join()s (~2× a monster transiently), and one ~200 MB monster module is still heavy to import → row-chunk the 3 monster sheets into smaller lazy modules. Plus the rest of Guided ete create skill + --output-profile: scope generated artifacts to the consumer's actual need #22's umbrella (--output-profile contract, guided ete create).

🤖 Generated with Claude Code

…itions, opt-in lazy engine Wall C (streamed emit): write each sheet module to disk as generated and drop the string instead of collect-all-then-write; heavy sheets (>=200k formulas) emit one-at-a-time, light ones in parallel. Was materializing all ~800 MB of generated JS in memory before writing any module -> ~18 GB peak on the real models, sheets/ empty for the whole run. Wall B (borrowed partitions): SheetPartition<'a> holds Vec<&CellData> instead of cloning ~6M cells while the workbook still holds the originals (peak-memory doubling). Consumers are read-only, so unchanged beyond the borrow. Wall A (opt-in --lazy-engine): emit a chunked engine whose sheet modules load on demand via async load()/runScoped() with output-cone scoping (load only the requested sheets/cells' transitive dependency closure, whole clusters included). Sync run() preserved and guarded against pre-load calls. Default engine.js is unchanged (eager + synchronous) so the Mippy contract, ete eval, smoke, and the engine suite are untouched; eager and lazy share the run() body so they can't drift. New `npm run test:lazy-engine` (19) + CI step. Validated: cargo test 17/17, smoke 78/78, test:engine 21/21, test:runnable 20/20, test:depgraph 11/11, test:slimming 13/13, test:golden 20/20, full npm test, and an `ete init --lazy-engine` e2e build. Residual (deferred): generate_sheet_module builds a Vec<String> then joins (~2x a monster module transiently); row-chunk the 3 monster sheets so even one is small to emit + import. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

This was referenced May 29, 2026

Chunked build: dependency-graph.json range-expansion is ~37 GB / ~7 min on the real models #32

Open

Row-chunk monster sheets: cluster-bound returns cone limits --lazy-engine pruning #33

Open

ebootheee merged commit fedcf0a into main May 29, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Close #22 scaling walls: streamed emit + borrowed partitions + opt-in lazy engine#31

Close #22 scaling walls: streamed emit + borrowed partitions + opt-in lazy engine#31
ebootheee merged 1 commit into
mainfrom
claude/distracted-taussig-aad70e

ebootheee commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ebootheee commented May 29, 2026

What

Wall C — streamed emit (the active OOM) · chunked_emitter.rs

Wall B — borrowed partitions · sheet_partition.rs

Wall A — opt-in ete init --lazy-engine · chunked_emitter.rs, main.rs, cli/

Tests

Caveats / follow-ups

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Wall C — streamed emit (the active OOM) · `chunked_emitter.rs`

Wall B — borrowed partitions · `sheet_partition.rs`

Wall A — opt-in `ete init --lazy-engine` · `chunked_emitter.rs`, `main.rs`, `cli/`