perf(codegen): outline array-literal construction for oversized modules (#5391)#5412
Conversation
…es (#5391) The biggest remaining functions on the cli.js bundle are minified data-table builders: the 18MB `perry_closure...31856` is ~17K js_gc_note_slot_layout + 11K js_inline_arena_slow_alloc + 9K js_write_barrier_slot — thousands of inline-allocated array literals. Each literal expands to a bump-alloc diamond + N×(store + layout-note + barrier) inline, and clang -O0 time is superlinear in function size, so these are the residual wall after codegen-unit + string-init splitting. When full-outline is enabled (size-gated, PERRY_FULL_OUTLINE_IC), array literals now spill their already-evaluated element values to a per-site stack buffer and build the array in ONE `js_array_from_values(ptr, n)` call. The runtime helper mirrors the inline path exactly: alloc via js_array_alloc_literal, copy each value, then js_gc_note_slot_layout + js_write_barrier_slot per element (both no-op for non-pointers). Per-site IR drops from ~15 + 4N instructions to ~N + 3. Validated: numeric + string-pointer + nested-array-pointer elements all read back correctly with full-outline ON and OFF (51 == 51); GC store-site inventory + full codegen suite green. Default off = unchanged behavior.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds a new ChangesArray Literal Outline Construction Path
Sequence Diagram(s)sequenceDiagram
participant Codegen as lower_array_literal
participant IRBuilder as Entry Block
participant Runtime as js_array_from_values
participant GC as GC Subsystem
Codegen->>IRBuilder: alloca [N x DOUBLE] stack buffer
loop for each element SSA value
Codegen->>IRBuilder: store element into buffer[i]
end
Codegen->>Runtime: call js_array_from_values(buf_ptr, N)
Runtime->>Runtime: js_array_alloc_literal(N)
loop for each slot
Runtime->>GC: js_gc_note_slot_layout(slot)
Runtime->>GC: js_write_barrier_slot(slot)
end
Runtime-->>Codegen: ArrayHeader ptr (nanboxed)
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/perry-codegen/src/expr/array_literal.rs (1)
40-57: ⚡ Quick winSkip unused per-element analysis in full-outline mode.
all_numeric_elementsandlayout_notes_neededare computed even whenfull_outline_ic_enabled()immediately returns through the outlined builder path. Gating those computations to non-outline mode trims unnecessary work on the exact oversized-literal path this PR targets.♻️ Proposed refactor
- let all_numeric_elements = elements.iter().all(|e| is_numeric_expr(ctx, e)); + let outline_mode = crate::codegen::full_outline_ic_enabled(); + let all_numeric_elements = if outline_mode { + false + } else { + elements.iter().all(|e| is_numeric_expr(ctx, e)) + }; @@ - let mut layout_notes_needed = Vec::with_capacity(n); + let mut layout_notes_needed = if outline_mode { + Vec::new() + } else { + Vec::with_capacity(n) + }; for value_expr in elements { - layout_notes_needed.push(!expr_produces_non_pointer_bits_by_construction( - ctx, value_expr, - )); + if !outline_mode { + layout_notes_needed.push(!expr_produces_non_pointer_bits_by_construction( + ctx, value_expr, + )); + } vals.push(lower_expr(ctx, value_expr)?); } @@ - if crate::codegen::full_outline_ic_enabled() { + if outline_mode {Also applies to: 68-79
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/perry-codegen/src/expr/array_literal.rs` around lines 40 - 57, The computation of `all_numeric_elements` and the loop that populates `layout_notes_needed` are being performed unconditionally even when `full_outline_ic_enabled()` returns true and causes an early exit through the outlined builder path. Move or wrap these computations (the `all_numeric_elements` assignment and the loop that builds `layout_notes_needed`) to only execute when NOT in full outline mode, so unnecessary work is skipped on the oversized-literal path that this change targets.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@crates/perry-codegen/src/expr/array_literal.rs`:
- Around line 40-57: The computation of `all_numeric_elements` and the loop that
populates `layout_notes_needed` are being performed unconditionally even when
`full_outline_ic_enabled()` returns true and causes an early exit through the
outlined builder path. Move or wrap these computations (the
`all_numeric_elements` assignment and the loop that builds
`layout_notes_needed`) to only execute when NOT in full outline mode, so
unnecessary work is skipped on the oversized-literal path that this change
targets.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 22302bad-9b9c-4901-941e-a79acfec6872
📒 Files selected for processing (4)
crates/perry-codegen/src/expr/array_literal.rscrates/perry-codegen/src/runtime_decls/arrays.rscrates/perry-codegen/tests/typed_feedback.rscrates/perry-runtime/src/array/alloc.rs
…th array-literal outline test)
# Conflicts: # crates/perry-codegen/tests/typed_feedback.rs
…lict in compile/link/mod.rs Brings main's PerryTS#5400 (Windows response-file link) + PerryTS#5406/PerryTS#5408/PerryTS#5410/PerryTS#5412 codegen split + PerryTS#5402 require Tier 2 + PerryTS#5414 unknown builtin-namespace fix into the Windows plugin support branch. The conflict in compile/link/mod.rs is structural: main extracted the orchestrator into link/build_and_run.rs while the branch kept it inline. Resolved by taking main's refactor and re-applying the Windows plugin support on top: - compile/link/mod.rs: add PLUGIN_HOST_SYMBOLS const (the runtime + plugin export list — single source of truth for the macOS -u force-keep and the Windows .def file) plus use std::io::Write for the .def writer. - compile/link/build_and_run.rs: convert the if ctx.needs_plugins && !is_windows block to if ctx.needs_plugins with platform branches. The Windows branch writes a per-build perry_plugin_host_<pid>.def using the NAME directive (not LIBRARY — that one tells link.exe the output is a DLL and breaks the host .exe) and passes /DEF:<path> to the linker. PLUGIN_HOST_SYMBOLS replaces the inline untime_syms array on macOS too. - compile.rs: dylib link path on Windows now uses lld-link with /FORCE:UNRESOLVED (the .def file lists plugin_activate / perry_plugin_abi_version / plugin_deactivate so lld-link's empty default export table doesn't break GetProcAddress in the host) and writes a per-build .def file with the same three symbols. Lifts the pre-existing 'use link.exe' out for the LLVM-friendly linker. - codegen/entry.rs: re-apply the dylib plugin ABI shim block (emits perry_plugin_abi_version, plugin_activate, and the optional plugin_deactivate) on top of main's refactor of compile_module_entry.
What
Outlines array-literal construction for oversized modules (#5391). When full-outline is enabled (size-gated by
PERRY_FULL_OUTLINE_IC), an array literal spills its already-evaluated element values to a per-site stack buffer and builds the array in ONEjs_array_from_values(ptr, n)call — instead of the inline bump-alloc diamond + N×(store + layout-note + barrier) that codegen otherwise emits.Why
After codegen-unit splitting (#5407) removed the module memory wall and the string-init split removed the 68MB generated function, the bundle's biggest user functions are minified data-table builders. The 18MB
perry_closure…31856was ~17Kjs_gc_note_slot_layout+ 11Kjs_inline_arena_slow_alloc+ 9Kjs_write_barrier_slot— thousands of inline-allocated array literals.clang -O0time is superlinear in function size, so these were the residual wall.Result (measured on the bundle)
closure_31856(array data-table)closure_2485025,070 array-literal sites collapsed to
js_array_from_valuescalls. Per-site IR drops from ~15 + 4N instructions to ~N + 3.Correctness
The runtime helper
js_array_from_valuesmirrors the inline path exactly: allocate viajs_array_alloc_literal(length pre-set), copy each value, thenjs_gc_note_slot_layout+js_write_barrier_slotper element (both no-op for non-pointers, so the calls are unconditional and correct for any mix). The per-site stack buffer is entry-hoisted (bounded total stack) and consumed immediately by the call — no GC-visible half-built window.51 == 51).perry-codegensuite green. Default off = unchanged behavior.Next (separate)
The new biggest user function is
main(11.5MB), dominated by a different construct — closure construction (40Kjs_closure_set_capture_f64+ 5Kjs_closure_alloc_singleton). Outlining that is the next step under #5391.Refs #5391.
Summary by CodeRabbit
Optimizations
New Features
Tests
PERRY_FULL_OUTLINE_IC=1, and updated related assertions to preserve the expected inline construction behavior when set to0.