Skip to content

perf(codegen): outline array-literal construction for oversized modules (#5391)#5412

Merged
proggeramlug merged 5 commits into
mainfrom
feat/outline-literals
Jun 18, 2026
Merged

perf(codegen): outline array-literal construction for oversized modules (#5391)#5412
proggeramlug merged 5 commits into
mainfrom
feat/outline-literals

Conversation

@proggeramlug

@proggeramlug proggeramlug commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

What

Outlines array-literal construction for oversized modules (#5391). When full-outline is enabled (size-gated by PERRY_FULL_OUTLINE_IC), an array literal spills its already-evaluated element values to a per-site stack buffer and builds the array in ONE js_array_from_values(ptr, n) call — instead of the inline bump-alloc diamond + N×(store + layout-note + barrier) that codegen otherwise emits.

Why

After codegen-unit splitting (#5407) removed the module memory wall and the string-init split removed the 68MB generated function, the bundle's biggest user functions are minified data-table builders. The 18MB perry_closure…31856 was ~17K js_gc_note_slot_layout + 11K js_inline_arena_slow_alloc + 9K js_write_barrier_slot — thousands of inline-allocated array literals. clang -O0 time is superlinear in function size, so these were the residual wall.

Result (measured on the bundle)

function before after
closure_31856 (array data-table) 18.2MB 5.4MB (−70%)
closure_24850 7.6MB 5.3MB

25,070 array-literal sites collapsed to js_array_from_values calls. Per-site IR drops from ~15 + 4N instructions to ~N + 3.

Correctness

The runtime helper js_array_from_values mirrors the inline path exactly: allocate via js_array_alloc_literal (length pre-set), copy each value, then js_gc_note_slot_layout + js_write_barrier_slot per element (both no-op for non-pointers, so the calls are unconditional and correct for any mix). The per-site stack buffer is entry-hoisted (bounded total stack) and consumed immediately by the call — no GC-visible half-built window.

  • E2E: numeric + string-pointer + nested-array-pointer elements all read back correctly with full-outline ON and OFF (51 == 51).
  • New unit test (both gate states); GC store-site inventory + full perry-codegen suite green. Default off = unchanged behavior.

Next (separate)

The new biggest user function is main (11.5MB), dominated by a different construct — closure construction (40K js_closure_set_capture_f64 + 5K js_closure_alloc_singleton). Outlining that is the next step under #5391.

Refs #5391.

Summary by CodeRabbit

  • Optimizations

    • Improved array literal construction for outline-enabled configurations by switching to a builder-style path that reduces inline allocation overhead and streamlines element handling.
  • New Features

    • Added a runtime helper to construct array literals from a contiguous buffer of values, including required GC slot layout tracking and write barriers.
  • Tests

    • Added coverage to verify the builder call is emitted only when PERRY_FULL_OUTLINE_IC=1, and updated related assertions to preserve the expected inline construction behavior when set to 0.

Ralph Küpper added 3 commits June 18, 2026 18:10
…es (#5391)

The biggest remaining functions on the cli.js bundle are minified
data-table builders: the 18MB `perry_closure...31856` is ~17K
js_gc_note_slot_layout + 11K js_inline_arena_slow_alloc + 9K
js_write_barrier_slot — thousands of inline-allocated array literals. Each
literal expands to a bump-alloc diamond + N×(store + layout-note + barrier)
inline, and clang -O0 time is superlinear in function size, so these are
the residual wall after codegen-unit + string-init splitting.

When full-outline is enabled (size-gated, PERRY_FULL_OUTLINE_IC), array
literals now spill their already-evaluated element values to a per-site
stack buffer and build the array in ONE `js_array_from_values(ptr, n)`
call. The runtime helper mirrors the inline path exactly: alloc via
js_array_alloc_literal, copy each value, then js_gc_note_slot_layout +
js_write_barrier_slot per element (both no-op for non-pointers). Per-site
IR drops from ~15 + 4N instructions to ~N + 3.

Validated: numeric + string-pointer + nested-array-pointer elements all
read back correctly with full-outline ON and OFF (51 == 51); GC store-site
inventory + full codegen suite green. Default off = unchanged behavior.
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: a6bf3a5a-916b-473b-90ce-7c1d78db89e5

📥 Commits

Reviewing files that changed from the base of the PR and between fafe6be and d52cbc6.

📒 Files selected for processing (1)
  • crates/perry-codegen/tests/typed_feedback.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/perry-codegen/tests/typed_feedback.rs

📝 Walkthrough

Walkthrough

Adds a new js_array_from_values runtime function that constructs an array from a stack-spilled buffer of f64 values with per-element GC barriers. The codegen gains a new branch in lower_array_literal that activates when full_outline_ic_enabled() is true, spilling elements to an entry-block stack buffer and delegating to a single runtime call. A matching IR declaration and tests covering both enabled and disabled states are included.

Changes

Array Literal Outline Construction Path

Layer / File(s) Summary
Runtime js_array_from_values implementation
crates/perry-runtime/src/array/alloc.rs
Adds the exported js_array_from_values C function: allocates an exact-sized array via js_array_alloc_literal, early-returns for null/zero inputs, then iterates over the values pointer performing element writes plus js_gc_note_slot_layout and js_write_barrier_slot per element.
IR declaration and codegen outline path
crates/perry-codegen/src/runtime_decls/arrays.rs, crates/perry-codegen/src/expr/array_literal.rs
Declares js_array_from_values(PTR, I32) -> I64 in declare_phase_b_arrays; adds DOUBLE to the type imports; adds the full_outline_ic_enabled() branch in lower_array_literal that allocates an entry-block stack buffer, stores all element SSA values, calls js_array_from_values, and returns the nanboxed pointer, bypassing both the inline and large-literal paths.
Tests for outline array path
crates/perry-codegen/tests/typed_feedback.rs
Adds full_outline_array_literal_uses_builder_call test that generates IR with a non-const array literal and asserts js_array_from_values is present when PERRY_FULL_OUTLINE_IC=1 and absent when =0, using ENV_LOCK and EnvVarGuard. Updates typed_feedback_marks_numeric_array_literals to pin PERRY_FULL_OUTLINE_IC to 0 for consistent inline numeric-array observation.

Sequence Diagram(s)

sequenceDiagram
  participant Codegen as lower_array_literal
  participant IRBuilder as Entry Block
  participant Runtime as js_array_from_values
  participant GC as GC Subsystem

  Codegen->>IRBuilder: alloca [N x DOUBLE] stack buffer
  loop for each element SSA value
    Codegen->>IRBuilder: store element into buffer[i]
  end
  Codegen->>Runtime: call js_array_from_values(buf_ptr, N)
  Runtime->>Runtime: js_array_alloc_literal(N)
  loop for each slot
    Runtime->>GC: js_gc_note_slot_layout(slot)
    Runtime->>GC: js_write_barrier_slot(slot)
  end
  Runtime-->>Codegen: ArrayHeader ptr (nanboxed)
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

Possibly related PRs

  • PerryTS/perry#5385: Introduces the shared full_outline_ic_enabled() and PERRY_FULL_OUTLINE_IC gating mechanism that this PR wires into array-literal outlining, working alongside the class-field SET outlining in that PR.

Poem

🐇 A stack buffer spills, elements align,
One runtime call now replaces the inline!
js_array_from_values hops into place,
Barriers and GC notes land with grace.
Outline IC hops — the IR stays lean,
The cleanest array build a rabbit has seen! 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title concisely and accurately describes the primary change: outlining array-literal construction for oversized modules, a clear performance optimization referenced by issue #5391.
Description check ✅ Passed The description comprehensively covers what changed, why, performance results, correctness validation, and future work. It follows the template structure with clear sections and detailed technical context.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/outline-literals

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/perry-codegen/src/expr/array_literal.rs (1)

40-57: ⚡ Quick win

Skip unused per-element analysis in full-outline mode.

all_numeric_elements and layout_notes_needed are computed even when full_outline_ic_enabled() immediately returns through the outlined builder path. Gating those computations to non-outline mode trims unnecessary work on the exact oversized-literal path this PR targets.

♻️ Proposed refactor
-    let all_numeric_elements = elements.iter().all(|e| is_numeric_expr(ctx, e));
+    let outline_mode = crate::codegen::full_outline_ic_enabled();
+    let all_numeric_elements = if outline_mode {
+        false
+    } else {
+        elements.iter().all(|e| is_numeric_expr(ctx, e))
+    };
@@
-    let mut layout_notes_needed = Vec::with_capacity(n);
+    let mut layout_notes_needed = if outline_mode {
+        Vec::new()
+    } else {
+        Vec::with_capacity(n)
+    };
     for value_expr in elements {
-        layout_notes_needed.push(!expr_produces_non_pointer_bits_by_construction(
-            ctx, value_expr,
-        ));
+        if !outline_mode {
+            layout_notes_needed.push(!expr_produces_non_pointer_bits_by_construction(
+                ctx, value_expr,
+            ));
+        }
         vals.push(lower_expr(ctx, value_expr)?);
     }
@@
-    if crate::codegen::full_outline_ic_enabled() {
+    if outline_mode {

Also applies to: 68-79

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/perry-codegen/src/expr/array_literal.rs` around lines 40 - 57, The
computation of `all_numeric_elements` and the loop that populates
`layout_notes_needed` are being performed unconditionally even when
`full_outline_ic_enabled()` returns true and causes an early exit through the
outlined builder path. Move or wrap these computations (the
`all_numeric_elements` assignment and the loop that builds
`layout_notes_needed`) to only execute when NOT in full outline mode, so
unnecessary work is skipped on the oversized-literal path that this change
targets.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/perry-codegen/src/expr/array_literal.rs`:
- Around line 40-57: The computation of `all_numeric_elements` and the loop that
populates `layout_notes_needed` are being performed unconditionally even when
`full_outline_ic_enabled()` returns true and causes an early exit through the
outlined builder path. Move or wrap these computations (the
`all_numeric_elements` assignment and the loop that builds
`layout_notes_needed`) to only execute when NOT in full outline mode, so
unnecessary work is skipped on the oversized-literal path that this change
targets.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 22302bad-9b9c-4901-941e-a79acfec6872

📥 Commits

Reviewing files that changed from the base of the PR and between 22d18d2 and 2677aea.

📒 Files selected for processing (4)
  • crates/perry-codegen/src/expr/array_literal.rs
  • crates/perry-codegen/src/runtime_decls/arrays.rs
  • crates/perry-codegen/tests/typed_feedback.rs
  • crates/perry-runtime/src/array/alloc.rs

Ralph Küpper added 2 commits June 18, 2026 20:27
# Conflicts:
#	crates/perry-codegen/tests/typed_feedback.rs
@proggeramlug proggeramlug merged commit d098a51 into main Jun 18, 2026
14 of 15 checks passed
@proggeramlug proggeramlug deleted the feat/outline-literals branch June 18, 2026 19:02
nglmercer pushed a commit to nglmercer/perry that referenced this pull request Jun 18, 2026
…lict in compile/link/mod.rs

Brings main's PerryTS#5400 (Windows response-file link) + PerryTS#5406/PerryTS#5408/PerryTS#5410/PerryTS#5412 codegen
split + PerryTS#5402 require Tier 2 + PerryTS#5414 unknown builtin-namespace fix into the
Windows plugin support branch. The conflict in compile/link/mod.rs is
structural: main extracted the orchestrator into link/build_and_run.rs
while the branch kept it inline. Resolved by taking main's refactor and
re-applying the Windows plugin support on top:

- compile/link/mod.rs: add PLUGIN_HOST_SYMBOLS const (the runtime + plugin
  export list — single source of truth for the macOS -u force-keep and
  the Windows .def file) plus use std::io::Write for the .def writer.

- compile/link/build_and_run.rs: convert the if ctx.needs_plugins &&
  !is_windows block to if ctx.needs_plugins with platform branches.
  The Windows branch writes a per-build perry_plugin_host_<pid>.def
  using the NAME directive (not LIBRARY — that one tells link.exe
  the output is a DLL and breaks the host .exe) and passes
  /DEF:<path> to the linker. PLUGIN_HOST_SYMBOLS replaces the inline
  
untime_syms array on macOS too.

- compile.rs: dylib link path on Windows now uses lld-link with
  /FORCE:UNRESOLVED (the .def file lists plugin_activate /
  perry_plugin_abi_version / plugin_deactivate so lld-link's empty
  default export table doesn't break GetProcAddress in the host) and
  writes a per-build .def file with the same three symbols. Lifts the
  pre-existing 'use link.exe' out for the LLVM-friendly linker.

- codegen/entry.rs: re-apply the dylib plugin ABI shim block (emits
  perry_plugin_abi_version, plugin_activate, and the optional
  plugin_deactivate) on top of main's refactor of compile_module_entry.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant