Skip to content

perf(codegen): outline per-new-site inline allocator (smaller IR + ~17% faster)#5294

Merged
proggeramlug merged 4 commits into
mainfrom
perf/codegen-outline-new-alloc
Jun 17, 2026
Merged

perf(codegen): outline per-new-site inline allocator (smaller IR + ~17% faster)#5294
proggeramlug merged 4 commits into
mainfrom
perf/codegen-outline-new-alloc

Conversation

@proggeramlug

@proggeramlug proggeramlug commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

What

Every new C() site inlined ~50 lines of bump-allocator IR — load arena state, bump the offset, fast/slow/merge blocks, write the GC + object headers, zero-fill the field slots. Every input is a per-class compile-time constant, so the sequence is identical across all sites of a class. On a large minified bundle (the 13MB @anthropic-ai/claude-code cli.js) this is a dominant source of codegen bloat — millions of IR lines, enough to stall LLVM IR-gen + clang.

Replace the per-site inline bump with a single call to the already-existing runtime js_object_alloc_class_inline_keys, which performs the identical alloc + header init + slot zero-fill. Default on; PERRY_INLINE_NEW=1 opts back into the old inline path.

It's a win on both axes (no speed tradeoff)

Measured on an 8M-allocation loop at -O2:

inline (old) outline (new)
8M-alloc loop 7030 ms 5832 ms (~17% faster)
IR / new site baseline −45 lines

The inline bump-allocator was a pessimization at scale: inlining ~50 lines at every site bloated the hot loop and hurt icache / register allocation / LLVM optimization more than the saved call. Outlining to the tight runtime helper is smaller IR and faster.

Tests

cargo test -p perry-codegen --tests green on the default (outline) path. Output matches Node for plain fields, inheritance (super), and arrays of instances. The PERRY_INLINE_NEW=1 opt-out preserves the old path for comparison.

Context

First of several codegen size-optimizations found while compiling a real 13MB app to native — outlining pervasive inline sequences (allocators, and next the constructor + property/field IC diamonds) to make large-bundle IR tractable for clang without sacrificing runtime speed.

Summary by CodeRabbit

  • New Features
    • Improved new ClassName(...) code generation for recursive field-initializer application across inheritance, with correct treatment of string-key and computed-key fields.
  • Bug Fixes
    • Inline class instance allocation now initializes all field slots to undefined, avoiding stale arena values.
    • Field initializers no longer overwrite non-target capture fields, and this-capturing closure initializers are patched correctly.
  • Chores
    • Added PERRY_INLINE_NEW environment variable to control the inline class allocation strategy.

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR extracts recursive field-initializer lowering logic into a dedicated field_init module with FieldInitMode and apply_field_initializers_recursive, updates the call-lowering module re-exports to reflect that move, gates the class-allocation inline fast path in lower_new behind a PERRY_INLINE_NEW environment variable (outlined to js_object_alloc_class_inline_keys when absent, fully inlined when set), and adds explicit field-slot initialization to the outlined allocator.

Changes

Field Initializers Module Extraction and Allocation Path Gating

Layer / File(s) Summary
Field initializer application logic
crates/perry-codegen/src/lower_call/field_init.rs
New module defines FieldInitMode enum controlling inheritance-chain selection (All, AncestorsOnly, SelfOnly, UpToInclusive, BetweenExclusiveTo, AfterRoot), and implements apply_field_initializers_recursive to construct and filter the inheritance chain using ctx.class_init_chains when available or fallback traversal, resolve per-class field lists with authoritative overrides, build separate string-key and computed-key initializer sets, convert missing initializers to Expr::Undefined, skip __perry_cap_* capture-field clobbering, and handle closures capturing this by recomputing auto-captures, patching the reserved capture slot, and directly storing the patched closure.
Module re-exports
crates/perry-codegen/src/lower_call/mod.rs
Declares field_init submodule and re-exports apply_field_initializers_recursive and FieldInitMode from field_init instead of new, narrowing the new re-exports to constructor/binding helpers (bind_inline_constructor_params, lower_new, restore_inline_constructor_scope).
PERRY_INLINE_NEW allocation path control
crates/perry-codegen/src/lower_call/new.rs
Updates imports to source field-initializer helpers from the relocated field_init module and removes old in-file definitions. Wraps the class_keys_globals fast path in an std::env::var_os("PERRY_INLINE_NEW").is_none() check: when the env var is absent, loads/caches the keys_array global and emits a call to the outlined runtime helper js_object_alloc_class_inline_keys; when set, preserves the existing inlined bump-allocation IR with arena state updates, header initialization, keys pointer storage, and field slot zeroing.
Inline keys allocator field-slot initialization
crates/perry-runtime/src/object/alloc.rs
js_object_alloc_class_inline_keys now explicitly initializes all physically allocated field slots (max(field_count, 8) slots) to JSValue::undefined() before GC layout initialization, aligning with the parent allocator's safety behavior and supporting the outlined allocation strategy.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • PerryTS/perry#5304: Both PRs modify crates/perry-codegen/src/lower_call/new.rs's lower_new decision-making to control per-new IR outlining vs. inlining via environment variables (PERRY_INLINE_NEW for allocation vs. PERRY_INLINE_CTOR for constructor calls), reducing hot-path duplication.

Poem

🐇 A new module hops into the codegen fold,
Field initializers, their logic to behold,
The allocator's path splits left or right—
When env vars whisper, we outline in flight,
Initialize slots with care and delight! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description fails to follow the repository template guidelines; the author edited Cargo.toml, CLAUDE.md, and CHANGELOG.md despite explicit prohibitions in the template. Remove version bumps from Cargo.toml and CLAUDE.md, and remove the CHANGELOG.md entry; let the maintainer handle these at merge time per CONTRIBUTING.md.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: outlining per-new-site allocators with specific benefits (smaller IR, ~17% faster).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/codegen-outline-new-alloc

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/perry-codegen/src/lower_call/new.rs`:
- Around line 825-860: The outlined allocation path using
js_object_alloc_class_inline_keys does not initialize object slots after
allocation, leaving stale arena bytes in freshly allocated instances. After the
ctx.block().call() to js_object_alloc_class_inline_keys, add slot zero-fill code
to ensure all slots are properly initialized to undefined, matching the behavior
of the inline bump-allocator branch. This should iterate through the field_count
and set each slot to undefined/zero to prevent stale data retention.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 2c22181a-a3c0-47bc-b0ef-23da4b132074

📥 Commits

Reviewing files that changed from the base of the PR and between 3b75ae1 and 57b69f9.

📒 Files selected for processing (1)
  • crates/perry-codegen/src/lower_call/new.rs

Comment thread crates/perry-codegen/src/lower_call/new.rs

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/perry-codegen/src/lower_call/field_init.rs (1)

298-312: ⚖️ Poor tradeoff

Computed-key fields with captures_this closures are not patched.

The comment at lines 302-304 acknowledges this gap. If a computed-key field like [Symbol.for("k")] = () => this.value appears in real code, the closure's this capture slot would remain uninitialized (0.0), causing a SIGSEGV when the arrow invokes this.value.

Consider adding a TODO/FIXME or, if feasible, extending the patching logic from the string-key branch to handle this case.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/perry-codegen/src/lower_call/field_init.rs` around lines 298 - 312,
The loop processing `init_pairs_computed` does not handle closures that capture
`this`, which can cause runtime errors when such closures are invoked. Either
add a TODO or FIXME comment documenting this known limitation at the location of
the `for (key_expr, init_expr) in init_pairs_computed` loop, or examine how the
string-keyed loop above handles `captures_this` closures and apply the same
patching logic to the computed-key branch to properly initialize the closure's
`this` capture slot.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/perry-codegen/src/lower_call/field_init.rs`:
- Around line 249-253: The fallback behavior in the code where
`ctx.this_stack.last()` is empty falls back to `double_literal(0.0)`, which when
later bitcast and used as a pointer in `js_object_set_field_by_name` would cause
a null-pointer dereference. Since this code path should be unreachable when
callers properly push `this` onto the stack before invoking this function,
replace the else clause that returns `double_literal(0.0)` with `unreachable!()`
to immediately surface any misuse rather than allowing a silent runtime crash.
This defensive change catches caller errors at the point of invocation.

---

Nitpick comments:
In `@crates/perry-codegen/src/lower_call/field_init.rs`:
- Around line 298-312: The loop processing `init_pairs_computed` does not handle
closures that capture `this`, which can cause runtime errors when such closures
are invoked. Either add a TODO or FIXME comment documenting this known
limitation at the location of the `for (key_expr, init_expr) in
init_pairs_computed` loop, or examine how the string-keyed loop above handles
`captures_this` closures and apply the same patching logic to the computed-key
branch to properly initialize the closure's `this` capture slot.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 1e7b0f94-933c-4e62-b71d-7d2d68892a33

📥 Commits

Reviewing files that changed from the base of the PR and between 57b69f9 and 440cf08.

📒 Files selected for processing (4)
  • crates/perry-codegen/src/lower_call/field_init.rs
  • crates/perry-codegen/src/lower_call/mod.rs
  • crates/perry-codegen/src/lower_call/new.rs
  • crates/perry-runtime/src/object/alloc.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/perry-codegen/src/lower_call/new.rs

Comment on lines +249 to +253
let this_val = if let Some(slot) = ctx.this_stack.last().cloned() {
ctx.block().load(DOUBLE, &slot)
} else {
double_literal(0.0)
};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fallback to 0.0 for empty this_stack would cause null-pointer dereference.

If this_stack is empty, the code falls back to double_literal(0.0), which when bitcast and used as a pointer in js_object_set_field_by_name (line 282) would dereference address 0. While this path should be unreachable in practice (callers push this before invoking this function), a defensive assertion or unreachable!() would surface misuse immediately rather than producing a runtime crash.

Proposed fix
                 // Read the current `this` from the constructor's this_stack.
-                let this_val = if let Some(slot) = ctx.this_stack.last().cloned() {
-                    ctx.block().load(DOUBLE, &slot)
-                } else {
-                    double_literal(0.0)
-                };
+                let this_val = ctx
+                    .this_stack
+                    .last()
+                    .map(|slot| ctx.block().load(DOUBLE, slot))
+                    .expect("apply_field_initializers_recursive: this_stack empty during captures_this closure init");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let this_val = if let Some(slot) = ctx.this_stack.last().cloned() {
ctx.block().load(DOUBLE, &slot)
} else {
double_literal(0.0)
};
let this_val = ctx
.this_stack
.last()
.map(|slot| ctx.block().load(DOUBLE, slot))
.expect("apply_field_initializers_recursive: this_stack empty during captures_this closure init");
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/perry-codegen/src/lower_call/field_init.rs` around lines 249 - 253,
The fallback behavior in the code where `ctx.this_stack.last()` is empty falls
back to `double_literal(0.0)`, which when later bitcast and used as a pointer in
`js_object_set_field_by_name` would cause a null-pointer dereference. Since this
code path should be unreachable when callers properly push `this` onto the stack
before invoking this function, replace the else clause that returns
`double_literal(0.0)` with `unreachable!()` to immediately surface any misuse
rather than allowing a silent runtime crash. This defensive change catches
caller errors at the point of invocation.

Ralph Küpper and others added 4 commits June 17, 2026 13:47
…ster)

Every `new C()` site inlined ~50 lines of bump-allocator IR (load arena state,
bump offset, fast/slow/merge, write GC+object headers, zero-fill slots) — all
per-class compile-time constants, identical across sites. On a 13MB minified
bundle this is a dominant source of codegen bloat (millions of IR lines).

Replace the per-site inline bump with a single call to the existing runtime
`js_object_alloc_class_inline_keys`, which performs the identical alloc + header
init + slot zero-fill. Default on; opt back into the inline path with
PERRY_INLINE_NEW=1.

Measured (8M-allocation loop, -O2): inline 7030ms -> outline 5832ms (~17% FASTER),
and -45 IR lines per new-site. The inline bump was a pessimization at scale — it
bloated the hot loop, hurting icache/regalloc/LLVM-opt more than the saved call.
perry-codegen suite green on the default (outline) path; output matches Node for
fields, inheritance (super), and arrays of instances.
The outlined `new C()` allocator path (default since this PR) calls
js_object_alloc_class_inline_keys, which initialized only the object
header and left the field slots holding recycled arena bytes. The
inline bump path (PERRY_INLINE_NEW=1) and json/parser.rs both zero-fill
by hand precisely because the helper didn't — so the new default path
regressed: a field read-before-write, or a GC scan of a
still-constructing instance, could observe stale bytes from a
previously-freed object (the #4717 `marked` "Cannot read properties of
undefined" failure mode).

Fold the slot zero-fill into the helper itself (mirroring
js_object_alloc_with_parent) so every caller — inline path, JSON
parser, class_registry, and the outlined codegen path — is correct by
construction. Verified the outlined path now matches Node and the
PERRY_INLINE_NEW=1 path for plain fields, inheritance, and
read-before-write.

Also reindent the inline branch so `cargo fmt --check` (the failing
lint job) passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011CDqAsmTvG7kwRTE1YPTZL
…00-LOC gate

The per-new-site allocator outlining pushed lower_call/new.rs to 2038 lines,
tripping the check_file_size.sh CI gate. Move FieldInitMode + the
apply_field_initializers_recursive walker (pure move, no behavior change)
into a sibling field_init.rs and re-export from mod.rs; new.rs drops to 1736.
@proggeramlug proggeramlug force-pushed the perf/codegen-outline-new-alloc branch from 440cf08 to d0ed314 Compare June 17, 2026 11:49

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CLAUDE.md`:
- Line 11: Revert the version bump in this PR as it should only be included
during actual release shipments. Change the **Current Version:** value in
CLAUDE.md back to its previous version number (before 0.5.1178), and similarly
revert the [workspace.package].version field in Cargo.toml back to its original
value prior to this PR's changes. Version updates should only be made by
maintainers at merge time, not by contributors in feature PRs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 489eca14-e396-40ae-9481-ccb1d53ad2fb

📥 Commits

Reviewing files that changed from the base of the PR and between 440cf08 and d0ed314.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • CHANGELOG.md
  • CLAUDE.md
  • Cargo.toml
  • crates/perry-codegen/src/lower_call/field_init.rs
  • crates/perry-codegen/src/lower_call/mod.rs
  • crates/perry-codegen/src/lower_call/new.rs
  • crates/perry-runtime/src/object/alloc.rs
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (4)
  • crates/perry-codegen/src/lower_call/mod.rs
  • crates/perry-runtime/src/object/alloc.rs
  • crates/perry-codegen/src/lower_call/new.rs
  • crates/perry-codegen/src/lower_call/field_init.rs

Comment thread CLAUDE.md
Perry is a native TypeScript compiler written in Rust that compiles TypeScript source code directly to native executables. It uses SWC for TypeScript parsing and LLVM for code generation.

**Current Version:** 0.5.1177
**Current Version:** 0.5.1178

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Revert release-version metadata from this non-release PR.

This bump should not be included here unless this PR is the actual release shipment. Please revert CLAUDE.md Line 11 and Cargo.toml Line 218 in this PR to avoid version-collision churn in active review branches.

Suggested revert
diff --git a/CLAUDE.md b/CLAUDE.md
-**Current Version:** 0.5.1178
+**Current Version:** 0.5.1177

diff --git a/Cargo.toml b/Cargo.toml
-version = "0.5.1178"
+version = "0.5.1177"

As per coding guidelines, “Only bump [workspace.package].version in Cargo.toml and **Current Version:** in CLAUDE.md when shipping a release.”
Based on learnings, “External contributor PRs should NOT modify [workspace.package].version in Cargo.toml or **Current Version:** in CLAUDE.md; maintainer does this at merge time.”

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
**Current Version:** 0.5.1178
**Current Version:** 0.5.1177
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CLAUDE.md` at line 11, Revert the version bump in this PR as it should only
be included during actual release shipments. Change the **Current Version:**
value in CLAUDE.md back to its previous version number (before 0.5.1178), and
similarly revert the [workspace.package].version field in Cargo.toml back to its
original value prior to this PR's changes. Version updates should only be made
by maintainers at merge time, not by contributors in feature PRs.

Sources: Coding guidelines, Learnings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants