perf(codegen): outline per-new-site inline allocator (smaller IR + ~17% faster) by proggeramlug · Pull Request #5294 · PerryTS/perry

proggeramlug · 2026-06-17T04:48:21Z

What

Every new C() site inlined ~50 lines of bump-allocator IR — load arena state, bump the offset, fast/slow/merge blocks, write the GC + object headers, zero-fill the field slots. Every input is a per-class compile-time constant, so the sequence is identical across all sites of a class. On a large minified bundle (the 13MB @anthropic-ai/claude-code cli.js) this is a dominant source of codegen bloat — millions of IR lines, enough to stall LLVM IR-gen + clang.

Replace the per-site inline bump with a single call to the already-existing runtime js_object_alloc_class_inline_keys, which performs the identical alloc + header init + slot zero-fill. Default on; PERRY_INLINE_NEW=1 opts back into the old inline path.

It's a win on both axes (no speed tradeoff)

Measured on an 8M-allocation loop at -O2:

	inline (old)	outline (new)
8M-alloc loop	7030 ms	5832 ms (~17% faster)
IR / `new` site	baseline	−45 lines

The inline bump-allocator was a pessimization at scale: inlining ~50 lines at every site bloated the hot loop and hurt icache / register allocation / LLVM optimization more than the saved call. Outlining to the tight runtime helper is smaller IR and faster.

Tests

cargo test -p perry-codegen --tests green on the default (outline) path. Output matches Node for plain fields, inheritance (super), and arrays of instances. The PERRY_INLINE_NEW=1 opt-out preserves the old path for comparison.

Context

First of several codegen size-optimizations found while compiling a real 13MB app to native — outlining pervasive inline sequences (allocators, and next the constructor + property/field IC diamonds) to make large-bundle IR tractable for clang without sacrificing runtime speed.

Summary by CodeRabbit

New Features
- Improved new ClassName(...) code generation for recursive field-initializer application across inheritance, with correct treatment of string-key and computed-key fields.
Bug Fixes
- Inline class instance allocation now initializes all field slots to undefined, avoiding stale arena values.
- Field initializers no longer overwrite non-target capture fields, and this-capturing closure initializers are patched correctly.
Chores
- Added PERRY_INLINE_NEW environment variable to control the inline class allocation strategy.

coderabbitai · 2026-06-17T04:48:44Z

📝 Walkthrough

Walkthrough

This PR extracts recursive field-initializer lowering logic into a dedicated field_init module with FieldInitMode and apply_field_initializers_recursive, updates the call-lowering module re-exports to reflect that move, gates the class-allocation inline fast path in lower_new behind a PERRY_INLINE_NEW environment variable (outlined to js_object_alloc_class_inline_keys when absent, fully inlined when set), and adds explicit field-slot initialization to the outlined allocator.

Changes

Field Initializers Module Extraction and Allocation Path Gating

Layer / File(s)	Summary
Field initializer application logic `crates/perry-codegen/src/lower_call/field_init.rs`	New module defines `FieldInitMode` enum controlling inheritance-chain selection (All, AncestorsOnly, SelfOnly, UpToInclusive, BetweenExclusiveTo, AfterRoot), and implements `apply_field_initializers_recursive` to construct and filter the inheritance chain using `ctx.class_init_chains` when available or fallback traversal, resolve per-class field lists with authoritative overrides, build separate string-key and computed-key initializer sets, convert missing initializers to `Expr::Undefined`, skip `__perry_cap_*` capture-field clobbering, and handle closures capturing `this` by recomputing auto-captures, patching the reserved capture slot, and directly storing the patched closure.
Module re-exports `crates/perry-codegen/src/lower_call/mod.rs`	Declares `field_init` submodule and re-exports `apply_field_initializers_recursive` and `FieldInitMode` from `field_init` instead of `new`, narrowing the `new` re-exports to constructor/binding helpers (`bind_inline_constructor_params`, `lower_new`, `restore_inline_constructor_scope`).
PERRY_INLINE_NEW allocation path control `crates/perry-codegen/src/lower_call/new.rs`	Updates imports to source field-initializer helpers from the relocated `field_init` module and removes old in-file definitions. Wraps the `class_keys_globals` fast path in an `std::env::var_os("PERRY_INLINE_NEW").is_none()` check: when the env var is absent, loads/caches the `keys_array` global and emits a call to the outlined runtime helper `js_object_alloc_class_inline_keys`; when set, preserves the existing inlined bump-allocation IR with arena state updates, header initialization, keys pointer storage, and field slot zeroing.
Inline keys allocator field-slot initialization `crates/perry-runtime/src/object/alloc.rs`	`js_object_alloc_class_inline_keys` now explicitly initializes all physically allocated field slots (max(field_count, 8) slots) to `JSValue::undefined()` before GC layout initialization, aligning with the parent allocator's safety behavior and supporting the outlined allocation strategy.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

PerryTS/perry#5304: Both PRs modify crates/perry-codegen/src/lower_call/new.rs's lower_new decision-making to control per-new IR outlining vs. inlining via environment variables (PERRY_INLINE_NEW for allocation vs. PERRY_INLINE_CTOR for constructor calls), reducing hot-path duplication.

Poem

🐇 A new module hops into the codegen fold,
Field initializers, their logic to behold,
The allocator's path splits left or right—
When env vars whisper, we outline in flight,
Initialize slots with care and delight! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description fails to follow the repository template guidelines; the author edited Cargo.toml, CLAUDE.md, and CHANGELOG.md despite explicit prohibitions in the template.	Remove version bumps from Cargo.toml and CLAUDE.md, and remove the CHANGELOG.md entry; let the maintainer handle these at merge time per CONTRIBUTING.md.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: outlining per-new-site allocators with specific benefits (smaller IR, ~17% faster).
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/codegen-outline-new-alloc

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/perry-codegen/src/lower_call/new.rs`:
- Around line 825-860: The outlined allocation path using
js_object_alloc_class_inline_keys does not initialize object slots after
allocation, leaving stale arena bytes in freshly allocated instances. After the
ctx.block().call() to js_object_alloc_class_inline_keys, add slot zero-fill code
to ensure all slots are properly initialized to undefined, matching the behavior
of the inline bump-allocator branch. This should iterate through the field_count
and set each slot to undefined/zero to prevent stale data retention.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 2c22181a-a3c0-47bc-b0ef-23da4b132074

📥 Commits

Reviewing files that changed from the base of the PR and between 3b75ae1 and 57b69f9.

📒 Files selected for processing (1)

crates/perry-codegen/src/lower_call/new.rs

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

crates/perry-codegen/src/lower_call/field_init.rs (1)
298-312: ⚖️ Poor tradeoff

Computed-key fields with captures_this closures are not patched.

The comment at lines 302-304 acknowledges this gap. If a computed-key field like [Symbol.for("k")] = () => this.value appears in real code, the closure's this capture slot would remain uninitialized (0.0), causing a SIGSEGV when the arrow invokes this.value.

Consider adding a TODO/FIXME or, if feasible, extending the patching logic from the string-key branch to handle this case.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/perry-codegen/src/lower_call/field_init.rs` around lines 298 - 312,
The loop processing `init_pairs_computed` does not handle closures that capture
`this`, which can cause runtime errors when such closures are invoked. Either
add a TODO or FIXME comment documenting this known limitation at the location of
the `for (key_expr, init_expr) in init_pairs_computed` loop, or examine how the
string-keyed loop above handles `captures_this` closures and apply the same
patching logic to the computed-key branch to properly initialize the closure's
`this` capture slot.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/perry-codegen/src/lower_call/field_init.rs`:
- Around line 249-253: The fallback behavior in the code where
`ctx.this_stack.last()` is empty falls back to `double_literal(0.0)`, which when
later bitcast and used as a pointer in `js_object_set_field_by_name` would cause
a null-pointer dereference. Since this code path should be unreachable when
callers properly push `this` onto the stack before invoking this function,
replace the else clause that returns `double_literal(0.0)` with `unreachable!()`
to immediately surface any misuse rather than allowing a silent runtime crash.
This defensive change catches caller errors at the point of invocation.

---

Nitpick comments:
In `@crates/perry-codegen/src/lower_call/field_init.rs`:
- Around line 298-312: The loop processing `init_pairs_computed` does not handle
closures that capture `this`, which can cause runtime errors when such closures
are invoked. Either add a TODO or FIXME comment documenting this known
limitation at the location of the `for (key_expr, init_expr) in
init_pairs_computed` loop, or examine how the string-keyed loop above handles
`captures_this` closures and apply the same patching logic to the computed-key
branch to properly initialize the closure's `this` capture slot.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 1e7b0f94-933c-4e62-b71d-7d2d68892a33

📥 Commits

Reviewing files that changed from the base of the PR and between 57b69f9 and 440cf08.

📒 Files selected for processing (4)

crates/perry-codegen/src/lower_call/field_init.rs
crates/perry-codegen/src/lower_call/mod.rs
crates/perry-codegen/src/lower_call/new.rs
crates/perry-runtime/src/object/alloc.rs

🚧 Files skipped from review as they are similar to previous changes (1)

crates/perry-codegen/src/lower_call/new.rs

coderabbitai · 2026-06-17T11:17:06Z

+                let this_val = if let Some(slot) = ctx.this_stack.last().cloned() {
+                    ctx.block().load(DOUBLE, &slot)
+                } else {
+                    double_literal(0.0)
+                };


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fallback to 0.0 for empty this_stack would cause null-pointer dereference.

If this_stack is empty, the code falls back to double_literal(0.0), which when bitcast and used as a pointer in js_object_set_field_by_name (line 282) would dereference address 0. While this path should be unreachable in practice (callers push this before invoking this function), a defensive assertion or unreachable!() would surface misuse immediately rather than producing a runtime crash.

Proposed fix

// Read the current `this` from the constructor's this_stack. - let this_val = if let Some(slot) = ctx.this_stack.last().cloned() { - ctx.block().load(DOUBLE, &slot) - } else { - double_literal(0.0) - }; + let this_val = ctx + .this_stack + .last() + .map(|slot| ctx.block().load(DOUBLE, slot)) + .expect("apply_field_initializers_recursive: this_stack empty during captures_this closure init");

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let this_val = if let Some(slot) = ctx.this_stack.last().cloned() {

ctx.block().load(DOUBLE, &slot)

} else {

double_literal(0.0)

};

let this_val = ctx

.this_stack

.last()

.map(|slot| ctx.block().load(DOUBLE, slot))

.expect("apply_field_initializers_recursive: this_stack empty during captures_this closure init");

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/perry-codegen/src/lower_call/field_init.rs` around lines 249 - 253, The fallback behavior in the code where `ctx.this_stack.last()` is empty falls back to `double_literal(0.0)`, which when later bitcast and used as a pointer in `js_object_set_field_by_name` would cause a null-pointer dereference. Since this code path should be unreachable when callers properly push `this` onto the stack before invoking this function, replace the else clause that returns `double_literal(0.0)` with `unreachable!()` to immediately surface any misuse rather than allowing a silent runtime crash. This defensive change catches caller errors at the point of invocation.

…ster) Every `new C()` site inlined ~50 lines of bump-allocator IR (load arena state, bump offset, fast/slow/merge, write GC+object headers, zero-fill slots) — all per-class compile-time constants, identical across sites. On a 13MB minified bundle this is a dominant source of codegen bloat (millions of IR lines). Replace the per-site inline bump with a single call to the existing runtime `js_object_alloc_class_inline_keys`, which performs the identical alloc + header init + slot zero-fill. Default on; opt back into the inline path with PERRY_INLINE_NEW=1. Measured (8M-allocation loop, -O2): inline 7030ms -> outline 5832ms (~17% FASTER), and -45 IR lines per new-site. The inline bump was a pessimization at scale — it bloated the hot loop, hurting icache/regalloc/LLVM-opt more than the saved call. perry-codegen suite green on the default (outline) path; output matches Node for fields, inheritance (super), and arrays of instances.

The outlined `new C()` allocator path (default since this PR) calls js_object_alloc_class_inline_keys, which initialized only the object header and left the field slots holding recycled arena bytes. The inline bump path (PERRY_INLINE_NEW=1) and json/parser.rs both zero-fill by hand precisely because the helper didn't — so the new default path regressed: a field read-before-write, or a GC scan of a still-constructing instance, could observe stale bytes from a previously-freed object (the #4717 `marked` "Cannot read properties of undefined" failure mode). Fold the slot zero-fill into the helper itself (mirroring js_object_alloc_with_parent) so every caller — inline path, JSON parser, class_registry, and the outlined codegen path — is correct by construction. Verified the outlined path now matches Node and the PERRY_INLINE_NEW=1 path for plain fields, inheritance, and read-before-write. Also reindent the inline branch so `cargo fmt --check` (the failing lint job) passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011CDqAsmTvG7kwRTE1YPTZL

…00-LOC gate The per-new-site allocator outlining pushed lower_call/new.rs to 2038 lines, tripping the check_file_size.sh CI gate. Move FieldInitMode + the apply_field_initializers_recursive walker (pure move, no behavior change) into a sibling field_init.rs and re-export from mod.rs; new.rs drops to 1736.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CLAUDE.md`:
- Line 11: Revert the version bump in this PR as it should only be included
during actual release shipments. Change the **Current Version:** value in
CLAUDE.md back to its previous version number (before 0.5.1178), and similarly
revert the [workspace.package].version field in Cargo.toml back to its original
value prior to this PR's changes. Version updates should only be made by
maintainers at merge time, not by contributors in feature PRs.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 489eca14-e396-40ae-9481-ccb1d53ad2fb

📥 Commits

Reviewing files that changed from the base of the PR and between 440cf08 and d0ed314.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (7)

CHANGELOG.md
CLAUDE.md
Cargo.toml
crates/perry-codegen/src/lower_call/field_init.rs
crates/perry-codegen/src/lower_call/mod.rs
crates/perry-codegen/src/lower_call/new.rs
crates/perry-runtime/src/object/alloc.rs

✅ Files skipped from review due to trivial changes (1)

CHANGELOG.md

🚧 Files skipped from review as they are similar to previous changes (4)

crates/perry-codegen/src/lower_call/mod.rs
crates/perry-runtime/src/object/alloc.rs
crates/perry-codegen/src/lower_call/new.rs
crates/perry-codegen/src/lower_call/field_init.rs

coderabbitai · 2026-06-17T11:56:45Z

 Perry is a native TypeScript compiler written in Rust that compiles TypeScript source code directly to native executables. It uses SWC for TypeScript parsing and LLVM for code generation.

-**Current Version:** 0.5.1177
+**Current Version:** 0.5.1178


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Revert release-version metadata from this non-release PR.

This bump should not be included here unless this PR is the actual release shipment. Please revert CLAUDE.md Line 11 and Cargo.toml Line 218 in this PR to avoid version-collision churn in active review branches.

Suggested revert

diff --git a/CLAUDE.md b/CLAUDE.md -**Current Version:** 0.5.1178 +**Current Version:** 0.5.1177 diff --git a/Cargo.toml b/Cargo.toml -version = "0.5.1178" +version = "0.5.1177"

As per coding guidelines, “Only bump [workspace.package].version in Cargo.toml and **Current Version:** in CLAUDE.md when shipping a release.”
Based on learnings, “External contributor PRs should NOT modify [workspace.package].version in Cargo.toml or **Current Version:** in CLAUDE.md; maintainer does this at merge time.”

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

**Current Version:** 0.5.1178

**Current Version:** 0.5.1177

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@CLAUDE.md` at line 11, Revert the version bump in this PR as it should only be included during actual release shipments. Change the **Current Version:** value in CLAUDE.md back to its previous version number (before 0.5.1178), and similarly revert the [workspace.package].version field in Cargo.toml back to its original value prior to this PR's changes. Version updates should only be made by maintainers at merge time, not by contributors in feature PRs.

Sources: Coding guidelines, Learnings

coderabbitai Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread crates/perry-codegen/src/lower_call/new.rs

proggeramlug mentioned this pull request Jun 17, 2026

perf(codegen): call shared constructor symbol instead of inlining per new-site (~2.5x faster, smaller IR) #5304

Merged

coderabbitai Bot reviewed Jun 17, 2026

View reviewed changes

Ralph Küpper and others added 4 commits June 17, 2026 13:47

chore: bump v0.5.1178 + changelog for per-new-site allocator outlining

d0ed314

proggeramlug force-pushed the perf/codegen-outline-new-alloc branch from 440cf08 to d0ed314 Compare June 17, 2026 11:49

coderabbitai Bot reviewed Jun 17, 2026

View reviewed changes

proggeramlug merged commit ffaa630 into main Jun 17, 2026
14 of 15 checks passed

proggeramlug deleted the perf/codegen-outline-new-alloc branch June 17, 2026 11:57

proggeramlug mentioned this pull request Jun 17, 2026

ci: move sccache off the GHA backend onto a persisted disk cache (fix cargo-test timeouts) #5324

Merged

This was referenced Jun 17, 2026

perf(codegen): outline class-field-SET guard-miss arm to one call (#5334 lever A) #5336

Closed

Representation-aware type lowering + native-ABI material evidence gate #5466

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(codegen): outline per-new-site inline allocator (smaller IR + ~17% faster)#5294

perf(codegen): outline per-new-site inline allocator (smaller IR + ~17% faster)#5294
proggeramlug merged 4 commits into
mainfrom
perf/codegen-outline-new-alloc

proggeramlug commented Jun 17, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

proggeramlug commented Jun 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

It's a win on both axes (no speed tradeoff)

Tests

Context

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

proggeramlug commented Jun 17, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading