perf(codegen): outline per-new-site inline allocator (smaller IR + ~17% faster)#5294
Conversation
📝 WalkthroughWalkthroughThis PR extracts recursive field-initializer lowering logic into a dedicated ChangesField Initializers Module Extraction and Allocation Path Gating
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/perry-codegen/src/lower_call/new.rs`:
- Around line 825-860: The outlined allocation path using
js_object_alloc_class_inline_keys does not initialize object slots after
allocation, leaving stale arena bytes in freshly allocated instances. After the
ctx.block().call() to js_object_alloc_class_inline_keys, add slot zero-fill code
to ensure all slots are properly initialized to undefined, matching the behavior
of the inline bump-allocator branch. This should iterate through the field_count
and set each slot to undefined/zero to prevent stale data retention.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 2c22181a-a3c0-47bc-b0ef-23da4b132074
📒 Files selected for processing (1)
crates/perry-codegen/src/lower_call/new.rs
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
crates/perry-codegen/src/lower_call/field_init.rs (1)
298-312: ⚖️ Poor tradeoffComputed-key fields with
captures_thisclosures are not patched.The comment at lines 302-304 acknowledges this gap. If a computed-key field like
[Symbol.for("k")] = () => this.valueappears in real code, the closure'sthiscapture slot would remain uninitialized (0.0), causing a SIGSEGV when the arrow invokesthis.value.Consider adding a TODO/FIXME or, if feasible, extending the patching logic from the string-key branch to handle this case.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/perry-codegen/src/lower_call/field_init.rs` around lines 298 - 312, The loop processing `init_pairs_computed` does not handle closures that capture `this`, which can cause runtime errors when such closures are invoked. Either add a TODO or FIXME comment documenting this known limitation at the location of the `for (key_expr, init_expr) in init_pairs_computed` loop, or examine how the string-keyed loop above handles `captures_this` closures and apply the same patching logic to the computed-key branch to properly initialize the closure's `this` capture slot.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/perry-codegen/src/lower_call/field_init.rs`:
- Around line 249-253: The fallback behavior in the code where
`ctx.this_stack.last()` is empty falls back to `double_literal(0.0)`, which when
later bitcast and used as a pointer in `js_object_set_field_by_name` would cause
a null-pointer dereference. Since this code path should be unreachable when
callers properly push `this` onto the stack before invoking this function,
replace the else clause that returns `double_literal(0.0)` with `unreachable!()`
to immediately surface any misuse rather than allowing a silent runtime crash.
This defensive change catches caller errors at the point of invocation.
---
Nitpick comments:
In `@crates/perry-codegen/src/lower_call/field_init.rs`:
- Around line 298-312: The loop processing `init_pairs_computed` does not handle
closures that capture `this`, which can cause runtime errors when such closures
are invoked. Either add a TODO or FIXME comment documenting this known
limitation at the location of the `for (key_expr, init_expr) in
init_pairs_computed` loop, or examine how the string-keyed loop above handles
`captures_this` closures and apply the same patching logic to the computed-key
branch to properly initialize the closure's `this` capture slot.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 1e7b0f94-933c-4e62-b71d-7d2d68892a33
📒 Files selected for processing (4)
crates/perry-codegen/src/lower_call/field_init.rscrates/perry-codegen/src/lower_call/mod.rscrates/perry-codegen/src/lower_call/new.rscrates/perry-runtime/src/object/alloc.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/perry-codegen/src/lower_call/new.rs
| let this_val = if let Some(slot) = ctx.this_stack.last().cloned() { | ||
| ctx.block().load(DOUBLE, &slot) | ||
| } else { | ||
| double_literal(0.0) | ||
| }; |
There was a problem hiding this comment.
Fallback to 0.0 for empty this_stack would cause null-pointer dereference.
If this_stack is empty, the code falls back to double_literal(0.0), which when bitcast and used as a pointer in js_object_set_field_by_name (line 282) would dereference address 0. While this path should be unreachable in practice (callers push this before invoking this function), a defensive assertion or unreachable!() would surface misuse immediately rather than producing a runtime crash.
Proposed fix
// Read the current `this` from the constructor's this_stack.
- let this_val = if let Some(slot) = ctx.this_stack.last().cloned() {
- ctx.block().load(DOUBLE, &slot)
- } else {
- double_literal(0.0)
- };
+ let this_val = ctx
+ .this_stack
+ .last()
+ .map(|slot| ctx.block().load(DOUBLE, slot))
+ .expect("apply_field_initializers_recursive: this_stack empty during captures_this closure init");📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let this_val = if let Some(slot) = ctx.this_stack.last().cloned() { | |
| ctx.block().load(DOUBLE, &slot) | |
| } else { | |
| double_literal(0.0) | |
| }; | |
| let this_val = ctx | |
| .this_stack | |
| .last() | |
| .map(|slot| ctx.block().load(DOUBLE, slot)) | |
| .expect("apply_field_initializers_recursive: this_stack empty during captures_this closure init"); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/perry-codegen/src/lower_call/field_init.rs` around lines 249 - 253,
The fallback behavior in the code where `ctx.this_stack.last()` is empty falls
back to `double_literal(0.0)`, which when later bitcast and used as a pointer in
`js_object_set_field_by_name` would cause a null-pointer dereference. Since this
code path should be unreachable when callers properly push `this` onto the stack
before invoking this function, replace the else clause that returns
`double_literal(0.0)` with `unreachable!()` to immediately surface any misuse
rather than allowing a silent runtime crash. This defensive change catches
caller errors at the point of invocation.
…ster) Every `new C()` site inlined ~50 lines of bump-allocator IR (load arena state, bump offset, fast/slow/merge, write GC+object headers, zero-fill slots) — all per-class compile-time constants, identical across sites. On a 13MB minified bundle this is a dominant source of codegen bloat (millions of IR lines). Replace the per-site inline bump with a single call to the existing runtime `js_object_alloc_class_inline_keys`, which performs the identical alloc + header init + slot zero-fill. Default on; opt back into the inline path with PERRY_INLINE_NEW=1. Measured (8M-allocation loop, -O2): inline 7030ms -> outline 5832ms (~17% FASTER), and -45 IR lines per new-site. The inline bump was a pessimization at scale — it bloated the hot loop, hurting icache/regalloc/LLVM-opt more than the saved call. perry-codegen suite green on the default (outline) path; output matches Node for fields, inheritance (super), and arrays of instances.
The outlined `new C()` allocator path (default since this PR) calls js_object_alloc_class_inline_keys, which initialized only the object header and left the field slots holding recycled arena bytes. The inline bump path (PERRY_INLINE_NEW=1) and json/parser.rs both zero-fill by hand precisely because the helper didn't — so the new default path regressed: a field read-before-write, or a GC scan of a still-constructing instance, could observe stale bytes from a previously-freed object (the #4717 `marked` "Cannot read properties of undefined" failure mode). Fold the slot zero-fill into the helper itself (mirroring js_object_alloc_with_parent) so every caller — inline path, JSON parser, class_registry, and the outlined codegen path — is correct by construction. Verified the outlined path now matches Node and the PERRY_INLINE_NEW=1 path for plain fields, inheritance, and read-before-write. Also reindent the inline branch so `cargo fmt --check` (the failing lint job) passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011CDqAsmTvG7kwRTE1YPTZL
…00-LOC gate The per-new-site allocator outlining pushed lower_call/new.rs to 2038 lines, tripping the check_file_size.sh CI gate. Move FieldInitMode + the apply_field_initializers_recursive walker (pure move, no behavior change) into a sibling field_init.rs and re-export from mod.rs; new.rs drops to 1736.
440cf08 to
d0ed314
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@CLAUDE.md`:
- Line 11: Revert the version bump in this PR as it should only be included
during actual release shipments. Change the **Current Version:** value in
CLAUDE.md back to its previous version number (before 0.5.1178), and similarly
revert the [workspace.package].version field in Cargo.toml back to its original
value prior to this PR's changes. Version updates should only be made by
maintainers at merge time, not by contributors in feature PRs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 489eca14-e396-40ae-9481-ccb1d53ad2fb
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
CHANGELOG.mdCLAUDE.mdCargo.tomlcrates/perry-codegen/src/lower_call/field_init.rscrates/perry-codegen/src/lower_call/mod.rscrates/perry-codegen/src/lower_call/new.rscrates/perry-runtime/src/object/alloc.rs
✅ Files skipped from review due to trivial changes (1)
- CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (4)
- crates/perry-codegen/src/lower_call/mod.rs
- crates/perry-runtime/src/object/alloc.rs
- crates/perry-codegen/src/lower_call/new.rs
- crates/perry-codegen/src/lower_call/field_init.rs
| Perry is a native TypeScript compiler written in Rust that compiles TypeScript source code directly to native executables. It uses SWC for TypeScript parsing and LLVM for code generation. | ||
|
|
||
| **Current Version:** 0.5.1177 | ||
| **Current Version:** 0.5.1178 |
There was a problem hiding this comment.
Revert release-version metadata from this non-release PR.
This bump should not be included here unless this PR is the actual release shipment. Please revert CLAUDE.md Line 11 and Cargo.toml Line 218 in this PR to avoid version-collision churn in active review branches.
Suggested revert
diff --git a/CLAUDE.md b/CLAUDE.md
-**Current Version:** 0.5.1178
+**Current Version:** 0.5.1177
diff --git a/Cargo.toml b/Cargo.toml
-version = "0.5.1178"
+version = "0.5.1177"As per coding guidelines, “Only bump [workspace.package].version in Cargo.toml and **Current Version:** in CLAUDE.md when shipping a release.”
Based on learnings, “External contributor PRs should NOT modify [workspace.package].version in Cargo.toml or **Current Version:** in CLAUDE.md; maintainer does this at merge time.”
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| **Current Version:** 0.5.1178 | |
| **Current Version:** 0.5.1177 |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@CLAUDE.md` at line 11, Revert the version bump in this PR as it should only
be included during actual release shipments. Change the **Current Version:**
value in CLAUDE.md back to its previous version number (before 0.5.1178), and
similarly revert the [workspace.package].version field in Cargo.toml back to its
original value prior to this PR's changes. Version updates should only be made
by maintainers at merge time, not by contributors in feature PRs.
Sources: Coding guidelines, Learnings
What
Every
new C()site inlined ~50 lines of bump-allocator IR — load arena state, bump the offset, fast/slow/merge blocks, write the GC + object headers, zero-fill the field slots. Every input is a per-class compile-time constant, so the sequence is identical across all sites of a class. On a large minified bundle (the 13MB@anthropic-ai/claude-codecli.js) this is a dominant source of codegen bloat — millions of IR lines, enough to stall LLVM IR-gen + clang.Replace the per-site inline bump with a single call to the already-existing runtime
js_object_alloc_class_inline_keys, which performs the identical alloc + header init + slot zero-fill. Default on;PERRY_INLINE_NEW=1opts back into the old inline path.It's a win on both axes (no speed tradeoff)
Measured on an 8M-allocation loop at -O2:
newsiteThe inline bump-allocator was a pessimization at scale: inlining ~50 lines at every site bloated the hot loop and hurt icache / register allocation / LLVM optimization more than the saved call. Outlining to the tight runtime helper is smaller IR and faster.
Tests
cargo test -p perry-codegen --testsgreen on the default (outline) path. Output matches Node for plain fields, inheritance (super), and arrays of instances. ThePERRY_INLINE_NEW=1opt-out preserves the old path for comparison.Context
First of several codegen size-optimizations found while compiling a real 13MB app to native — outlining pervasive inline sequences (allocators, and next the constructor + property/field IC diamonds) to make large-bundle IR tractable for clang without sacrificing runtime speed.
Summary by CodeRabbit
new ClassName(...)code generation for recursive field-initializer application across inheritance, with correct treatment of string-key and computed-key fields.undefined, avoiding stale arena values.this-capturing closure initializers are patched correctly.PERRY_INLINE_NEWenvironment variable to control the inline class allocation strategy.