Skip to content

perf(codegen): call shared constructor symbol instead of inlining per new-site (~2.5x faster, smaller IR)#5304

Merged
proggeramlug merged 2 commits into
mainfrom
perf/codegen-outline-ctor
Jun 17, 2026
Merged

perf(codegen): call shared constructor symbol instead of inlining per new-site (~2.5x faster, smaller IR)#5304
proggeramlug merged 2 commits into
mainfrom
perf/codegen-outline-ctor

Conversation

@proggeramlug

@proggeramlug proggeramlug commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

What

The inlined constructor body (field-init stores etc.) was the dominant per-new-site IR after the allocator — ~136 IR lines per site on a class with super + fields. Default to calling the already-emitted standalone <Class>_constructor symbol instead of inlining the ctor body at every new site, so it's emitted once. PERRY_INLINE_CTOR=1 opts back into inlining.

Restricted to classes with their own constructor and an emitted standalone symbol: no-own-ctor subclasses (class C extends B {}) stay on the inline path (the symbol-call path doesn't reproduce the inline leaf-keys/shape setup); without the symbol the call would be a no-op. Classes with super(...) / rest params round-trip correctly.

Win on both axes

8M construct-heavy allocation loop (new P(i, i+1) with this.x/this.y), -O2:

inline ctor (old) call ctor (new)
8M loop 5609 ms 2251 ms (~2.5× faster)
IR / new site baseline −136 lines

Same root cause as the allocator outline (#5294): inlining ~136 lines of ctor body at every site bloated the hot loop, hurting icache / register allocation / LLVM-opt far more than a call costs. Calling the shared symbol is smaller IR and multiples faster.

Tests

cargo test -p perry-codegen --tests green on the default (call) path. Output matches the previous inline baseline for super, rest params, and arrays of instances (including the unrelated, pre-existing no-own-ctor by-name-read quirk, which is identical on both paths and untouched here).

Context

Second of the codegen size-optimizations from compiling a real 13MB app — together with #5294 (allocator), per-new IR drops ~181 lines/site, shrinking large-bundle IR substantially while improving runtime speed.

Summary by CodeRabbit

  • Chore
    • Added support for the PERRY_INLINE_CTOR environment variable to control constructor execution routing.

…each new-site

The inlined constructor body (field-init stores etc.) was the dominant per-new-site
IR after the allocator (~136 IR lines/site on a class with super+fields). Default to
CALLING the already-emitted standalone <Class>_constructor symbol instead, emitting
the ctor body once. Opt back into inlining with PERRY_INLINE_CTOR=1.

Restricted to classes with their OWN constructor AND an emitted standalone symbol:
no-own-ctor subclasses (class C extends B {}) stay on the inline path (the symbol-call
path doesn't reproduce the inline leaf-keys/shape setup); without the symbol the call
would be a no-op. Classes with super(...)/rest params round-trip correctly.

Measured win-win vs inlining (8M construct-heavy loop, new P(i,i+1) with this.x/this.y):
inline 5609ms -> call 2251ms (~2.5x FASTER), and ~136 fewer IR lines per new-site —
the inlined ctor bloated the hot loop. perry-codegen suite green on the default (call)
path; output matches the inline baseline (incl. the unrelated pre-existing no-own-ctor
by-name-read quirk, identical on both paths) for super, rest params, and arrays.
@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: f47a8214-0e69-4218-9ffe-865004141c27

📥 Commits

Reviewing files that changed from the base of the PR and between f0bbc2d and ff0bbc8.

📒 Files selected for processing (1)
  • crates/perry-codegen/src/lower_call/new.rs

📝 Walkthrough

Walkthrough

In lower_new, a force_ctor_call boolean flag is introduced, enabled by default unless the PERRY_INLINE_CTOR environment variable is set. This flag is added as a third OR branch to the existing redirect condition (alongside class_stack recursion detection and ctor_alias_collision), routing constructor calls through the standalone *_constructor symbol by default.

Changes

Constructor Dispatch Control Flag

Layer / File(s) Summary
force_ctor_call flag and redirect condition
crates/perry-codegen/src/lower_call/new.rs
Adds force_ctor_call defaulting to true unless PERRY_INLINE_CTOR is set; expands the redirect if to class_stack recursion || ctor_alias_collision || force_ctor_call, making constructor symbol dispatch the default and inline the opt-in path.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

  • PerryTS/perry#5171: Modifies the same constructor dispatch logic in lower_call/new.rs, changing walk-stop and parameter wiring for explicit zero-arg imported constructors at the same call-resolution level.

Poem

🐇 A flag hops in, small but bright,
PERRY_INLINE_CTOR sets the light.
By default the symbol gets the call,
No inlining needed, one rule for all.
The rabbit grins — less surprise in store! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: switching from inlining constructor bodies to calling a shared constructor symbol, with quantified performance benefits (~2.5x faster, smaller IR).
Description check ✅ Passed The description covers the 'What' section explaining the change, includes quantified performance results and IR reduction, mentions testing, and provides context. However, it lacks explicit checklist items completion markers and doesn't follow the template's structured sections perfectly.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/codegen-outline-ctor

Comment @coderabbitai help to get the list of available commands and usage tips.

…pass file-size gate

new.rs crossed the 2000-line CI gate (2017 LOC) after the call-shared-
constructor change. Move the pure constructor-body predicate walkers
(ctor_body_calls_super / _closure_calls_super / _uses_this /
_has_value_return, node_stream_parent_kind, collect_decl_local_ids) into
a sibling new_helpers.rs and import them. Pure move, no behavior change;
new.rs drops to 1760 LOC.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant