perf(codegen): call shared constructor symbol instead of inlining per new-site (~2.5x faster, smaller IR)#5304
Conversation
…each new-site
The inlined constructor body (field-init stores etc.) was the dominant per-new-site
IR after the allocator (~136 IR lines/site on a class with super+fields). Default to
CALLING the already-emitted standalone <Class>_constructor symbol instead, emitting
the ctor body once. Opt back into inlining with PERRY_INLINE_CTOR=1.
Restricted to classes with their OWN constructor AND an emitted standalone symbol:
no-own-ctor subclasses (class C extends B {}) stay on the inline path (the symbol-call
path doesn't reproduce the inline leaf-keys/shape setup); without the symbol the call
would be a no-op. Classes with super(...)/rest params round-trip correctly.
Measured win-win vs inlining (8M construct-heavy loop, new P(i,i+1) with this.x/this.y):
inline 5609ms -> call 2251ms (~2.5x FASTER), and ~136 fewer IR lines per new-site —
the inlined ctor bloated the hot loop. perry-codegen suite green on the default (call)
path; output matches the inline baseline (incl. the unrelated pre-existing no-own-ctor
by-name-read quirk, identical on both paths) for super, rest params, and arrays.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughIn ChangesConstructor Dispatch Control Flag
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
…pass file-size gate new.rs crossed the 2000-line CI gate (2017 LOC) after the call-shared- constructor change. Move the pure constructor-body predicate walkers (ctor_body_calls_super / _closure_calls_super / _uses_this / _has_value_return, node_stream_parent_kind, collect_decl_local_ids) into a sibling new_helpers.rs and import them. Pure move, no behavior change; new.rs drops to 1760 LOC. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
What
The inlined constructor body (field-init stores etc.) was the dominant per-
new-site IR after the allocator — ~136 IR lines per site on a class withsuper+ fields. Default to calling the already-emitted standalone<Class>_constructorsymbol instead of inlining the ctor body at everynewsite, so it's emitted once.PERRY_INLINE_CTOR=1opts back into inlining.Restricted to classes with their own constructor and an emitted standalone symbol: no-own-ctor subclasses (
class C extends B {}) stay on the inline path (the symbol-call path doesn't reproduce the inline leaf-keys/shape setup); without the symbol the call would be a no-op. Classes withsuper(...)/ rest params round-trip correctly.Win on both axes
8M construct-heavy allocation loop (
new P(i, i+1)withthis.x/this.y), -O2:newsiteSame root cause as the allocator outline (#5294): inlining ~136 lines of ctor body at every site bloated the hot loop, hurting icache / register allocation / LLVM-opt far more than a call costs. Calling the shared symbol is smaller IR and multiples faster.
Tests
cargo test -p perry-codegen --testsgreen on the default (call) path. Output matches the previous inline baseline forsuper, rest params, and arrays of instances (including the unrelated, pre-existing no-own-ctor by-name-read quirk, which is identical on both paths and untouched here).Context
Second of the codegen size-optimizations from compiling a real 13MB app — together with #5294 (allocator), per-
newIR drops ~181 lines/site, shrinking large-bundle IR substantially while improving runtime speed.Summary by CodeRabbit
PERRY_INLINE_CTORenvironment variable to control constructor execution routing.