perf(codegen): full-outline class-field IC diamond for oversized modules (#5334 lever B)#5385
Conversation
…les (#5334 lever B) Pathologically-large modules (the motivating case: a 13MB minified bundle that lowers to ~1.25GB of LLVM IR across ~92K functions) are forced to `clang -O0` (#4880), where the inline class-field-SET IC diamond's ~15-lines-per-site expansion is never optimized away — and clang needs ~15GB RSS just to chew through it. For such modules, replace the ENTIRE diamond (guard call + fast slot store + fallback arm) with a single `call @js_class_field_set_ic(...)`. The runtime helper reproduces the diamond's exact semantics — run the guard, then on PASS do the same raw-f64/boxed slot store, on FAIL record + route by name. This trades a function-call frame on the (cold, startup- dominated) field-set path for a large per-site IR reduction so clang can compile the module at all. Gating (codegen-time, decided once per module in compile_module): - `PERRY_FULL_OUTLINE_IC=1/on/true` forces ON, `=0/off/false` forces OFF; - otherwise auto: function count >= PERRY_FULL_OUTLINE_IC_MIN_FUNCS (default 4000) — the defining trait of the bundle case; ordinary per-file modules stay on the inline diamond and keep the hot fast store. The decision is a thread-local set at the top of compile_module (codegen is sequential per module), not a process-global OnceLock, so it can't pin one module's decision across a multi-module build. NB: the full-outline boxed store always emits the write barrier (via js_object_set_field), so the compile-time non-pointer barrier elision (#5334 lever D) does not apply on this path — acceptable, since it is gated to oversized, non-hot-loop modules. Verified: forced ON collapses the diamond to one call (no fast/fallback blocks, no inline guard call); a class-field-write program runs to the correct result under full-outline; full perry-codegen suite green. The two class-field structure tests now pin PERRY_FULL_OUTLINE_IC off and serialize on ENV_LOCK against the new lever-B test. Final lever of the IR-efficiency roadmap (#5334). Levers A #5351, C #5350 merged; D #5381 in review.
Addresses self-review findings on #5334 lever B (#5385): - Gate denominator: `decide_full_outline_ic` was fed `hir.functions.len()`, which excludes class methods, static methods, accessors, and constructors (those live in `hir.classes[].*`, collected separately). A class-heavy minified bundle — the exact pathology lever B targets — could have a small `functions.len()` yet emit tens of thousands of LLVM functions, so the gate would never fire. New `module_callable_count()` counts top-level functions plus all class callables; the gate now uses it. New test `full_outline_ic_auto_gate_counts_class_methods` covers a class-heavy module triggering with only one top-level function. - Dedup: `js_class_field_set_ic`'s guard-FAIL tail re-implemented `js_class_field_set_fallback` verbatim. It now delegates to that helper, so by-name routing (frozen / accessor / setter-in-chain) is defined once. Full perry-codegen + perry-runtime typed_feedback suites green.
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds an adaptive "full-outline IC" mode for class-field SET. A thread-local flag and env-var-driven decision function ( ChangesFull-outline IC for class-field SET
Sequence Diagram(s)sequenceDiagram
participant CompileModule as compile_module
participant Helpers as helpers.rs
participant PropertySet as property_set.rs
participant RuntimeDecls as declare_phase_b_objects
participant Runtime as js_class_field_set_ic
CompileModule->>Helpers: module_callable_count(hir)
Helpers-->>CompileModule: callable_count
CompileModule->>Helpers: decide_full_outline_ic(callable_count)
Helpers-->>CompileModule: enabled bool
CompileModule->>Helpers: set_full_outline_ic(enabled)
CompileModule->>RuntimeDecls: declare js_class_field_set_ic
CompileModule->>PropertySet: lower class-field SET site
PropertySet->>Helpers: full_outline_ic_enabled()
alt enabled
PropertySet->>Runtime: emit js_class_field_set_ic call
else disabled
PropertySet->>PropertySet: emit guard/fast/fallback diamond
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
…oversized # Conflicts: # crates/perry-codegen/tests/typed_feedback.rs
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/perry-codegen/src/codegen/helpers.rs`:
- Around line 139-148: The class_callables count calculation omits
computed_members from the denominator, but since computed_members are
lowered/registered like class callables in compile_module, they must be
included. Add the length of c.computed_members to the sum being calculated in
the map closure that processes each class c, placing it alongside the other
member counts (constructor, methods, static_methods, getters, setters).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 9f6c0907-481c-4933-b8f0-3233867fb81a
📒 Files selected for processing (6)
crates/perry-codegen/src/codegen/helpers.rscrates/perry-codegen/src/codegen/mod.rscrates/perry-codegen/src/expr/property_set.rscrates/perry-codegen/src/runtime_decls/objects.rscrates/perry-codegen/tests/typed_feedback.rscrates/perry-runtime/src/typed_feedback/guards.rs
…minator ClassComputedMember holds a Function that compile_module lowers like any method (emits an LLVM function), so computed members must count toward the oversized-module gate alongside methods/static_methods/getters/setters. A class with many computed-key methods could otherwise stay under PERRY_FULL_OUTLINE_IC_MIN_FUNCS and keep the inline diamond when the auto gate should fire.
…w store The full-outline IC helper's raw-f64 slot write is a barrier-free store the GC store-site inventory requires to be annotated. A passing guard with require_raw_f64 proves the slot is pointer-free (typed-shape descriptor) and the value is a plain number — identical to the inline class_field_set.fast raw-f64 store. Fixes the failing lint 'GC store-site inventory' step.
What
Final lever of the IR-efficiency roadmap (#5334, lever B): full-outline the class-field IC diamond for oversized modules.
Pathologically-large modules (the motivating case: a 13MB minified bundle that lowers to ~1.25GB of LLVM IR across ~92K functions) are forced to
clang -O0(#4880), where the inline class-field-SET diamond's ~15–22-lines-per-site expansion is never optimized away — and clang needs ~15GB RSS just to chew through it.For such modules, the whole diamond collapses to a single call:
The runtime helper reproduces the diamond's exact semantics — run the guard, then on PASS do the same raw-f64/boxed slot store, on FAIL record + route by name. It trades a call frame on the (cold, startup-dominated) field-set path for a large per-site IR reduction so clang can compile the module at all.
Gating (decided once per module, at codegen time)
PERRY_FULL_OUTLINE_IC=1/on/trueforces ON,=0/off/falseforces OFF;PERRY_FULL_OUTLINE_IC_MIN_FUNCS(default 4000) — the defining trait of the bundle case (tens of thousands of functions in one module). Ordinary per-file modules stay on the inline diamond and keep the hot fast store.The decision is a thread-local set at the top of
compile_module(codegen is sequential per module), not a process-globalOnceLock— so it can't pin one module's decision across a multi-module build.Trade-offs
js_object_set_field), so the compile-time non-pointer barrier elision (codegen: IR is ~96x bloated for large/untyped modules (1.25GB / ~15GB clang RSS for a 13MB bundle) — outline dynamic machinery + specialize #5334 lever D, perf(codegen): elide GC write barrier for non-pointer class-field stores (#5334 lever D) #5381) does not apply on this path. Acceptable — it's gated to oversized, non-hot-loop modules.Verification
js_class_field_set_iccall per site — 0class_field_set.fastblocks, 0 inline guard calls. On a small fixture: 197 fewer IR lines for 9 field-sets (~22/site), scaling to ~4–5M lines on the bundle's 238K field-set sites.5999997) under full-outline — the outlined runtime path is semantically and GC-correct.full_outline_ic_collapses_class_field_set_to_single_call(both gate states). The two class-field structure tests now pinPERRY_FULL_OUTLINE_ICoff and serialize onENV_LOCKagainst it. Fullperry-codegensuite green.Refs #5334. (Levers A #5351, C #5350 merged; D #5381 in review.)
Summary by CodeRabbit
Release Notes
Performance
SETinline-cache through a fully-outlinedjs_class_field_set_icentrypoint when full-outline inline-cache is enabled.Configuration
PERRY_FULL_OUTLINE_IC, with auto-gating usingPERRY_FULL_OUTLINE_IC_MIN_FUNCSbased on estimated module callable counts (including class members).Tests