perf(codegen): outline class-field-SET guard-miss arm to one call (#5334 lever A)#5351
Conversation
lever A) The default class-field-set diamond runs the inline `js_typed_feedback_class_field_set_guard` in its entry block; on a guard PASS it stores the slot inline, on a MISS it branches to the fallback arm. That arm emitted TWO inline calls per set site — `js_typed_feedback_record_fallback_call` then `js_object_set_field_by_name`. Since the guard has already run and FAILED (that failure is what branched control here), nothing is left to decide: collapse the pair into a single outlined `js_class_field_set_fallback(site_id, obj_bits, key_raw, value)` that records the miss and routes the write by name. Perf-neutral by construction: the hot `class_field_set.fast` slot store is untouched, and the change is confined to the cold guard-miss arm, which never executes on a monomorphic hot path. IR shrinks by one call per class-field-SET site (verified on emitted IR: fallback arm 2 calls -> 1; full perry-codegen suite green). First step of the IR-efficiency roadmap in #5334 (Tier 1, lever A: outline cold IC machinery). Establishes the outline-helper pattern reused by the larger levers.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughA new cold-path runtime helper, ChangesClass-field SET fallback consolidation
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
…les (#5334 lever B) (#5385) * perf(codegen): full-outline class-field IC diamond for oversized modules (#5334 lever B) Pathologically-large modules (the motivating case: a 13MB minified bundle that lowers to ~1.25GB of LLVM IR across ~92K functions) are forced to `clang -O0` (#4880), where the inline class-field-SET IC diamond's ~15-lines-per-site expansion is never optimized away — and clang needs ~15GB RSS just to chew through it. For such modules, replace the ENTIRE diamond (guard call + fast slot store + fallback arm) with a single `call @js_class_field_set_ic(...)`. The runtime helper reproduces the diamond's exact semantics — run the guard, then on PASS do the same raw-f64/boxed slot store, on FAIL record + route by name. This trades a function-call frame on the (cold, startup- dominated) field-set path for a large per-site IR reduction so clang can compile the module at all. Gating (codegen-time, decided once per module in compile_module): - `PERRY_FULL_OUTLINE_IC=1/on/true` forces ON, `=0/off/false` forces OFF; - otherwise auto: function count >= PERRY_FULL_OUTLINE_IC_MIN_FUNCS (default 4000) — the defining trait of the bundle case; ordinary per-file modules stay on the inline diamond and keep the hot fast store. The decision is a thread-local set at the top of compile_module (codegen is sequential per module), not a process-global OnceLock, so it can't pin one module's decision across a multi-module build. NB: the full-outline boxed store always emits the write barrier (via js_object_set_field), so the compile-time non-pointer barrier elision (#5334 lever D) does not apply on this path — acceptable, since it is gated to oversized, non-hot-loop modules. Verified: forced ON collapses the diamond to one call (no fast/fallback blocks, no inline guard call); a class-field-write program runs to the correct result under full-outline; full perry-codegen suite green. The two class-field structure tests now pin PERRY_FULL_OUTLINE_IC off and serialize on ENV_LOCK against the new lever-B test. Final lever of the IR-efficiency roadmap (#5334). Levers A #5351, C #5350 merged; D #5381 in review. * review: count class callables in lever-B gate; dedup IC fallback tail Addresses self-review findings on #5334 lever B (#5385): - Gate denominator: `decide_full_outline_ic` was fed `hir.functions.len()`, which excludes class methods, static methods, accessors, and constructors (those live in `hir.classes[].*`, collected separately). A class-heavy minified bundle — the exact pathology lever B targets — could have a small `functions.len()` yet emit tens of thousands of LLVM functions, so the gate would never fire. New `module_callable_count()` counts top-level functions plus all class callables; the gate now uses it. New test `full_outline_ic_auto_gate_counts_class_methods` covers a class-heavy module triggering with only one top-level function. - Dedup: `js_class_field_set_ic`'s guard-FAIL tail re-implemented `js_class_field_set_fallback` verbatim. It now delegates to that helper, so by-name routing (frozen / accessor / setter-in-chain) is defined once. Full perry-codegen + perry-runtime typed_feedback suites green. * review(coderabbit): count class computed_members in lever-B gate denominator ClassComputedMember holds a Function that compile_module lowers like any method (emits an LLVM function), so computed members must count toward the oversized-module gate alongside methods/static_methods/getters/setters. A class with many computed-key methods could otherwise stay under PERRY_FULL_OUTLINE_IC_MIN_FUNCS and keep the inline diamond when the auto gate should fire. * lint: GC_STORE_AUDIT(POINTER_FREE) marker on js_class_field_set_ic raw store The full-outline IC helper's raw-f64 slot write is a barrier-free store the GC store-site inventory requires to be annotated. A passing guard with require_raw_f64 proves the slot is pointer-free (typed-shape descriptor) and the value is a plain number — identical to the inline class_field_set.fast raw-f64 store. Fixes the failing lint 'GC store-site inventory' step. --------- Co-authored-by: Ralph Küpper <ralph2@skelpo.com>
What
First step of the IR-efficiency roadmap (#5334, Tier 1 / lever A): outline the cold guard-miss arm of the class-field-SET inline-cache diamond.
The default diamond runs the inline
js_typed_feedback_class_field_set_guardin its entry block; on a guard PASS it stores the slot inline, on a MISS it branches to the fallback arm. That arm emitted two inline calls per set site:The guard has already run and failed in the entry block (that failure is what branched control here), so nothing is left to decide. Collapse the pair into one outlined call:
js_class_field_set_fallbackrecords the miss and routes the write by name — byte-identical to the two-call block.Why it's safe
class_field_set.fastslot store is untouched; the change is confined to the cold guard-miss arm, which never executes on a monomorphic hot path.Verification
Point-field-churn test: fallback arm drops from 2 calls to 1 (21js_class_field_set_fallback, no class-fieldjs_object_set_field_by_name).60000012).perry-codegensuite green. Updated twotyped_feedbacktests whoserecord_fallback_call/by_nameassertions were satisfied by the now-folded field-set fallback (one of them incidentally, via a class's synthesized field-init).Roadmap
Establishes the outline-helper pattern reused by the larger levers in #5334 (C: nan-box round-trips — see #5350; D: non-pointer barrier elision; B: adaptive full-outline for oversized modules). Refs #5334.
Summary by CodeRabbit
Performance
Tests