perf(codegen): elide GC write barrier for non-pointer class-field stores (#5334 lever D)#5381
Conversation
…res (#5334 lever D) The boxed class-field-SET fast path emitted a generational write barrier (`js_write_barrier_slot`) unconditionally. But the barrier only matters when the stored value is a heap pointer — it records the parent→child reference so the minor GC scans the parent. Storing a value that is a non-pointer by construction (number / bool / undefined / null / comparison / arithmetic) creates no such reference, so the barrier is a semantic no-op. Skip it in that case, reusing `expr_produces_non_pointer_bits_by_construction` — the same predicate the array-store paths already trust for barrier elision, so the GC soundness standard is unchanged. The barrier flag is computed before the block builder is borrowed; the LAYOUT NOTE is kept regardless (it tracks the slot's pointer-ness for minor-scan skipping, and a non-pointer write into a slot that previously held a pointer is a real transition the GC must observe). Verified: a numeric-heavy class drops 6 of 9 boxed field-store barriers (numeric/boolean stores), keeps all 3 genuine pointer (string) stores, and runs to the correct result under a 2M-iteration GC-exercising loop. New unit test asserts both directions (numeric → no barrier, string literal → barrier kept); full perry-codegen suite green (incl. large_object_barriers). Tier-2 lever of the IR-efficiency roadmap (#5334, lever D).
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe ChangesWrite-barrier elision for non-pointer class field sets
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…les (#5334 lever B) (#5385) * perf(codegen): full-outline class-field IC diamond for oversized modules (#5334 lever B) Pathologically-large modules (the motivating case: a 13MB minified bundle that lowers to ~1.25GB of LLVM IR across ~92K functions) are forced to `clang -O0` (#4880), where the inline class-field-SET IC diamond's ~15-lines-per-site expansion is never optimized away — and clang needs ~15GB RSS just to chew through it. For such modules, replace the ENTIRE diamond (guard call + fast slot store + fallback arm) with a single `call @js_class_field_set_ic(...)`. The runtime helper reproduces the diamond's exact semantics — run the guard, then on PASS do the same raw-f64/boxed slot store, on FAIL record + route by name. This trades a function-call frame on the (cold, startup- dominated) field-set path for a large per-site IR reduction so clang can compile the module at all. Gating (codegen-time, decided once per module in compile_module): - `PERRY_FULL_OUTLINE_IC=1/on/true` forces ON, `=0/off/false` forces OFF; - otherwise auto: function count >= PERRY_FULL_OUTLINE_IC_MIN_FUNCS (default 4000) — the defining trait of the bundle case; ordinary per-file modules stay on the inline diamond and keep the hot fast store. The decision is a thread-local set at the top of compile_module (codegen is sequential per module), not a process-global OnceLock, so it can't pin one module's decision across a multi-module build. NB: the full-outline boxed store always emits the write barrier (via js_object_set_field), so the compile-time non-pointer barrier elision (#5334 lever D) does not apply on this path — acceptable, since it is gated to oversized, non-hot-loop modules. Verified: forced ON collapses the diamond to one call (no fast/fallback blocks, no inline guard call); a class-field-write program runs to the correct result under full-outline; full perry-codegen suite green. The two class-field structure tests now pin PERRY_FULL_OUTLINE_IC off and serialize on ENV_LOCK against the new lever-B test. Final lever of the IR-efficiency roadmap (#5334). Levers A #5351, C #5350 merged; D #5381 in review. * review: count class callables in lever-B gate; dedup IC fallback tail Addresses self-review findings on #5334 lever B (#5385): - Gate denominator: `decide_full_outline_ic` was fed `hir.functions.len()`, which excludes class methods, static methods, accessors, and constructors (those live in `hir.classes[].*`, collected separately). A class-heavy minified bundle — the exact pathology lever B targets — could have a small `functions.len()` yet emit tens of thousands of LLVM functions, so the gate would never fire. New `module_callable_count()` counts top-level functions plus all class callables; the gate now uses it. New test `full_outline_ic_auto_gate_counts_class_methods` covers a class-heavy module triggering with only one top-level function. - Dedup: `js_class_field_set_ic`'s guard-FAIL tail re-implemented `js_class_field_set_fallback` verbatim. It now delegates to that helper, so by-name routing (frozen / accessor / setter-in-chain) is defined once. Full perry-codegen + perry-runtime typed_feedback suites green. * review(coderabbit): count class computed_members in lever-B gate denominator ClassComputedMember holds a Function that compile_module lowers like any method (emits an LLVM function), so computed members must count toward the oversized-module gate alongside methods/static_methods/getters/setters. A class with many computed-key methods could otherwise stay under PERRY_FULL_OUTLINE_IC_MIN_FUNCS and keep the inline diamond when the auto gate should fire. * lint: GC_STORE_AUDIT(POINTER_FREE) marker on js_class_field_set_ic raw store The full-outline IC helper's raw-f64 slot write is a barrier-free store the GC store-site inventory requires to be annotated. A passing guard with require_raw_f64 proves the slot is pointer-free (typed-shape descriptor) and the value is a plain number — identical to the inline class_field_set.fast raw-f64 store. Fixes the failing lint 'GC store-site inventory' step. --------- Co-authored-by: Ralph Küpper <ralph2@skelpo.com>
What
Tier-2 lever of the IR-efficiency roadmap (#5334, lever D): skip the GC write barrier on boxed class-field stores whose value is a non-pointer by construction.
The boxed class-field-SET fast path emitted
js_write_barrier_slotunconditionally:But the generational write barrier only matters when the stored value is a heap pointer — it records the parent→child reference so the minor GC scans the parent. A value that is a non-pointer by construction (number / bool / undefined / null / comparison / arithmetic) creates no such reference, so the barrier is a semantic no-op:
Safety
expr_produces_non_pointer_bits_by_construction— the same predicate the array-store paths already trust for barrier elision, so the GC soundness standard is unchanged.Verification
5999997) under a 2M-iteration GC-exercising loop — barrier elision doesn't corrupt the heap.class_field_set_elides_write_barrier_for_nonpointer_valueasserts both directions (numeric → no barrier, string literal → barrier kept).perry-codegensuite green, includinglarge_object_barriersand the runtime barrier-metadata tests.Refs #5334. (Levers A #5351 and C #5350 already merged.)
Summary by CodeRabbit
Performance Improvements
Tests