Skip to content

perf(codegen): elide GC write barrier for non-pointer class-field stores (#5334 lever D)#5381

Merged
proggeramlug merged 2 commits into
mainfrom
feat/elide-nonpointer-field-barrier
Jun 18, 2026
Merged

perf(codegen): elide GC write barrier for non-pointer class-field stores (#5334 lever D)#5381
proggeramlug merged 2 commits into
mainfrom
feat/elide-nonpointer-field-barrier

Conversation

@proggeramlug

@proggeramlug proggeramlug commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

What

Tier-2 lever of the IR-efficiency roadmap (#5334, lever D): skip the GC write barrier on boxed class-field stores whose value is a non-pointer by construction.

The boxed class-field-SET fast path emitted js_write_barrier_slot unconditionally:

class_field_set.fast:
  store double %val, ptr %slot
  call void @js_gc_note_slot_layout(...)
  call void @js_write_barrier_slot(...)   ; <- always

But the generational write barrier only matters when the stored value is a heap pointer — it records the parent→child reference so the minor GC scans the parent. A value that is a non-pointer by construction (number / bool / undefined / null / comparison / arithmetic) creates no such reference, so the barrier is a semantic no-op:

class_field_set.fast:                      ; value is numeric/boolean
  store double %val, ptr %slot
  call void @js_gc_note_slot_layout(...)   ; layout note kept
                                           ; barrier elided

Safety

  • Reuses expr_produces_non_pointer_bits_by_constructionthe same predicate the array-store paths already trust for barrier elision, so the GC soundness standard is unchanged.
  • The layout note stays (it tracks the slot's pointer-ness for minor-scan skipping; a non-pointer write into a slot that previously held a pointer is a real transition the GC must observe).
  • The barrier is kept for any value that may be a heap pointer.

Verification

  • A numeric-heavy class drops 6 of 9 boxed field-store barriers (the numeric/boolean stores), keeps all 3 genuine pointer (string) stores.
  • Runs to the correct result (5999997) under a 2M-iteration GC-exercising loop — barrier elision doesn't corrupt the heap.
  • New unit test class_field_set_elides_write_barrier_for_nonpointer_value asserts both directions (numeric → no barrier, string literal → barrier kept).
  • Full perry-codegen suite green, including large_object_barriers and the runtime barrier-metadata tests.

Refs #5334. (Levers A #5351 and C #5350 already merged.)

Summary by CodeRabbit

  • Performance Improvements

    • Optimized boxed class field writes to elide unnecessary generational write barriers when the assigned value is provably non-pointer, while preserving barrier emission for pointer values.
  • Tests

    • Added coverage ensuring non-pointer class field assignments use the optimized path without emitting a write barrier, and pointer assignments still emit the required barrier.

…res (#5334 lever D)

The boxed class-field-SET fast path emitted a generational write barrier
(`js_write_barrier_slot`) unconditionally. But the barrier only matters
when the stored value is a heap pointer — it records the parent→child
reference so the minor GC scans the parent. Storing a value that is a
non-pointer by construction (number / bool / undefined / null /
comparison / arithmetic) creates no such reference, so the barrier is a
semantic no-op.

Skip it in that case, reusing `expr_produces_non_pointer_bits_by_construction`
— the same predicate the array-store paths already trust for barrier
elision, so the GC soundness standard is unchanged. The barrier flag is
computed before the block builder is borrowed; the LAYOUT NOTE is kept
regardless (it tracks the slot's pointer-ness for minor-scan skipping, and
a non-pointer write into a slot that previously held a pointer is a real
transition the GC must observe).

Verified: a numeric-heavy class drops 6 of 9 boxed field-store barriers
(numeric/boolean stores), keeps all 3 genuine pointer (string) stores, and
runs to the correct result under a 2M-iteration GC-exercising loop. New
unit test asserts both directions (numeric → no barrier, string literal →
barrier kept); full perry-codegen suite green (incl. large_object_barriers).

Tier-2 lever of the IR-efficiency roadmap (#5334, lever D).
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 11e4facc-55ba-49ca-bc7a-a934211dcf5d

📥 Commits

Reviewing files that changed from the base of the PR and between 4b47b09 and 4ab7e3a.

📒 Files selected for processing (1)
  • crates/perry-codegen/src/expr/property_set.rs

📝 Walkthrough

Walkthrough

The PropertySet fast path now imports expr_produces_non_pointer_bits_by_construction and uses it to compute field_set_barrier_needed. When the assigned value is provably non-pointer (numbers, booleans, null, undefined), the generational write barrier is skipped while the slot's pointer-ness note is still recorded for GC correctness. A new test verifies the elision for numeric literals and retention for string literals.

Changes

Write-barrier elision for non-pointer class field sets

Layer / File(s) Summary
Barrier elision logic in PropertySet fast path
crates/perry-codegen/src/expr/property_set.rs
Adds import of expr_produces_non_pointer_bits_by_construction, computes field_set_barrier_needed = !expr_produces_non_pointer_bits_by_construction(ctx, value) in the class-field-set fast path, and passes it to emit_jsvalue_slot_store_on_block instead of the previous hardcoded true.
Test: barrier elision vs. retention
crates/perry-codegen/tests/typed_feedback.rs
Adds class_field_set_elides_write_barrier_for_nonpointer_value, asserting class_field_set.fast omits js_write_barrier_slot for a numeric literal and includes it for a string literal.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • PerryTS/perry#5098: Both PRs modify the slot-store/write-barrier lowering to conditionally emit GC/barrier behavior based on whether stored/previous values are provably non-pointer (retrieved PR adds js_gc_note_slot_layout_aware + scalar-aware slot-store helper; main PR uses the non-pointer-by-construction predicate to pass field_set_barrier_needed into the slot store emission).

Poem

A bunny hops past the write barrier gate,
"No pointer here!" it thumps — no need to wait.
Numbers and booleans skip the GC queue,
Strings still pay the toll, as all heap values do.
The test confirms the elision is neat and true. 🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: eliding GC write barriers for non-pointer class-field stores, with a reference to the broader IR-efficiency roadmap (#5334 lever D).
Description check ✅ Passed The description provides comprehensive coverage including summary, concrete changes, related issue, verification details, and test plan with checkboxes completed, closely following the repository template structure.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/elide-nonpointer-field-barrier

Comment @coderabbitai help to get the list of available commands and usage tips.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@proggeramlug proggeramlug merged commit cdbb488 into main Jun 18, 2026
13 of 15 checks passed
@proggeramlug proggeramlug deleted the feat/elide-nonpointer-field-barrier branch June 18, 2026 05:34
proggeramlug added a commit that referenced this pull request Jun 18, 2026
…les (#5334 lever B) (#5385)

* perf(codegen): full-outline class-field IC diamond for oversized modules (#5334 lever B)

Pathologically-large modules (the motivating case: a 13MB minified bundle
that lowers to ~1.25GB of LLVM IR across ~92K functions) are forced to
`clang -O0` (#4880), where the inline class-field-SET IC diamond's
~15-lines-per-site expansion is never optimized away — and clang needs
~15GB RSS just to chew through it.

For such modules, replace the ENTIRE diamond (guard call + fast slot store
+ fallback arm) with a single `call @js_class_field_set_ic(...)`. The
runtime helper reproduces the diamond's exact semantics — run the guard,
then on PASS do the same raw-f64/boxed slot store, on FAIL record + route
by name. This trades a function-call frame on the (cold, startup-
dominated) field-set path for a large per-site IR reduction so clang can
compile the module at all.

Gating (codegen-time, decided once per module in compile_module):
- `PERRY_FULL_OUTLINE_IC=1/on/true` forces ON, `=0/off/false` forces OFF;
- otherwise auto: function count >= PERRY_FULL_OUTLINE_IC_MIN_FUNCS
  (default 4000) — the defining trait of the bundle case; ordinary
  per-file modules stay on the inline diamond and keep the hot fast store.

The decision is a thread-local set at the top of compile_module (codegen
is sequential per module), not a process-global OnceLock, so it can't pin
one module's decision across a multi-module build.

NB: the full-outline boxed store always emits the write barrier (via
js_object_set_field), so the compile-time non-pointer barrier elision
(#5334 lever D) does not apply on this path — acceptable, since it is
gated to oversized, non-hot-loop modules.

Verified: forced ON collapses the diamond to one call (no fast/fallback
blocks, no inline guard call); a class-field-write program runs to the
correct result under full-outline; full perry-codegen suite green. The two
class-field structure tests now pin PERRY_FULL_OUTLINE_IC off and
serialize on ENV_LOCK against the new lever-B test.

Final lever of the IR-efficiency roadmap (#5334). Levers A #5351, C #5350
merged; D #5381 in review.

* review: count class callables in lever-B gate; dedup IC fallback tail

Addresses self-review findings on #5334 lever B (#5385):

- Gate denominator: `decide_full_outline_ic` was fed `hir.functions.len()`,
  which excludes class methods, static methods, accessors, and constructors
  (those live in `hir.classes[].*`, collected separately). A class-heavy
  minified bundle — the exact pathology lever B targets — could have a small
  `functions.len()` yet emit tens of thousands of LLVM functions, so the gate
  would never fire. New `module_callable_count()` counts top-level functions
  plus all class callables; the gate now uses it. New test
  `full_outline_ic_auto_gate_counts_class_methods` covers a class-heavy module
  triggering with only one top-level function.

- Dedup: `js_class_field_set_ic`'s guard-FAIL tail re-implemented
  `js_class_field_set_fallback` verbatim. It now delegates to that helper, so
  by-name routing (frozen / accessor / setter-in-chain) is defined once.

Full perry-codegen + perry-runtime typed_feedback suites green.

* review(coderabbit): count class computed_members in lever-B gate denominator

ClassComputedMember holds a Function that compile_module lowers like any
method (emits an LLVM function), so computed members must count toward the
oversized-module gate alongside methods/static_methods/getters/setters.
A class with many computed-key methods could otherwise stay under
PERRY_FULL_OUTLINE_IC_MIN_FUNCS and keep the inline diamond when the auto
gate should fire.

* lint: GC_STORE_AUDIT(POINTER_FREE) marker on js_class_field_set_ic raw store

The full-outline IC helper's raw-f64 slot write is a barrier-free store the
GC store-site inventory requires to be annotated. A passing guard with
require_raw_f64 proves the slot is pointer-free (typed-shape descriptor) and
the value is a plain number — identical to the inline class_field_set.fast
raw-f64 store. Fixes the failing lint 'GC store-site inventory' step.

---------

Co-authored-by: Ralph Küpper <ralph2@skelpo.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant