perf(gc): skip per-write layout hashmap on scalar array stores (10.5× on numeric_array_downgrade)#5098
Conversation
…r array stores
For an in-place `arr[i] = value` on a non-raw-layout (e.g. downgraded `any[]`)
array, codegen emits a per-write `js_gc_note_slot_layout` call that, for a
SIDE_MASK array, does a thread-local `LAYOUT_SLOT_MASKS` hashmap lookup +
clear_slot — even when the slot was already non-pointer and stays non-pointer.
On a numeric write loop over a downgraded array this is the dominant cost
(stubbing the note makes bench_numeric_array_downgrade 11x faster).
Add `js_gc_note_slot_layout_aware(parent, slot, value_bits, old_bits)`: when
neither the new nor the previous slot value is a heap pointer, the slot's
pointer-ness is unchanged, so the GC per-slot mask needs no update and the
hashmap is skipped. The mask invariant ('bit set <=> slot holds a pointer') is
preserved because the full path still runs whenever a pointer is involved on
either side (new is a pointer -> set; old was a pointer -> clear). Uses the same
`layout_pointer_bearing_bits` predicate the layout machinery uses internally,
so raw-pointer slots are classified correctly (not just NaN-box tags).
Codegen (`emit_jsvalue_slot_store_scalar_aware_on_block`) loads the slot's
previous value before the store and routes ONLY in-place array element
overwrites (index.rs) through the aware note — object field writes and
fresh-slot appends keep the original note, so POINTER_FREE-dominated paths
(bench_object_property) are unaffected.
bench_numeric_array_downgrade (M1 Pro): 4482ms -> 427ms (~10.5x). Checksum
identical to Node. First concrete win from the #5094 GC-layout umbrella.
Verified: GC-stress correct (pointer<->number slot transitions + GC under
PERRY_GC_VERIFY_EVACUATION=1 / PERRY_GC_FORCE_EVACUATE=1 / PERRY_GEN_GC=0);
codegen + runtime tests pass; full local parity shows zero new regressions; no
other benchmark regresses.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughThis PR introduces scalar-aware GC layout tracking for array element stores. A new runtime function ChangesScalar-aware GC layout tracking for in-place slot stores
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
First concrete win from the #5094 GC-layout umbrella.
Problem
For an in-place
arr[i] = valueon a non-raw-layout array (e.g. a downgradedany[]— numeric array that received an object/string slot), codegen emits a per-writejs_gc_note_slot_layoutcall. On aSIDE_MASKarray that does a thread-localLAYOUT_SLOT_MASKShashmap lookup +clear_sloton every write, even when the slot was already non-pointer and stays non-pointer. On a numeric write loop over a downgraded array this is the dominant cost.Confirmed by an A/B stub experiment: stubbing the layout note makes
bench_numeric_array_downgrade11× faster (4507 → 399ms), while stubbing the write barrier is ~2%.Fix
js_gc_note_slot_layout_aware(parent, slot, value_bits, old_bits)— when neither the new nor the previous slot value is a heap pointer, the slot’s pointer-ness is unchanged, so the GC per-slot mask needs no update and the hashmap is skipped.Correctness: the mask invariant "bit set ⟺ slot holds a pointer" is preserved because the full path still runs whenever a pointer is involved on either side (
newis a pointer → set the bit;oldwas a pointer → clear it) — exactly when the mask must change. It uses the samelayout_pointer_bearing_bitspredicate the layout machinery uses internally, so raw-pointer array slots are classified correctly (not just NaN-box tags).Codegen (
emit_jsvalue_slot_store_scalar_aware_on_block) loads the slot’s previous value before the store and routes only in-place array element overwrites (index.rs) through the aware note. Object field writes and fresh-slot appends keep the original unconditional note, so POINTER_FREE-dominated paths are unaffected.Results (M1 Pro)
bench_numeric_array_downgradeChecksum identical to Node. All other benchmarks neutral — a clean A/B confirmed
bench_object_propertyis unchanged (clean baseline ~268ms; an earlier snapshot caught an anomalous 201ms run).Verification
PERRY_GC_VERIFY_EVACUATION=1 + PERRY_GC_FORCE_EVACUATE=1, and full mark-sweep (PERRY_GEN_GC=0). This exercises the dangerous case (pointer→number downgrade where a stale mask would make the GC trace a number as a pointer).cargo test -p perry-codegen -p perry-runtime— pass.Background: this is the array side of #5094 (the
layout_note_slotthread-local tracking). The class-field side (method_calls, #5093) is the codegen inline-guard + raw-f64 work described in that issue.Summary by CodeRabbit