Skip to content

perf(gc): skip per-write layout hashmap on scalar array stores (10.5× on numeric_array_downgrade)#5098

Merged
proggeramlug merged 1 commit into
mainfrom
perf/gc-scalar-aware-layout-note
Jun 13, 2026
Merged

perf(gc): skip per-write layout hashmap on scalar array stores (10.5× on numeric_array_downgrade)#5098
proggeramlug merged 1 commit into
mainfrom
perf/gc-scalar-aware-layout-note

Conversation

@TheHypnoo

@TheHypnoo TheHypnoo commented Jun 13, 2026

Copy link
Copy Markdown
Member

First concrete win from the #5094 GC-layout umbrella.

Problem

For an in-place arr[i] = value on a non-raw-layout array (e.g. a downgraded any[] — numeric array that received an object/string slot), codegen emits a per-write js_gc_note_slot_layout call. On a SIDE_MASK array that does a thread-local LAYOUT_SLOT_MASKS hashmap lookup + clear_slot on every write, even when the slot was already non-pointer and stays non-pointer. On a numeric write loop over a downgraded array this is the dominant cost.

Confirmed by an A/B stub experiment: stubbing the layout note makes bench_numeric_array_downgrade 11× faster (4507 → 399ms), while stubbing the write barrier is ~2%.

Fix

js_gc_note_slot_layout_aware(parent, slot, value_bits, old_bits) — when neither the new nor the previous slot value is a heap pointer, the slot’s pointer-ness is unchanged, so the GC per-slot mask needs no update and the hashmap is skipped.

Correctness: the mask invariant "bit set ⟺ slot holds a pointer" is preserved because the full path still runs whenever a pointer is involved on either side (new is a pointer → set the bit; old was a pointer → clear it) — exactly when the mask must change. It uses the same layout_pointer_bearing_bits predicate the layout machinery uses internally, so raw-pointer array slots are classified correctly (not just NaN-box tags).

Codegen (emit_jsvalue_slot_store_scalar_aware_on_block) loads the slot’s previous value before the store and routes only in-place array element overwrites (index.rs) through the aware note. Object field writes and fresh-slot appends keep the original unconditional note, so POINTER_FREE-dominated paths are unaffected.

Results (M1 Pro)

Benchmark Before After
bench_numeric_array_downgrade 4482ms 427ms (~10.5×)

Checksum identical to Node. All other benchmarks neutral — a clean A/B confirmed bench_object_property is unchanged (clean baseline ~268ms; an earlier snapshot caught an anomalous 201ms run).

Verification

  • GC-stress correct: a targeted test of pointer↔number slot transitions + forced GC over the mixed array produces identical output under normal, PERRY_GC_VERIFY_EVACUATION=1 + PERRY_GC_FORCE_EVACUATE=1, and full mark-sweep (PERRY_GEN_GC=0). This exercises the dangerous case (pointer→number downgrade where a stale mask would make the GC trace a number as a pointer).
  • cargo test -p perry-codegen -p perry-runtime — pass.
  • Full local parity: zero new regressions — every mismatch is a pre-existing categorical/module-surface gap unrelated to array stores (verified file-by-file).

Background: this is the array side of #5094 (the layout_note_slot thread-local tracking). The class-field side (method_calls, #5093) is the codegen inline-guard + raw-f64 work described in that issue.

Summary by CodeRabbit

  • Performance
    • Enhanced garbage collection efficiency by implementing scalar-aware slot layout tracking that reduces bookkeeping overhead for value assignments.
  • Refactor
    • Consolidated slot store operations to improve memory management code organization.

…r array stores

For an in-place `arr[i] = value` on a non-raw-layout (e.g. downgraded `any[]`)
array, codegen emits a per-write `js_gc_note_slot_layout` call that, for a
SIDE_MASK array, does a thread-local `LAYOUT_SLOT_MASKS` hashmap lookup +
clear_slot — even when the slot was already non-pointer and stays non-pointer.
On a numeric write loop over a downgraded array this is the dominant cost
(stubbing the note makes bench_numeric_array_downgrade 11x faster).

Add `js_gc_note_slot_layout_aware(parent, slot, value_bits, old_bits)`: when
neither the new nor the previous slot value is a heap pointer, the slot's
pointer-ness is unchanged, so the GC per-slot mask needs no update and the
hashmap is skipped. The mask invariant ('bit set <=> slot holds a pointer') is
preserved because the full path still runs whenever a pointer is involved on
either side (new is a pointer -> set; old was a pointer -> clear). Uses the same
`layout_pointer_bearing_bits` predicate the layout machinery uses internally,
so raw-pointer slots are classified correctly (not just NaN-box tags).

Codegen (`emit_jsvalue_slot_store_scalar_aware_on_block`) loads the slot's
previous value before the store and routes ONLY in-place array element
overwrites (index.rs) through the aware note — object field writes and
fresh-slot appends keep the original note, so POINTER_FREE-dominated paths
(bench_object_property) are unaffected.

bench_numeric_array_downgrade (M1 Pro): 4482ms -> 427ms (~10.5x). Checksum
identical to Node. First concrete win from the #5094 GC-layout umbrella.

Verified: GC-stress correct (pointer<->number slot transitions + GC under
PERRY_GC_VERIFY_EVACUATION=1 / PERRY_GC_FORCE_EVACUATE=1 / PERRY_GEN_GC=0);
codegen + runtime tests pass; full local parity shows zero new regressions; no
other benchmark regresses.
@coderabbitai

coderabbitai Bot commented Jun 13, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 842dfeb0-5d49-4676-8483-2b226a7c56d5

📥 Commits

Reviewing files that changed from the base of the PR and between 84d59a7 and 06a692c.

📒 Files selected for processing (5)
  • crates/perry-codegen/src/expr/index.rs
  • crates/perry-codegen/src/expr/mod.rs
  • crates/perry-codegen/src/expr/write_barrier.rs
  • crates/perry-codegen/src/runtime_decls/arrays.rs
  • crates/perry-runtime/src/gc/layout.rs

📝 Walkthrough

Walkthrough

This PR introduces scalar-aware GC layout tracking for array element stores. A new runtime function js_gc_note_slot_layout_aware tracks both old and new slot value bits, skipping redundant GC bookkeeping when scalar values overwrite scalars. Codegen refactors slot-store emission into a shared helper supporting both scalar-aware and standard paths, and index array element stores now route in-place overwrites through the scalar-aware path.

Changes

Scalar-aware GC layout tracking for in-place slot stores

Layer / File(s) Summary
Runtime scalar-aware layout tracking
crates/perry-codegen/src/runtime_decls/arrays.rs, crates/perry-runtime/src/gc/layout.rs
New js_gc_note_slot_layout_aware runtime function receives both old and new value bits, skips layout tracking when both are non-pointer scalars, otherwise forwards to existing layout_note_slot.
Codegen scalar-aware slot-store emitters
crates/perry-codegen/src/expr/write_barrier.rs, crates/perry-codegen/src/expr/mod.rs
emit_jsvalue_slot_store_on_block refactored into shared internal helper; new emit_jsvalue_slot_store_scalar_aware_on_block loads old bits before overwrite and selects between regular and scalar-aware layout-note emission; public helper re-exported.
Index array element scalar-aware store
crates/perry-codegen/src/expr/index.rs
Index element in-bounds non-raw-layout stores switch to scalar-aware helper for in-place overwrites that skip GC bookkeeping on scalar-over-scalar cases.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

  • #5094: The scalar-aware slot-layout tracking and in-place array element store routing directly implement the per-slot layout bookkeeping optimization described in the perf/GC proposal.

Poem

🐰 In slots where scalars dance and play,
We skip the notes that clutter the way,
Old bits checked against the new,
When both are scalars, zoom right through!
Less bookkeeping, faster still—
A rabbit's touch of GC's will.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed Title accurately describes the main performance optimization: skipping layout hashmap on scalar array stores with specific benchmark improvement.
Description check ✅ Passed Description follows template structure with clear Problem, Fix, Results, and Verification sections; includes benchmark data, correctness reasoning, and testing evidence.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/gc-scalar-aware-layout-note

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants