perf(gc): O(1) class-field raw-f64 layout — header bit + inline guard hoist (#5094)#5240
perf(gc): O(1) class-field raw-f64 layout — header bit + inline guard hoist (#5094)#5240proggeramlug wants to merge 1 commit into
Conversation
…inline guard hoist (#5094) Class-field access (`this.field` in class methods) was the worst benchmark gap in #5094 (~290×: method_calls ~3300ms vs Node ~11ms). The class-field guard called `layout_typed_raw_f64_slot_for_user` — a thread-local `TYPED_LAYOUTS` hashmap lookup (`_tlv_get_addr` + hash) — on every get/set, and because the guard was an opaque runtime call, LLVM could not hoist the loop-invariant shape/layout check out of hot loops. Two phases, both shippable: Phase 3a (runtime): add a `GC_LAYOUT_TYPED_RAW_F64_INTACT` bit to the spare GcHeader `_reserved` bits, set iff an intact typed-shape descriptor with a raw-f64 slot is installed. A new guard-only fast path `layout_guard_field_is_raw_f64` answers "is this field raw-f64?" from the header bit + the object's `field_count` with no thread-local lookup, falling back to the precise per-slot predicate when the bit is clear. The bit is maintained at a single choke point: `set_layout_state` clears it (every downgrade/removal routes through there), the two descriptor-install sites re-set it, `layout_transfer` carries it across GC moves, and the sweep/free path clears it. The precise predicate is unchanged, so the GC scanner and existing tests are unaffected. Phase 3b (codegen): emit the class-field guard inline (shape + class_id + keys + the intact bit + no-own-descriptor, plus an inline plain-number check for stores) instead of an opaque call, gated to raw-f64 data fields. LLVM LICM hoists the loop-invariant part out of the loop, collapsing the per- iteration cost to the direct slot load/store. Any inline miss falls through to today's exact guard-call path, so a `false` is never unsafe — only slower. Result: method_calls ~3837ms -> ~36ms (~106x; same order as Node). Soundness: the inline fast path implies every condition the runtime guard's success requires (receiver is a POINTER_TAG object, GC_TYPE_OBJECT, not forwarded, OBJECT_TYPE_REGULAR, class_id/keys match -> field in bounds + key matches by construction, intact bit -> slot is raw-f64, own-descriptor bit clear -> a prototype accessor is shadowed by the own data slot; the store value is not a NaN-boxed non-number so a raw-f64 store can never publish a pointer). The `requires_raw_f64` flag and the descriptor's raw_f64_mask both derive from the same `class_typed_layout`, so they agree by construction; `PERRY_VERIFY_LAYOUT_FASTPATH=1` cross-checks this on every 3a fast-path hit and ran clean at 10M iterations. Verification: method_calls 106x; gap parity suite shows zero new failures vs main (branch fail set is a strict subset; Compile Fail 0); perry-codegen and single-threaded perry-runtime suites pass; targeted downgrade-via-`any` / mixed-class / GC-churn test matches Node and is byte-identical under PERRY_GC_VERIFY_EVACUATION=1, PERRY_GC_FORCE_EVACUATE=1, PERRY_GEN_GC=0, and PERRY_WRITE_BARRIERS=0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughIntroduces a ChangesRaw-f64 class field inline guard fast path
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/perry-codegen/src/expr/property_get.rs`:
- Around line 92-123: The code computes the `is_obj_tag` validation check but
does not use it to control program flow, so subsequent pointer dereferences
through `obj_ptr` (via operations like loading obj_type, gcflags, otype, cid,
and keys) execute unconditionally in the same basic block regardless of whether
the tag validation passed. Move all the header and object loads (the blk.load
and blk.gep calls that access obj_type_ptr, gcflags_ptr, reserved_ptr, otype,
cid_ptr, and keys_ptr) into a conditional block that only executes when
`is_obj_tag` is true, so that pointer dereferences are gated on successful tag
validation via control flow rather than just data dependency.
In `@crates/perry-codegen/src/expr/property_set.rs`:
- Around line 335-346: The inline predicate for the raw-f64 fast-path set
optimization does not include a check for the `OBJ_FLAG_FROZEN` flag, allowing
frozen objects to bypass protection that is enforced by the runtime contract of
`js_typed_feedback_class_field_set_guard`. Modify the inline set predicate
computation for `inline_ok` (returned from `emit_inline_class_field_guard` call)
to include a frozen-bit check that ensures frozen receivers cannot take the
fast-store path and must go through the runtime guard in `needguard_idx` block
instead.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: d998c565-4654-49f6-b24d-2c6d1a3d7b90
📒 Files selected for processing (6)
crates/perry-codegen/src/expr/property_get.rscrates/perry-codegen/src/expr/property_set.rscrates/perry-runtime/src/gc/layout.rscrates/perry-runtime/src/gc/types.rscrates/perry-runtime/src/typed_feedback/guards.rstest-files/test_issue_5094_layout_fastpath.ts
| let recv_bits = blk.bitcast_double_to_i64(recv_box); | ||
| // Receiver must be a POINTER_TAG NaN-boxed heap object before any deref. | ||
| let tag = blk.and( | ||
| I64, | ||
| &recv_bits, | ||
| &crate::nanbox::i64_literal(crate::nanbox::TAG_MASK), | ||
| ); | ||
| let is_obj_tag = blk.icmp_eq(I64, &tag, crate::nanbox::POINTER_TAG_I64); | ||
| let handle = blk.and(I64, &recv_bits, POINTER_MASK_I64); | ||
| let obj_ptr = blk.inttoptr(I64, &handle); | ||
| // GcHeader sits 8 bytes below the user pointer: obj_type u8@-8, gc_flags | ||
| // u8@-7, _reserved u16@-6. | ||
| let obj_type_ptr = blk.gep(I8, &obj_ptr, &[(I64, "-8")]); | ||
| let obj_type = blk.load(I8, &obj_type_ptr); | ||
| let type_ok = blk.icmp_eq(I8, &obj_type, "2"); // GC_TYPE_OBJECT | ||
| let gcflags_ptr = blk.gep(I8, &obj_ptr, &[(I64, "-7")]); | ||
| let gcflags = blk.load(I8, &gcflags_ptr); | ||
| let fwd = blk.and(I8, &gcflags, "-128"); // GC_FLAG_FORWARDED (0x80) | ||
| let not_fwd = blk.icmp_eq(I8, &fwd, "0"); | ||
| let reserved_ptr = blk.gep(I8, &obj_ptr, &[(I64, "-6")]); | ||
| let reserved = blk.load(I16, &reserved_ptr); | ||
| // ObjectHeader.object_type u32@+0 | ||
| let otype = blk.load(I32, &obj_ptr); | ||
| let otype_ok = blk.icmp_eq(I32, &otype, "1"); // OBJECT_TYPE_REGULAR | ||
| // class_id u32@+4 | ||
| let cid_ptr = blk.gep(I8, &obj_ptr, &[(I64, "4")]); | ||
| let cid = blk.load(I32, &cid_ptr); | ||
| let cid_ok = blk.icmp_eq(I32, &cid, expected_class_id); | ||
| // keys_array ptr@+16 | ||
| let keys_ptr = blk.gep(I8, &obj_ptr, &[(I64, "16")]); | ||
| let keys = blk.load(I64, &keys_ptr); | ||
| let keys_ok = blk.icmp_eq(I64, &keys, expected_keys); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find and examine the emit_inline_class_field_guard function
cd crates/perry-codegen/src/expr && wc -l property_get.rsRepository: PerryTS/perry
Length of output: 78
🏁 Script executed:
# Get the full function definition to see control flow
cd crates/perry-codegen/src/expr && sed -n '85,150p' property_get.rsRepository: PerryTS/perry
Length of output: 3111
🏁 Script executed:
# Check what happens after line 123 (if there are branches gating the loads)
cd crates/perry-codegen/src/expr && sed -n '92,140p' property_get.rsRepository: PerryTS/perry
Length of output: 2485
Unsafe dereference happens before tag validation is enforced in control flow.
The function computes is_obj_tag on line 99 but does not branch on it. Lines 104–123 execute unconditionally in the same basic block, dereferencing obj_ptr (derived from unvalidated bits) regardless of whether the pointer tag check passed. The is_obj_tag boolean is later combined with other conditions via bitwise AND, but bitwise operations are data dependencies, not control-flow branches—all loads execute before any guard can reject the input. Move the header and object loads (lines 104–123) into a block that only executes when is_obj_tag is true, or ensure all callsites validate the pointer tag before entering this helper.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/perry-codegen/src/expr/property_get.rs` around lines 92 - 123, The
code computes the `is_obj_tag` validation check but does not use it to control
program flow, so subsequent pointer dereferences through `obj_ptr` (via
operations like loading obj_type, gcflags, otype, cid, and keys) execute
unconditionally in the same basic block regardless of whether the tag validation
passed. Move all the header and object loads (the blk.load and blk.gep calls
that access obj_type_ptr, gcflags_ptr, reserved_ptr, otype, cid_ptr, and
keys_ptr) into a conditional block that only executes when `is_obj_tag` is true,
so that pointer dereferences are gated on successful tag validation via control
flow rather than just data dependency.
| if requires_raw_f64 { | ||
| let inline_ok = emit_inline_class_field_guard( | ||
| ctx.block(), | ||
| &recv_box, | ||
| &expected_class_id_str, | ||
| &expected_keys, | ||
| Some(&val_double), | ||
| ); | ||
| let needguard_idx = ctx.new_block("class_field_set.needguard"); | ||
| let needguard_label = ctx.block_label(needguard_idx); | ||
| ctx.block().cond_br(&inline_ok, &fast_label, &needguard_label); | ||
| ctx.current_block = needguard_idx; |
There was a problem hiding this comment.
Inline raw-f64 set fast-path bypasses frozen-object protection.
Line 345 can jump straight to the fast store, but the inline predicate does not check OBJ_FLAG_FROZEN. The runtime contract used by js_typed_feedback_class_field_set_guard (in crates/perry-runtime/src/typed_feedback/guards.rs) rejects frozen receivers, so this optimization can allow writes that should be blocked. Add a frozen-bit condition to the inline set predicate (or force frozen checks through the runtime guard before fast-store).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/perry-codegen/src/expr/property_set.rs` around lines 335 - 346, The
inline predicate for the raw-f64 fast-path set optimization does not include a
check for the `OBJ_FLAG_FROZEN` flag, allowing frozen objects to bypass
protection that is enforced by the runtime contract of
`js_typed_feedback_class_field_set_guard`. Modify the inline set predicate
computation for `inline_ok` (returned from `emit_inline_class_field_guard` call)
to include a frozen-bit check that ensures frozen receivers cannot take the
fast-store path and must go through the runtime guard in `needguard_idx` block
instead.
|
Closing as superseded by #5198, which already shipped the inline class-field shape guard (header intact-bit, inline guard, pointer-tag gate, frozen + plain-number checks, verify mode, escape hatch). Investigation findings — including why it stays ~10% (the per-access flag load pins LICM) and why dropping the flag isn't safe yet (it's correctness-load-bearing; masks a latent dup-class-name inline-path bug) — recorded on #5094: #5094 (comment) |
Summary
Closes the class-field (
method_calls) leg of umbrella #5094 — the worst benchmark gap (~290×:method_calls~3300 ms vs Node ~11 ms). The class-field guard calledlayout_typed_raw_f64_slot_for_user— a thread-localTYPED_LAYOUTShashmap lookup (_tlv_get_addr+ hash) — on everythis.fieldget/set, and because the guard was an opaque runtime call, LLVM couldn't hoist the loop-invariant shape/layout check out of hot loops.Result:
method_calls~3837 ms → ~36 ms (~106×) on this machine (same order of magnitude as Node), with the loop result verified (value:10000000).This is the class-field analogue of the array-side win that landed in #5098.
Approach (two phases, both in this PR)
Phase 3a — runtime O(1) bit. Adds
GC_LAYOUT_TYPED_RAW_F64_INTACTto the spareGcHeader._reservedbits, set iff an intact typed-shape descriptor with a raw-f64 slot is installed. A guard-only fast pathlayout_guard_field_is_raw_f64answers "is this field raw-f64?" from the header bit + the object'sfield_count— no thread-local lookup — falling back to the precise per-slot predicate when the bit is clear. The bit is maintained at a single choke point:set_layout_stateclears it (every downgrade/removal routes through it), the two descriptor-install sites re-set it,layout_transfercarries it across GC moves, and the sweep/free path clears it. The precise predicate is unchanged, so the GC scanner and existing layout tests are unaffected.Phase 3b — codegen inline guard hoist. Emits the class-field guard inline (POINTER_TAG check →
GC_TYPE_OBJECT/not-forwarded →OBJECT_TYPE_REGULAR→class_id/keys_arraymatch → intact bit → no own descriptor, plus an inline plain-number check for stores) instead of an opaque call, gated torequires_raw_f64data fields. LLVM LICM hoists the loop-invariant part out, collapsing the per-iteration cost to the direct slot load/store. Any inline miss falls through to today's exact guard-call path, so afalseis never unsafe — only slower.Soundness (memory-corruption class — the crux of #5094)
inline_ok == trueimplies every condition the runtime guard's success requires:POINTER_TAGheap object (tag-checked before any deref);GcHeader.obj_type == GC_TYPE_OBJECT, not forwarded;ObjectHeader.object_type == OBJECT_TYPE_REGULAR;class_id/keys_arraymatch the compile-time class shape ⇒field_indexin bounds and key matches by construction;anyalias);requires_raw_f64(codegen) and the descriptor'sraw_f64_maskboth derive from the sameclass_typed_layout, so they agree by construction.PERRY_VERIFY_LAYOUT_FASTPATH=1cross-checks the 3a fast path against the precise predicate on every hit and ran clean at 10M iterations.Verification
method_calls~3837 ms → ~36 ms (~106×).main(branch fail set is a strict subset of base; Compile Fail: 0).perry-codegenunit suite passes;perry-runtimesuite passes single-threaded (1035/1035; the parallel-isolation flake reproduces identically onmain).any/ mixed-class / GC-churn test (test-files/test_issue_5094_layout_fastpath.ts) matches Node and is byte-identical underPERRY_GC_VERIFY_EVACUATION=1,PERRY_GC_FORCE_EVACUATE=1,PERRY_GEN_GC=0, andPERRY_WRITE_BARRIERS=0.Notes
requires_raw_f64data fields (the verified pointer-free case); non-raw fields andobject_propertykeep today's path and remain follow-ups under the umbrella.🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
Performance
Bug Fixes