Skip to content

perf(gc): O(1) class-field raw-f64 layout — header bit + inline guard hoist (#5094)#5240

Closed
proggeramlug wants to merge 1 commit into
mainfrom
fix/5094-layout-o1-header
Closed

perf(gc): O(1) class-field raw-f64 layout — header bit + inline guard hoist (#5094)#5240
proggeramlug wants to merge 1 commit into
mainfrom
fix/5094-layout-o1-header

Conversation

@proggeramlug

@proggeramlug proggeramlug commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes the class-field (method_calls) leg of umbrella #5094 — the worst benchmark gap (~290×: method_calls ~3300 ms vs Node ~11 ms). The class-field guard called layout_typed_raw_f64_slot_for_user — a thread-local TYPED_LAYOUTS hashmap lookup (_tlv_get_addr + hash) — on every this.field get/set, and because the guard was an opaque runtime call, LLVM couldn't hoist the loop-invariant shape/layout check out of hot loops.

Result: method_calls ~3837 ms → ~36 ms (~106×) on this machine (same order of magnitude as Node), with the loop result verified (value:10000000).

This is the class-field analogue of the array-side win that landed in #5098.

Approach (two phases, both in this PR)

Phase 3a — runtime O(1) bit. Adds GC_LAYOUT_TYPED_RAW_F64_INTACT to the spare GcHeader._reserved bits, set iff an intact typed-shape descriptor with a raw-f64 slot is installed. A guard-only fast path layout_guard_field_is_raw_f64 answers "is this field raw-f64?" from the header bit + the object's field_count — no thread-local lookup — falling back to the precise per-slot predicate when the bit is clear. The bit is maintained at a single choke point: set_layout_state clears it (every downgrade/removal routes through it), the two descriptor-install sites re-set it, layout_transfer carries it across GC moves, and the sweep/free path clears it. The precise predicate is unchanged, so the GC scanner and existing layout tests are unaffected.

Phase 3b — codegen inline guard hoist. Emits the class-field guard inline (POINTER_TAG check → GC_TYPE_OBJECT/not-forwarded → OBJECT_TYPE_REGULARclass_id/keys_array match → intact bit → no own descriptor, plus an inline plain-number check for stores) instead of an opaque call, gated to requires_raw_f64 data fields. LLVM LICM hoists the loop-invariant part out, collapsing the per-iteration cost to the direct slot load/store. Any inline miss falls through to today's exact guard-call path, so a false is never unsafe — only slower.

Soundness (memory-corruption class — the crux of #5094)

inline_ok == true implies every condition the runtime guard's success requires:

  • receiver is a POINTER_TAG heap object (tag-checked before any deref);
  • GcHeader.obj_type == GC_TYPE_OBJECT, not forwarded; ObjectHeader.object_type == OBJECT_TYPE_REGULAR;
  • class_id/keys_array match the compile-time class shape ⇒ field_index in bounds and key matches by construction;
  • intact bit set ⇒ slot is raw-f64 with no downgrade (cleared the instant a non-number is written via an any alias);
  • own-descriptor bit clear ⇒ a prototype accessor is shadowed by the own data slot, so direct access stays correct;
  • (stores) the value is not a NaN-boxed non-number, so a raw-f64 store can never publish a pointer/string into a pointer-free slot (→ GC use-after-free).

requires_raw_f64 (codegen) and the descriptor's raw_f64_mask both derive from the same class_typed_layout, so they agree by construction. PERRY_VERIFY_LAYOUT_FASTPATH=1 cross-checks the 3a fast path against the precise predicate on every hit and ran clean at 10M iterations.

Verification

  • Perf: method_calls ~3837 ms → ~36 ms (~106×).
  • Parity: gap suite shows zero new failures vs main (branch fail set is a strict subset of base; Compile Fail: 0).
  • Tests: perry-codegen unit suite passes; perry-runtime suite passes single-threaded (1035/1035; the parallel-isolation flake reproduces identically on main).
  • GC stress: targeted downgrade-via-any / mixed-class / GC-churn test (test-files/test_issue_5094_layout_fastpath.ts) matches Node and is byte-identical under PERRY_GC_VERIFY_EVACUATION=1, PERRY_GC_FORCE_EVACUATE=1, PERRY_GEN_GC=0, and PERRY_WRITE_BARRIERS=0.

Notes

  • Scoped to requires_raw_f64 data fields (the verified pointer-free case); non-raw fields and object_property keep today's path and remain follow-ups under the umbrella.
  • Version bump / CHANGELOG intentionally omitted for the maintainer to fold in at merge (avoids patch-version collisions).

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • Performance

    • Optimized class field access operations with inline shape checks for numeric fields, reducing runtime overhead for frequently-accessed properties.
    • Added fast-path detection for numeric field slots to improve data-access performance.
  • Bug Fixes

    • Fixed handling of field type transitions in the class layout fastpath.

…inline guard hoist (#5094)

Class-field access (`this.field` in class methods) was the worst benchmark
gap in #5094 (~290×: method_calls ~3300ms vs Node ~11ms). The class-field
guard called `layout_typed_raw_f64_slot_for_user` — a thread-local
`TYPED_LAYOUTS` hashmap lookup (`_tlv_get_addr` + hash) — on every get/set,
and because the guard was an opaque runtime call, LLVM could not hoist the
loop-invariant shape/layout check out of hot loops.

Two phases, both shippable:

Phase 3a (runtime): add a `GC_LAYOUT_TYPED_RAW_F64_INTACT` bit to the spare
GcHeader `_reserved` bits, set iff an intact typed-shape descriptor with a
raw-f64 slot is installed. A new guard-only fast path
`layout_guard_field_is_raw_f64` answers "is this field raw-f64?" from the
header bit + the object's `field_count` with no thread-local lookup, falling
back to the precise per-slot predicate when the bit is clear. The bit is
maintained at a single choke point: `set_layout_state` clears it (every
downgrade/removal routes through there), the two descriptor-install sites
re-set it, `layout_transfer` carries it across GC moves, and the sweep/free
path clears it. The precise predicate is unchanged, so the GC scanner and
existing tests are unaffected.

Phase 3b (codegen): emit the class-field guard inline (shape + class_id +
keys + the intact bit + no-own-descriptor, plus an inline plain-number check
for stores) instead of an opaque call, gated to raw-f64 data fields. LLVM
LICM hoists the loop-invariant part out of the loop, collapsing the per-
iteration cost to the direct slot load/store. Any inline miss falls through
to today's exact guard-call path, so a `false` is never unsafe — only slower.

Result: method_calls ~3837ms -> ~36ms (~106x; same order as Node). Soundness:
the inline fast path implies every condition the runtime guard's success
requires (receiver is a POINTER_TAG object, GC_TYPE_OBJECT, not forwarded,
OBJECT_TYPE_REGULAR, class_id/keys match -> field in bounds + key matches by
construction, intact bit -> slot is raw-f64, own-descriptor bit clear -> a
prototype accessor is shadowed by the own data slot; the store value is not a
NaN-boxed non-number so a raw-f64 store can never publish a pointer). The
`requires_raw_f64` flag and the descriptor's raw_f64_mask both derive from the
same `class_typed_layout`, so they agree by construction;
`PERRY_VERIFY_LAYOUT_FASTPATH=1` cross-checks this on every 3a fast-path hit
and ran clean at 10M iterations.

Verification: method_calls 106x; gap parity suite shows zero new failures vs
main (branch fail set is a strict subset; Compile Fail 0); perry-codegen and
single-threaded perry-runtime suites pass; targeted downgrade-via-`any` /
mixed-class / GC-churn test matches Node and is byte-identical under
PERRY_GC_VERIFY_EVACUATION=1, PERRY_GC_FORCE_EVACUATE=1, PERRY_GEN_GC=0, and
PERRY_WRITE_BARRIERS=0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Introduces a GC_LAYOUT_TYPED_RAW_F64_INTACT header bit to track raw-f64 typed-shape integrity per object. Synchronizes this bit across all descriptor lifecycle transitions. Adds layout_guard_field_is_raw_f64 for O(1) fast-path checks without hashmap lookups. Emits an inline LLVM-IR shape guard (emit_inline_class_field_guard) into property get/set lowering for raw-f64 candidate fields. Updates typed-feedback guard contracts to use the new predicate. Adds a regression test.

Changes

Raw-f64 class field inline guard fast path

Layer / File(s) Summary
GC_LAYOUT_TYPED_RAW_F64_INTACT flag and lifecycle synchronization
crates/perry-runtime/src/gc/types.rs, crates/perry-runtime/src/gc/layout.rs
Defines GC_LAYOUT_TYPED_RAW_F64_INTACT (bit 0x1000); updates set_layout_state to clear it on any transition; adds set_typed_raw_f64_intact helper; re-applies the bit after init_typed_shape_layout, js_gc_init_unboxed_object_layout, and layout_transfer calls that clear it; clears it directly in layout_clear_for_ptr.
layout_guard_field_is_raw_f64 fast-path function
crates/perry-runtime/src/gc/layout.rs
Adds layout_guard_field_is_raw_f64 reading GC_LAYOUT_TYPED_RAW_F64_INTACT from the object header to skip the TYPED_LAYOUTS hashmap lookup; in debug/verification mode (PERRY_VERIFY_LAYOUT_FASTPATH), cross-checks against layout_typed_raw_f64_slot_for_user and panics on divergence.
Typed-feedback guard contracts
crates/perry-runtime/src/typed_feedback/guards.rs
Switches class_field_get_contract, class_field_fast_contract, and class_field_set_contract to call layout_guard_field_is_raw_f64 instead of layout_typed_raw_f64_slot_for_user for require_raw_f64 validity; all other contract conditions unchanged.
emit_inline_class_field_guard LLVM-IR helper
crates/perry-codegen/src/expr/property_get.rs
Adds emit_inline_class_field_guard (with supporting imports) that emits an SSA ok boolean checking NaN-box tags, GC/object header fields, class id, keys-array identity, GC_LAYOUT_TYPED_RAW_F64_INTACT bit, descriptor absence, and optionally NaN-boxed non-number rejection.
Property-get lowering inline guard integration
crates/perry-codegen/src/expr/property_get.rs
Refactors the class-field-get fast-path to extract obj_bits, obj_handle, key_raw, and expected_keys as explicit SSA values; when requires_raw_f64, branches on emit_inline_class_field_guard directly to the fast slot-load block, with misses falling to the js_typed_feedback_class_field_get_guard runtime path.
Property-set lowering inline guard integration
crates/perry-codegen/src/expr/property_set.rs
Imports emit_inline_class_field_guard; refactors the class-field-set fast-path to compute key_raw/expected_keys separately and allocate fast/fallback/merge blocks earlier; when requires_raw_f64, adds a needguard block gated by emit_inline_class_field_guard before falling through to js_typed_feedback_class_field_set_guard.
Regression test
test-files/test_issue_5094_layout_fastpath.ts
Adds hot-loop scenarios covering any-alias field downgrade, mixed number+string field access, and GC churn during field reads to validate fast-path correctness.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

  • PerryTS/perry#5198: Implements the same code-level optimization across property_get.rs/property_set.rs, gc/layout.rs, and typed_feedback/guards.rs for raw-f64 inline class field guards gated on a GC typed-layout-intact bit.

Poem

🐇 Hop along the fast path, no hashmap to chase,
The header bit glows — 0x1000 in place!
NaN-box tags checked, the shape guard inline,
GC churn and downgrades? The bunny is fine.
With emit_inline_guard and has_raw_f64 set,
This rabbit's fast path is the speediest yet! 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'perf(gc): O(1) class-field raw-f64 layout — header bit + inline guard hoist (#5094)' clearly summarizes the main performance optimization: converting class-field layout checks from O(n) hashmap lookups to O(1) header bit checks with inline guard hoisting.
Description check ✅ Passed The PR description is comprehensive and well-structured, covering Summary, Changes (across all affected files), Related issue (#5094), Test plan with all checkboxes addressed, and Checklist confirmation. All required template sections are present and substantive.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/5094-layout-o1-header
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch fix/5094-layout-o1-header

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/perry-codegen/src/expr/property_get.rs`:
- Around line 92-123: The code computes the `is_obj_tag` validation check but
does not use it to control program flow, so subsequent pointer dereferences
through `obj_ptr` (via operations like loading obj_type, gcflags, otype, cid,
and keys) execute unconditionally in the same basic block regardless of whether
the tag validation passed. Move all the header and object loads (the blk.load
and blk.gep calls that access obj_type_ptr, gcflags_ptr, reserved_ptr, otype,
cid_ptr, and keys_ptr) into a conditional block that only executes when
`is_obj_tag` is true, so that pointer dereferences are gated on successful tag
validation via control flow rather than just data dependency.

In `@crates/perry-codegen/src/expr/property_set.rs`:
- Around line 335-346: The inline predicate for the raw-f64 fast-path set
optimization does not include a check for the `OBJ_FLAG_FROZEN` flag, allowing
frozen objects to bypass protection that is enforced by the runtime contract of
`js_typed_feedback_class_field_set_guard`. Modify the inline set predicate
computation for `inline_ok` (returned from `emit_inline_class_field_guard` call)
to include a frozen-bit check that ensures frozen receivers cannot take the
fast-store path and must go through the runtime guard in `needguard_idx` block
instead.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: d998c565-4654-49f6-b24d-2c6d1a3d7b90

📥 Commits

Reviewing files that changed from the base of the PR and between 5258a60 and 723e9c9.

📒 Files selected for processing (6)
  • crates/perry-codegen/src/expr/property_get.rs
  • crates/perry-codegen/src/expr/property_set.rs
  • crates/perry-runtime/src/gc/layout.rs
  • crates/perry-runtime/src/gc/types.rs
  • crates/perry-runtime/src/typed_feedback/guards.rs
  • test-files/test_issue_5094_layout_fastpath.ts

Comment on lines +92 to +123
let recv_bits = blk.bitcast_double_to_i64(recv_box);
// Receiver must be a POINTER_TAG NaN-boxed heap object before any deref.
let tag = blk.and(
I64,
&recv_bits,
&crate::nanbox::i64_literal(crate::nanbox::TAG_MASK),
);
let is_obj_tag = blk.icmp_eq(I64, &tag, crate::nanbox::POINTER_TAG_I64);
let handle = blk.and(I64, &recv_bits, POINTER_MASK_I64);
let obj_ptr = blk.inttoptr(I64, &handle);
// GcHeader sits 8 bytes below the user pointer: obj_type u8@-8, gc_flags
// u8@-7, _reserved u16@-6.
let obj_type_ptr = blk.gep(I8, &obj_ptr, &[(I64, "-8")]);
let obj_type = blk.load(I8, &obj_type_ptr);
let type_ok = blk.icmp_eq(I8, &obj_type, "2"); // GC_TYPE_OBJECT
let gcflags_ptr = blk.gep(I8, &obj_ptr, &[(I64, "-7")]);
let gcflags = blk.load(I8, &gcflags_ptr);
let fwd = blk.and(I8, &gcflags, "-128"); // GC_FLAG_FORWARDED (0x80)
let not_fwd = blk.icmp_eq(I8, &fwd, "0");
let reserved_ptr = blk.gep(I8, &obj_ptr, &[(I64, "-6")]);
let reserved = blk.load(I16, &reserved_ptr);
// ObjectHeader.object_type u32@+0
let otype = blk.load(I32, &obj_ptr);
let otype_ok = blk.icmp_eq(I32, &otype, "1"); // OBJECT_TYPE_REGULAR
// class_id u32@+4
let cid_ptr = blk.gep(I8, &obj_ptr, &[(I64, "4")]);
let cid = blk.load(I32, &cid_ptr);
let cid_ok = blk.icmp_eq(I32, &cid, expected_class_id);
// keys_array ptr@+16
let keys_ptr = blk.gep(I8, &obj_ptr, &[(I64, "16")]);
let keys = blk.load(I64, &keys_ptr);
let keys_ok = blk.icmp_eq(I64, &keys, expected_keys);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, find and examine the emit_inline_class_field_guard function
cd crates/perry-codegen/src/expr && wc -l property_get.rs

Repository: PerryTS/perry

Length of output: 78


🏁 Script executed:

# Get the full function definition to see control flow
cd crates/perry-codegen/src/expr && sed -n '85,150p' property_get.rs

Repository: PerryTS/perry

Length of output: 3111


🏁 Script executed:

# Check what happens after line 123 (if there are branches gating the loads)
cd crates/perry-codegen/src/expr && sed -n '92,140p' property_get.rs

Repository: PerryTS/perry

Length of output: 2485


Unsafe dereference happens before tag validation is enforced in control flow.

The function computes is_obj_tag on line 99 but does not branch on it. Lines 104–123 execute unconditionally in the same basic block, dereferencing obj_ptr (derived from unvalidated bits) regardless of whether the pointer tag check passed. The is_obj_tag boolean is later combined with other conditions via bitwise AND, but bitwise operations are data dependencies, not control-flow branches—all loads execute before any guard can reject the input. Move the header and object loads (lines 104–123) into a block that only executes when is_obj_tag is true, or ensure all callsites validate the pointer tag before entering this helper.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/perry-codegen/src/expr/property_get.rs` around lines 92 - 123, The
code computes the `is_obj_tag` validation check but does not use it to control
program flow, so subsequent pointer dereferences through `obj_ptr` (via
operations like loading obj_type, gcflags, otype, cid, and keys) execute
unconditionally in the same basic block regardless of whether the tag validation
passed. Move all the header and object loads (the blk.load and blk.gep calls
that access obj_type_ptr, gcflags_ptr, reserved_ptr, otype, cid_ptr, and
keys_ptr) into a conditional block that only executes when `is_obj_tag` is true,
so that pointer dereferences are gated on successful tag validation via control
flow rather than just data dependency.

Comment on lines +335 to +346
if requires_raw_f64 {
let inline_ok = emit_inline_class_field_guard(
ctx.block(),
&recv_box,
&expected_class_id_str,
&expected_keys,
Some(&val_double),
);
let needguard_idx = ctx.new_block("class_field_set.needguard");
let needguard_label = ctx.block_label(needguard_idx);
ctx.block().cond_br(&inline_ok, &fast_label, &needguard_label);
ctx.current_block = needguard_idx;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Inline raw-f64 set fast-path bypasses frozen-object protection.

Line 345 can jump straight to the fast store, but the inline predicate does not check OBJ_FLAG_FROZEN. The runtime contract used by js_typed_feedback_class_field_set_guard (in crates/perry-runtime/src/typed_feedback/guards.rs) rejects frozen receivers, so this optimization can allow writes that should be blocked. Add a frozen-bit condition to the inline set predicate (or force frozen checks through the runtime guard before fast-store).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/perry-codegen/src/expr/property_set.rs` around lines 335 - 346, The
inline predicate for the raw-f64 fast-path set optimization does not include a
check for the `OBJ_FLAG_FROZEN` flag, allowing frozen objects to bypass
protection that is enforced by the runtime contract of
`js_typed_feedback_class_field_set_guard`. Modify the inline set predicate
computation for `inline_ok` (returned from `emit_inline_class_field_guard` call)
to include a frozen-bit check that ensures frozen receivers cannot take the
fast-store path and must go through the runtime guard in `needguard_idx` block
instead.

@proggeramlug

Copy link
Copy Markdown
Contributor Author

Closing as superseded by #5198, which already shipped the inline class-field shape guard (header intact-bit, inline guard, pointer-tag gate, frozen + plain-number checks, verify mode, escape hatch). Investigation findings — including why it stays ~10% (the per-access flag load pins LICM) and why dropping the flag isn't safe yet (it's correctness-load-bearing; masks a latent dup-class-name inline-path bug) — recorded on #5094: #5094 (comment)

@proggeramlug proggeramlug deleted the fix/5094-layout-o1-header branch June 16, 2026 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant