Skip to content

perf(codegen): full-outline class-field IC diamond for oversized modules (#5334 lever B)#5385

Merged
proggeramlug merged 5 commits into
mainfrom
feat/full-outline-ic-oversized
Jun 18, 2026
Merged

perf(codegen): full-outline class-field IC diamond for oversized modules (#5334 lever B)#5385
proggeramlug merged 5 commits into
mainfrom
feat/full-outline-ic-oversized

Conversation

@proggeramlug

@proggeramlug proggeramlug commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

What

Final lever of the IR-efficiency roadmap (#5334, lever B): full-outline the class-field IC diamond for oversized modules.

Pathologically-large modules (the motivating case: a 13MB minified bundle that lowers to ~1.25GB of LLVM IR across ~92K functions) are forced to clang -O0 (#4880), where the inline class-field-SET diamond's ~15–22-lines-per-site expansion is never optimized away — and clang needs ~15GB RSS just to chew through it.

For such modules, the whole diamond collapses to a single call:

; before (per site): guard-operand prep + guard call + cond_br
;   + class_field_set.fast {store, layout note, write barrier}
;   + class_field_set.fallback {js_class_field_set_fallback}
;   + class_field_set.merge          (~22 lines)
; after:
  call void @js_class_field_set_ic(i64 <site>, double %recv, i32 %cls,
        i64 %keys, i64 %key, i32 %field_idx, double %val, i32 %raw_f64)   ; ~5 lines

The runtime helper reproduces the diamond's exact semantics — run the guard, then on PASS do the same raw-f64/boxed slot store, on FAIL record + route by name. It trades a call frame on the (cold, startup-dominated) field-set path for a large per-site IR reduction so clang can compile the module at all.

Gating (decided once per module, at codegen time)

  • PERRY_FULL_OUTLINE_IC=1/on/true forces ON, =0/off/false forces OFF;
  • otherwise auto: function count ≥ PERRY_FULL_OUTLINE_IC_MIN_FUNCS (default 4000) — the defining trait of the bundle case (tens of thousands of functions in one module). Ordinary per-file modules stay on the inline diamond and keep the hot fast store.

The decision is a thread-local set at the top of compile_module (codegen is sequential per module), not a process-global OnceLock — so it can't pin one module's decision across a multi-module build.

Trade-offs

Verification

  • Forced ON: the diamond collapses to one js_class_field_set_ic call per site — 0 class_field_set.fast blocks, 0 inline guard calls. On a small fixture: 197 fewer IR lines for 9 field-sets (~22/site), scaling to ~4–5M lines on the bundle's 238K field-set sites.
  • A class-field-write program runs to the correct result (5999997) under full-outline — the outlined runtime path is semantically and GC-correct.
  • New unit test full_outline_ic_collapses_class_field_set_to_single_call (both gate states). The two class-field structure tests now pin PERRY_FULL_OUTLINE_IC off and serialize on ENV_LOCK against it. Full perry-codegen suite green.

Refs #5334. (Levers A #5351, C #5350 merged; D #5381 in review.)

Summary by CodeRabbit

Release Notes

  • Performance

    • Reduced generated code size for large modules by routing class-field SET inline-cache through a fully-outlined js_class_field_set_ic entrypoint when full-outline inline-cache is enabled.
  • Configuration

    • Added a per-thread full-outline inline-cache feature gate via PERRY_FULL_OUTLINE_IC, with auto-gating using PERRY_FULL_OUTLINE_IC_MIN_FUNCS based on estimated module callable counts (including class members).
  • Tests

    • Improved test reliability by isolating environment-variable changes and added coverage for forced and auto-gated full-outline class-field behavior.

Ralph Küpper added 2 commits June 18, 2026 09:27
…les (#5334 lever B)

Pathologically-large modules (the motivating case: a 13MB minified bundle
that lowers to ~1.25GB of LLVM IR across ~92K functions) are forced to
`clang -O0` (#4880), where the inline class-field-SET IC diamond's
~15-lines-per-site expansion is never optimized away — and clang needs
~15GB RSS just to chew through it.

For such modules, replace the ENTIRE diamond (guard call + fast slot store
+ fallback arm) with a single `call @js_class_field_set_ic(...)`. The
runtime helper reproduces the diamond's exact semantics — run the guard,
then on PASS do the same raw-f64/boxed slot store, on FAIL record + route
by name. This trades a function-call frame on the (cold, startup-
dominated) field-set path for a large per-site IR reduction so clang can
compile the module at all.

Gating (codegen-time, decided once per module in compile_module):
- `PERRY_FULL_OUTLINE_IC=1/on/true` forces ON, `=0/off/false` forces OFF;
- otherwise auto: function count >= PERRY_FULL_OUTLINE_IC_MIN_FUNCS
  (default 4000) — the defining trait of the bundle case; ordinary
  per-file modules stay on the inline diamond and keep the hot fast store.

The decision is a thread-local set at the top of compile_module (codegen
is sequential per module), not a process-global OnceLock, so it can't pin
one module's decision across a multi-module build.

NB: the full-outline boxed store always emits the write barrier (via
js_object_set_field), so the compile-time non-pointer barrier elision
(#5334 lever D) does not apply on this path — acceptable, since it is
gated to oversized, non-hot-loop modules.

Verified: forced ON collapses the diamond to one call (no fast/fallback
blocks, no inline guard call); a class-field-write program runs to the
correct result under full-outline; full perry-codegen suite green. The two
class-field structure tests now pin PERRY_FULL_OUTLINE_IC off and
serialize on ENV_LOCK against the new lever-B test.

Final lever of the IR-efficiency roadmap (#5334). Levers A #5351, C #5350
merged; D #5381 in review.
Addresses self-review findings on #5334 lever B (#5385):

- Gate denominator: `decide_full_outline_ic` was fed `hir.functions.len()`,
  which excludes class methods, static methods, accessors, and constructors
  (those live in `hir.classes[].*`, collected separately). A class-heavy
  minified bundle — the exact pathology lever B targets — could have a small
  `functions.len()` yet emit tens of thousands of LLVM functions, so the gate
  would never fire. New `module_callable_count()` counts top-level functions
  plus all class callables; the gate now uses it. New test
  `full_outline_ic_auto_gate_counts_class_methods` covers a class-heavy module
  triggering with only one top-level function.

- Dedup: `js_class_field_set_ic`'s guard-FAIL tail re-implemented
  `js_class_field_set_fallback` verbatim. It now delegates to that helper, so
  by-name routing (frozen / accessor / setter-in-chain) is defined once.

Full perry-codegen + perry-runtime typed_feedback suites green.
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 0b5b6b9f-941d-4d32-8277-1521f342d1e6

📥 Commits

Reviewing files that changed from the base of the PR and between 624cea7 and acc60d6.

📒 Files selected for processing (1)
  • crates/perry-runtime/src/typed_feedback/guards.rs

📝 Walkthrough

Walkthrough

Adds an adaptive "full-outline IC" mode for class-field SET. A thread-local flag and env-var-driven decision function (decide_full_outline_ic) gate the feature per compile_module call. The runtime gains a new js_class_field_set_ic outlined entrypoint; codegen emits a single call to it instead of the inline guard/fast/fallback diamond when the flag is set.

Changes

Full-outline IC for class-field SET

Layer / File(s) Summary
Thread-local flag, callable-count estimator, and decision function
crates/perry-codegen/src/codegen/helpers.rs
Adds FULL_OUTLINE_IC thread-local with full_outline_ic_enabled/set_full_outline_ic accessors, module_callable_count to sum top-level and per-class callables, and decide_full_outline_ic resolving the gate from PERRY_FULL_OUTLINE_IC / PERRY_FULL_OUTLINE_IC_MIN_FUNCS env vars (default threshold 4000).
Runtime js_class_field_set_ic helper and linker retention
crates/perry-runtime/src/typed_feedback/guards.rs
Implements the outlined IC entrypoint: runs js_typed_feedback_class_field_set_guard; on success performs a raw f64 slot write or calls js_object_set_field with write barrier; on failure delegates to js_class_field_set_fallback. Adds a #[used] static to retain the symbol through LTO.
Codegen wiring: declaration, module init, and property-set outlined branch
crates/perry-codegen/src/runtime_decls/objects.rs, crates/perry-codegen/src/codegen/mod.rs, crates/perry-codegen/src/expr/property_set.rs
Declares js_class_field_set_ic in declare_phase_b_objects; re-exports new helpers and calls set_full_outline_ic(decide_full_outline_ic(module_callable_count(hir))) at the top of compile_module; adds a full_outline_ic_enabled() branch in property_set.rs that emits a single outlined call and returns early, bypassing the inline diamond.
Test coverage: env-var pinning and new outlined-mode assertions
crates/perry-codegen/tests/typed_feedback.rs
Introduces ENV_LOCK + EnvVarGuard to serialize env mutations; pins PERRY_FULL_OUTLINE_IC=0 in existing specialization tests; adds full_outline_ic_collapses_class_field_set_to_single_call (forced on/off toggle) and full_outline_ic_auto_gate_counts_class_methods (validates callable-count auto-gating).

Sequence Diagram(s)

sequenceDiagram
  participant CompileModule as compile_module
  participant Helpers as helpers.rs
  participant PropertySet as property_set.rs
  participant RuntimeDecls as declare_phase_b_objects
  participant Runtime as js_class_field_set_ic

  CompileModule->>Helpers: module_callable_count(hir)
  Helpers-->>CompileModule: callable_count
  CompileModule->>Helpers: decide_full_outline_ic(callable_count)
  Helpers-->>CompileModule: enabled bool
  CompileModule->>Helpers: set_full_outline_ic(enabled)
  CompileModule->>RuntimeDecls: declare js_class_field_set_ic
  CompileModule->>PropertySet: lower class-field SET site
  PropertySet->>Helpers: full_outline_ic_enabled()
  alt enabled
    PropertySet->>Runtime: emit js_class_field_set_ic call
  else disabled
    PropertySet->>PropertySet: emit guard/fast/fallback diamond
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related issues

Possibly related PRs

  • PerryTS/perry#5198: Both PRs modify the class-field SET typed-feedback fast path in property_set.rs; this PR adds the full_outline_ic_enabled() outlined branch that bypasses the inline shape-guard path implemented in that PR.
  • PerryTS/perry#5351: The new js_class_field_set_ic outlined path delegates guard failures to js_class_field_set_fallback, which was introduced/outlined in that PR, with both touching the same class-field-SET codegen and runtime helpers.

Poem

🐇 Hop hop, the diamond's gone!
One outlined call from dusk to dawn,
When callables grow too large to bear,
A single IC leaps through the air.
No guard, no fast, no fallback chain —
The rabbit's IR stays lean and sane! 🌟

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: implementing a full-outline class-field IC diamond optimization for oversized modules, with a relevant reference to the broader roadmap (lever B of #5334).
Description check ✅ Passed The description comprehensively covers all required template sections: a clear 'What' summary, detailed explanation of the problem and solution, gating mechanism, trade-offs, and verification steps with test results.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/full-outline-ic-oversized

Comment @coderabbitai help to get the list of available commands and usage tips.

…oversized

# Conflicts:
#	crates/perry-codegen/tests/typed_feedback.rs

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/perry-codegen/src/codegen/helpers.rs`:
- Around line 139-148: The class_callables count calculation omits
computed_members from the denominator, but since computed_members are
lowered/registered like class callables in compile_module, they must be
included. Add the length of c.computed_members to the sum being calculated in
the map closure that processes each class c, placing it alongside the other
member counts (constructor, methods, static_methods, getters, setters).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 9f6c0907-481c-4933-b8f0-3233867fb81a

📥 Commits

Reviewing files that changed from the base of the PR and between 2b30f9f and 3b17b36.

📒 Files selected for processing (6)
  • crates/perry-codegen/src/codegen/helpers.rs
  • crates/perry-codegen/src/codegen/mod.rs
  • crates/perry-codegen/src/expr/property_set.rs
  • crates/perry-codegen/src/runtime_decls/objects.rs
  • crates/perry-codegen/tests/typed_feedback.rs
  • crates/perry-runtime/src/typed_feedback/guards.rs

Comment thread crates/perry-codegen/src/codegen/helpers.rs
Ralph Küpper added 2 commits June 18, 2026 09:46
…minator

ClassComputedMember holds a Function that compile_module lowers like any
method (emits an LLVM function), so computed members must count toward the
oversized-module gate alongside methods/static_methods/getters/setters.
A class with many computed-key methods could otherwise stay under
PERRY_FULL_OUTLINE_IC_MIN_FUNCS and keep the inline diamond when the auto
gate should fire.
…w store

The full-outline IC helper's raw-f64 slot write is a barrier-free store the
GC store-site inventory requires to be annotated. A passing guard with
require_raw_f64 proves the slot is pointer-free (typed-shape descriptor) and
the value is a plain number — identical to the inline class_field_set.fast
raw-f64 store. Fixes the failing lint 'GC store-site inventory' step.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant