Skip to content

perf(hir): make large-bundle lowering linear (closure captures, registry lookups, fluent-chain re-lowering)#5284

Merged
proggeramlug merged 4 commits into
mainfrom
perf/hir-large-bundle-lowering
Jun 17, 2026
Merged

perf(hir): make large-bundle lowering linear (closure captures, registry lookups, fluent-chain re-lowering)#5284
proggeramlug merged 4 commits into
mainfrom
perf/hir-large-bundle-lowering

Conversation

@proggeramlug

@proggeramlug proggeramlug commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

What

Three independent HIR-lowering performance fixes found while compiling a large (~13 MB) minified real-world ESM bundle, whose check-lower stage went from never finishing (>1500 s) to ~12 s. Each is a separate commit:

  1. perf(hir): O(1) closure-capture analysiscompute_closure_captures rebuilt an O(scope) membership set per closure → O(n²) for N closures in an N-binding scope. Now maintains a live id_set on Locals (insert/remove/reindex) passed by reference, and shares the fn_ctor_env write-scan Shadow instead of cloning per nested fn. A nested-closure micro-benchmark (cap_12000) drops 13.0 s → 0.07 s. Capture/mutable-capture semantics unchanged (param / inner-decl / dayjs same-id filtering preserved).

  2. perf(hir): O(1) registry lookupsnative_instances / module_native_instances / func_return_native_instances / native_modules / class_statics were Vec-scanned per call/member; indexed them by name (mirroring the existing imported_functions_index), preserving scope shadowing + truncation semantics exactly. Direct-lookup micro-bench at K=16000: 1033 ms → 0.54 ms.

  3. perf(hir): fix exponential re-lowering of native-fluent method chainslower_call_inner's fall-through re-lowered a chained-native-method receiver after try_static_method_and_instance discarded it, giving 2^depth re-lowering on builder chains like x.a().b().c()… (span-count trace showed 37M/18.5M/9.2M… halving per level). Now memoizes the pre-lowered receiver (span-keyed, single-shot, cleared on any other member lowering) and reuses it. This is the dominant fix: the bundle's perry check went >1500 s (killed) → ~12 s.

Why grouped

All three touch overlapping lowering files (lowering_context.rs, context.rs, expr_call/mod.rs) and were found in one pass; splitting them causes artificial cherry-pick conflicts. Each is a clean, self-contained commit if you prefer to review/merge individually.

Tests

cargo test -p perry-hir --tests green at each commit (321 → 323). Semantics-preserving throughout (these are pure performance fixes — no behavior change). The pre-existing machine-specific test_lower_rejects_deep_* / nested_object_literal_lowers_in_linear_time debug-build stack-overflow aborts are unrelated (identical on main).

Summary by CodeRabbit

  • Refactor / Performance

    • Improved native module, native instance, and class static registration/lookup with indexed registries and consistent scope truncation behavior.
    • Reduced redundant work in native fluent call lowering using span-aware receiver caching.
    • Streamlined closure-capture analysis with faster local membership tracking.
  • Debugging

    • Added optional, environment-gated relowering trace output to pinpoint excessive re-lowering.
  • Tests

    • Added coverage for native instance indexing, shadowing, truncation, and module-level behavior, plus an ignored performance benchmark.

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 18c2dfff-0768-4617-81b8-3638a63d6d2d

📥 Commits

Reviewing files that changed from the base of the PR and between 0d52689 and 7b510b0.

📒 Files selected for processing (16)
  • crates/perry-hir/src/lower/context.rs
  • crates/perry-hir/src/lower/expr_assign.rs
  • crates/perry-hir/src/lower/expr_call/mod.rs
  • crates/perry-hir/src/lower/expr_call/static_and_instance.rs
  • crates/perry-hir/src/lower/expr_function.rs
  • crates/perry-hir/src/lower/expr_member.rs
  • crates/perry-hir/src/lower/fn_ctor_env.rs
  • crates/perry-hir/src/lower/locals.rs
  • crates/perry-hir/src/lower/lower_expr.rs
  • crates/perry-hir/src/lower/lowering_context.rs
  • crates/perry-hir/src/lower/module_decl.rs
  • crates/perry-hir/src/lower/stmt.rs
  • crates/perry-hir/src/lower/tests.rs
  • crates/perry-hir/src/lower_decl/body_stmt.rs
  • crates/perry-hir/src/lower_decl/body_stmt/nested_fn_decl.rs
  • crates/perry-hir/src/lower_decl/fn_decl.rs
🚧 Files skipped from review as they are similar to previous changes (15)
  • crates/perry-hir/src/lower_decl/body_stmt.rs
  • crates/perry-hir/src/lower_decl/body_stmt/nested_fn_decl.rs
  • crates/perry-hir/src/lower/tests.rs
  • crates/perry-hir/src/lower/stmt.rs
  • crates/perry-hir/src/lower/expr_member.rs
  • crates/perry-hir/src/lower/expr_call/static_and_instance.rs
  • crates/perry-hir/src/lower/expr_assign.rs
  • crates/perry-hir/src/lower/locals.rs
  • crates/perry-hir/src/lower/lower_expr.rs
  • crates/perry-hir/src/lower/context.rs
  • crates/perry-hir/src/lower_decl/fn_decl.rs
  • crates/perry-hir/src/lower/module_decl.rs
  • crates/perry-hir/src/lower/expr_call/mod.rs
  • crates/perry-hir/src/lower/lowering_context.rs
  • crates/perry-hir/src/lower/fn_ctor_env.rs

📝 Walkthrough

Walkthrough

This PR optimizes the perry-hir lowering phase by replacing O(n) linear scans across five native-instance/module registries in LoweringContext with O(1) HashMap indexes, caching pre-lowered member receivers to prevent exponential re-lowering on native fluent chains, maintaining an incremental HashSet<LocalId> in Locals to avoid rebuilding membership sets per closure, and threading mutable Shadow through fn_ctor_env scan helpers to eliminate cloning. A diagnostic relower_trace module is also added.

Changes

HIR Lowering Performance Optimizations

Layer / File(s) Summary
LoweringContext index fields and Locals.id_set data contracts
crates/perry-hir/src/lower/lowering_context.rs, crates/perry-hir/src/lower/locals.rs
LoweringContext gains six new fields: five HashMap-based index structures for native-instance/module/class-statics registries and a prelowered_member_receiver memo slot. Locals gains an id_set: HashSet<LocalId> field with id_set() accessor, kept synchronized by push, drain_from, and reindex.
Indexed registry implementation and index initialization
crates/perry-hir/src/lower/context.rs, crates/perry-hir/src/lower/tests.rs
context.rs initializes all six index fields in with_class_id_start and implements indexed register/lookup/truncate for all five registry types with defined shadowing semantics (shadow-stack for scoped instances, first-match for modules/class-statics/func-returns, last-match for module-level instances). Adds truncate_native_instances, push_func_return_native_instance, push_module_native_instance helpers. Delegates exit_scope to truncate_native_instances. Tests cover scope shadowing, truncation restore, module last-wins, and flat lookup cost.
Registry helper call site migration
crates/perry-hir/src/lower/expr_assign.rs, crates/perry-hir/src/lower/module_decl.rs, crates/perry-hir/src/lower/stmt.rs, crates/perry-hir/src/lower_decl/body_stmt.rs, crates/perry-hir/src/lower_decl/fn_decl.rs
All direct .push() call sites across multiple files are migrated to ctx.push_func_return_native_instance(...) for function returns and ctx.push_module_native_instance(...) for module-level instances, spanning exported-function declarations, arrow functions with native return type inference, default exports, native factory/member-call initialization, and variable-to-variable instance re-registration.
Fluent-chain receiver caching
crates/perry-hir/src/lower/expr_call/static_and_instance.rs, crates/perry-hir/src/lower/expr_member.rs, crates/perry-hir/src/lower/expr_call/mod.rs
In the native fluent-chain dispatch path, the receiver is lowered once and its span is captured; on a no-match fallback the lowered receiver is stashed in ctx.prelowered_member_receiver keyed by (lo, hi). lower_member_inner checks the cache before calling lower_expr on the receiver. lower_call_inner clears the cache at entry to prevent stale memo leakage.
Closure capture via Locals.id_set
crates/perry-hir/src/lower/expr_function.rs, crates/perry-hir/src/lower_decl/body_stmt/nested_fn_decl.rs
compute_closure_captures signature changed to accept &HashSet<LocalId> directly. lower_arrow, lower_fn_expr_anon, and lower_named_fn_expr remove pre-scope outer_locals Vec snapshots and instead pass ctx.locals.id_set() (the live enclosing-scope membership set) after popping the closure scope. lower_nested_fn_decl is updated analogously.
Mutable Shadow threading in fn_ctor_env
crates/perry-hir/src/lower/fn_ctor_env.rs
All scan-helper signatures (scan_stmt, record_pat_bindings, scan_for_head_writes, scan_assign_target_writes, scan_fn_body_writes, scan_class_writes, scan_expr_writes) changed from &Shadow to &mut Shadow. scan_fn_body_writes and arrow scanning use insert/remove push-pop over the shared shadow instead of cloning.
PERRY_TRACE_RELOWER diagnostic tracing
crates/perry-hir/src/lower/lower_expr.rs
Adds pub(crate) mod relower_trace with enabled() and record(lo, hi) functions, gated by the PERRY_TRACE_RELOWER env var. Uses atomics and thread-local HashMap storage to count per-span lower_expr invocations and dump hot spans to stderr periodically.

Possibly related PRs

  • PerryTS/perry#5270: This PR extends the indexed Locals container from #5270 by adding id_set/id_set() for closure-capture membership tests and refactoring compute_closure_captures to use it directly.

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 Hop, hop through the lowering pass,
No more O(n) scans in the grass!
HashMap indexes, quick as a wink,
Fluent chains cached before you can blink.
HashSets for closures, shadows pushed clean —
The fluffiest compiler you've ever seen! 🌿

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the three main performance fixes: closure captures (O(1)), registry lookups (O(1)), and fluent-chain re-lowering optimization.
Description check ✅ Passed The description includes all required sections: What (detailed), Why (rationale), Tests (results), with test checklist items checked and proper formatting.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/hir-large-bundle-lowering

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/perry-hir/src/lower_decl/body_stmt.rs (1)

1886-1889: ⚡ Quick win

Use the function-return index for the early-exit check.

Line 1851 now updates the helper-backed index, but this recursive stop check still scans the backing Vec after each statement. Use the indexed lookup to keep this path aligned with the O(1) registry goal.

♻️ Proposed change
-        if ctx
-            .func_return_native_instances
-            .iter()
-            .any(|(n, _, _)| n == func_name)
+        if ctx.lookup_func_return_native_instance(func_name).is_some()
         {
             return;
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/perry-hir/src/lower_decl/body_stmt.rs` around lines 1886 - 1889, The
recursive stop check at lines 1886-1889 performs a linear scan of the
func_return_native_instances vector using iter().any() to check if func_name
exists, which is O(n) complexity. Since line 1851 now maintains a helper-backed
index for this lookup, refactor the early-exit check to use the indexed lookup
instead of scanning the Vec. This will keep the performance characteristic
aligned with the O(1) registry goal and ensure consistency throughout the code
path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/perry-hir/src/lower_decl/body_stmt.rs`:
- Around line 1886-1889: The recursive stop check at lines 1886-1889 performs a
linear scan of the func_return_native_instances vector using iter().any() to
check if func_name exists, which is O(n) complexity. Since line 1851 now
maintains a helper-backed index for this lookup, refactor the early-exit check
to use the indexed lookup instead of scanning the Vec. This will keep the
performance characteristic aligned with the O(1) registry goal and ensure
consistency throughout the code path.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 91fe645d-057c-4e72-8f47-459f66d7dd8e

📥 Commits

Reviewing files that changed from the base of the PR and between d46feff and 11731e5.

📒 Files selected for processing (16)
  • crates/perry-hir/src/lower/context.rs
  • crates/perry-hir/src/lower/expr_assign.rs
  • crates/perry-hir/src/lower/expr_call/mod.rs
  • crates/perry-hir/src/lower/expr_call/static_and_instance.rs
  • crates/perry-hir/src/lower/expr_function.rs
  • crates/perry-hir/src/lower/expr_member.rs
  • crates/perry-hir/src/lower/fn_ctor_env.rs
  • crates/perry-hir/src/lower/locals.rs
  • crates/perry-hir/src/lower/lower_expr.rs
  • crates/perry-hir/src/lower/lowering_context.rs
  • crates/perry-hir/src/lower/module_decl.rs
  • crates/perry-hir/src/lower/stmt.rs
  • crates/perry-hir/src/lower/tests.rs
  • crates/perry-hir/src/lower_decl/body_stmt.rs
  • crates/perry-hir/src/lower_decl/body_stmt/nested_fn_decl.rs
  • crates/perry-hir/src/lower_decl/fn_decl.rs

Ralph Küpper added 4 commits June 16, 2026 21:58
…d write-scan shadow

compute_closure_captures rebuilt an O(scope) membership set per closure
(O(n^2) for N closures in an N-binding scope). Maintain a live id_set on
Locals (insert/remove/reindex) and pass it by reference; share the
fn_ctor_env write-scan Shadow instead of cloning per nested fn.
cap_12000 check: 13.06s -> 0.07s. Captures/mutable-captures semantics
unchanged (param/inner-decl/dayjs same-id filtering preserved).
…tics by name

Several per-call/per-member HIR resolution helpers did linear scans over `Vec`
registries. For a program with K classes / native bindings and M call+member
expressions, every lookup walked the whole registry — including the common
miss case (receiver not registered) which scanned to the end and returned
`None`. That is O(M*K), quadratic on large bundles, stalling check-lower.

`lookup_class` was already indexed (classes_index, #5267). This indexes the
remaining Vec-scanned lookups, mirroring the proven imported_functions_index
pattern, while preserving identical Option/tuple results:

- native_instances (scope-stack-like: pushed on scope entry, truncated on
  exit): name -> Vec<usize> shadow stack, innermost (last) on top. lookup
  reads the top index (== old `.rev().find()` last-match-wins). New
  `truncate_native_instances(mark)` pops indices >= mark off each name's stack
  (and the two prior direct `.truncate()` sites now call it), so an inner
  binding stops shadowing the moment its scope pops — same shadowing as before.
- module_native_instances (module-level, push-only): name -> usize, overwritten
  on each push (last-match-wins, matching the reverse-scan fallback arm).
- func_return_native_instances + native_modules + class_statics (push-only):
  name -> usize keeping the FIRST entry (`or_insert`), matching the old forward
  `.iter().find()` first-match-wins. has_static_method/has_static_field and
  lookup_native_module/lookup_func_return_native_instance now O(1).

Push sites for module_native_instances / func_return_native_instances routed
through new register helpers so the index stays in sync.

Micro-bench (20000 x3 miss lookups vs K-sized registry, release):
  K=2000  baseline 82ms  -> fixed 0.53ms
  K=8000  baseline 335ms -> fixed 0.51ms
  K=16000 baseline 1033ms-> fixed 0.54ms   (~1900x at K=16000; flat in K)

Adds unit tests for native-instance shadowing+truncation and module-level
last-wins, plus an #[ignore] perf gate. Builds on the closure-capture perf fix.
A 13 MB minified ESM bundle (a commander-based CLI) made `perry check`
stall in the HIR `check-lower` stage forever (>1500 s, never finishing).
Instrumenting `lower_expr` (env-gated `PERRY_TRACE_RELOWER`, counting
lowerings per source span) showed a single ~360-byte commander builder
chain — `K.name(..).description(..).argument(..).helpOption(..)
.option(..).addOption(..)…` — whose receiver subtrees were lowered
EXPONENTIALLY: span counts of 37M / 18.5M / 9.2M / 4.6M / 2.3M, halving
once per nesting level (a clean 2^depth signature).

Root cause: the chained-native-method dispatch helper
`try_static_method_and_instance` (expr_call/static_and_instance.rs).
`may_lower_to_native_method_call` over-approximates to `true` whenever the
chain root is a native instance/module ident (here `K`, tagged commander
via `new Command()`), so the helper SPECULATIVELY lowers the whole
receiver prefix to inspect whether it produced a `NativeMethodCall` of a
recognized fluent module. When the inner call instead lowers to a generic
`Call` (or the outer method isn't one of the recognized fluent methods —
`hook`/`helpOption`/`addOption`…), every fluent arm misses, the lowered
receiver is discarded, and the helper returns `Err(args)`. The
`lower_call_inner` fall-through tail then RE-lowers the same member callee
(and thus the whole prefix) via `lower_member_inner`. Two full recursive
descents into the prefix per chain level ⇒ 2^depth work.

Fix: lower each receiver exactly once. When the helper lowers `member.obj`
and no fluent arm consumes it, stash it in
`LoweringContext::prelowered_member_receiver` keyed by the receiver's
source span; `lower_member_inner` (the tail's receiver-lowering site)
takes it back when re-lowering the same span instead of redoing the work.
The memo is single-shot and span-keyed, any member lowering clears a stale
entry, and `lower_call_inner` resets it as a safety net — so it can never
leak onto a different receiver. Reuse is semantics-preserving: lowering a
receiver is idempotent in the value it produces, and the fluent-success
arms already reuse that very `object_expr`.

Results:
- Real bundle: `perry check /tmp/cli.ts` >1500 s (never finishes) → 11.9 s,
  prints "All checks passed! - 2 file(s) checked".
- Minimal synthetic (commander chain mixing recognized/unrecognized
  methods), before: N=12 0.07s, N=14 0.52s, N=16 4.0s, N=18 16.3s, N≥20
  >30 s timeout (exponential). After: N=20 0.01s, N=500 0.5s — the
  exponential re-lowering is gone (no span lowered more than ~once;
  `PERRY_TRACE_RELOWER` never trips its 5M-call dump even at N=2000).
- `cargo test -p perry-hir --tests`: 323 passed, 0 failures (excluding the
  4 pre-existing machine-specific debug-build stack-overflow tests
  test_lower_rejects_deep_* / nested_object_literal_lowers_in_linear_time,
  confirmed identical on HEAD).

The `PERRY_TRACE_RELOWER` counter is left in place, fully env-gated and
zero-cost when unset, as a standing diagnostic for future lowering perf
work.
@proggeramlug proggeramlug force-pushed the perf/hir-large-bundle-lowering branch from 0d52689 to 7b510b0 Compare June 17, 2026 05:00
@proggeramlug proggeramlug merged commit 4515910 into main Jun 17, 2026
15 checks passed
@proggeramlug proggeramlug deleted the perf/hir-large-bundle-lowering branch June 17, 2026 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant