perf(hir): make large-bundle lowering linear (closure captures, registry lookups, fluent-chain re-lowering) by proggeramlug · Pull Request #5284 · PerryTS/perry

proggeramlug · 2026-06-17T02:36:40Z

What

Three independent HIR-lowering performance fixes found while compiling a large (~13 MB) minified real-world ESM bundle, whose check-lower stage went from never finishing (>1500 s) to ~12 s. Each is a separate commit:

perf(hir): O(1) closure-capture analysis — compute_closure_captures rebuilt an O(scope) membership set per closure → O(n²) for N closures in an N-binding scope. Now maintains a live id_set on Locals (insert/remove/reindex) passed by reference, and shares the fn_ctor_env write-scan Shadow instead of cloning per nested fn. A nested-closure micro-benchmark (cap_12000) drops 13.0 s → 0.07 s. Capture/mutable-capture semantics unchanged (param / inner-decl / dayjs same-id filtering preserved).
perf(hir): O(1) registry lookups — native_instances / module_native_instances / func_return_native_instances / native_modules / class_statics were Vec-scanned per call/member; indexed them by name (mirroring the existing imported_functions_index), preserving scope shadowing + truncation semantics exactly. Direct-lookup micro-bench at K=16000: 1033 ms → 0.54 ms.
perf(hir): fix exponential re-lowering of native-fluent method chains — lower_call_inner's fall-through re-lowered a chained-native-method receiver after try_static_method_and_instance discarded it, giving 2^depth re-lowering on builder chains like x.a().b().c()… (span-count trace showed 37M/18.5M/9.2M… halving per level). Now memoizes the pre-lowered receiver (span-keyed, single-shot, cleared on any other member lowering) and reuses it. This is the dominant fix: the bundle's perry check went >1500 s (killed) → ~12 s.

Why grouped

All three touch overlapping lowering files (lowering_context.rs, context.rs, expr_call/mod.rs) and were found in one pass; splitting them causes artificial cherry-pick conflicts. Each is a clean, self-contained commit if you prefer to review/merge individually.

Tests

cargo test -p perry-hir --tests green at each commit (321 → 323). Semantics-preserving throughout (these are pure performance fixes — no behavior change). The pre-existing machine-specific test_lower_rejects_deep_* / nested_object_literal_lowers_in_linear_time debug-build stack-overflow aborts are unrelated (identical on main).

Summary by CodeRabbit

Refactor / Performance
- Improved native module, native instance, and class static registration/lookup with indexed registries and consistent scope truncation behavior.
- Reduced redundant work in native fluent call lowering using span-aware receiver caching.
- Streamlined closure-capture analysis with faster local membership tracking.
Debugging
- Added optional, environment-gated relowering trace output to pinpoint excessive re-lowering.
Tests
- Added coverage for native instance indexing, shadowing, truncation, and module-level behavior, plus an ignored performance benchmark.

coderabbitai · 2026-06-17T02:36:59Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 18c2dfff-0768-4617-81b8-3638a63d6d2d

📥 Commits

Reviewing files that changed from the base of the PR and between 0d52689 and 7b510b0.

📒 Files selected for processing (16)

crates/perry-hir/src/lower/context.rs
crates/perry-hir/src/lower/expr_assign.rs
crates/perry-hir/src/lower/expr_call/mod.rs
crates/perry-hir/src/lower/expr_call/static_and_instance.rs
crates/perry-hir/src/lower/expr_function.rs
crates/perry-hir/src/lower/expr_member.rs
crates/perry-hir/src/lower/fn_ctor_env.rs
crates/perry-hir/src/lower/locals.rs
crates/perry-hir/src/lower/lower_expr.rs
crates/perry-hir/src/lower/lowering_context.rs
crates/perry-hir/src/lower/module_decl.rs
crates/perry-hir/src/lower/stmt.rs
crates/perry-hir/src/lower/tests.rs
crates/perry-hir/src/lower_decl/body_stmt.rs
crates/perry-hir/src/lower_decl/body_stmt/nested_fn_decl.rs
crates/perry-hir/src/lower_decl/fn_decl.rs

🚧 Files skipped from review as they are similar to previous changes (15)

crates/perry-hir/src/lower_decl/body_stmt.rs
crates/perry-hir/src/lower_decl/body_stmt/nested_fn_decl.rs
crates/perry-hir/src/lower/tests.rs
crates/perry-hir/src/lower/stmt.rs
crates/perry-hir/src/lower/expr_member.rs
crates/perry-hir/src/lower/expr_call/static_and_instance.rs
crates/perry-hir/src/lower/expr_assign.rs
crates/perry-hir/src/lower/locals.rs
crates/perry-hir/src/lower/lower_expr.rs
crates/perry-hir/src/lower/context.rs
crates/perry-hir/src/lower_decl/fn_decl.rs
crates/perry-hir/src/lower/module_decl.rs
crates/perry-hir/src/lower/expr_call/mod.rs
crates/perry-hir/src/lower/lowering_context.rs
crates/perry-hir/src/lower/fn_ctor_env.rs

📝 Walkthrough

Walkthrough

This PR optimizes the perry-hir lowering phase by replacing O(n) linear scans across five native-instance/module registries in LoweringContext with O(1) HashMap indexes, caching pre-lowered member receivers to prevent exponential re-lowering on native fluent chains, maintaining an incremental HashSet<LocalId> in Locals to avoid rebuilding membership sets per closure, and threading mutable Shadow through fn_ctor_env scan helpers to eliminate cloning. A diagnostic relower_trace module is also added.

Changes

HIR Lowering Performance Optimizations

Layer / File(s)	Summary
LoweringContext index fields and Locals.id_set data contracts `crates/perry-hir/src/lower/lowering_context.rs`, `crates/perry-hir/src/lower/locals.rs`	`LoweringContext` gains six new fields: five `HashMap`-based index structures for native-instance/module/class-statics registries and a `prelowered_member_receiver` memo slot. `Locals` gains an `id_set: HashSet<LocalId>` field with `id_set()` accessor, kept synchronized by `push`, `drain_from`, and `reindex`.
Indexed registry implementation and index initialization `crates/perry-hir/src/lower/context.rs`, `crates/perry-hir/src/lower/tests.rs`	`context.rs` initializes all six index fields in `with_class_id_start` and implements indexed register/lookup/truncate for all five registry types with defined shadowing semantics (shadow-stack for scoped instances, first-match for modules/class-statics/func-returns, last-match for module-level instances). Adds `truncate_native_instances`, `push_func_return_native_instance`, `push_module_native_instance` helpers. Delegates `exit_scope` to `truncate_native_instances`. Tests cover scope shadowing, truncation restore, module last-wins, and flat lookup cost.
Registry helper call site migration `crates/perry-hir/src/lower/expr_assign.rs`, `crates/perry-hir/src/lower/module_decl.rs`, `crates/perry-hir/src/lower/stmt.rs`, `crates/perry-hir/src/lower_decl/body_stmt.rs`, `crates/perry-hir/src/lower_decl/fn_decl.rs`	All direct `.push()` call sites across multiple files are migrated to `ctx.push_func_return_native_instance(...)` for function returns and `ctx.push_module_native_instance(...)` for module-level instances, spanning exported-function declarations, arrow functions with native return type inference, default exports, native factory/member-call initialization, and variable-to-variable instance re-registration.
Fluent-chain receiver caching `crates/perry-hir/src/lower/expr_call/static_and_instance.rs`, `crates/perry-hir/src/lower/expr_member.rs`, `crates/perry-hir/src/lower/expr_call/mod.rs`	In the native fluent-chain dispatch path, the receiver is lowered once and its span is captured; on a no-match fallback the lowered receiver is stashed in `ctx.prelowered_member_receiver` keyed by `(lo, hi)`. `lower_member_inner` checks the cache before calling `lower_expr` on the receiver. `lower_call_inner` clears the cache at entry to prevent stale memo leakage.
Closure capture via Locals.id_set `crates/perry-hir/src/lower/expr_function.rs`, `crates/perry-hir/src/lower_decl/body_stmt/nested_fn_decl.rs`	`compute_closure_captures` signature changed to accept `&HashSet<LocalId>` directly. `lower_arrow`, `lower_fn_expr_anon`, and `lower_named_fn_expr` remove pre-scope `outer_locals` Vec snapshots and instead pass `ctx.locals.id_set()` (the live enclosing-scope membership set) after popping the closure scope. `lower_nested_fn_decl` is updated analogously.
Mutable Shadow threading in fn_ctor_env `crates/perry-hir/src/lower/fn_ctor_env.rs`	All scan-helper signatures (`scan_stmt`, `record_pat_bindings`, `scan_for_head_writes`, `scan_assign_target_writes`, `scan_fn_body_writes`, `scan_class_writes`, `scan_expr_writes`) changed from `&Shadow` to `&mut Shadow`. `scan_fn_body_writes` and arrow scanning use insert/remove push-pop over the shared shadow instead of cloning.
PERRY_TRACE_RELOWER diagnostic tracing `crates/perry-hir/src/lower/lower_expr.rs`	Adds `pub(crate) mod relower_trace` with `enabled()` and `record(lo, hi)` functions, gated by the `PERRY_TRACE_RELOWER` env var. Uses atomics and thread-local `HashMap` storage to count per-span `lower_expr` invocations and dump hot spans to stderr periodically.

Possibly related PRs

PerryTS/perry#5270: This PR extends the indexed Locals container from #5270 by adding id_set/id_set() for closure-capture membership tests and refactoring compute_closure_captures to use it directly.

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 Hop, hop through the lowering pass,
No more O(n) scans in the grass!
HashMap indexes, quick as a wink,
Fluent chains cached before you can blink.
HashSets for closures, shadows pushed clean —
The fluffiest compiler you've ever seen! 🌿

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the three main performance fixes: closure captures (O(1)), registry lookups (O(1)), and fluent-chain re-lowering optimization.
Description check	✅ Passed	The description includes all required sections: What (detailed), Why (rationale), Tests (results), with test checklist items checked and proper formatting.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/hir-large-bundle-lowering

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

crates/perry-hir/src/lower_decl/body_stmt.rs (1)

1886-1889: ⚡ Quick win

Use the function-return index for the early-exit check.

Line 1851 now updates the helper-backed index, but this recursive stop check still scans the backing Vec after each statement. Use the indexed lookup to keep this path aligned with the O(1) registry goal.

♻️ Proposed change

-        if ctx
-            .func_return_native_instances
-            .iter()
-            .any(|(n, _, _)| n == func_name)
+        if ctx.lookup_func_return_native_instance(func_name).is_some()
         {
             return;
         }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/perry-hir/src/lower_decl/body_stmt.rs` around lines 1886 - 1889, The
recursive stop check at lines 1886-1889 performs a linear scan of the
func_return_native_instances vector using iter().any() to check if func_name
exists, which is O(n) complexity. Since line 1851 now maintains a helper-backed
index for this lookup, refactor the early-exit check to use the indexed lookup
instead of scanning the Vec. This will keep the performance characteristic
aligned with the O(1) registry goal and ensure consistency throughout the code
path.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/perry-hir/src/lower_decl/body_stmt.rs`:
- Around line 1886-1889: The recursive stop check at lines 1886-1889 performs a
linear scan of the func_return_native_instances vector using iter().any() to
check if func_name exists, which is O(n) complexity. Since line 1851 now
maintains a helper-backed index for this lookup, refactor the early-exit check
to use the indexed lookup instead of scanning the Vec. This will keep the
performance characteristic aligned with the O(1) registry goal and ensure
consistency throughout the code path.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 91fe645d-057c-4e72-8f47-459f66d7dd8e

📥 Commits

Reviewing files that changed from the base of the PR and between d46feff and 11731e5.

📒 Files selected for processing (16)

crates/perry-hir/src/lower/context.rs
crates/perry-hir/src/lower/expr_assign.rs
crates/perry-hir/src/lower/expr_call/mod.rs
crates/perry-hir/src/lower/expr_call/static_and_instance.rs
crates/perry-hir/src/lower/expr_function.rs
crates/perry-hir/src/lower/expr_member.rs
crates/perry-hir/src/lower/fn_ctor_env.rs
crates/perry-hir/src/lower/locals.rs
crates/perry-hir/src/lower/lower_expr.rs
crates/perry-hir/src/lower/lowering_context.rs
crates/perry-hir/src/lower/module_decl.rs
crates/perry-hir/src/lower/stmt.rs
crates/perry-hir/src/lower/tests.rs
crates/perry-hir/src/lower_decl/body_stmt.rs
crates/perry-hir/src/lower_decl/body_stmt/nested_fn_decl.rs
crates/perry-hir/src/lower_decl/fn_decl.rs

…d write-scan shadow compute_closure_captures rebuilt an O(scope) membership set per closure (O(n^2) for N closures in an N-binding scope). Maintain a live id_set on Locals (insert/remove/reindex) and pass it by reference; share the fn_ctor_env write-scan Shadow instead of cloning per nested fn. cap_12000 check: 13.06s -> 0.07s. Captures/mutable-captures semantics unchanged (param/inner-decl/dayjs same-id filtering preserved).

…tics by name Several per-call/per-member HIR resolution helpers did linear scans over `Vec` registries. For a program with K classes / native bindings and M call+member expressions, every lookup walked the whole registry — including the common miss case (receiver not registered) which scanned to the end and returned `None`. That is O(M*K), quadratic on large bundles, stalling check-lower. `lookup_class` was already indexed (classes_index, #5267). This indexes the remaining Vec-scanned lookups, mirroring the proven imported_functions_index pattern, while preserving identical Option/tuple results: - native_instances (scope-stack-like: pushed on scope entry, truncated on exit): name -> Vec<usize> shadow stack, innermost (last) on top. lookup reads the top index (== old `.rev().find()` last-match-wins). New `truncate_native_instances(mark)` pops indices >= mark off each name's stack (and the two prior direct `.truncate()` sites now call it), so an inner binding stops shadowing the moment its scope pops — same shadowing as before. - module_native_instances (module-level, push-only): name -> usize, overwritten on each push (last-match-wins, matching the reverse-scan fallback arm). - func_return_native_instances + native_modules + class_statics (push-only): name -> usize keeping the FIRST entry (`or_insert`), matching the old forward `.iter().find()` first-match-wins. has_static_method/has_static_field and lookup_native_module/lookup_func_return_native_instance now O(1). Push sites for module_native_instances / func_return_native_instances routed through new register helpers so the index stays in sync. Micro-bench (20000 x3 miss lookups vs K-sized registry, release): K=2000 baseline 82ms -> fixed 0.53ms K=8000 baseline 335ms -> fixed 0.51ms K=16000 baseline 1033ms-> fixed 0.54ms (~1900x at K=16000; flat in K) Adds unit tests for native-instance shadowing+truncation and module-level last-wins, plus an #[ignore] perf gate. Builds on the closure-capture perf fix.

A 13 MB minified ESM bundle (a commander-based CLI) made `perry check` stall in the HIR `check-lower` stage forever (>1500 s, never finishing). Instrumenting `lower_expr` (env-gated `PERRY_TRACE_RELOWER`, counting lowerings per source span) showed a single ~360-byte commander builder chain — `K.name(..).description(..).argument(..).helpOption(..) .option(..).addOption(..)…` — whose receiver subtrees were lowered EXPONENTIALLY: span counts of 37M / 18.5M / 9.2M / 4.6M / 2.3M, halving once per nesting level (a clean 2^depth signature). Root cause: the chained-native-method dispatch helper `try_static_method_and_instance` (expr_call/static_and_instance.rs). `may_lower_to_native_method_call` over-approximates to `true` whenever the chain root is a native instance/module ident (here `K`, tagged commander via `new Command()`), so the helper SPECULATIVELY lowers the whole receiver prefix to inspect whether it produced a `NativeMethodCall` of a recognized fluent module. When the inner call instead lowers to a generic `Call` (or the outer method isn't one of the recognized fluent methods — `hook`/`helpOption`/`addOption`…), every fluent arm misses, the lowered receiver is discarded, and the helper returns `Err(args)`. The `lower_call_inner` fall-through tail then RE-lowers the same member callee (and thus the whole prefix) via `lower_member_inner`. Two full recursive descents into the prefix per chain level ⇒ 2^depth work. Fix: lower each receiver exactly once. When the helper lowers `member.obj` and no fluent arm consumes it, stash it in `LoweringContext::prelowered_member_receiver` keyed by the receiver's source span; `lower_member_inner` (the tail's receiver-lowering site) takes it back when re-lowering the same span instead of redoing the work. The memo is single-shot and span-keyed, any member lowering clears a stale entry, and `lower_call_inner` resets it as a safety net — so it can never leak onto a different receiver. Reuse is semantics-preserving: lowering a receiver is idempotent in the value it produces, and the fluent-success arms already reuse that very `object_expr`. Results: - Real bundle: `perry check /tmp/cli.ts` >1500 s (never finishes) → 11.9 s, prints "All checks passed! - 2 file(s) checked". - Minimal synthetic (commander chain mixing recognized/unrecognized methods), before: N=12 0.07s, N=14 0.52s, N=16 4.0s, N=18 16.3s, N≥20 >30 s timeout (exponential). After: N=20 0.01s, N=500 0.5s — the exponential re-lowering is gone (no span lowered more than ~once; `PERRY_TRACE_RELOWER` never trips its 5M-call dump even at N=2000). - `cargo test -p perry-hir --tests`: 323 passed, 0 failures (excluding the 4 pre-existing machine-specific debug-build stack-overflow tests test_lower_rejects_deep_* / nested_object_literal_lowers_in_linear_time, confirmed identical on HEAD). The `PERRY_TRACE_RELOWER` counter is left in place, fully env-gated and zero-cost when unset, as a standing diagnostic for future lowering perf work.

coderabbitai Bot reviewed Jun 17, 2026

View reviewed changes

Ralph Küpper added 4 commits June 16, 2026 21:58

style: rustfmt the lowering-perf changes

7b510b0

proggeramlug force-pushed the perf/hir-large-bundle-lowering branch from 0d52689 to 7b510b0 Compare June 17, 2026 05:00

proggeramlug merged commit 4515910 into main Jun 17, 2026
15 checks passed

proggeramlug deleted the perf/hir-large-bundle-lowering branch June 17, 2026 06:59

This was referenced Jun 19, 2026

fix(hir+runtime): native-instance values support arbitrary own properties (read = property GET, not invoking call) #5471

Merged

fix(hir): keep upgrade-callback wsId tagged ("ws","Client") across its own param binding #5534

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(hir): make large-bundle lowering linear (closure captures, registry lookups, fluent-chain re-lowering)#5284

perf(hir): make large-bundle lowering linear (closure captures, registry lookups, fluent-chain re-lowering)#5284
proggeramlug merged 4 commits into
mainfrom
perf/hir-large-bundle-lowering

proggeramlug commented Jun 17, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading

Walkthrough

Changes

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

proggeramlug commented Jun 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why grouped

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

proggeramlug commented Jun 17, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading