perf(hir): make large-bundle lowering linear (closure captures, registry lookups, fluent-chain re-lowering)#5284
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (16)
🚧 Files skipped from review as they are similar to previous changes (15)
📝 WalkthroughWalkthroughThis PR optimizes the ChangesHIR Lowering Performance Optimizations
Possibly related PRs
🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/perry-hir/src/lower_decl/body_stmt.rs (1)
1886-1889: ⚡ Quick winUse the function-return index for the early-exit check.
Line 1851 now updates the helper-backed index, but this recursive stop check still scans the backing
Vecafter each statement. Use the indexed lookup to keep this path aligned with the O(1) registry goal.♻️ Proposed change
- if ctx - .func_return_native_instances - .iter() - .any(|(n, _, _)| n == func_name) + if ctx.lookup_func_return_native_instance(func_name).is_some() { return; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/perry-hir/src/lower_decl/body_stmt.rs` around lines 1886 - 1889, The recursive stop check at lines 1886-1889 performs a linear scan of the func_return_native_instances vector using iter().any() to check if func_name exists, which is O(n) complexity. Since line 1851 now maintains a helper-backed index for this lookup, refactor the early-exit check to use the indexed lookup instead of scanning the Vec. This will keep the performance characteristic aligned with the O(1) registry goal and ensure consistency throughout the code path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@crates/perry-hir/src/lower_decl/body_stmt.rs`:
- Around line 1886-1889: The recursive stop check at lines 1886-1889 performs a
linear scan of the func_return_native_instances vector using iter().any() to
check if func_name exists, which is O(n) complexity. Since line 1851 now
maintains a helper-backed index for this lookup, refactor the early-exit check
to use the indexed lookup instead of scanning the Vec. This will keep the
performance characteristic aligned with the O(1) registry goal and ensure
consistency throughout the code path.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 91fe645d-057c-4e72-8f47-459f66d7dd8e
📒 Files selected for processing (16)
crates/perry-hir/src/lower/context.rscrates/perry-hir/src/lower/expr_assign.rscrates/perry-hir/src/lower/expr_call/mod.rscrates/perry-hir/src/lower/expr_call/static_and_instance.rscrates/perry-hir/src/lower/expr_function.rscrates/perry-hir/src/lower/expr_member.rscrates/perry-hir/src/lower/fn_ctor_env.rscrates/perry-hir/src/lower/locals.rscrates/perry-hir/src/lower/lower_expr.rscrates/perry-hir/src/lower/lowering_context.rscrates/perry-hir/src/lower/module_decl.rscrates/perry-hir/src/lower/stmt.rscrates/perry-hir/src/lower/tests.rscrates/perry-hir/src/lower_decl/body_stmt.rscrates/perry-hir/src/lower_decl/body_stmt/nested_fn_decl.rscrates/perry-hir/src/lower_decl/fn_decl.rs
…d write-scan shadow compute_closure_captures rebuilt an O(scope) membership set per closure (O(n^2) for N closures in an N-binding scope). Maintain a live id_set on Locals (insert/remove/reindex) and pass it by reference; share the fn_ctor_env write-scan Shadow instead of cloning per nested fn. cap_12000 check: 13.06s -> 0.07s. Captures/mutable-captures semantics unchanged (param/inner-decl/dayjs same-id filtering preserved).
…tics by name Several per-call/per-member HIR resolution helpers did linear scans over `Vec` registries. For a program with K classes / native bindings and M call+member expressions, every lookup walked the whole registry — including the common miss case (receiver not registered) which scanned to the end and returned `None`. That is O(M*K), quadratic on large bundles, stalling check-lower. `lookup_class` was already indexed (classes_index, #5267). This indexes the remaining Vec-scanned lookups, mirroring the proven imported_functions_index pattern, while preserving identical Option/tuple results: - native_instances (scope-stack-like: pushed on scope entry, truncated on exit): name -> Vec<usize> shadow stack, innermost (last) on top. lookup reads the top index (== old `.rev().find()` last-match-wins). New `truncate_native_instances(mark)` pops indices >= mark off each name's stack (and the two prior direct `.truncate()` sites now call it), so an inner binding stops shadowing the moment its scope pops — same shadowing as before. - module_native_instances (module-level, push-only): name -> usize, overwritten on each push (last-match-wins, matching the reverse-scan fallback arm). - func_return_native_instances + native_modules + class_statics (push-only): name -> usize keeping the FIRST entry (`or_insert`), matching the old forward `.iter().find()` first-match-wins. has_static_method/has_static_field and lookup_native_module/lookup_func_return_native_instance now O(1). Push sites for module_native_instances / func_return_native_instances routed through new register helpers so the index stays in sync. Micro-bench (20000 x3 miss lookups vs K-sized registry, release): K=2000 baseline 82ms -> fixed 0.53ms K=8000 baseline 335ms -> fixed 0.51ms K=16000 baseline 1033ms-> fixed 0.54ms (~1900x at K=16000; flat in K) Adds unit tests for native-instance shadowing+truncation and module-level last-wins, plus an #[ignore] perf gate. Builds on the closure-capture perf fix.
A 13 MB minified ESM bundle (a commander-based CLI) made `perry check` stall in the HIR `check-lower` stage forever (>1500 s, never finishing). Instrumenting `lower_expr` (env-gated `PERRY_TRACE_RELOWER`, counting lowerings per source span) showed a single ~360-byte commander builder chain — `K.name(..).description(..).argument(..).helpOption(..) .option(..).addOption(..)…` — whose receiver subtrees were lowered EXPONENTIALLY: span counts of 37M / 18.5M / 9.2M / 4.6M / 2.3M, halving once per nesting level (a clean 2^depth signature). Root cause: the chained-native-method dispatch helper `try_static_method_and_instance` (expr_call/static_and_instance.rs). `may_lower_to_native_method_call` over-approximates to `true` whenever the chain root is a native instance/module ident (here `K`, tagged commander via `new Command()`), so the helper SPECULATIVELY lowers the whole receiver prefix to inspect whether it produced a `NativeMethodCall` of a recognized fluent module. When the inner call instead lowers to a generic `Call` (or the outer method isn't one of the recognized fluent methods — `hook`/`helpOption`/`addOption`…), every fluent arm misses, the lowered receiver is discarded, and the helper returns `Err(args)`. The `lower_call_inner` fall-through tail then RE-lowers the same member callee (and thus the whole prefix) via `lower_member_inner`. Two full recursive descents into the prefix per chain level ⇒ 2^depth work. Fix: lower each receiver exactly once. When the helper lowers `member.obj` and no fluent arm consumes it, stash it in `LoweringContext::prelowered_member_receiver` keyed by the receiver's source span; `lower_member_inner` (the tail's receiver-lowering site) takes it back when re-lowering the same span instead of redoing the work. The memo is single-shot and span-keyed, any member lowering clears a stale entry, and `lower_call_inner` resets it as a safety net — so it can never leak onto a different receiver. Reuse is semantics-preserving: lowering a receiver is idempotent in the value it produces, and the fluent-success arms already reuse that very `object_expr`. Results: - Real bundle: `perry check /tmp/cli.ts` >1500 s (never finishes) → 11.9 s, prints "All checks passed! - 2 file(s) checked". - Minimal synthetic (commander chain mixing recognized/unrecognized methods), before: N=12 0.07s, N=14 0.52s, N=16 4.0s, N=18 16.3s, N≥20 >30 s timeout (exponential). After: N=20 0.01s, N=500 0.5s — the exponential re-lowering is gone (no span lowered more than ~once; `PERRY_TRACE_RELOWER` never trips its 5M-call dump even at N=2000). - `cargo test -p perry-hir --tests`: 323 passed, 0 failures (excluding the 4 pre-existing machine-specific debug-build stack-overflow tests test_lower_rejects_deep_* / nested_object_literal_lowers_in_linear_time, confirmed identical on HEAD). The `PERRY_TRACE_RELOWER` counter is left in place, fully env-gated and zero-cost when unset, as a standing diagnostic for future lowering perf work.
0d52689 to
7b510b0
Compare
What
Three independent HIR-lowering performance fixes found while compiling a large (~13 MB) minified real-world ESM bundle, whose
check-lowerstage went from never finishing (>1500 s) to ~12 s. Each is a separate commit:perf(hir): O(1) closure-capture analysis—compute_closure_capturesrebuilt an O(scope) membership set per closure → O(n²) for N closures in an N-binding scope. Now maintains a liveid_setonLocals(insert/remove/reindex) passed by reference, and shares thefn_ctor_envwrite-scanShadowinstead of cloning per nested fn. A nested-closure micro-benchmark (cap_12000) drops 13.0 s → 0.07 s. Capture/mutable-capture semantics unchanged (param / inner-decl / dayjs same-id filtering preserved).perf(hir): O(1) registry lookups—native_instances/module_native_instances/func_return_native_instances/native_modules/class_staticswereVec-scanned per call/member; indexed them by name (mirroring the existingimported_functions_index), preserving scope shadowing + truncation semantics exactly. Direct-lookup micro-bench at K=16000: 1033 ms → 0.54 ms.perf(hir): fix exponential re-lowering of native-fluent method chains—lower_call_inner's fall-through re-lowered a chained-native-method receiver aftertry_static_method_and_instancediscarded it, giving 2^depth re-lowering on builder chains likex.a().b().c()…(span-count trace showed 37M/18.5M/9.2M… halving per level). Now memoizes the pre-lowered receiver (span-keyed, single-shot, cleared on any other member lowering) and reuses it. This is the dominant fix: the bundle'sperry checkwent >1500 s (killed) → ~12 s.Why grouped
All three touch overlapping lowering files (
lowering_context.rs,context.rs,expr_call/mod.rs) and were found in one pass; splitting them causes artificial cherry-pick conflicts. Each is a clean, self-contained commit if you prefer to review/merge individually.Tests
cargo test -p perry-hir --testsgreen at each commit (321 → 323). Semantics-preserving throughout (these are pure performance fixes — no behavior change). The pre-existing machine-specifictest_lower_rejects_deep_*/nested_object_literal_lowers_in_linear_timedebug-build stack-overflow aborts are unrelated (identical onmain).Summary by CodeRabbit
Refactor / Performance
Debugging
Tests