perf(compile): compile oversized modules at -O0 to fix wide-object-literal blowup (#4880)#5109
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthrough
ChangesIR-size-based clang optimization flag
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/perry-runtime/src/value/to_string.rs`:
- Around line 694-718: The native fast-path for URL and URLSearchParams string
coercion in crates/perry-runtime/src/value/to_string.rs#L694-L718 returns the
native href and search-params strings without checking for user-defined toString
or valueOf overrides first, causing custom overrides to be bypassed. Fix this by
adding a check to see if the object has a user-defined toString or valueOf
method before calling js_url_href_if_url and js_url_search_params_to_string;
only use the native serialization if no override exists. Apply the same fix to
the addition coercion path in
crates/perry-runtime/src/value/dynamic_arith.rs#L105-L123 to ensure consistent
override behavior across all coercion contexts.
- Around line 694-718: The code at crates/perry-runtime/src/value/to_string.rs
lines 694-718 and crates/perry-runtime/src/value/dynamic_arith.rs lines 105-123
both perform object probes (js_url_href_if_url and try_read_as_search_params) on
raw pointers without first honoring the runtime-wide small-handle cutoff. This
allows widget-handle values (< 0x100000) to be incorrectly dereferenced as
ObjectHeaders. At both sites, add a guard check that bails out and returns early
if the pointer value is less than 0x100000 before calling js_url_href_if_url or
try_read_as_search_params. In to_string.rs, insert the guard after the ptr
extraction but before the boxed assignment; in dynamic_arith.rs, insert the
guard before the to_primitive_default_for_add URL/SearchParams probes. This
ensures small pointers are detected and treated as widget handles per coding
guidelines.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 630b83eb-01fd-4005-ac57-331ef6dd944e
📒 Files selected for processing (3)
crates/perry-codegen/src/linker.rscrates/perry-runtime/src/value/dynamic_arith.rscrates/perry-runtime/src/value/to_string.rs
| // WHATWG `URL` / `URLSearchParams` have native `toString`s | ||
| // (`href` / the query string) that aren't discoverable as object | ||
| // fields. They must be checked BEFORE OrdinaryToPrimitive, which | ||
| // would otherwise find the inherited `Object.prototype.toString` | ||
| // and return "[object Object]" — so `String(url)`, `` `${url}` `` | ||
| // and `"" + url` diverged from explicit `url.toString()`. Detected | ||
| // before the GC-header object dispatch like the other native types. | ||
| // | ||
| // Normalize the raw heap pointer to a `POINTER_TAG` value first: | ||
| // the `+`/template concat path delivers the operand as a raw | ||
| // pointer (upper-16 == 0), and `js_url_href_if_url`'s | ||
| // `object_from_f64` only recognizes `POINTER_TAG`. `String(url)` | ||
| // already arrives tagged. | ||
| let boxed = f64::from_bits(POINTER_TAG | ((ptr as u64) & POINTER_MASK)); | ||
| let url_href = crate::url::url_class::js_url_href_if_url(boxed); | ||
| if url_href.to_bits() != crate::value::TAG_UNDEFINED { | ||
| return js_jsvalue_to_string(url_href); | ||
| } | ||
| if crate::url::try_read_as_search_params(ptr as *mut crate::object::ObjectHeader) | ||
| .is_some() | ||
| { | ||
| return crate::url::search_params::js_url_search_params_to_string( | ||
| ptr as *mut crate::object::ObjectHeader, | ||
| ); | ||
| } |
There was a problem hiding this comment.
Don't let the native fast-path bypass user-defined coercion methods.
These branches return the URL/SearchParams native string before the ordinary toString/valueOf lookup runs, so overrides like url.toString = () => "x" are ignored by String(url) and "" + url. The native path needs to be the fallback only when no user-visible override exists.
crates/perry-runtime/src/value/to_string.rs#L694-L718: only use the nativehref/ search-params serialization after confirming the object does not expose an overridingtoString/valueOf.crates/perry-runtime/src/value/dynamic_arith.rs#L105-L123: preserve the same override behavior in the addition coercion path instead of returning the native string unconditionally.
📍 Affects 2 files
crates/perry-runtime/src/value/to_string.rs#L694-L718(this comment)crates/perry-runtime/src/value/dynamic_arith.rs#L105-L123
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/perry-runtime/src/value/to_string.rs` around lines 694 - 718, The
native fast-path for URL and URLSearchParams string coercion in
crates/perry-runtime/src/value/to_string.rs#L694-L718 returns the native href
and search-params strings without checking for user-defined toString or valueOf
overrides first, causing custom overrides to be bypassed. Fix this by adding a
check to see if the object has a user-defined toString or valueOf method before
calling js_url_href_if_url and js_url_search_params_to_string; only use the
native serialization if no override exists. Apply the same fix to the addition
coercion path in crates/perry-runtime/src/value/dynamic_arith.rs#L105-L123 to
ensure consistent override behavior across all coercion contexts.
Guard widget handles before these object probes.
Both fast-paths box/cast ptr as an object before honoring the runtime-wide small-handle cutoff. That makes values in the widget-handle range reachable by try_read_as_search_params, which dereferences them as ObjectHeaders.
crates/perry-runtime/src/value/to_string.rs#L694-L718: bail out on values< 0x100000before callingjs_url_href_if_urlortry_read_as_search_params.crates/perry-runtime/src/value/dynamic_arith.rs#L105-L123: apply the same small-handle guard before the URL/SearchParams probes into_primitive_default_for_add.
As per coding guidelines, "Detect small pointers (value < 0x100000) as widget handles in the NaN-boxed value representation."
📍 Affects 2 files
crates/perry-runtime/src/value/to_string.rs#L694-L718(this comment)crates/perry-runtime/src/value/dynamic_arith.rs#L105-L123
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/perry-runtime/src/value/to_string.rs` around lines 694 - 718, The code
at crates/perry-runtime/src/value/to_string.rs lines 694-718 and
crates/perry-runtime/src/value/dynamic_arith.rs lines 105-123 both perform
object probes (js_url_href_if_url and try_read_as_search_params) on raw pointers
without first honoring the runtime-wide small-handle cutoff. This allows
widget-handle values (< 0x100000) to be incorrectly dereferenced as
ObjectHeaders. At both sites, add a guard check that bails out and returns early
if the pointer value is less than 0x100000 before calling js_url_href_if_url or
try_read_as_search_params. In to_string.rs, insert the guard after the ptr
extraction but before the boxed assignment; in dynamic_arith.rs, insert the
guard before the to_primitive_default_for_add URL/SearchParams probes. This
ensures small pointers are detected and treated as widget handles per coding
guidelines.
Source: Coding guidelines
d317527 to
cef3968
Compare
…teral blowup (#4880) A module dominated by a huge generated object literal (config / lookup table) lowers to one enormous function whose thousands of `alloca`s make LLVM's `-O1+` optimization pipeline (SROA / mem2reg / GVN) super-linear. Perry's own IR generation is fast (<1s); the cost is the external `clang -c -O3` on the generated `.ll`. Measured on a 2800-key literal (≈9.6 MB / 199K-line `.ll`): `clang -c` is 3.0s at `-O0`, 17.1s at `-O1`, 18.4s at `-O2`, 18.5s at `-O3` — i.e. the blowup is entirely in the `-O1+` pipeline and `-O0` is the only escape (`-O1`/`-O2` are no faster than `-O3`). Fix: in `build_clang_compile_plan`, compile a module whose IR exceeds a size threshold (default 6 MiB, override via `PERRY_LL_O0_THRESHOLD_BYTES`) at `-O0` instead of `-O3`, with a one-line note to stderr. Such modules are almost always static data where optimization is irrelevant, and the threshold is high enough that ordinary modules are unaffected (they stay `-O3`). End-to-end: the 2800-key repro drops from ~19s to ~5.4s and still runs correctly; a 400-key program is unchanged (stays `-O3`). New `compile_plan_downgrades_to_o0_for_oversized_module` test + existing linker tests pass. (The issue's headline 2100=114s is ~10x stale — current main compiles 2100 keys in ~11s — but the super-linear LLVM cost remained; this caps it for the pathological generated-data case.)
cef3968 to
5a3226f
Compare
Fixes #4880.
Root cause
A module dominated by a huge generated object literal (config / lookup table) lowers to one enormous function whose thousands of
allocas make LLVM's-O1+pipeline (SROA / mem2reg / GVN) super-linear. Profiling (sample+atos) showed Perry's own IR generation is fast (<1s); the entire cost is the externalclang -c -O3on the generated.ll(perry-main sits incompile_ll_to_object→Command::outputfor the whole compile).Measured on a 2800-key literal (≈9.6 MB / 199K-line
.ll):clang -copt-O0-O1-O2-O3The blowup is entirely in the
-O1+pipeline;-O0is the only escape (-O1/-O2are no faster than-O3).Fix
In
build_clang_compile_plan, a module whose IR exceeds a size threshold (default 6 MiB, override viaPERRY_LL_O0_THRESHOLD_BYTES) compiles at-O0instead of-O3, emitting a one-line note. Such modules are almost always static data where optimization is irrelevant, and the threshold is high enough that ordinary modules are unaffected (they stay-O3).Verification
ok(correct), with the note:perry: module IR is 9.6 MB (> 6.0 MB); compiling it at -O0 …-O3(1.6s,ok) — no note.compile_plan_downgrades_to_o0_for_oversized_moduletest + existing linker tests pass (9/9).Note
The issue's headline (2100 keys ≈ 114s) is ~10× stale — current
mainalready compiles 2100 keys in ~11s (a constant-factor improvement landed since filing) — but the super-linear LLVM cost remained and is what this caps for the pathological generated-data case. A fully opt-preserving alternative (splitting the giant module-init into batched helper functions so LLVM optimizes each in linear time) is larger/riskier and left as a follow-up; this is the contained, low-risk mitigation.No version bump / changelog per maintainer instruction.
Summary by CodeRabbit