Skip to content

perf(compile): compile oversized modules at -O0 to fix wide-object-literal blowup (#4880)#5109

Merged
proggeramlug merged 1 commit into
mainfrom
fix/wide-object-literal-compile-time-4880
Jun 14, 2026
Merged

perf(compile): compile oversized modules at -O0 to fix wide-object-literal blowup (#4880)#5109
proggeramlug merged 1 commit into
mainfrom
fix/wide-object-literal-compile-time-4880

Conversation

@proggeramlug

@proggeramlug proggeramlug commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Fixes #4880.

Root cause

A module dominated by a huge generated object literal (config / lookup table) lowers to one enormous function whose thousands of allocas make LLVM's -O1+ pipeline (SROA / mem2reg / GVN) super-linear. Profiling (sample + atos) showed Perry's own IR generation is fast (<1s); the entire cost is the external clang -c -O3 on the generated .ll (perry-main sits in compile_ll_to_objectCommand::output for the whole compile).

Measured on a 2800-key literal (≈9.6 MB / 199K-line .ll):

clang -c opt time
-O0 3.0s
-O1 17.1s
-O2 18.4s
-O3 18.5s

The blowup is entirely in the -O1+ pipeline; -O0 is the only escape (-O1/-O2 are no faster than -O3).

Fix

In build_clang_compile_plan, a module whose IR exceeds a size threshold (default 6 MiB, override via PERRY_LL_O0_THRESHOLD_BYTES) compiles at -O0 instead of -O3, emitting a one-line note. Such modules are almost always static data where optimization is irrelevant, and the threshold is high enough that ordinary modules are unaffected (they stay -O3).

Verification

  • 2800-key repro: ~19s → ~5.4s, still prints ok (correct), with the note:
    perry: module IR is 9.6 MB (> 6.0 MB); compiling it at -O0 …
  • 400-key program: unchanged at -O3 (1.6s, ok) — no note.
  • New compile_plan_downgrades_to_o0_for_oversized_module test + existing linker tests pass (9/9).

Note

The issue's headline (2100 keys ≈ 114s) is ~10× stale — current main already compiles 2100 keys in ~11s (a constant-factor improvement landed since filing) — but the super-linear LLVM cost remained and is what this caps for the pathological generated-data case. A fully opt-preserving alternative (splitting the giant module-init into batched helper functions so LLVM optimizes each in linear time) is larger/riskier and left as a follow-up; this is the contained, low-risk mitigation.

No version bump / changelog per maintainer instruction.

Summary by CodeRabbit

  • Chores
    • Improved build behavior for large generated modules by switching to unoptimized compilation when LLVM IR exceeds a 6 MiB threshold, while keeping optimized compilation for smaller modules.
    • Added informative output during this decision process.
  • Tests
    • Updated existing tests and added new coverage to verify the threshold behavior and related compilation-plan metadata expectations.

@coderabbitai

coderabbitai Bot commented Jun 14, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: c13ce97b-f018-4cb9-a6e4-f272f5a604ce

📥 Commits

Reviewing files that changed from the base of the PR and between cef3968 and 5a3226f.

📒 Files selected for processing (1)
  • crates/perry-codegen/src/linker.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/perry-codegen/src/linker.rs

📝 Walkthrough

Walkthrough

crates/perry-codegen/src/linker.rs gains a configurable byte-size threshold (default 6 MiB, overridable via PERRY_LL_O0_THRESHOLD_BYTES) that causes build_clang_compile_plan to emit -O0 instead of -O3 when the LLVM IR input exceeds that size. compile_ll_to_object passes ll_text.len() to activate this check, and unit tests cover both the small-module (-O3) and large-module (-O0) branches.

Changes

IR-size-based clang optimization flag

Layer / File(s) Summary
Threshold config, compile plan flag selection, and callsite
crates/perry-codegen/src/linker.rs
Adds DEFAULT_LL_O0_THRESHOLD_BYTES (6 MiB) constant, ll_o0_threshold_bytes env-backed helper, updates build_clang_compile_plan signature to accept ll_byte_size, selects -O0 or -O3 accordingly with an eprintln! diagnostic on downgrade, and passes ll_text.len() at the compile_ll_to_object callsite.
Unit tests for optimization flag branches
crates/perry-codegen/src/linker.rs
Updates the existing small-module compile-plan test to assert -O3 is present, adds a new test asserting -O0 is selected (and -O3 absent) when IR size exceeds threshold+1, and updates the metadata JSON test to pass the new ll_byte_size parameter.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇 A module so vast made clang groan and wheeze,
Six megabytes crossed? I downshift with ease!
-O0 I whisper, no fuss, no delay,
The threshold now guards against compile-time fray.
Small IR keeps its -O3 crown with glee —
A bunny who benchmarks says: that's the key! 🌿

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: conditional compilation of oversized modules at -O0 instead of -O3 to address wide-object-literal performance regression.
Description check ✅ Passed The description provides a comprehensive explanation of the root cause, the fix, and verification results. It follows the template with clear sections for summary, changes, related issue, and test verification.
Linked Issues check ✅ Passed The PR successfully addresses the objective from #4880 by implementing a performance fix for compile-time blowup on modules with large generated object literals through conditional -O0 compilation above a 6 MiB threshold.
Out of Scope Changes check ✅ Passed All changes in linker.rs are directly scoped to the performance fix objective: threshold-based optimization downgrade, plan builder parameter updates, and related test adjustments. No unrelated modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/wide-object-literal-compile-time-4880

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/perry-runtime/src/value/to_string.rs`:
- Around line 694-718: The native fast-path for URL and URLSearchParams string
coercion in crates/perry-runtime/src/value/to_string.rs#L694-L718 returns the
native href and search-params strings without checking for user-defined toString
or valueOf overrides first, causing custom overrides to be bypassed. Fix this by
adding a check to see if the object has a user-defined toString or valueOf
method before calling js_url_href_if_url and js_url_search_params_to_string;
only use the native serialization if no override exists. Apply the same fix to
the addition coercion path in
crates/perry-runtime/src/value/dynamic_arith.rs#L105-L123 to ensure consistent
override behavior across all coercion contexts.
- Around line 694-718: The code at crates/perry-runtime/src/value/to_string.rs
lines 694-718 and crates/perry-runtime/src/value/dynamic_arith.rs lines 105-123
both perform object probes (js_url_href_if_url and try_read_as_search_params) on
raw pointers without first honoring the runtime-wide small-handle cutoff. This
allows widget-handle values (< 0x100000) to be incorrectly dereferenced as
ObjectHeaders. At both sites, add a guard check that bails out and returns early
if the pointer value is less than 0x100000 before calling js_url_href_if_url or
try_read_as_search_params. In to_string.rs, insert the guard after the ptr
extraction but before the boxed assignment; in dynamic_arith.rs, insert the
guard before the to_primitive_default_for_add URL/SearchParams probes. This
ensures small pointers are detected and treated as widget handles per coding
guidelines.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 630b83eb-01fd-4005-ac57-331ef6dd944e

📥 Commits

Reviewing files that changed from the base of the PR and between 2cfa8ca and 3e00f2f.

📒 Files selected for processing (3)
  • crates/perry-codegen/src/linker.rs
  • crates/perry-runtime/src/value/dynamic_arith.rs
  • crates/perry-runtime/src/value/to_string.rs

Comment on lines +694 to +718
// WHATWG `URL` / `URLSearchParams` have native `toString`s
// (`href` / the query string) that aren't discoverable as object
// fields. They must be checked BEFORE OrdinaryToPrimitive, which
// would otherwise find the inherited `Object.prototype.toString`
// and return "[object Object]" — so `String(url)`, `` `${url}` ``
// and `"" + url` diverged from explicit `url.toString()`. Detected
// before the GC-header object dispatch like the other native types.
//
// Normalize the raw heap pointer to a `POINTER_TAG` value first:
// the `+`/template concat path delivers the operand as a raw
// pointer (upper-16 == 0), and `js_url_href_if_url`'s
// `object_from_f64` only recognizes `POINTER_TAG`. `String(url)`
// already arrives tagged.
let boxed = f64::from_bits(POINTER_TAG | ((ptr as u64) & POINTER_MASK));
let url_href = crate::url::url_class::js_url_href_if_url(boxed);
if url_href.to_bits() != crate::value::TAG_UNDEFINED {
return js_jsvalue_to_string(url_href);
}
if crate::url::try_read_as_search_params(ptr as *mut crate::object::ObjectHeader)
.is_some()
{
return crate::url::search_params::js_url_search_params_to_string(
ptr as *mut crate::object::ObjectHeader,
);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Don't let the native fast-path bypass user-defined coercion methods.

These branches return the URL/SearchParams native string before the ordinary toString/valueOf lookup runs, so overrides like url.toString = () => "x" are ignored by String(url) and "" + url. The native path needs to be the fallback only when no user-visible override exists.

  • crates/perry-runtime/src/value/to_string.rs#L694-L718: only use the native href / search-params serialization after confirming the object does not expose an overriding toString/valueOf.
  • crates/perry-runtime/src/value/dynamic_arith.rs#L105-L123: preserve the same override behavior in the addition coercion path instead of returning the native string unconditionally.
📍 Affects 2 files
  • crates/perry-runtime/src/value/to_string.rs#L694-L718 (this comment)
  • crates/perry-runtime/src/value/dynamic_arith.rs#L105-L123
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/perry-runtime/src/value/to_string.rs` around lines 694 - 718, The
native fast-path for URL and URLSearchParams string coercion in
crates/perry-runtime/src/value/to_string.rs#L694-L718 returns the native href
and search-params strings without checking for user-defined toString or valueOf
overrides first, causing custom overrides to be bypassed. Fix this by adding a
check to see if the object has a user-defined toString or valueOf method before
calling js_url_href_if_url and js_url_search_params_to_string; only use the
native serialization if no override exists. Apply the same fix to the addition
coercion path in crates/perry-runtime/src/value/dynamic_arith.rs#L105-L123 to
ensure consistent override behavior across all coercion contexts.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Guard widget handles before these object probes.

Both fast-paths box/cast ptr as an object before honoring the runtime-wide small-handle cutoff. That makes values in the widget-handle range reachable by try_read_as_search_params, which dereferences them as ObjectHeaders.

  • crates/perry-runtime/src/value/to_string.rs#L694-L718: bail out on values < 0x100000 before calling js_url_href_if_url or try_read_as_search_params.
  • crates/perry-runtime/src/value/dynamic_arith.rs#L105-L123: apply the same small-handle guard before the URL/SearchParams probes in to_primitive_default_for_add.
    As per coding guidelines, "Detect small pointers (value < 0x100000) as widget handles in the NaN-boxed value representation."
📍 Affects 2 files
  • crates/perry-runtime/src/value/to_string.rs#L694-L718 (this comment)
  • crates/perry-runtime/src/value/dynamic_arith.rs#L105-L123
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/perry-runtime/src/value/to_string.rs` around lines 694 - 718, The code
at crates/perry-runtime/src/value/to_string.rs lines 694-718 and
crates/perry-runtime/src/value/dynamic_arith.rs lines 105-123 both perform
object probes (js_url_href_if_url and try_read_as_search_params) on raw pointers
without first honoring the runtime-wide small-handle cutoff. This allows
widget-handle values (< 0x100000) to be incorrectly dereferenced as
ObjectHeaders. At both sites, add a guard check that bails out and returns early
if the pointer value is less than 0x100000 before calling js_url_href_if_url or
try_read_as_search_params. In to_string.rs, insert the guard after the ptr
extraction but before the boxed assignment; in dynamic_arith.rs, insert the
guard before the to_primitive_default_for_add URL/SearchParams probes. This
ensures small pointers are detected and treated as widget handles per coding
guidelines.

Source: Coding guidelines

@proggeramlug proggeramlug force-pushed the fix/wide-object-literal-compile-time-4880 branch 2 times, most recently from d317527 to cef3968 Compare June 14, 2026 06:06
…teral blowup (#4880)

A module dominated by a huge generated object literal (config / lookup
table) lowers to one enormous function whose thousands of `alloca`s make
LLVM's `-O1+` optimization pipeline (SROA / mem2reg / GVN) super-linear.
Perry's own IR generation is fast (<1s); the cost is the external
`clang -c -O3` on the generated `.ll`.

Measured on a 2800-key literal (≈9.6 MB / 199K-line `.ll`):
`clang -c` is 3.0s at `-O0`, 17.1s at `-O1`, 18.4s at `-O2`, 18.5s at
`-O3` — i.e. the blowup is entirely in the `-O1+` pipeline and `-O0` is
the only escape (`-O1`/`-O2` are no faster than `-O3`).

Fix: in `build_clang_compile_plan`, compile a module whose IR exceeds a
size threshold (default 6 MiB, override via `PERRY_LL_O0_THRESHOLD_BYTES`)
at `-O0` instead of `-O3`, with a one-line note to stderr. Such modules
are almost always static data where optimization is irrelevant, and the
threshold is high enough that ordinary modules are unaffected (they stay
`-O3`).

End-to-end: the 2800-key repro drops from ~19s to ~5.4s and still runs
correctly; a 400-key program is unchanged (stays `-O3`). New
`compile_plan_downgrades_to_o0_for_oversized_module` test + existing
linker tests pass.

(The issue's headline 2100=114s is ~10x stale — current main compiles
2100 keys in ~11s — but the super-linear LLVM cost remained; this caps
it for the pathological generated-data case.)
@proggeramlug proggeramlug force-pushed the fix/wide-object-literal-compile-time-4880 branch from cef3968 to 5a3226f Compare June 14, 2026 07:33
@proggeramlug proggeramlug merged commit 4169a34 into main Jun 14, 2026
15 checks passed
@proggeramlug proggeramlug deleted the fix/wide-object-literal-compile-time-4880 branch June 14, 2026 08:46
proggeramlug pushed a commit that referenced this pull request Jun 14, 2026
Rolls up the issue-fix batch merged on top of 0.5.1165 (#5102, #5103,
#5105, #5106, #5107, #5108, #5109, #5110, #5112, #5117). See CHANGELOG
for the per-PR breakdown.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compile-time blowup on wide object literals (2100 keys ≈ 2 min, 3000 keys > 7 min)

1 participant