codegen: split large modules into N codegen units (bound clang peak memory) (#5391)#5407
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (6)
🚧 Files skipped from review as they are similar to previous changes (5)
📝 WalkthroughWalkthroughImplements LLVM codegen unit splitting for large Perry modules. A new heuristic ( ChangesLLVM Codegen Unit Splitting
Sequence Diagram(s)sequenceDiagram
participant compile_module
participant decide_codegen_units
participant LlModule
participant compile_units_to_object
participant compile_ll_to_object
participant ld_r as ld -r
compile_module->>decide_codegen_units: module_callable_count(hir)
decide_codegen_units-->>compile_module: n_units
alt n_units > 1
compile_module->>LlModule: render_codegen_units(n_units)
LlModule-->>compile_module: Vec<String> unit IR texts
compile_module->>compile_units_to_object: units, target_triple
loop each unit
compile_units_to_object->>compile_ll_to_object: unit IR → .o bytes
compile_units_to_object->>compile_units_to_object: write /tmp/unit_N.o
end
compile_units_to_object->>ld_r: -r unit_0.o unit_1.o …
ld_r-->>compile_units_to_object: merged object bytes
compile_units_to_object-->>compile_module: Result<Vec<u8>>
else n_units == 1
compile_module->>LlModule: to_ir()
LlModule-->>compile_module: IR string
compile_module->>compile_ll_to_object: IR string, target_triple
compile_ll_to_object-->>compile_module: object bytes
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
crates/perry-codegen/src/function.rs (1)
368-380: ⚡ Quick winInclude
entry_post_init_setupin the IR-size estimate.Line 378 counts
entry_allocas, butto_ir()also emitsentry_post_init_setupinto block 0. Excluding it can underweight init-heavy functions and skew unit balancing.Suggested patch
pub fn estimated_ir_bytes(&self) -> usize { let body: usize = self .blocks .iter() .map(|b| { b.instructions_iter().map(|i| i.len() + 1).sum::<usize>() + b.label.len() + 4 }) .sum(); let allocas: usize = self.entry_allocas.iter().map(|a| a.len() + 1).sum(); - body + allocas + self.name.len() + 64 + let post_init_setup: usize = self.entry_post_init_setup.iter().map(|l| l.len() + 1).sum(); + body + allocas + post_init_setup + self.name.len() + 64 }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/perry-codegen/src/function.rs` around lines 368 - 380, The estimated_ir_bytes() method calculates IR size for entry_allocas but does not account for entry_post_init_setup, which is also emitted into block 0 by the to_ir() function. This causes the size estimate to be inaccurate for init-heavy functions. Add a calculation for entry_post_init_setup similar to the entry_allocas calculation (iterating through each element, summing up their lengths plus 1 for each item), and include this sum in the final body + allocas + name.len() + 64 calculation to provide a more complete IR-size estimate.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/perry-codegen/src/codegen/mod.rs`:
- Around line 2521-2531: The n_units decision logic in the codegen units block
only checks emit_ir_only but neglects to check PERRY_SAVE_LL, which according to
the comment should also use the single-text path. Update the condition that sets
n_units to 1 to also check whether PERRY_SAVE_LL is set, so that when the save
operation is enabled, it takes the whole-module text path and doesn't return
early before reaching the save block.
In `@crates/perry-codegen/src/linker.rs`:
- Around line 303-309: The temporary object files (perry_cgu_*.o) created in the
loop starting at the compile_ll_to_object call are not cleaned up when early
failures occur through the ? operator, leaving orphaned files in /tmp. Implement
a cleanup mechanism that ensures all object files in obj_paths are deleted
regardless of whether compilation succeeds or fails. This could be done by using
a scope guard or error handler that guarantees cleanup runs even when ?
operators cause early returns before reaching the intended cleanup block at line
335-340. Make sure the cleanup logic removes all accumulated object files from
obj_paths whenever the function exits due to error.
- Around line 312-318: The linker selection in the partial linking step (where
Command::new(&ld) is created) currently ignores target_triple and always uses
the host linker via PERRY_LD environment variable or hardcoded "ld" string. This
fails for cross-compilation targets like MSVC/COFF or non-GNU targets. Replace
the hardcoded linker selection logic by introducing target-aware selection logic
that examines target_triple (the same parameter used for per-unit compilation)
and selects the appropriate linker tool (e.g., lld-link for COFF targets,
link.exe for MSVC, or target-specific ld variants). Include a clear error or
diagnostic when the target is not supported or no compatible partial linker is
available.
In `@crates/perry-codegen/src/module.rs`:
- Around line 549-555: The test in the render_codegen_units_dedup function
currently does not validate the multi-unit deduplication path because after
deduplication, all definitions are collapsed into a single unit rather than
being distributed across multiple units. Modify the test setup to create a
scenario where the duplicate symbol perry_method_j__foo appears across multiple
codegen units (not consolidated into one), so that the assertion at the end
actually exercises and validates the multi-unit deduplication behavior instead
of only testing the single-unit case.
---
Nitpick comments:
In `@crates/perry-codegen/src/function.rs`:
- Around line 368-380: The estimated_ir_bytes() method calculates IR size for
entry_allocas but does not account for entry_post_init_setup, which is also
emitted into block 0 by the to_ir() function. This causes the size estimate to
be inaccurate for init-heavy functions. Add a calculation for
entry_post_init_setup similar to the entry_allocas calculation (iterating
through each element, summing up their lengths plus 1 for each item), and
include this sum in the final body + allocas + name.len() + 64 calculation to
provide a more complete IR-size estimate.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: df5fb6c3-227d-4f6f-aea7-424c8f3b5224
📒 Files selected for processing (5)
crates/perry-codegen/src/codegen/helpers.rscrates/perry-codegen/src/codegen/mod.rscrates/perry-codegen/src/function.rscrates/perry-codegen/src/linker.rscrates/perry-codegen/src/module.rs
| let combined = tmp_dir.join(format!("perry_cgu_{}_{}_combined.o", pid, nonce)); | ||
| let ld = env::var("PERRY_LD").unwrap_or_else(|_| "ld".to_string()); | ||
| let mut cmd = Command::new(&ld); | ||
| cmd.arg("-r").arg("-o").arg(&combined); | ||
| for p in &obj_paths { | ||
| cmd.arg(p); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Inspect whether the split partial-link path propagates target/toolchain selection.
# Expected: current output shows the per-unit clang path receives `target_triple`,
# while the partial-link command is always `PERRY_LD`/`ld -r`.
rg -n -C3 'compile_units_to_object|PERRY_LD|arg\("-r"\)|target_triple' --type=rustRepository: PerryTS/perry
Length of output: 50369
🏁 Script executed:
sed -n '200,400p' crates/perry-codegen/src/linker.rs | cat -nRepository: PerryTS/perry
Length of output: 9679
🏁 Script executed:
rg 'lld|link\.exe|arm.*ld|aarch64.*ld' crates/perry-codegen/src/linker.rsRepository: PerryTS/perry
Length of output: 320
Route the partial linker through target-aware selection logic.
The per-unit compilation correctly passes target_triple to clang -c (line 104), but the subsequent merge at lines 313–318 always invokes host ld -r via PERRY_LD or hardcoded "ld". When compiling for MSVC/COFF or cross-targets (e.g., host=Linux, target=Windows), the resulting COFF objects cannot be merged with GNU ld, causing link failure or producing incompatible output.
Select the linker based on target_triple to use the correct tool (lld-link, link.exe, or target-specific ld variants), or disable splitting with a clear diagnostic when no compatible partial linker is available.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/perry-codegen/src/linker.rs` around lines 312 - 318, The linker
selection in the partial linking step (where Command::new(&ld) is created)
currently ignores target_triple and always uses the host linker via PERRY_LD
environment variable or hardcoded "ld" string. This fails for cross-compilation
targets like MSVC/COFF or non-GNU targets. Replace the hardcoded linker
selection logic by introducing target-aware selection logic that examines
target_triple (the same parameter used for per-unit compilation) and selects the
appropriate linker tool (e.g., lld-link for COFF targets, link.exe for MSVC, or
target-specific ld variants). Include a clear error or diagnostic when the
target is not supported or no compatible partial linker is available.
A large bundle interns ~190K strings, and emitting every string's init (from-bytes + nanbox + global store + GC-root register) into one `__perry_init_strings` block produced a single ~68MB function. clang -O0 (forced for oversized modules, #4880) can't compile a function that large in practical time — its per-function passes are superlinear in size — so this was the residual wall after codegen-unit splitting removed the module MEMORY wall: a single giant FUNCTION that inter-function splitting can't divide. The string inits are independent (each handle is stored to its own global; no SSA value flows between iterations), so emit them into chunk functions of PERRY_STRING_INIT_CHUNK_SIZE entries (default 4000) and call those in sequence from `__perry_init_strings`. The ~48 chunks are each ~1.4MB and, combined with codegen-unit splitting (#5407), bin-pack evenly across units instead of one unit carrying the 68MB monolith. Order is preserved (chunks called in sequence, ops in order within a chunk), so GC-root registration and any ordering stay identical. Behavior-neutral for ordinary modules (a handful of strings → one chunk). Full codegen suite green.
Update: + string-pool init splitting (function-level)Pushed a 4th commit that splits the generated string-pool initializer. On the bundle that function was a single 68MB monolith (190K independent string inits in one block) — Empirical status on the cli.js bundle
This PR is a correct, self-contained structural win on its own: it removes the module memory wall and the worst single (generated) function, both proven. |
… modules (#5391 path 2) (#5410) Extends the #5334 lever-B full-outline from field-SET to field-GET. After codegen-unit splitting (#5407) removed the module memory wall and the string-init split removed the 68MB generated function, the residual wall is the bundle's large minified USER functions (7-18MB closures): clang -O0 time is superlinear in single-function size. Full-outlining the IC diamonds shrinks those functions at their source. When full-outline is enabled (size-gated), the class-field-GET diamond (inline precheck + guard call + fast slot load + by-name fallback + phi) collapses to one `js_class_field_get_ic(...)` call returning the field value. The runtime helper runs the same guard, reads the field slot as f64 on a pass (a plain number is self-boxing in nan-boxing, so raw-f64 and boxed slots read identically — matching the inline `load double`), and records + reads by name on a miss. The outline drops the inline path's static raw-number type hint (result treated as a general JS value), which is value-correct. class-field GET is the largest remaining diamond kind on the bundle (~39K sites). Gated by the existing PERRY_FULL_OUTLINE_IC; default off = unchanged. Unit test covers both gate states; full codegen suite green. Co-authored-by: Ralph Küpper <ralph2@skelpo.com>
…ERRY_SAVE_LL; clean temp objects on failure (#5407)
|
Thanks — addressed the review:
|
6b63b2b to
3e4f12e
Compare
A large bundle interns ~190K strings, and emitting every string's init (from-bytes + nanbox + global store + GC-root register) into one `__perry_init_strings` block produced a single ~68MB function. clang -O0 (forced for oversized modules, #4880) can't compile a function that large in practical time — its per-function passes are superlinear in size — so this was the residual wall after codegen-unit splitting removed the module MEMORY wall: a single giant FUNCTION that inter-function splitting can't divide. The string inits are independent (each handle is stored to its own global; no SSA value flows between iterations), so emit them into chunk functions of PERRY_STRING_INIT_CHUNK_SIZE entries (default 4000) and call those in sequence from `__perry_init_strings`. The ~48 chunks are each ~1.4MB and, combined with codegen-unit splitting (#5407), bin-pack evenly across units instead of one unit carrying the 68MB monolith. Order is preserved (chunks called in sequence, ops in order within a chunk), so GC-root registration and any ordering stay identical. Behavior-neutral for ordinary modules (a handful of strings → one chunk). Full codegen suite green.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/perry-codegen/src/codegen/mod.rs`:
- Around line 2535-2538: The n_units decision logic at the assignment of n_units
should check three conditions to use single-unit mode (n_units = 1) instead of
deciding based on module_callable_count: the existing opts.emit_ir_only check, a
new check for the PERRY_SAVE_LL environment variable being set, and a
cross-compilation check for when opts.target.is_none(). Modify the condition so
that if any of these three conditions are true, set n_units to 1, otherwise call
decide_codegen_units with the hir. This ensures that PERRY_SAVE_LL paths reach
the save logic at line 2558 and that cross-compilation without explicit target
override uses the single linker invocation path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 6d540f14-6d75-442e-ae98-a1863bfae08e
📒 Files selected for processing (6)
crates/perry-codegen/src/codegen/helpers.rscrates/perry-codegen/src/codegen/mod.rscrates/perry-codegen/src/codegen/string_pool.rscrates/perry-codegen/src/function.rscrates/perry-codegen/src/linker.rscrates/perry-codegen/src/module.rs
🚧 Files skipped from review as they are similar to previous changes (5)
- crates/perry-codegen/src/codegen/helpers.rs
- crates/perry-codegen/src/linker.rs
- crates/perry-codegen/src/function.rs
- crates/perry-codegen/src/module.rs
- crates/perry-codegen/src/codegen/string_pool.rs
| let n_units = if opts.emit_ir_only { | ||
| 1 | ||
| } else { | ||
| decide_codegen_units(module_callable_count(hir)) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if cross-compilation gating is implemented elsewhere
rg -n "PERRY_CODEGEN_UNITS|render_codegen_units|compile_units_to_object" --type rust -C 3 | head -100Repository: PerryTS/perry
Length of output: 7863
🏁 Script executed:
# Check the full context around the n_units decision in codegen/mod.rs
sed -n '2520,2560p' crates/perry-codegen/src/codegen/mod.rsRepository: PerryTS/perry
Length of output: 1664
🏁 Script executed:
# Check the full compile_units_to_object implementation in linker.rs
sed -n '280,320p' crates/perry-codegen/src/linker.rsRepository: PerryTS/perry
Length of output: 1884
🏁 Script executed:
# Look for any environment variable checks related to PERRY_SAVE_LL or cross-compilation
rg -n "PERRY_SAVE_LL|is_cross|target_triple" crates/perry-codegen/src/ --type rustRepository: PerryTS/perry
Length of output: 4435
🏁 Script executed:
# Check git log for PR references to cross-compilation and codegen units
git log --oneline --all --grep="5391\|cross-compil\|codegen.*unit" 2>/dev/null | head -20Repository: PerryTS/perry
Length of output: 914
🏁 Script executed:
# Check if there are any comments in the code mentioning cross-compilation + codegen splitting
rg -n "cross.*compil|target.*arch" crates/perry-codegen/src/codegen/mod.rs crates/perry-codegen/src/linker.rsRepository: PerryTS/perry
Length of output: 619
🏁 Script executed:
# Check commit messages for `#5391` to see if cross-compilation gating was mentioned
git log --all --format="%B" --grep="5391" 2>/dev/null | head -100Repository: PerryTS/perry
Length of output: 5531
🏁 Script executed:
# Look for FIXME/TODO comments about cross-compilation in codegen units code
rg -n "FIXME|TODO|XXX" crates/perry-codegen/src/codegen/mod.rs crates/perry-codegen/src/linker.rs | grep -i "cross\|target\|compil\|unit"Repository: PerryTS/perry
Length of output: 39
🏁 Script executed:
# Check if ld -r command gets any target-specific flags when cross-compiling
sed -n '310,340p' crates/perry-codegen/src/linker.rsRepository: PerryTS/perry
Length of output: 1075
Add PERRY_SAVE_LL check to n_units decision logic.
The code comment at line 2532 states that both emit_ir_only and PERRY_SAVE_LL should use the single-text path, but the n_units decision at lines 2535–2537 only checks emit_ir_only. When PERRY_SAVE_LL is set, the code incorrectly proceeds to codegen splitting instead of taking the single-text path where PERRY_SAVE_LL is actually handled (line 2558). This causes the directory to be never populated because the split path returns early without reaching the PERRY_SAVE_LL save logic.
Additionally, cross-compilation with splitting should be gated: when cross-compiling (no --target flag override from host), ld -r at line ~320 uses the host linker by default, which may not support the target object format. Pass opts.target.is_some() into the n_units decision to force single-unit mode.
🔧 Proposed fix
+ let save_ll = std::env::var_os("PERRY_SAVE_LL").is_some();
+ let is_cross_compile = opts.target.is_some();
- let n_units = if opts.emit_ir_only {
+ let n_units = if opts.emit_ir_only || save_ll || is_cross_compile {
1
} else {
decide_codegen_units(module_callable_count(hir))
};🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/perry-codegen/src/codegen/mod.rs` around lines 2535 - 2538, The
n_units decision logic at the assignment of n_units should check three
conditions to use single-unit mode (n_units = 1) instead of deciding based on
module_callable_count: the existing opts.emit_ir_only check, a new check for the
PERRY_SAVE_LL environment variable being set, and a cross-compilation check for
when opts.target.is_none(). Modify the condition so that if any of these three
conditions are true, set n_units to 1, otherwise call decide_codegen_units with
the hir. This ensures that PERRY_SAVE_LL paths reach the save logic at line 2558
and that cross-compilation without explicit target override uses the single
linker invocation path.
A single LLVM translation unit per module makes clang OOM on large inputs (a 13MB minified bundle lowers to one ~1.1GB .ll; clang -c needs ~15GB RSS). This adds codegen-unit splitting: the populated LlModule renders to N independent .ll texts, each compiled by clang -c separately (peak RSS ~whole/N) and merged with `ld -r` into one object — so compile_module's single-Vec<u8> contract and the existing one-object link path are unchanged. Mechanism (LlModule::render_codegen_units): - functions partitioned into N contiguous buckets; - the full string-constant + global set carried in every unit with local and bare-external DEFINITIONS promoted to linkonce_odr (linker keeps one copy; globals are ~1.4% of a large module's IR, so duplication is cheap); - each unit gets external declares for every function it calls but does not define (deduped by name; existing declarations win), so cross-unit calls resolve; the lone internal init/wrapper is promoted to external; - shared attribute groups + metadata replicated so #N/!N references resolve. Gating (decide_codegen_units): PERRY_CODEGEN_UNITS=N forces N (1 disables); otherwise auto by callable count — 1 unit below a floor, then ceil(callables / PERRY_CODEGEN_UNIT_SIZE) capped at 48. Default is 1 unit for ordinary modules — zero behavior change. Validated: a 2-unit split of a multi-function program with cross-unit calls + shared string/class globals compiles, links, and runs identically to the 1-unit build (`hi world` / 20). Unit tests cover the partition (define-once, cross-unit declares, linkonce_odr promotion) and the n=1 == to_ir identity. Follow-on to the IR-efficiency roadmap #5334; tracks #5391.
Minified bundles can contain two distinct classes that sanitize to the same name (e.g. two classes `j`), so their mangled method symbols collide (`perry_method_..._j__getElementsByTagName` defined twice). LLVM rejects the redefinition and the whole module fails to compile — independent of codegen-unit splitting (the single-TU `to_ir` had the same latent bug; it only surfaced once a real bundle reached clang). Add `deduped_function_refs` (first occurrence per name) and route both `to_ir` and `render_codegen_units` through it, so each symbol is emitted once. Calls to the duplicate resolve to the first definition — a dispatch ambiguity limited to genuinely name-colliding members; proper disambiguation by class id is a separate concern. No-op for ordinary modules (no duplicate names). Unit test covers both render paths.
Contiguous chunk-by-function-count partitioning clumped functions unevenly: on the cli.js bundle one unit ballooned to 139MB (it caught the biggest functions) while others were ~50MB, and clang -O0 time is superlinear in unit size, so that one unit dominated the whole compile. Switch to greedy largest-first bin-packing by `LlFunction::estimated_ir_bytes` (a cheap instruction-byte sum, no second render): each function is placed into the currently-smallest unit, so big functions are isolated and the rest stay even. This bounds the max unit size to ~total/N + the single largest function. NB: a function larger than total/N is irreducible by inter-function splitting — the cli.js bundle has a single 68MB minified IIFE whose clang -O0 time is the residual bottleneck (the intra-function #4880 problem, distinct from the module memory wall this feature removes). Balancing isolates it so the other units compile fast and in bounded memory. Unit test: a giant function + several tiny ones splits so the giant is not clumped with the small set.
A large bundle interns ~190K strings, and emitting every string's init (from-bytes + nanbox + global store + GC-root register) into one `__perry_init_strings` block produced a single ~68MB function. clang -O0 (forced for oversized modules, #4880) can't compile a function that large in practical time — its per-function passes are superlinear in size — so this was the residual wall after codegen-unit splitting removed the module MEMORY wall: a single giant FUNCTION that inter-function splitting can't divide. The string inits are independent (each handle is stored to its own global; no SSA value flows between iterations), so emit them into chunk functions of PERRY_STRING_INIT_CHUNK_SIZE entries (default 4000) and call those in sequence from `__perry_init_strings`. The ~48 chunks are each ~1.4MB and, combined with codegen-unit splitting (#5407), bin-pack evenly across units instead of one unit carrying the 68MB monolith. Order is preserved (chunks called in sequence, ops in order within a chunk), so GC-root registration and any ordering stay identical. Behavior-neutral for ordinary modules (a handful of strings → one chunk). Full codegen suite green.
3e4f12e to
05fd8ef
Compare
What
Split large modules into N independently-compiled codegen units so
clang -c's peak memory is bounded to ~whole/N instead of OOMing on one giant translation unit. Closes the structural half of the single-TU wall in #5391 (follow-on to the IR-efficiency roadmap #5334).The 13MB
@anthropic-ai/claude-codecli.js lowers to one ~1.1GB / 28.7M-line.ll;clang -cneeds ~15GB RSS to compile it — the hard wall that stops the bundle producing a binary. The IR-efficiency levers (#5334 A–D) trimmed ~12%, but the floor is structural: one TU.How
LlModule::render_codegen_units(n)renders the populated module into N.lltexts:linkonce_odr(linker keeps one copy; globals are ~1.4% of a large module's IR, so duplication is cheap);declares for every function it calls but doesn't define (deduped by name; existing declarations win), so cross-unit calls resolve; the loneinternalinit/wrapper is promoted to external;#N/!Nreferences resolve.linker::compile_units_to_objectcompiles each unit withclang -c(sequential → peak RSS ≈ one unit) and merges the objects withld -rinto one — socompile_module(...) -> Result<Vec<u8>>and the existing one-object link path are unchanged.Also fixes a latent duplicate-symbol bug (two minified classes both named
j→ colliding mangled method symbol, defined twice → LLVM redefinition error).to_irand the unit renderer now emit each symbol once. This bit the single-TU path too; it only surfaced once a real bundle reached clang.Gating
PERRY_CODEGEN_UNITS=Nforces N (1 disables); otherwise auto by callable count — 1 unit below a floor (default, zero behavior change), thenceil(callables / PERRY_CODEGEN_UNIT_SIZE)capped at 48.Validation
hi world/ 20). Unit tests cover the partition (define-once, cross-unit declares,linkonce_odrpromotion), size-balancing (a giant function is isolated), dedup (both render paths), andn=1 == to_ir.Known remaining wall (out of scope here)
The bundle has a single 68MB minified IIFE.
clang -O0on one 68MB function is pathologically slow (time, not memory) — the intra-function #4880 problem, which inter-function splitting cannot divide. Size-balancing isolates it (so the other 23 units stay fast + bounded), but producing a full bundle binary needs function-level splitting, tracked separately.Follow-ups
Declare only referenced symbols per unit (vs all); bounded-parallel unit compiles for wall-clock speedup. Refs #5391.
Summary by CodeRabbit