Skip to content

codegen: split large modules into N codegen units (bound clang peak memory) (#5391)#5407

Merged
proggeramlug merged 5 commits into
mainfrom
feat/codegen-units
Jun 18, 2026
Merged

codegen: split large modules into N codegen units (bound clang peak memory) (#5391)#5407
proggeramlug merged 5 commits into
mainfrom
feat/codegen-units

Conversation

@proggeramlug

@proggeramlug proggeramlug commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

What

Split large modules into N independently-compiled codegen units so clang -c's peak memory is bounded to ~whole/N instead of OOMing on one giant translation unit. Closes the structural half of the single-TU wall in #5391 (follow-on to the IR-efficiency roadmap #5334).

The 13MB @anthropic-ai/claude-code cli.js lowers to one ~1.1GB / 28.7M-line .ll; clang -c needs ~15GB RSS to compile it — the hard wall that stops the bundle producing a binary. The IR-efficiency levers (#5334 A–D) trimmed ~12%, but the floor is structural: one TU.

How

LlModule::render_codegen_units(n) renders the populated module into N .ll texts:

  • functions bin-packed by estimated byte size (greedy largest-first) so no unit is outsized;
  • the full string-constant + global set carried in every unit with local/bare-external definitions promoted to linkonce_odr (linker keeps one copy; globals are ~1.4% of a large module's IR, so duplication is cheap);
  • each unit gets external declares for every function it calls but doesn't define (deduped by name; existing declarations win), so cross-unit calls resolve; the lone internal init/wrapper is promoted to external;
  • shared attribute groups + metadata replicated so #N/!N references resolve.

linker::compile_units_to_object compiles each unit with clang -c (sequential → peak RSS ≈ one unit) and merges the objects with ld -r into one — so compile_module(...) -> Result<Vec<u8>> and the existing one-object link path are unchanged.

Also fixes a latent duplicate-symbol bug (two minified classes both named j → colliding mangled method symbol, defined twice → LLVM redefinition error). to_ir and the unit renderer now emit each symbol once. This bit the single-TU path too; it only surfaced once a real bundle reached clang.

Gating

PERRY_CODEGEN_UNITS=N forces N (1 disables); otherwise auto by callable count — 1 unit below a floor (default, zero behavior change), then ceil(callables / PERRY_CODEGEN_UNIT_SIZE) capped at 48.

Validation

  • Correctness: a multi-function program with cross-unit calls + shared string/class globals compiles, links, and runs identically at 1, 2, 3, and 4 units (hi world / 20). Unit tests cover the partition (define-once, cross-unit declares, linkonce_odr promotion), size-balancing (a giant function is isolated), dedup (both render paths), and n=1 == to_ir.
  • The memory wall is gone: the cli.js bundle compiled 23/24 units at ~3GB peak clang RSS (vs ~15GB on the single TU) before I stopped it — proving the structural fix.

Known remaining wall (out of scope here)

The bundle has a single 68MB minified IIFE. clang -O0 on one 68MB function is pathologically slow (time, not memory) — the intra-function #4880 problem, which inter-function splitting cannot divide. Size-balancing isolates it (so the other 23 units stay fast + bounded), but producing a full bundle binary needs function-level splitting, tracked separately.

Follow-ups

Declare only referenced symbols per unit (vs all); bounded-parallel unit compiles for wall-clock speedup. Refs #5391.

Summary by CodeRabbit

  • New Features
    • Added configurable LLVM codegen-unit splitting for large modules (with environment overrides), including per-unit compilation merged into a single object and earlier native-representation artifact emission.
    • Improved cross-unit correctness with de-duplicated function symbols and safe promotion of shared globals/strings.
    • Reduced monolithic string initialization via chunked setup (configurable chunk size).
    • Added IR-size estimation for functions and exposed module function count.
  • Tests
    • Expanded coverage for multi-unit packing, cross-unit references, symbol-collision handling, and chunked string initialization behavior.

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 4238dc44-3bd2-442f-a162-61afa40dd0e8

📥 Commits

Reviewing files that changed from the base of the PR and between 3e4f12e and 05fd8ef.

📒 Files selected for processing (6)
  • crates/perry-codegen/src/codegen/helpers.rs
  • crates/perry-codegen/src/codegen/mod.rs
  • crates/perry-codegen/src/codegen/string_pool.rs
  • crates/perry-codegen/src/function.rs
  • crates/perry-codegen/src/linker.rs
  • crates/perry-codegen/src/module.rs
🚧 Files skipped from review as they are similar to previous changes (5)
  • crates/perry-codegen/src/codegen/helpers.rs
  • crates/perry-codegen/src/codegen/string_pool.rs
  • crates/perry-codegen/src/function.rs
  • crates/perry-codegen/src/module.rs
  • crates/perry-codegen/src/codegen/mod.rs

📝 Walkthrough

Walkthrough

Implements LLVM codegen unit splitting for large Perry modules. A new heuristic (decide_codegen_units) selects unit count based on callable count and env-var overrides. Module IR emission is refactored to partition functions via greedy bin-packing by estimated IR size, with link-safe linkage promotion, cross-unit declare synthesis, and symbol deduplication. String initialization is split into multiple chunk functions to reduce per-unit IR. A new linker function (compile_units_to_object) compiles each unit and merges them via ld -r. The compile_module pipeline is wired to use the multi-unit path when applicable.

Changes

LLVM Codegen Unit Splitting

Layer / File(s) Summary
IR size estimation and unit-count heuristic
crates/perry-codegen/src/function.rs, crates/perry-codegen/src/codegen/helpers.rs
LlFunction::estimated_ir_bytes approximates function IR size by summing per-instruction and per-block overhead; decide_codegen_units selects unit count from callable count with env-var overrides (PERRY_CODEGEN_UNITS, PERRY_CODEGEN_UNIT_SIZE) clamped to [1, 48].
Module IR refactoring for multi-unit safety
crates/perry-codegen/src/module.rs
Adds private helpers for linkage stripping, global/string promotion, external-declare synthesis, and forced external linkage. Introduces deduped_function_refs() to prevent symbol collisions; updates to_ir() to use deduplicated function lists for consistent declare filtering and definition emission across full-module and split paths. Factors setjmp attributes and metadata into push_attrs_and_metadata for reuse. Exposes LlModule::function_count() for unit balancing.
Module unit partitioning with bin-packing
crates/perry-codegen/src/module.rs
Refactors render_codegen_units(n) to partition functions via greedy bin-packing by estimated IR bytes, promotes shared globals and strings to linkonce_odr for safe duplication, synthesizes external declares for cross-unit references with dedup, forces external linkage on internal/private function definitions. Tests cover 2-unit partitioning with cross-unit declares, symbol collision handling, greedy bin-packing isolation of large functions, and single-unit equivalence to to_ir().
Multi-unit object compilation and partial link
crates/perry-codegen/src/linker.rs
compile_units_to_object compiles each unit's IR to a temp .o via compile_ll_to_object, merges via ld -r (configurable via PERRY_LD), returns merged bytes on success, reports link failure with stderr, and conditionally cleans up intermediates based on PERRY_LLVM_KEEP_IR.
String pool initialization chunking
crates/perry-codegen/src/codegen/string_pool.rs
emit_string_pool partitions string initialization into multiple __perry_init_strings_<prefix>_chunk<N> functions (chunk size configurable via PERRY_STRING_INIT_CHUNK_SIZE, default 4000) and updates the main init function to sequentially call each chunk, reducing per-unit monolithic init function IR size.
compile_module pipeline integration
crates/perry-codegen/src/codegen/mod.rs
Moves native-representation artifact write before the early-return branch; computes n_units (forced to 1 for emit_ir_only, otherwise via decide_codegen_units); for n_units > 1 renders units and returns early via compile_units_to_object, leaving the single-unit path as fallback.

Sequence Diagram(s)

sequenceDiagram
  participant compile_module
  participant decide_codegen_units
  participant LlModule
  participant compile_units_to_object
  participant compile_ll_to_object
  participant ld_r as ld -r

  compile_module->>decide_codegen_units: module_callable_count(hir)
  decide_codegen_units-->>compile_module: n_units

  alt n_units > 1
    compile_module->>LlModule: render_codegen_units(n_units)
    LlModule-->>compile_module: Vec<String> unit IR texts
    compile_module->>compile_units_to_object: units, target_triple
    loop each unit
      compile_units_to_object->>compile_ll_to_object: unit IR → .o bytes
      compile_units_to_object->>compile_units_to_object: write /tmp/unit_N.o
    end
    compile_units_to_object->>ld_r: -r unit_0.o unit_1.o …
    ld_r-->>compile_units_to_object: merged object bytes
    compile_units_to_object-->>compile_module: Result<Vec<u8>>
  else n_units == 1
    compile_module->>LlModule: to_ir()
    LlModule-->>compile_module: IR string
    compile_module->>compile_ll_to_object: IR string, target_triple
    compile_ll_to_object-->>compile_module: object bytes
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

  • PerryTS/perry#5308: Both PRs change how per-function LLVM symbol names are generated and kept unique to prevent symbol-collision issues during codegen (main PR via cross-unit deduped declare/define emission; retrieved PR via injective scoped_fn_name/__dupN disambiguation).

Poem

🐇 Hop, hop, the IR grows too wide,
So I split it up and let the units glide.
Each bucket bins its bytes with greedy care,
ld -r stitches objects through the air.
No more giant monolith in memory's lair —
Just nimble units, linked without despair! 🌿

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically describes the main change: splitting large modules into N codegen units to bound clang peak memory, directly matching the PR's core objective.
Description check ✅ Passed The PR description covers all required template sections: Summary (Why), Changes (What), Related issue (#5391), Test plan with validation checkboxes, and Checklist. All sections are substantively completed.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/codegen-units

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
crates/perry-codegen/src/function.rs (1)

368-380: ⚡ Quick win

Include entry_post_init_setup in the IR-size estimate.

Line 378 counts entry_allocas, but to_ir() also emits entry_post_init_setup into block 0. Excluding it can underweight init-heavy functions and skew unit balancing.

Suggested patch
     pub fn estimated_ir_bytes(&self) -> usize {
         let body: usize = self
             .blocks
             .iter()
             .map(|b| {
                 b.instructions_iter().map(|i| i.len() + 1).sum::<usize>()
                     + b.label.len()
                     + 4
             })
             .sum();
         let allocas: usize = self.entry_allocas.iter().map(|a| a.len() + 1).sum();
-        body + allocas + self.name.len() + 64
+        let post_init_setup: usize = self.entry_post_init_setup.iter().map(|l| l.len() + 1).sum();
+        body + allocas + post_init_setup + self.name.len() + 64
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/perry-codegen/src/function.rs` around lines 368 - 380, The
estimated_ir_bytes() method calculates IR size for entry_allocas but does not
account for entry_post_init_setup, which is also emitted into block 0 by the
to_ir() function. This causes the size estimate to be inaccurate for init-heavy
functions. Add a calculation for entry_post_init_setup similar to the
entry_allocas calculation (iterating through each element, summing up their
lengths plus 1 for each item), and include this sum in the final body + allocas
+ name.len() + 64 calculation to provide a more complete IR-size estimate.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/perry-codegen/src/codegen/mod.rs`:
- Around line 2521-2531: The n_units decision logic in the codegen units block
only checks emit_ir_only but neglects to check PERRY_SAVE_LL, which according to
the comment should also use the single-text path. Update the condition that sets
n_units to 1 to also check whether PERRY_SAVE_LL is set, so that when the save
operation is enabled, it takes the whole-module text path and doesn't return
early before reaching the save block.

In `@crates/perry-codegen/src/linker.rs`:
- Around line 303-309: The temporary object files (perry_cgu_*.o) created in the
loop starting at the compile_ll_to_object call are not cleaned up when early
failures occur through the ? operator, leaving orphaned files in /tmp. Implement
a cleanup mechanism that ensures all object files in obj_paths are deleted
regardless of whether compilation succeeds or fails. This could be done by using
a scope guard or error handler that guarantees cleanup runs even when ?
operators cause early returns before reaching the intended cleanup block at line
335-340. Make sure the cleanup logic removes all accumulated object files from
obj_paths whenever the function exits due to error.
- Around line 312-318: The linker selection in the partial linking step (where
Command::new(&ld) is created) currently ignores target_triple and always uses
the host linker via PERRY_LD environment variable or hardcoded "ld" string. This
fails for cross-compilation targets like MSVC/COFF or non-GNU targets. Replace
the hardcoded linker selection logic by introducing target-aware selection logic
that examines target_triple (the same parameter used for per-unit compilation)
and selects the appropriate linker tool (e.g., lld-link for COFF targets,
link.exe for MSVC, or target-specific ld variants). Include a clear error or
diagnostic when the target is not supported or no compatible partial linker is
available.

In `@crates/perry-codegen/src/module.rs`:
- Around line 549-555: The test in the render_codegen_units_dedup function
currently does not validate the multi-unit deduplication path because after
deduplication, all definitions are collapsed into a single unit rather than
being distributed across multiple units. Modify the test setup to create a
scenario where the duplicate symbol perry_method_j__foo appears across multiple
codegen units (not consolidated into one), so that the assertion at the end
actually exercises and validates the multi-unit deduplication behavior instead
of only testing the single-unit case.

---

Nitpick comments:
In `@crates/perry-codegen/src/function.rs`:
- Around line 368-380: The estimated_ir_bytes() method calculates IR size for
entry_allocas but does not account for entry_post_init_setup, which is also
emitted into block 0 by the to_ir() function. This causes the size estimate to
be inaccurate for init-heavy functions. Add a calculation for
entry_post_init_setup similar to the entry_allocas calculation (iterating
through each element, summing up their lengths plus 1 for each item), and
include this sum in the final body + allocas + name.len() + 64 calculation to
provide a more complete IR-size estimate.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: df5fb6c3-227d-4f6f-aea7-424c8f3b5224

📥 Commits

Reviewing files that changed from the base of the PR and between 54ebb6c and a8b8f57.

📒 Files selected for processing (5)
  • crates/perry-codegen/src/codegen/helpers.rs
  • crates/perry-codegen/src/codegen/mod.rs
  • crates/perry-codegen/src/function.rs
  • crates/perry-codegen/src/linker.rs
  • crates/perry-codegen/src/module.rs

Comment thread crates/perry-codegen/src/codegen/mod.rs
Comment thread crates/perry-codegen/src/linker.rs Outdated
Comment on lines +312 to +318
let combined = tmp_dir.join(format!("perry_cgu_{}_{}_combined.o", pid, nonce));
let ld = env::var("PERRY_LD").unwrap_or_else(|_| "ld".to_string());
let mut cmd = Command::new(&ld);
cmd.arg("-r").arg("-o").arg(&combined);
for p in &obj_paths {
cmd.arg(p);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Inspect whether the split partial-link path propagates target/toolchain selection.
# Expected: current output shows the per-unit clang path receives `target_triple`,
# while the partial-link command is always `PERRY_LD`/`ld -r`.
rg -n -C3 'compile_units_to_object|PERRY_LD|arg\("-r"\)|target_triple' --type=rust

Repository: PerryTS/perry

Length of output: 50369


🏁 Script executed:

sed -n '200,400p' crates/perry-codegen/src/linker.rs | cat -n

Repository: PerryTS/perry

Length of output: 9679


🏁 Script executed:

rg 'lld|link\.exe|arm.*ld|aarch64.*ld' crates/perry-codegen/src/linker.rs

Repository: PerryTS/perry

Length of output: 320


Route the partial linker through target-aware selection logic.

The per-unit compilation correctly passes target_triple to clang -c (line 104), but the subsequent merge at lines 313–318 always invokes host ld -r via PERRY_LD or hardcoded "ld". When compiling for MSVC/COFF or cross-targets (e.g., host=Linux, target=Windows), the resulting COFF objects cannot be merged with GNU ld, causing link failure or producing incompatible output.

Select the linker based on target_triple to use the correct tool (lld-link, link.exe, or target-specific ld variants), or disable splitting with a clear diagnostic when no compatible partial linker is available.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/perry-codegen/src/linker.rs` around lines 312 - 318, The linker
selection in the partial linking step (where Command::new(&ld) is created)
currently ignores target_triple and always uses the host linker via PERRY_LD
environment variable or hardcoded "ld" string. This fails for cross-compilation
targets like MSVC/COFF or non-GNU targets. Replace the hardcoded linker
selection logic by introducing target-aware selection logic that examines
target_triple (the same parameter used for per-unit compilation) and selects the
appropriate linker tool (e.g., lld-link for COFF targets, link.exe for MSVC, or
target-specific ld variants). Include a clear error or diagnostic when the
target is not supported or no compatible partial linker is available.

Comment thread crates/perry-codegen/src/module.rs
proggeramlug pushed a commit that referenced this pull request Jun 18, 2026
A large bundle interns ~190K strings, and emitting every string's init
(from-bytes + nanbox + global store + GC-root register) into one
`__perry_init_strings` block produced a single ~68MB function. clang -O0
(forced for oversized modules, #4880) can't compile a function that large
in practical time — its per-function passes are superlinear in size — so
this was the residual wall after codegen-unit splitting removed the module
MEMORY wall: a single giant FUNCTION that inter-function splitting can't
divide.

The string inits are independent (each handle is stored to its own global;
no SSA value flows between iterations), so emit them into chunk functions
of PERRY_STRING_INIT_CHUNK_SIZE entries (default 4000) and call those in
sequence from `__perry_init_strings`. The ~48 chunks are each ~1.4MB and,
combined with codegen-unit splitting (#5407), bin-pack evenly across units
instead of one unit carrying the 68MB monolith. Order is preserved (chunks
called in sequence, ops in order within a chunk), so GC-root registration
and any ordering stay identical.

Behavior-neutral for ordinary modules (a handful of strings → one chunk).
Full codegen suite green.
@proggeramlug

Copy link
Copy Markdown
Contributor Author

Update: + string-pool init splitting (function-level)

Pushed a 4th commit that splits the generated string-pool initializer. On the bundle that function was a single 68MB monolith (190K independent string inits in one block) — clang -O0 (forced for oversized modules, #4880) is superlinear in single-function size, so even after codegen-unit splitting removed the module memory wall, that one function was an unbounded time wall. The inits are independent (each handle stored to its own global; no SSA flows between iterations), so they now emit as ~48 chunk functions (PERRY_STRING_INIT_CHUNK_SIZE, default 4000) called in sequence — bin-packing evenly across units. Validated: correct runtime output with multiple chunks + CGU split; behavior-neutral for ordinary modules.

Empirical status on the cli.js bundle

This PR is a correct, self-contained structural win on its own: it removes the module memory wall and the worst single (generated) function, both proven.

proggeramlug added a commit that referenced this pull request Jun 18, 2026
… modules (#5391 path 2) (#5410)

Extends the #5334 lever-B full-outline from field-SET to field-GET. After
codegen-unit splitting (#5407) removed the module memory wall and the
string-init split removed the 68MB generated function, the residual wall is
the bundle's large minified USER functions (7-18MB closures): clang -O0
time is superlinear in single-function size. Full-outlining the IC diamonds
shrinks those functions at their source.

When full-outline is enabled (size-gated), the class-field-GET diamond
(inline precheck + guard call + fast slot load + by-name fallback + phi)
collapses to one `js_class_field_get_ic(...)` call returning the field
value. The runtime helper runs the same guard, reads the field slot as f64
on a pass (a plain number is self-boxing in nan-boxing, so raw-f64 and
boxed slots read identically — matching the inline `load double`), and
records + reads by name on a miss. The outline drops the inline path's
static raw-number type hint (result treated as a general JS value), which
is value-correct.

class-field GET is the largest remaining diamond kind on the bundle (~39K
sites). Gated by the existing PERRY_FULL_OUTLINE_IC; default off = unchanged.
Unit test covers both gate states; full codegen suite green.

Co-authored-by: Ralph Küpper <ralph2@skelpo.com>
proggeramlug pushed a commit that referenced this pull request Jun 18, 2026
…ERRY_SAVE_LL; clean temp objects on failure (#5407)
@proggeramlug

Copy link
Copy Markdown
Contributor Author

Thanks — addressed the review:

  • Major (cross-target ld -r): codegen-unit splitting now gates off for any explicit --target (and for emit_ir_only/PERRY_SAVE_LL), falling back to a single TU, since the host ld -r only merges host-format objects. Host builds (the large-bundle case) still split.
  • PERRY_SAVE_LL skip: same gate now disables splitting when it's set, so the whole-module dump works.
  • Temp leak on failure: the unit objects are tracked and cleaned on every exit path (extracted an inner worker so cleanup runs even on a mid-sequence ?).
  • Weak test note: the dup-symbol test deliberately validates the dedup (one symbol → one unit); the genuine cross-unit define/declare split is covered by render_codegen_units_partitions_and_links (two distinct functions → two units).

proggeramlug pushed a commit that referenced this pull request Jun 18, 2026
A large bundle interns ~190K strings, and emitting every string's init
(from-bytes + nanbox + global store + GC-root register) into one
`__perry_init_strings` block produced a single ~68MB function. clang -O0
(forced for oversized modules, #4880) can't compile a function that large
in practical time — its per-function passes are superlinear in size — so
this was the residual wall after codegen-unit splitting removed the module
MEMORY wall: a single giant FUNCTION that inter-function splitting can't
divide.

The string inits are independent (each handle is stored to its own global;
no SSA value flows between iterations), so emit them into chunk functions
of PERRY_STRING_INIT_CHUNK_SIZE entries (default 4000) and call those in
sequence from `__perry_init_strings`. The ~48 chunks are each ~1.4MB and,
combined with codegen-unit splitting (#5407), bin-pack evenly across units
instead of one unit carrying the 68MB monolith. Order is preserved (chunks
called in sequence, ops in order within a chunk), so GC-root registration
and any ordering stay identical.

Behavior-neutral for ordinary modules (a handful of strings → one chunk).
Full codegen suite green.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/perry-codegen/src/codegen/mod.rs`:
- Around line 2535-2538: The n_units decision logic at the assignment of n_units
should check three conditions to use single-unit mode (n_units = 1) instead of
deciding based on module_callable_count: the existing opts.emit_ir_only check, a
new check for the PERRY_SAVE_LL environment variable being set, and a
cross-compilation check for when opts.target.is_none(). Modify the condition so
that if any of these three conditions are true, set n_units to 1, otherwise call
decide_codegen_units with the hir. This ensures that PERRY_SAVE_LL paths reach
the save logic at line 2558 and that cross-compilation without explicit target
override uses the single linker invocation path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 6d540f14-6d75-442e-ae98-a1863bfae08e

📥 Commits

Reviewing files that changed from the base of the PR and between 6b63b2b and 3e4f12e.

📒 Files selected for processing (6)
  • crates/perry-codegen/src/codegen/helpers.rs
  • crates/perry-codegen/src/codegen/mod.rs
  • crates/perry-codegen/src/codegen/string_pool.rs
  • crates/perry-codegen/src/function.rs
  • crates/perry-codegen/src/linker.rs
  • crates/perry-codegen/src/module.rs
🚧 Files skipped from review as they are similar to previous changes (5)
  • crates/perry-codegen/src/codegen/helpers.rs
  • crates/perry-codegen/src/linker.rs
  • crates/perry-codegen/src/function.rs
  • crates/perry-codegen/src/module.rs
  • crates/perry-codegen/src/codegen/string_pool.rs

Comment on lines +2535 to +2538
let n_units = if opts.emit_ir_only {
1
} else {
decide_codegen_units(module_callable_count(hir))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if cross-compilation gating is implemented elsewhere
rg -n "PERRY_CODEGEN_UNITS|render_codegen_units|compile_units_to_object" --type rust -C 3 | head -100

Repository: PerryTS/perry

Length of output: 7863


🏁 Script executed:

# Check the full context around the n_units decision in codegen/mod.rs
sed -n '2520,2560p' crates/perry-codegen/src/codegen/mod.rs

Repository: PerryTS/perry

Length of output: 1664


🏁 Script executed:

# Check the full compile_units_to_object implementation in linker.rs
sed -n '280,320p' crates/perry-codegen/src/linker.rs

Repository: PerryTS/perry

Length of output: 1884


🏁 Script executed:

# Look for any environment variable checks related to PERRY_SAVE_LL or cross-compilation
rg -n "PERRY_SAVE_LL|is_cross|target_triple" crates/perry-codegen/src/ --type rust

Repository: PerryTS/perry

Length of output: 4435


🏁 Script executed:

# Check git log for PR references to cross-compilation and codegen units
git log --oneline --all --grep="5391\|cross-compil\|codegen.*unit" 2>/dev/null | head -20

Repository: PerryTS/perry

Length of output: 914


🏁 Script executed:

# Check if there are any comments in the code mentioning cross-compilation + codegen splitting
rg -n "cross.*compil|target.*arch" crates/perry-codegen/src/codegen/mod.rs crates/perry-codegen/src/linker.rs

Repository: PerryTS/perry

Length of output: 619


🏁 Script executed:

# Check commit messages for `#5391` to see if cross-compilation gating was mentioned
git log --all --format="%B" --grep="5391" 2>/dev/null | head -100

Repository: PerryTS/perry

Length of output: 5531


🏁 Script executed:

# Look for FIXME/TODO comments about cross-compilation in codegen units code
rg -n "FIXME|TODO|XXX" crates/perry-codegen/src/codegen/mod.rs crates/perry-codegen/src/linker.rs | grep -i "cross\|target\|compil\|unit"

Repository: PerryTS/perry

Length of output: 39


🏁 Script executed:

# Check if ld -r command gets any target-specific flags when cross-compiling
sed -n '310,340p' crates/perry-codegen/src/linker.rs

Repository: PerryTS/perry

Length of output: 1075


Add PERRY_SAVE_LL check to n_units decision logic.

The code comment at line 2532 states that both emit_ir_only and PERRY_SAVE_LL should use the single-text path, but the n_units decision at lines 2535–2537 only checks emit_ir_only. When PERRY_SAVE_LL is set, the code incorrectly proceeds to codegen splitting instead of taking the single-text path where PERRY_SAVE_LL is actually handled (line 2558). This causes the directory to be never populated because the split path returns early without reaching the PERRY_SAVE_LL save logic.

Additionally, cross-compilation with splitting should be gated: when cross-compiling (no --target flag override from host), ld -r at line ~320 uses the host linker by default, which may not support the target object format. Pass opts.target.is_some() into the n_units decision to force single-unit mode.

🔧 Proposed fix
+    let save_ll = std::env::var_os("PERRY_SAVE_LL").is_some();
+    let is_cross_compile = opts.target.is_some();
-    let n_units = if opts.emit_ir_only {
+    let n_units = if opts.emit_ir_only || save_ll || is_cross_compile {
         1
     } else {
         decide_codegen_units(module_callable_count(hir))
     };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/perry-codegen/src/codegen/mod.rs` around lines 2535 - 2538, The
n_units decision logic at the assignment of n_units should check three
conditions to use single-unit mode (n_units = 1) instead of deciding based on
module_callable_count: the existing opts.emit_ir_only check, a new check for the
PERRY_SAVE_LL environment variable being set, and a cross-compilation check for
when opts.target.is_none(). Modify the condition so that if any of these three
conditions are true, set n_units to 1, otherwise call decide_codegen_units with
the hir. This ensures that PERRY_SAVE_LL paths reach the save logic at line 2558
and that cross-compilation without explicit target override uses the single
linker invocation path.

Ralph Küpper added 5 commits June 18, 2026 21:03
A single LLVM translation unit per module makes clang OOM on large inputs
(a 13MB minified bundle lowers to one ~1.1GB .ll; clang -c needs ~15GB
RSS). This adds codegen-unit splitting: the populated LlModule renders to
N independent .ll texts, each compiled by clang -c separately (peak RSS
~whole/N) and merged with `ld -r` into one object — so compile_module's
single-Vec<u8> contract and the existing one-object link path are unchanged.

Mechanism (LlModule::render_codegen_units):
- functions partitioned into N contiguous buckets;
- the full string-constant + global set carried in every unit with local
  and bare-external DEFINITIONS promoted to linkonce_odr (linker keeps one
  copy; globals are ~1.4% of a large module's IR, so duplication is cheap);
- each unit gets external declares for every function it calls but does not
  define (deduped by name; existing declarations win), so cross-unit calls
  resolve; the lone internal init/wrapper is promoted to external;
- shared attribute groups + metadata replicated so #N/!N references resolve.

Gating (decide_codegen_units): PERRY_CODEGEN_UNITS=N forces N (1 disables);
otherwise auto by callable count — 1 unit below a floor, then
ceil(callables / PERRY_CODEGEN_UNIT_SIZE) capped at 48. Default is 1 unit
for ordinary modules — zero behavior change.

Validated: a 2-unit split of a multi-function program with cross-unit calls
+ shared string/class globals compiles, links, and runs identically to the
1-unit build (`hi world` / 20). Unit tests cover the partition (define-once,
cross-unit declares, linkonce_odr promotion) and the n=1 == to_ir identity.

Follow-on to the IR-efficiency roadmap #5334; tracks #5391.
Minified bundles can contain two distinct classes that sanitize to the
same name (e.g. two classes `j`), so their mangled method symbols collide
(`perry_method_..._j__getElementsByTagName` defined twice). LLVM rejects
the redefinition and the whole module fails to compile — independent of
codegen-unit splitting (the single-TU `to_ir` had the same latent bug; it
only surfaced once a real bundle reached clang).

Add `deduped_function_refs` (first occurrence per name) and route both
`to_ir` and `render_codegen_units` through it, so each symbol is emitted
once. Calls to the duplicate resolve to the first definition — a dispatch
ambiguity limited to genuinely name-colliding members; proper
disambiguation by class id is a separate concern. No-op for ordinary
modules (no duplicate names). Unit test covers both render paths.
Contiguous chunk-by-function-count partitioning clumped functions
unevenly: on the cli.js bundle one unit ballooned to 139MB (it caught the
biggest functions) while others were ~50MB, and clang -O0 time is
superlinear in unit size, so that one unit dominated the whole compile.

Switch to greedy largest-first bin-packing by `LlFunction::estimated_ir_bytes`
(a cheap instruction-byte sum, no second render): each function is placed
into the currently-smallest unit, so big functions are isolated and the
rest stay even. This bounds the max unit size to ~total/N + the single
largest function.

NB: a function larger than total/N is irreducible by inter-function
splitting — the cli.js bundle has a single 68MB minified IIFE whose clang
-O0 time is the residual bottleneck (the intra-function #4880 problem,
distinct from the module memory wall this feature removes). Balancing
isolates it so the other units compile fast and in bounded memory.

Unit test: a giant function + several tiny ones splits so the giant is not
clumped with the small set.
A large bundle interns ~190K strings, and emitting every string's init
(from-bytes + nanbox + global store + GC-root register) into one
`__perry_init_strings` block produced a single ~68MB function. clang -O0
(forced for oversized modules, #4880) can't compile a function that large
in practical time — its per-function passes are superlinear in size — so
this was the residual wall after codegen-unit splitting removed the module
MEMORY wall: a single giant FUNCTION that inter-function splitting can't
divide.

The string inits are independent (each handle is stored to its own global;
no SSA value flows between iterations), so emit them into chunk functions
of PERRY_STRING_INIT_CHUNK_SIZE entries (default 4000) and call those in
sequence from `__perry_init_strings`. The ~48 chunks are each ~1.4MB and,
combined with codegen-unit splitting (#5407), bin-pack evenly across units
instead of one unit carrying the 68MB monolith. Order is preserved (chunks
called in sequence, ops in order within a chunk), so GC-root registration
and any ordering stay identical.

Behavior-neutral for ordinary modules (a handful of strings → one chunk).
Full codegen suite green.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant