Skip to content

perf(codegen): chunk the ENTIRE module-init function, not just the string loop (#5391)#5418

Merged
proggeramlug merged 1 commit into
mainfrom
feat/chunk-full-init
Jun 19, 2026
Merged

perf(codegen): chunk the ENTIRE module-init function, not just the string loop (#5391)#5418
proggeramlug merged 1 commit into
mainfrom
feat/chunk-full-init

Conversation

@proggeramlug

@proggeramlug proggeramlug commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Problem

#5407 split the string-allocation loop of __perry_init_strings_<prefix> into chunk functions. But that function ALSO emits every closure/class/function registration call (js_register_closure_length, js_register_function_source, js_register_class_id, js_build_class_keys_array, …), and those ~250K calls still dumped into the function's single basic block.

On a large bundle that left __perry_init_strings as one ~32MB basic block of ~405K instructions (55K register_closure_length + 55K register_closure_strict + 55K register_closure_arity + 43K register_function_source + …). clang -O0 — forced for oversized modules (#4880) — is catastrophically superlinear on a single huge basic block: this one function alone took ~36 minutes to compile, dominating the entire build and defeating the codegen-unit memory win (it's one indivisible function, so unit-splitting can't touch it).

Empirically isolated (synthetic clang -c -O0 timings): a 10.7MB single-basic-block function = 15s and climbing superlinearly, vs. the same byte size as branchy functions = ~2s. The init function is the pathological single-block case.

Fix

Introduce InitChunker and route all init operations — string allocation AND every one of the 21 registration loops — through it, spilling them into a sequence of small __perry_init_strings_<prefix>_chunkN functions (default 4000 ops/chunk, PERRY_STRING_INIT_CHUNK_SIZE). The entry function just calls the chunks in order.

Every init op is independent — each writes its own global or a runtime registry; no SSA value flows between ops — so chunking at op boundaries is safe and order-preserving (chunks run in sequence; ops run in order within a chunk).

Result (measured on a 13MB bundle)

  • __perry_init_strings main function: 32MB → 0.01MB (now 104 chunk calls); 104 *_chunkN functions of ~1.4MB each.
  • The previously ~36-min single-block compile is eliminated; the chunks are ordinary small functions clang compiles in milliseconds and bin-pack evenly across codegen units.
  • E2E: a program exercising classes + closures + full-outline still runs correctly (registration order preserved across chunks).

Follow-up to #5407 (same #5391 effort).

Summary by CodeRabbit

  • Refactor
    • Optimized the initialization sequence generation to improve performance when handling large-scale code compilation tasks.
    • Consolidated internal initialization logic into a more efficient and maintainable structure.

… just strings) (#5391)

The string loop was chunked but the ~250K closure/class/function registration
calls still dumped into the single __perry_init_strings block (405K instrs, one
basic block, ~32MB), and clang -O0 is catastrophically superlinear on a single
huge block (~36 min). Route ALL init ops — string allocation AND every
registration loop — through InitChunker, which spills them into small *_chunkN
functions called in order from the entry. All ops are independent (each writes
its own global / a runtime registry; no SSA crosses), so chunking is safe and
order-preserving.
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: bea3b70a-4c0a-45e9-a609-84bcbd591310

📥 Commits

Reviewing files that changed from the base of the PR and between 0cc6ee1 and 99d838d.

📒 Files selected for processing (1)
  • crates/perry-codegen/src/codegen/string_pool.rs

📝 Walkthrough

Walkthrough

Introduces a new private InitChunker struct in string_pool.rs that creates sequential *_chunkN LLVM functions, controls rollover via a configurable op-count limit, and emits a ret void terminator per chunk. All registration phases in emit_string_pool are migrated from manual per-string chunk-splitting to InitChunker, and the entry function __perry_init_strings_<prefix> is built by calling each chunk in sequence.

Changes

InitChunker-based chunk splitting in emit_string_pool

Layer / File(s) Summary
InitChunker struct and API
crates/perry-codegen/src/codegen/string_pool.rs
Adds the private InitChunker struct with new, roll_if_full, current_block, and finish methods. Chunk rollover threshold is read from PERRY_STRING_INIT_CHUNK_SIZE (default 4000); each chunk is a new LLVM function terminated with ret void.
String pool and all registration phase migrations
crates/perry-codegen/src/codegen/string_pool.rs
Removes the prior bespoke string_chunk_names/cur_idx logic from the string allocation loop and replaces it with InitChunker. Applies roll_if_full + current_block across every remaining registration phase (display names, function sources, class keys/parents/methods/static methods/constructors/ids/names, getters/setters, closure/wrapper arity/length/flags, and async/generator/strict variants). Ends by calling chunker.finish() and wiring each chunk into the __perry_init_strings_<prefix> entry function.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • PerryTS/perry#5407: Directly overlaps — also refactors emit_string_pool in string_pool.rs to split the monolithic __perry_init_strings_<prefix> initializer into *_chunkN helper functions using the same chunking mechanism.

Poem

🐇 Hop, hop, the chunks are small,
No giant init to stall the hall!
roll_if_full, a tidy trick,
Each chunkN built with a click.
The rabbit cheers — no stack overflow,
Just neat little functions, all in a row! 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: chunking the entire module-init function, expanding prior work that only chunked the string loop.
Description check ✅ Passed The PR description is comprehensive, covering the problem, fix, and measured results. However, the Test plan checklist items are not marked as completed, which is a template requirement.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/chunk-full-init

Comment @coderabbitai help to get the list of available commands and usage tips.

@proggeramlug proggeramlug merged commit 73d1fed into main Jun 19, 2026
15 checks passed
@proggeramlug proggeramlug deleted the feat/chunk-full-init branch June 19, 2026 03:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant