feat(compile): per-app feature gating for regex/Temporal/URL/normalize/segmenter (binary size)#5153
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (36)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (32)
📝 WalkthroughWalkthroughIntroduces five Cargo feature flags ( ChangesOptional runtime subsystem feature gating
🎯 4 (Complex) | ⏱️ ~60 minutes
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/perry-runtime/src/object/global_this.rs (1)
5005-5014:⚠️ Potential issue | 🟠 MajorVerify that
Temporalis conditionally excluded fromglobalThiswhen the temporal feature is disabled.When
feature = "temporal"is disabled,"Temporal"remains in theGLOBAL_THIS_BUILTIN_NAMESPACESarray (line 178 ofcrates/perry-runtime/src/object/global_this_tables.rs). The namespace-installation loop at line 4968 still processes it, but the cfg-gated match arm at lines 5005–5009 is compiled out, leaving the default_ => {}handler. This installs an empty namespace object onglobalThis.Temporalinstead of omitting it entirely.Either:
- Move the cfg guard to
GLOBAL_THIS_BUILTIN_NAMESPACESitself (conditionally include"Temporal"only when enabled), or- Ensure the match arm handles the feature-disabled case explicitly (e.g.,
#[cfg(not(feature = "temporal"))] "Temporal" => skip_or_error)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/perry-runtime/src/object/global_this.rs` around lines 5005 - 5014, The `"Temporal"` entry in the `GLOBAL_THIS_BUILTIN_NAMESPACES` array is processed unconditionally during namespace installation, but the corresponding match arm for `"Temporal"` (which calls `install_temporal_namespace`) is gated behind `#[cfg(feature = "temporal")]`. When the temporal feature is disabled, this match arm is compiled out, causing the default case `_ => {}` to install an empty namespace object instead of excluding it. To fix this, either conditionally include `"Temporal"` in the `GLOBAL_THIS_BUILTIN_NAMESPACES` array itself by wrapping its declaration with `#[cfg(feature = "temporal")]`, or add an explicit feature-gated match arm for the `"Temporal"` case that handles both the enabled case (with `install_temporal_namespace`) and the disabled case (with an appropriate skip or error handler).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@Cargo.toml`:
- Around line 286-296: Increment the patch version in the [workspace.package]
section of Cargo.toml from 0.5.1168 to 0.5.1169 to reflect the change made to
the perry-runtime dependency configuration, as required by the repository rule
for all modifications landing on main.
In `@crates/perry-runtime/src/intl.rs`:
- Around line 640-651: The fallback implementation in the intl-segmenter feature
gate is passing None for the word_like parameter to make_segment_record on every
call, causing the isWordLike property to be missing from segment records when
word granularity is used. Instead of always passing None, check if the
granularity is "word" and if so, determine whether the segment is word-like
(typically by checking if it contains word characters) and pass that computed
value to make_segment_record; for other granularities, continue passing None to
maintain consistency with the feature-enabled path.
In `@crates/perry/src/commands/compile/collect_modules.rs`:
- Around line 1870-1911: The current implementation in the Temporal, URL, and
Segmenter detectors uses overly broad substring matching on the HIR debug
output, which can trigger false positives and unnecessarily enable heavy runtime
features. Replace the generic substring checks (such as "Temporal", "Url"/"URL",
and "Segmenter") with more targeted detection logic that matches
feature-specific HIR tokens instead. Consider either identifying unique HIR
token patterns specific to each feature or implementing a targeted walk through
the HIR structure (hir_module.init and hir_module.functions) rather than relying
on substring matches of the entire debug format string.
---
Outside diff comments:
In `@crates/perry-runtime/src/object/global_this.rs`:
- Around line 5005-5014: The `"Temporal"` entry in the
`GLOBAL_THIS_BUILTIN_NAMESPACES` array is processed unconditionally during
namespace installation, but the corresponding match arm for `"Temporal"` (which
calls `install_temporal_namespace`) is gated behind `#[cfg(feature =
"temporal")]`. When the temporal feature is disabled, this match arm is compiled
out, causing the default case `_ => {}` to install an empty namespace object
instead of excluding it. To fix this, either conditionally include `"Temporal"`
in the `GLOBAL_THIS_BUILTIN_NAMESPACES` array itself by wrapping its declaration
with `#[cfg(feature = "temporal")]`, or add an explicit feature-gated match arm
for the `"Temporal"` case that handles both the enabled case (with
`install_temporal_namespace`) and the disabled case (with an appropriate skip or
error handler).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 7398d1a8-bb88-4997-9c66-ef712a612cdd
📒 Files selected for processing (35)
Cargo.tomlcrates/perry-runtime/Cargo.tomlcrates/perry-runtime/src/builtins/arithmetic.rscrates/perry-runtime/src/builtins/formatting.rscrates/perry-runtime/src/builtins/globals.rscrates/perry-runtime/src/date.rscrates/perry-runtime/src/fs/dir_glob_watch.rscrates/perry-runtime/src/intl.rscrates/perry-runtime/src/json/stringify.rscrates/perry-runtime/src/object/assert.rscrates/perry-runtime/src/object/class_registry.rscrates/perry-runtime/src/object/field_get_set.rscrates/perry-runtime/src/object/global_this.rscrates/perry-runtime/src/object/iterator_prototypes.rscrates/perry-runtime/src/object/mod.rscrates/perry-runtime/src/object/native_call_method.rscrates/perry-runtime/src/object/object_ops.rscrates/perry-runtime/src/object/regex_proto_thunks.rscrates/perry-runtime/src/path.rscrates/perry-runtime/src/regex.rscrates/perry-runtime/src/regex/match_all.rscrates/perry-runtime/src/regex/replace_fn.rscrates/perry-runtime/src/string/compare.rscrates/perry-runtime/src/string/mod.rscrates/perry-runtime/src/string/split.rscrates/perry-runtime/src/symbol.rscrates/perry-runtime/src/temporal/mod.rscrates/perry-runtime/src/url/mod.rscrates/perry-runtime/src/url/node_compat.rscrates/perry-runtime/src/url/url_class.rscrates/perry-runtime/src/value/dyn_index.rscrates/perry-runtime/src/value/to_string.rscrates/perry/src/commands/compile/collect_modules.rscrates/perry/src/commands/compile/optimized_libs.rscrates/perry/src/commands/compile/types.rs
5944d40 to
3241f2a
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/perry-runtime/src/regex.rs (1)
421-440:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winStore an owned pattern header here too.
js_regexp_newnow materializesflags_ptr, butpattern_ptrstill points at the caller'sStringHeader. The fancy-regex path later dereferences(*re).pattern_ptrinlookup_fancy_regex, so a RegExp built from a temporary pattern can still read freed memory on the next exec/search/match/replace/split. Either allocate a stablepattern_ptrhere as well, or switch the fancy-cache lookup toREGEX_SOURCE_TABLE.💡 Suggested fix
- let canonical_flags_ptr = js_string_from_str(flags_str); + let owned_pattern_ptr = js_string_from_str(pattern_str); + let canonical_flags_ptr = js_string_from_str(flags_str); @@ - (*ptr).pattern_ptr = pattern; + (*ptr).pattern_ptr = owned_pattern_ptr; (*ptr).flags_ptr = canonical_flags_ptr;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/perry-runtime/src/regex.rs` around lines 421 - 440, The issue is that in the `js_regexp_new` function, the `pattern_ptr` is stored directly from the caller's temporary input without materializing a stable, owned copy (unlike `canonical_flags_ptr` which is properly materialized). This causes use-after-free when `lookup_fancy_regex` later dereferences `(*re).pattern_ptr`. Materialize a stable pattern header by calling `js_string_from_str` on the pattern string (similar to how `canonical_flags_ptr` is created from `flags_str`), and then store that stable pointer in `(*ptr).pattern_ptr` instead of storing the caller's temporary pattern directly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@crates/perry-runtime/src/regex.rs`:
- Around line 421-440: The issue is that in the `js_regexp_new` function, the
`pattern_ptr` is stored directly from the caller's temporary input without
materializing a stable, owned copy (unlike `canonical_flags_ptr` which is
properly materialized). This causes use-after-free when `lookup_fancy_regex`
later dereferences `(*re).pattern_ptr`. Materialize a stable pattern header by
calling `js_string_from_str` on the pattern string (similar to how
`canonical_flags_ptr` is created from `flags_str`), and then store that stable
pointer in `(*ptr).pattern_ptr` instead of storing the caller's temporary
pattern directly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 7c967a29-43c4-4715-9c5c-e0c41e053c0d
📒 Files selected for processing (36)
Cargo.tomlcrates/perry-runtime/Cargo.tomlcrates/perry-runtime/src/builtins/arithmetic.rscrates/perry-runtime/src/builtins/formatting.rscrates/perry-runtime/src/builtins/globals.rscrates/perry-runtime/src/date.rscrates/perry-runtime/src/fs/dir_glob_watch.rscrates/perry-runtime/src/intl.rscrates/perry-runtime/src/json/stringify.rscrates/perry-runtime/src/object/assert.rscrates/perry-runtime/src/object/class_registry.rscrates/perry-runtime/src/object/field_get_set.rscrates/perry-runtime/src/object/global_this.rscrates/perry-runtime/src/object/iterator_prototypes.rscrates/perry-runtime/src/object/mod.rscrates/perry-runtime/src/object/native_call_method.rscrates/perry-runtime/src/object/object_ops.rscrates/perry-runtime/src/object/regex_proto_thunks.rscrates/perry-runtime/src/path.rscrates/perry-runtime/src/regex.rscrates/perry-runtime/src/regex/match_all.rscrates/perry-runtime/src/regex/replace_fn.rscrates/perry-runtime/src/string/compare.rscrates/perry-runtime/src/string/mod.rscrates/perry-runtime/src/string/split.rscrates/perry-runtime/src/symbol.rscrates/perry-runtime/src/temporal/mod.rscrates/perry-runtime/src/url/mod.rscrates/perry-runtime/src/url/node_compat.rscrates/perry-runtime/src/url/url_class.rscrates/perry-runtime/src/value/dyn_index.rscrates/perry-runtime/src/value/to_string.rscrates/perry/src/commands/compile/collect_modules.rscrates/perry/src/commands/compile/optimized_libs.rscrates/perry/src/commands/compile/types.rsscripts/check_file_size.sh
✅ Files skipped from review due to trivial changes (1)
- crates/perry-runtime/src/symbol.rs
🚧 Files skipped from review as they are similar to previous changes (33)
- crates/perry-runtime/src/builtins/arithmetic.rs
- crates/perry-runtime/src/object/iterator_prototypes.rs
- crates/perry-runtime/src/regex/match_all.rs
- crates/perry-runtime/src/date.rs
- crates/perry-runtime/src/string/split.rs
- crates/perry-runtime/src/json/stringify.rs
- crates/perry-runtime/src/url/mod.rs
- crates/perry-runtime/src/object/object_ops.rs
- crates/perry-runtime/src/object/class_registry.rs
- Cargo.toml
- crates/perry-runtime/src/object/field_get_set.rs
- crates/perry-runtime/src/builtins/globals.rs
- crates/perry-runtime/src/object/mod.rs
- crates/perry/src/commands/compile/collect_modules.rs
- crates/perry-runtime/src/object/regex_proto_thunks.rs
- crates/perry-runtime/src/string/compare.rs
- crates/perry-runtime/src/string/mod.rs
- crates/perry/src/commands/compile/optimized_libs.rs
- crates/perry-runtime/src/builtins/formatting.rs
- crates/perry-runtime/src/object/assert.rs
- crates/perry-runtime/src/value/to_string.rs
- crates/perry-runtime/src/value/dyn_index.rs
- crates/perry-runtime/Cargo.toml
- crates/perry-runtime/src/url/node_compat.rs
- crates/perry-runtime/src/temporal/mod.rs
- crates/perry-runtime/src/object/native_call_method.rs
- crates/perry-runtime/src/url/url_class.rs
- crates/perry/src/commands/compile/types.rs
- crates/perry-runtime/src/path.rs
- crates/perry-runtime/src/intl.rs
- crates/perry-runtime/src/regex/replace_fn.rs
- crates/perry-runtime/src/object/global_this.rs
- crates/perry-runtime/src/fs/dir_glob_watch.rs
…e/segmenter Gate five heavy runtime subsystems behind opt-in cargo features so a compiled binary links only what the program actually uses. Mirrors the existing `wasm-host` mechanism: the compiler detects usage in HIR and forwards `perry-runtime/<feature>` to the auto-optimize runtime build. regex-engine regex + fancy-regex ~1.2 MB temporal temporal_rs + tz/calendar deps ~580 KB (JS Date is a separate impl) url-engine url + idna + transitive percent_encoding ~195 KB string-normalize unicode-normalization ~113 KB intl-segmenter unicode-segmentation ~73 KB A `console.log` hello-world drops ~2.5 MB (all five gates off); each engine auto-enables on use with identical behavior. No speed trade-off — this is pure dead-code elimination for code paths a given program never reaches. Detection (collect_modules) sets ctx.uses_* flags by grepping the HIR Debug form for the relevant tokens (regex literal / RegExp / .match/.matchAll/.search / glob; Temporal; URL/URLSearchParams/URLPattern; .normalize; Intl.Segmenter), forwarded + baked into the auto-optimize cache key in optimized_libs. Each engine keeps its identity/display layer always compiled (so value formatting / JSON / instanceof on non-engine values still work); the engine code and the engine branches inside always-linked dispatchers are #[cfg]-gated with the existing non-engine path as the fallback. The workspace perry-runtime dependency is set to default-features=false so dependency edges (perry-stdlib, ext crates, the perry binary) don't force the heavy default features back on during the auto-optimize --no-default-features build (cargo unifies features additively). Plain `cargo build` / `cargo test --workspace` and the shipped prebuilt still build perry-runtime as a *selected* package, so its own `default` applies and they keep every engine.
3241f2a to
fc2bb4c
Compare
What
Gate five heavy runtime subsystems behind opt-in cargo features so a compiled binary links only what the program actually uses. Mirrors the existing
wasm-hostmechanism: the compiler detects usage in HIR and forwardsperry-runtime/<feature>to the auto-optimize runtime build.regex-engineregex+fancy-regexRegExp/.match/.matchAll/.search/ glob APItemporaltemporal_rs+ tz/calendar depsTemporal.*(JSDateis a separate impl — unaffected)url-engineurl+idna(+ transitivepercent_encoding)new URL/URLSearchParams/URLPattern/node:urlstring-normalizeunicode-normalizationString.prototype.normalizeintl-segmenterunicode-segmentationIntl.SegmenterResult
A
console.loghello-world drops ~2.5 MB (all five gates off). Each engine auto-enables on use with identical behavior. No speed trade-off — this is pure dead-code elimination for paths a given program never reaches (noopt-levelchange).Verified per-engine (clean build off this base): hello/string-ops/Date 5.5 MB · regex 6.9 MB (
12/34,a#b#c#, …) · Temporal 6.3 MB (2020) · URL 5.7 MB (example.com 1) · normalize 5.6 MB (1).DateandAbortController(which share modules with gated code) work with the engines off.How
collect_modules):ctx.uses_*flags set by grepping the HIR Debug form for the relevant tokens; forwarded + baked into the auto-optimize cache key (optimized_libs).instanceofon non-engine values still work); the engine code + the engine branches inside always-linked dispatchers are#[cfg]-gated with the existing non-engine path as the fallback.perry-runtimedep isdefault-features = falseso dependency edges (perry-stdlib, ext crates, the perry binary) don't force the heavy default features back on during the auto-optimize--no-default-featuresbuild. Plaincargo build/cargo test --workspaceand the shipped prebuilt still buildperry-runtimeas a selected package, so its owndefaultapplies and they keep every engine (test coverage + out-of-tree correctness preserved).All five features are in
default, so behavior is unchanged for any build that isn't the per-app auto-optimize path.Summary by CodeRabbit
Release Notes
String.prototype.normalize, andIntl.Segmenter, and include only the needed runtime capabilities.