You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Unicode property escape \p{Surrogate} (general category Cs) is rejected at regex-compile time: new RegExp('\\p{Surrogate}', 'u') → SyntaxError: Invalid regular expression: /\p{Surrogate}/: invalid pattern. Every other property in the same family works.
Root cause: Perry compiles JS regexes to the Rust regex crate, which matches over Unicode scalar values (UTF-8) — surrogate code points (U+D800–U+DFFF) can't occur there, so the crate rejects the Surrogate/Cs property outright instead of treating it as a never-matching class.
This is the next ink wall after #4877 (Segmenter). string-width/index.js builds two regexes at module top-level that include \p{Surrogate}, so importing string-width (→ ink) throws at init before any user code:
OK /\p{Control}/u
OK /\p{Mark}/u
OK /\p{Format}/u
OK /\p{Default_Ignorable_Code_Point}/u
OK /[\p{Control}]/v ← the /v (unicodeSets) flag itself is fine
FAIL /\p{Surrogate}/u → invalid pattern
FAIL /^(?:…|\p{Surrogate})+$/u → invalid pattern (same with /v)
So it's specifically \p{Surrogate} — not the /v flag, not the other properties.
Expected (Node)
/\p{Surrogate}/u compiles and matches lone surrogate code units ("\uD800"). Node/V8 matches per UTF-16 code unit, so surrogates are matchable.
Suggested fix
In the JS→Rust regex translation (crates/perry-runtime/src/regex/compile.rs / grammar.rs), special-case \p{Surrogate} / \p{gc=Cs} / \P{Surrogate} rather than passing it through to the Rust crate:
Minimum (unblocks ink): rewrite \p{Surrogate} to a never-matching subexpression (e.g. (?!) / empty class) and \P{Surrogate} to "any". For valid (non-surrogate) input — which is all string-width ever sees — this is behavior-preserving, and these regexes only ever test for zero-width/ignorable clusters.
Any package using \p{Surrogate} for input sanitization / width calculation.
Sits at the intersection of the two known categorical gaps in CLAUDE.md (Rust regex crate limits + lone-surrogate/WTF-8 handling), but is a narrow, self-contained translation fix.
Summary
The Unicode property escape
\p{Surrogate}(general categoryCs) is rejected at regex-compile time:new RegExp('\\p{Surrogate}', 'u')→SyntaxError: Invalid regular expression: /\p{Surrogate}/: invalid pattern. Every other property in the same family works.Root cause: Perry compiles JS regexes to the Rust
regexcrate, which matches over Unicode scalar values (UTF-8) — surrogate code points (U+D800–U+DFFF) can't occur there, so the crate rejects theSurrogate/Csproperty outright instead of treating it as a never-matching class.This is the next ink wall after #4877 (Segmenter).
string-width/index.jsbuilds two regexes at module top-level that include\p{Surrogate}, so importingstring-width(→ink) throws at init before any user code:Isolation (exactly one token is at fault)
So it's specifically
\p{Surrogate}— not the/vflag, not the other properties.Expected (Node)
/\p{Surrogate}/ucompiles and matches lone surrogate code units ("\uD800"). Node/V8 matches per UTF-16 code unit, so surrogates are matchable.Suggested fix
In the JS→Rust regex translation (
crates/perry-runtime/src/regex/compile.rs/grammar.rs), special-case\p{Surrogate}/\p{gc=Cs}/\P{Surrogate}rather than passing it through to the Rust crate:\p{Surrogate}to a never-matching subexpression (e.g.(?!)/ empty class) and\P{Surrogate}to "any". For valid (non-surrogate) input — which is allstring-widthever sees — this is behavior-preserving, and these regexes only ever test for zero-width/ignorable clusters.\p{Surrogate}to the code-unit range[\u{D800}-\u{DFFF}]against the WTF-8 representation so lone surrogates actually match. Heavier; not needed for the ink path.Impact
string-width@7+→wrap-ansi,cli-truncate,slice-ansi→ ink end-to-end (Compileink(React-based TUI framework) end-to-end viaperry.compilePackages#348). After this, ink's next gate is yoga-layout's WASM runtime (the documented out-of-scope rock).\p{Surrogate}for input sanitization / width calculation.regexcrate limits + lone-surrogate/WTF-8 handling), but is a narrow, self-contained translation fix.Related
ink(React-based TUI framework) end-to-end viaperry.compilePackages#348 (ink end-to-end smoke test — where this surfaced)Intl.Segmenter(grapheme segmentation) —new Intl.Segmenter()throws, blocks string-width/wrap-ansi/ink #4877 / feat(intl): implement Intl.Segmenter (grapheme/word/sentence) — closes #4877 #4882 (previous ink wall:Intl.Segmenter)new MessageChannel()global constructor unlinked/non-constructible — routes to stdlib symbol, breaks React scheduler init #4873 / fix(hir): globalnew MessageChannel()routes to always-linked runtime constructor (#4873) #4875 (new MessageChannel())