regex: `\p{RGI_Emoji}` (property of strings, /v flag) rejected — Rust regex crate has no string-properties; blocks string-width/ink

## Summary

The Unicode **property of strings** `\p{RGI_Emoji}` (ES2024 `/v`/unicodeSets only) is rejected at compile time: `/^\p{RGI_Emoji}$/v` → `SyntaxError: Invalid regular expression: /^\p{RGI_Emoji}$/: invalid pattern`.

Unlike ordinary `\p{…}` character properties (which match a single code point), a *property of strings* can match a **multi-code-point cluster** (emoji ZWJ sequences, flags, keycaps, skin-tone modifiers). Rust's `regex` crate has no support for properties of strings, so the token is unrepresentable as-is.

This is the **next ink wall after #4884** (`\p{Surrogate}`). `string-width/index.js:25` builds this regex **at module top-level**, so importing `string-width` (→ `ink`) throws at init before any user code:

```js
// node_modules/string-width/index.js:25
const rgiEmojiRegex = /^\p{RGI_Emoji}$/v;   // "is this whole cluster one RGI emoji?" → width 2
```

It's the only property-of-strings in the ink dep tree.

## What works vs. what doesn't (probed on the #4887 build)

```
OK    /\p{Extended_Pictographic}/u      → true on "😀"
OK    /\p{Emoji}/u                      → true
OK    /\p{Emoji_Presentation}/u         → true
FAIL  /^\p{RGI_Emoji}$/v                → invalid pattern
```

So the single-code-point emoji building blocks Rust *does* support — only the string-property aggregate is missing. That makes a translation feasible without new Unicode tables.

## Expected (Node)

`/^\p{RGI_Emoji}$/v.test("👍")` → `true`; `.test("👨‍👩‍👧")` (ZWJ family) → `true`; `.test("🇬🇧")` (flag) → `true`; `.test("ab")` → `false`.

## Suggested fix (JS→Rust regex translation, `crates/perry-runtime/src/regex/{grammar,compile}.rs`)

Per UTS #51, `RGI_Emoji = Basic_Emoji | Emoji_Keycap_Sequence | RGI_Emoji_Flag_Sequence | RGI_Emoji_Tag_Sequence | RGI_Emoji_Modifier_Sequence | RGI_Emoji_ZWJ_Sequence`. Two paths:

- **Pragmatic (unblocks ink, no data tables):** expand `\p{RGI_Emoji}` to an alternation over the supported primitives — e.g.

  ```
  (?:
      [\u{1F1E6}-\u{1F1FF}]{2}                                   # flag pair (regional indicators)
    | \p{Emoji}️⃣                                       # keycap
    | \p{Extended_Pictographic}[\u{1F3FB}-\u{1F3FF}]?️?       # base (+ skin tone, + VS16)
        (?:‍\p{Extended_Pictographic}[\u{1F3FB}-\u{1F3FF}]?️?)*  # ZWJ continuation
  )
  ```

  Anchored as `^(…)$` this classifies single emoji clusters correctly for `string-width`'s width-2 decision (it already segments into clusters via `Intl.Segmenter` first, so the input is always one cluster). Approximate at the edges (rare tag sequences) but behavior-preserving for the width use case.

- **Faithful:** generate the actual RGI sequence set from the Unicode `emoji-sequences.txt` / `emoji-zwj-sequences.txt` data into a table and emit an exact alternation (or a separate matcher). Heavier; only needed if exact `\p{RGI_Emoji}` semantics matter beyond width.

`\P{RGI_Emoji}` and use inside larger `/v` set operations can stay unsupported initially (string-width doesn't need them) — but should error clearly rather than mis-compile.

## Impact

- `string-width@7+` → `wrap-ansi`, `cli-truncate`, `slice-ansi` → **ink** end-to-end (#348). This is the last `string-width` init regex; after it, ink's next gate is yoga-layout's WASM runtime (the documented out-of-scope rock).
- Any width/emoji-aware CLI code using modern `string-width`.

## Related

- #348 (ink end-to-end smoke test — where this surfaced)
- #4884 / #4887 (previous ink wall: `\p{Surrogate}`)
- #4877 / #4882 (`Intl.Segmenter`), #4873 / #4875 (`new MessageChannel()`)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

regex: `\p{RGI_Emoji}` (property of strings, /v flag) rejected — Rust regex crate has no string-properties; blocks string-width/ink #4889

Summary

What works vs. what doesn't (probed on the #4887 build)

Expected (Node)

Suggested fix (JS→Rust regex translation, `crates/perry-runtime/src/regex/{grammar,compile}.rs`)

Impact

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

regex: \p{RGI_Emoji} (property of strings, /v flag) rejected — Rust regex crate has no string-properties; blocks string-width/ink #4889

Description

Summary

What works vs. what doesn't (probed on the #4887 build)

Expected (Node)

Suggested fix (JS→Rust regex translation, crates/perry-runtime/src/regex/{grammar,compile}.rs)

Impact

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

regex: `\p{RGI_Emoji}` (property of strings, /v flag) rejected — Rust regex crate has no string-properties; blocks string-width/ink #4889

Suggested fix (JS→Rust regex translation, `crates/perry-runtime/src/regex/{grammar,compile}.rs`)