Skip to content

feat(intl): implement Intl.Segmenter (grapheme/word/sentence) — closes #4877#4882

Merged
proggeramlug merged 1 commit into
mainfrom
worktree-intl-segmenter-4877
Jun 10, 2026
Merged

feat(intl): implement Intl.Segmenter (grapheme/word/sentence) — closes #4877#4882
proggeramlug merged 1 commit into
mainfrom
worktree-intl-segmenter-4877

Conversation

@proggeramlug

Copy link
Copy Markdown
Contributor

Summary

Implements Intl.Segmenter, the last ANSI-width dependency wall before ink (#348). string-width@7+ / wrap-ansi@9+ call new Intl.Segmenter() at module top-level, so before this, merely importing them (and therefore ink) threw TypeError: is not a constructor.

Closes #4877.

What changed

crates/perry-runtime/src/intl.rs — adds a Segmenter constructor alongside the existing NumberFormat / DateTimeFormat / Collator, reusing their bound-thunk + prototype pattern:

  • new Intl.Segmenter(locales?, { granularity }) — defaults to grapheme; an invalid granularity throws the spec RangeError.
  • .segment(str) returns a JS array of { segment, index, input, isWordLike? } records — iterable / spreadable, covering both [...seg.segment(s)] and for (const {segment} of seg.segment(s)) (the shapes string-width / wrap-ansi actually use). index is the UTF-16 code-unit offset per spec; isWordLike is emitted only for word granularity.
  • .resolvedOptions(){ locale, granularity }.

Backed by the pure-Rust unicode-segmentation crate (UAX #29 — extended grapheme clusters incl. emoji ZWJ sequences / combining marks / regional-indicator flags, word, and sentence boundaries). It was already in our lock graph via convert_case; promoted to a direct perry-runtime dependency.

No codegen allowlist change was needed: new Intl.Segmenter() is a member-expression construct that resolves through the runtime Intl namespace, exactly like the existing new Intl.NumberFormat().

Verification (Perry vs node)

Byte-for-byte match against Node on:

  • The issue's exact repro: typeof Intl.Segmenterfunction; [...s.segment('a👨‍👩‍👧b')].map(x => x.segment)[ 'a', '👨‍👩‍👧', 'b' ]
  • word granularity with isWordLike, sentence granularity, resolvedOptions()
  • UTF-16 index on astral chars (😀 → next index 2)
  • The RangeError message for an invalid granularity
  • The for-of destructuring form
  • Existing NumberFormat / DateTimeFormat / Collator output unchanged; Segmenter.name and .prototype.segment.length correct.

cargo fmt --check clean.

Notes

  • Per maintainer convention, version bump + CHANGELOG entry are left for merge time.
  • This clears the string-width / wrap-ansi module-init wall; ink's next documented gate is the yoga-layout WASM runtime.

Related

new Intl.Segmenter(locales?, { granularity }) now constructs and returns
an object with .segment(str) (iterable of { segment, index, input,
isWordLike? } records) and .resolvedOptions(). Backed by the pure-Rust
unicode-segmentation crate: extended grapheme clusters (default), word
boundaries (with isWordLike), and sentence boundaries. index is the
UTF-16 code-unit offset to match the spec.

This unblocks string-width@7+ / wrap-ansi@9+, which call
new Intl.Segmenter() at module top-level, and therefore ink (#348).

Fixes #4877
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

globals/Intl: implement Intl.Segmenter (grapheme segmentation) — new Intl.Segmenter() throws, blocks string-width/wrap-ansi/ink

1 participant