Skip to content

globals/Intl: implement Intl.Segmenter (grapheme segmentation) — new Intl.Segmenter() throws, blocks string-width/wrap-ansi/ink #4877

@proggeramlug

Description

@proggeramlug

Summary

Intl.Segmenter is not implemented — Intl.Segmenter is undefined, so new Intl.Segmenter() throws TypeError: is not a constructor. Perry ships Intl.NumberFormat, Intl.DateTimeFormat, and Intl.Collator (crates/perry-runtime/src/intl.rs) but not Segmenter.

This is the next ink wall after #4873. With the MessageChannel fix in (#4875), React/scheduler init cleanly, but importing ink still crashes at module-init — traced to its ANSI-width deps:

  • string-width/index.js:16const segmenter = new Intl.Segmenter();
  • wrap-ansi/index.js:31const segmenter = new Intl.Segmenter();

Both run at module top-level, so merely importing string-width / wrap-ansi (and therefore ink, react-three-fiber-style CLIs, and most modern width-aware CLI code) dies before any user code. Intl.Segmenter has been the default grapheme-segmentation primitive in string-width@7+ / wrap-ansi@9+ since they dropped the old codepoint-counting path.

Repro

// seg.ts
console.log(typeof Intl.Segmenter);          // Perry: "undefined"  | Node: "function"
const s = new Intl.Segmenter('en', { granularity: 'grapheme' });
console.log([...s.segment('a👨‍👩‍👧b')].map(x => x.segment));
$ perry compile seg.ts -o seg && ./seg
TypeError: is not a constructor

Expected (Node)

function
[ 'a', '👨‍👩‍👧', 'b' ]

new Intl.Segmenter(locales?, { granularity: 'grapheme' | 'word' | 'sentence' }) → object with .segment(str) returning an iterable of segment data { segment, index, input, isWordLike? }, plus .resolvedOptions().

Scope / suggested impl

  • granularity: 'grapheme' is what string-width / wrap-ansi actually use (extended grapheme clusters — emoji ZWJ sequences, combining marks, regional-indicator flags). That's the must-have for ink. word / sentence can follow.
  • Rust side: the unicode-segmentation crate gives grapheme + word iteration directly; a minimal Segmenter over it covers the ink path. Sentence granularity would need a small extra rule set.
  • Wire alongside the existing Intl constructors: crates/perry-runtime/src/intl.rs, the globalThis/Intl tables in crates/perry-runtime/src/object/global_this*.rs, and the codegen builtin/new allowlist in crates/perry-codegen/src/expr/helpers.rs (where MessageChannel et al. live).

Impact

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions