feat(core): improve stdin-buffer emoji/IME handling with Kitty protocol #483

GreyElaina · 2026-01-07T02:40:07Z

Summary

Add proper grapheme segmentation and Kitty emoji reassembly for correct handling of emoji, CJK characters, and other multi-codepoint sequences.

Changes

StdinBuffer (grapheme segmentation)

Add grapheme-segmenter.ts using Intl.Segmenter for Unicode-correct text segmentation
Update StdinBuffer to emit complete grapheme clusters for non-escape input
Preserve Kitty keyboard protocol sequences unchanged for downstream parsing

KeyHandler (Kitty emoji reassembly)

Add emoji codepoint detection helpers (isGraphemeExtender, canStartGraphemeCluster)
Buffer Kitty sequences that form multi-codepoint emoji (ZWJ families, flags, skin tones, keycaps)
Flush buffer on timeout or when non-emoji input arrives
Preserve all raw sequences in KeyEvent.raw field

Architecture

stdin → StdinBuffer → KeyHandler.processInput() → parseKeypress() → emit
              ↓                    ↓
     grapheme clusters      emoji reassembly
     (for raw input)        (for Kitty protocol)

Key insight: StdinBuffer handles grapheme segmentation for raw UTF-8 input, while KeyHandler handles emoji reassembly for Kitty protocol sequences. This separation preserves the raw field correctly.

Supported Emoji Types

Basic emoji: 😀
ZWJ sequences: 👨‍👩‍👧 (family)
Flag emoji: 🇺🇸 🇯🇵
Skin tone modifiers: 👋🏻
Keycap sequences: #️⃣
Subdivision flags: 🏴󠁧󠁢󠁥󠁮󠁧󠁿

Testing

All 113 stdin-buffer tests pass
All 60 renderer.input tests pass
All 40 KeyHandler tests pass (10 new emoji tests)

Copilot

Pull request overview

This PR enhances emoji and IME input handling in the stdin-buffer module, particularly for terminals using the Kitty keyboard protocol. The implementation adds sophisticated grapheme cluster handling to properly process complex emoji sequences like ZWJ families, flags, and skin-tone variants.

Key Changes:

Introduces grapheme-segmenter module with Intl.Segmenter polyfill support for proper Unicode grapheme cluster handling
Adds Kitty keyboard protocol support that buffers and reassembles emoji codepoints sent as separate escape sequences
Refactors sequence completion helpers for better code reuse

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
`packages/core/src/lib/stdin-buffer.ts`	Core changes: adds Kitty protocol emoji buffering, grapheme cluster extraction, and helper functions for Unicode codepoint classification
`packages/core/src/lib/grapheme-segmenter.ts`	New module providing grapheme segmentation utilities with `Intl.Segmenter` support and performance optimizations
`packages/core/src/lib/parse.keypress.ts`	Updates keypress parsing to use grapheme-aware checks instead of simple length checks
`packages/core/src/lib/stdin-buffer.test.ts`	Comprehensive test coverage for grapheme handling and Kitty protocol emoji reassembly
`packages/core/package.json`	Adds `@formatjs/intl-segmenter` dependency for polyfill support
`bun.lock`	Lock file updates for new dependencies

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-07T02:46:18Z

packages/core/src/lib/stdin-buffer.ts

+    (codepoint >= 0x1f1e6 && codepoint <= 0x1f1ff) || // Regional Indicators
+    (codepoint >= 0x1f300 && codepoint <= 0x1faff) || // Emoji ranges (simplified)
+    codepoint === 0x1f3f4 || // Black Flag
+    (codepoint >= 0x23 && codepoint <= 0x39) || // #, *, 0-9 for keycaps


The range check for keycap characters (0x23 to 0x39) includes some ASCII characters that shouldn't start grapheme clusters. The range 0x23-0x39 includes:

(0x23) ✓ valid for keycaps

$ (0x24) ✗ not used for keycaps

% through ) (0x25-0x29) ✗ not used for keycaps

(0x2a) ✓ valid for keycaps

, - . / (0x2b-0x2f) ✗ not used for keycaps

0-9 (0x30-0x39) ✓ valid for keycaps

This overly broad range could cause unintended buffering of characters like $, %, &, etc. Consider using a more precise check: (codepoint >= 0x30 && codepoint <= 0x39) || codepoint === 0x23 || codepoint === 0x2a

Suggested change

(codepoint >= 0x23 && codepoint <= 0x39) || // #, *, 0-9 for keycaps

((codepoint >= 0x30 && codepoint <= 0x39) || codepoint === 0x23 || codepoint === 0x2a) || // #, *, 0-9 for keycaps

Copilot · 2026-01-07T02:46:19Z

packages/core/src/lib/grapheme-segmenter.ts

+let segmenter: Intl.Segmenter | null = null
+
+if (typeof Intl === "undefined" || typeof (Intl as any).Segmenter !== "function") {
+  await import("@formatjs/intl-segmenter/polyfill-force.js").catch(() => {})


Top-level await is used here which can cause issues in certain module systems and bundlers. This is executed at module load time, which could:

Block the entire module graph from loading if the polyfill import is slow

Cause problems in environments that don't support top-level await (though increasingly rare)

The error is silently caught, which means there's no feedback if the polyfill fails to load

Consider moving this to a lazy initialization pattern inside getGraphemeSegmenter() or make the polyfill loading explicit and required upfront during application initialization.

Suggested change

await import("@formatjs/intl-segmenter/polyfill-force.js").catch(() => {})

import("@formatjs/intl-segmenter/polyfill-force.js").catch(() => {})

Copilot · 2026-01-07T02:46:19Z

packages/core/src/lib/stdin-buffer.ts

+    codepoint === 0x200d || // ZWJ
+    (codepoint >= 0xfe00 && codepoint <= 0xfe0f) || // Variation Selectors
+    (codepoint >= 0x1f3fb && codepoint <= 0x1f3ff) || // Skin Tone Modifiers
+    (codepoint >= 0x1f1e6 && codepoint <= 0x1f1ff) || // Regional Indicators


Regional Indicator codepoints (0x1f1e6-0x1f1ff) are listed in both isGraphemeExtender and canStartGraphemeCluster. While this works because the grapheme segmenter ultimately handles the splitting correctly, the classification is semantically confusing. Regional Indicators don't "extend" a grapheme - they combine in pairs to form flag emoji. Consider clarifying the logic or renaming the functions to better reflect their purpose in the Kitty buffering strategy (e.g., shouldBufferForCombining instead of isGraphemeExtender).

Suggested change

(codepoint >= 0x1f1e6 && codepoint <= 0x1f1ff) || // Regional Indicators

Copilot · 2026-01-07T02:46:19Z

packages/core/src/lib/grapheme-segmenter.ts

+export function firstGrapheme(str: string): string {
+  if (str.length === 0) return ""
+  const firstCode = str.charCodeAt(0)
+  if (firstCode < 128) {
+    if (str.length === 1 || str.charCodeAt(1) < 128) return str[0]!
+  }
+
+  const segments = getGraphemeSegmenter().segment(str)
+  const first = segments[Symbol.iterator]().next()
+  return first.done ? "" : first.value.segment


Potential issue with surrogate pairs handling. If firstCode < 128 is false but str.length === 1, the code will try to get the first grapheme from the segmenter. However, if the string contains a single unpaired surrogate (which is technically invalid UTF-16), this could cause issues. While unpaired surrogates are rare in practice, consider adding a check or documenting that the input is expected to be valid UTF-16.

Copilot · 2026-01-07T02:46:19Z

packages/core/src/lib/stdin-buffer.ts

+    if (remaining.length > 0) this.process(remaining)
+    return true


The tryCompletePaste() method can trigger recursive calls through this.process(remaining) at line 312. If the pasted content contains multiple nested bracketed paste sequences or a large amount of remaining data, this could potentially cause a deep call stack. While unlikely in practice, consider whether this recursion is bounded or if an iterative approach would be safer for handling remaining content after paste completion.

Copilot · 2026-01-07T02:46:20Z

packages/core/src/lib/stdin-buffer.ts

+    }
+    if (this.kittyBuffer.length === 0) return []
+
+    const text = String.fromCodePoint(...this.kittyBuffer)


Using the spread operator with String.fromCodePoint(...this.kittyBuffer) could cause issues if the kittyBuffer becomes very large. JavaScript has a maximum call stack size, and spreading a large array as function arguments can exceed this limit (typically around 65,536 arguments, but varies by engine).

While emoji sequences are typically short (< 20 codepoints), consider adding a safeguard or chunking mechanism if there's any possibility of the buffer growing unexpectedly large. Alternatively, document the expected maximum size of the kittyBuffer.

Suggested change

const text = String.fromCodePoint(...this.kittyBuffer)

let text = ""

for (let i = 0; i < this.kittyBuffer.length; i++) {

text += String.fromCodePoint(this.kittyBuffer[i]!)

}

Copilot · 2026-01-07T02:46:20Z

packages/core/src/lib/stdin-buffer.ts

+    if (this.kittyTimeout) {
+      clearTimeout(this.kittyTimeout)
+      this.kittyTimeout = null
+    }


The clear() method now clears kittyTimeout in addition to the regular timeout, but there's no test coverage for this scenario. Consider adding a test that:

Processes a Kitty sequence that would trigger buffering (e.g., a Regional Indicator)

Calls clear() before the kittyTimeout fires

Waits beyond the timeout duration

Verifies that nothing is emitted

This would ensure that the kittyTimeout is properly cleaned up and doesn't cause spurious emissions after clear() is called.

Copilot · 2026-01-07T02:46:20Z

packages/core/src/lib/stdin-buffer.ts

-        }
+        this.emitSequences(extractCompleteSequences(this.buffer.slice(0, startIndex)).sequences)
      }
+      this.emitKittyBuffer()


When entering bracketed paste mode, emitKittyBuffer() is called to flush any pending Kitty codepoints. However, there's no test coverage for the scenario where:

A Kitty emoji sequence is partially buffered (e.g., first part of a skin-tone emoji)

Bracketed paste mode is entered

The buffered Kitty sequence should be emitted before processing the pasted content

Consider adding a test to verify this edge case is handled correctly.

Copilot · 2026-01-07T02:46:20Z

packages/core/src/lib/stdin-buffer.ts

+function extractKittyCodepoint(sequence: string): number | null {
+  const match = KITTY_UNICODE_RE.exec(sequence)
+  if (!match) return null
+  const cp = parseInt(match[1]!, 10)


The non-null assertion match[1]! is safe because the regex requires at least one \d+ group, so match[1] will always be defined when the regex matches. However, for code clarity and to avoid potential confusion, consider documenting this assumption or restructuring the code to make it more explicit.

Suggested change

const cp = parseInt(match[1]!, 10)

const [, group] = match

if (group === undefined) return null

const cp = parseInt(group, 10)

Copilot · 2026-01-07T02:46:21Z

packages/core/src/lib/stdin-buffer.ts

+function isGraphemeExtender(codepoint: number): boolean {
+  return (
+    codepoint === 0x200d || // ZWJ
+    (codepoint >= 0xfe00 && codepoint <= 0xfe0f) || // Variation Selectors
+    (codepoint >= 0x1f3fb && codepoint <= 0x1f3ff) || // Skin Tone Modifiers
+    (codepoint >= 0x1f1e6 && codepoint <= 0x1f1ff) || // Regional Indicators
+    codepoint === 0x20e3 || // Combining Enclosing Keycap
+    (codepoint >= 0xe0020 && codepoint <= 0xe007f) // Tag characters
+  )
+}
+
+function canStartGraphemeCluster(codepoint: number): boolean {
+  return (
+    (codepoint >= 0x1f1e6 && codepoint <= 0x1f1ff) || // Regional Indicators
+    (codepoint >= 0x1f300 && codepoint <= 0x1faff) || // Emoji ranges (simplified)
+    codepoint === 0x1f3f4 || // Black Flag
+    (codepoint >= 0x23 && codepoint <= 0x39) || // #, *, 0-9 for keycaps
+    (codepoint >= 0x2600 && codepoint <= 0x27bf) // Misc Symbols & Dingbats
+  )
+}
+
+function extractKittyCodepoint(sequence: string): number | null {
+  const match = KITTY_UNICODE_RE.exec(sequence)
+  if (!match) return null
+  const cp = parseInt(match[1]!, 10)
+  return cp >= 0 && cp <= 0x10ffff ? cp : null
+}


The helper functions isGraphemeExtender, canStartGraphemeCluster, and extractKittyCodepoint lack documentation comments. While the inline comments provide some context, these functions would benefit from JSDoc comments explaining:

Their purpose in the Kitty keyboard protocol handling

Parameter descriptions

Return value meanings

Example usage or edge cases

This would improve code maintainability, especially given the complexity of grapheme cluster handling.

Copilot

Pull request overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-07T03:18:22Z

packages/core/src/lib/stdin-buffer.ts

+    const CHUNK_SIZE = 8192
+    for (let i = 0; i < buffer.length; i += CHUNK_SIZE) {
+      const slice = buffer.slice(i, i + CHUNK_SIZE)
+      chunks.push(String.fromCodePoint.apply(null, slice))


Using String.fromCodePoint.apply(null, slice) can throw a "Maximum call stack size exceeded" error if the slice array is very large, even though CHUNK_SIZE is set to 8192 to mitigate this. Different JavaScript engines have different limits (typically 65536-100000 arguments).

Consider using the spread operator with a safer approach or a loop:

chunks.push(String.fromCodePoint(...slice))

Or reduce CHUNK_SIZE to a more conservative value like 1000-5000 to ensure compatibility across all engines.

Suggested change

const CHUNK_SIZE = 8192

for (let i = 0; i < buffer.length; i += CHUNK_SIZE) {

const slice = buffer.slice(i, i + CHUNK_SIZE)

chunks.push(String.fromCodePoint.apply(null, slice))

const CHUNK_SIZE = 4096

for (let i = 0; i < buffer.length; i += CHUNK_SIZE) {

const slice = buffer.slice(i, i + CHUNK_SIZE)

chunks.push(String.fromCodePoint(...slice))

Copilot · 2026-01-07T03:18:22Z

packages/core/src/lib/parse.keypress.ts

  } else if (s.length === 1 && s >= "A" && s <= "Z") {
    // shift+letter
    key.name = s.toLowerCase()
    key.shift = true


The call to isSingleGrapheme(s) at line 302 invokes the Intl.Segmenter for every keypress that isn't matched by earlier conditions. This happens after checking for single ASCII characters (lines 291-301), but the isSingleGrapheme function itself also checks for ASCII at lines 54-58 in grapheme-segmenter.ts, resulting in duplicate checks.

For better performance, the ASCII check could be moved earlier in this function to avoid calling isSingleGrapheme for common ASCII characters. However, the current structure might be intentional for code clarity.

Suggested change

key.shift = true

key.shift = true

} else if (s.length === 1 && s >= " " && s <= "\x7f") {

// remaining single printable ASCII characters (punctuation, symbols, etc.)

key.name = s

Copilot · 2026-01-07T03:18:22Z

packages/core/src/lib/stdin-buffer.ts

+function isGraphemeExtender(codepoint: number): boolean {
+  return (
+    codepoint === 0x200d || // ZWJ
+    (codepoint >= 0xfe00 && codepoint <= 0xfe0f) || // Variation Selectors
+    (codepoint >= 0x1f3fb && codepoint <= 0x1f3ff) || // Skin Tone Modifiers
+    (codepoint >= 0x1f1e6 && codepoint <= 0x1f1ff) || // Regional Indicators
+    codepoint === 0x20e3 || // Combining Enclosing Keycap
+    (codepoint >= 0xe0020 && codepoint <= 0xe007f) // Tag characters
+  )
+}
+
+function canStartGraphemeCluster(codepoint: number): boolean {
+  return (
+    (codepoint >= 0x1f1e6 && codepoint <= 0x1f1ff) || // Regional Indicators
+    (codepoint >= 0x1f300 && codepoint <= 0x1faff) || // Emoji ranges (simplified)
+    codepoint === 0x1f3f4 || // Black Flag
+    codepoint === 0x23 || // # for keycap
+    codepoint === 0x2a || // * for keycap
+    (codepoint >= 0x30 && codepoint <= 0x39) || // 0-9 for keycaps
+    (codepoint >= 0x2600 && codepoint <= 0x27bf) // Misc Symbols & Dingbats
+  )
+}


The helper functions isGraphemeExtender and canStartGraphemeCluster lack documentation explaining their purpose and the Unicode ranges they cover. Given the complexity of Unicode grapheme cluster rules, these functions should have comments explaining:

What each Unicode range represents (the inline comments are good but could be in function-level docs)

Why these specific ranges were chosen

That this is a heuristic for buffering, not a complete grapheme cluster algorithm (which is delegated to Intl.Segmenter)

Copilot · 2026-01-07T03:18:23Z

packages/core/src/lib/grapheme-segmenter.ts

+    if (str.length === 1) return str[0]!
+    const secondCode = str.charCodeAt(1)
+    if (secondCode < 128) return str[0]!
+  } else if (str.length === 1) {


The firstGrapheme function has an optimization path that returns str[0]! for single-character strings (line 74), but this doesn't handle the edge case where the string might be a single invalid surrogate. While unlikely in practice since the function is called on stdin input, it would be more robust to check firstCode >= 0xd800 && firstCode <= 0xdfff and fall through to the segmenter for safety.

Suggested change

} else if (str.length === 1) {

} else if (str.length === 1 && (firstCode < 0xd800 || firstCode > 0xdfff)) {

Copilot · 2026-01-07T03:18:23Z

packages/core/src/lib/grapheme-segmenter.ts

+export function getGraphemeSegmenter(): Intl.Segmenter {
+  if (segmenter) return segmenter
+
+  if (typeof Intl !== "undefined" && typeof (Intl as any).Segmenter === "function") {
+    segmenter = new Intl.Segmenter(undefined, { granularity: "grapheme" })
+    return segmenter
+  }
+
+  if (initError) {
+    throw initError
+  }
+
+  throw new Error(
+    "Intl.Segmenter is not available. Please ensure your runtime supports it or install @formatjs/intl-segmenter",
+  )
+}


The polyfill initialization is asynchronous but called at module load time (line 23). However, getGraphemeSegmenter() can be called synchronously before the polyfill finishes loading. If the native Intl.Segmenter is not available and the polyfill hasn't loaded yet, the function will throw an error instead of waiting for initialization to complete.

This could cause runtime errors when the stdin-buffer is used immediately after import in environments without native Intl.Segmenter support. Consider either:

Making getGraphemeSegmenter() async and awaiting the initialization

Checking if initPromise is still pending and waiting for it

Documenting that users must await initialization before using the module

Copilot · 2026-01-07T03:18:23Z

packages/core/src/lib/grapheme-segmenter.ts

+        await import("@formatjs/intl-segmenter/polyfill-force.js")
+      } catch (e) {
+        initError = new Error(
+          "Failed to load Intl.Segmenter polyfill. Please ensure @formatjs/intl-segmenter is installed or use a runtime that supports Intl.Segmenter natively.",


The error from the polyfill import is caught but not stored correctly. The catch block at line 12 receives the error as parameter e, but then creates a new generic Error message without preserving the original error information. This makes debugging polyfill loading issues difficult.

The initError should either wrap the original error or include its message: initError = new Error("Failed to load Intl.Segmenter polyfill: " + (e instanceof Error ? e.message : String(e)))

Suggested change

"Failed to load Intl.Segmenter polyfill. Please ensure @formatjs/intl-segmenter is installed or use a runtime that supports Intl.Segmenter natively.",

"Failed to load Intl.Segmenter polyfill: " +

(e instanceof Error ? e.message : String(e)) +

" Please ensure @formatjs/intl-segmenter is installed or use a runtime that supports Intl.Segmenter natively.",

…n StdinBuffer - Add grapheme-segmenter.ts using Intl.Segmenter for Unicode-correct text segmentation - Update StdinBuffer to emit complete grapheme clusters instead of single characters - Preserve Kitty keyboard protocol sequences unchanged for downstream parsing - Add comprehensive tests for grapheme cluster handling

Buffer Kitty keyboard protocol emoji sequences and reassemble them into complete grapheme clusters before emitting keypress events. - Add emoji codepoint detection helpers (isGraphemeExtender, canStartGraphemeCluster) - Buffer Kitty sequences that form multi-codepoint emoji (ZWJ, flags, skin tones, keycaps) - Flush buffer on timeout or when non-emoji input arrives - Preserve all raw sequences in the emitted KeyEvent.raw field - Add comprehensive tests for emoji reassembly scenarios

…yco#51) to emoji helpers

- Reject lone surrogates in firstGrapheme fast path (fall through to segmenter) - Preserve original error message when polyfill loading fails

kommander · 2026-01-08T01:59:43Z

Interesting, I hadn't thought about it from that perspective. That means full grapheme parsing for input though, which can get out of hand really fast. The stdin buffer was meant to handle incomplete sequences, split at byte level, which would break some behaviours. For input in the Textarea for example, even if the bytes arrive chunked, they would end up forming a joined grapheme at the end. Is this meant to have full grapheme character key events?

…JK detection Update parse.keypress.ts to use isSingleGrapheme() instead of s.length === 1 for proper handling of multi-codepoint characters like emoji and CJK.

GreyElaina · 2026-01-08T05:28:42Z

The grapheme-aware input is already on my PR branch with ASCII fast paths, so performance impact is minimal for typical input.
Scope clarification:

StdinBuffer: Only changes how non-escape sequences are split (uses firstGrapheme() instead of single char). This handles raw UTF-8 correctly.
KeyHandler: Already has emoji buffering for Kitty protocol (shouldBufferForEmoji() + 10ms timeout + Intl.Segmenter reassembly).

The issue I hit was specifically with Rime IME emoji input - it sends input through some keystroke-based mechanism that differs from:

Clipboard paste (works fine)
Raycast emoji picker (works fine)
macOS fn emoji menu (works fine)

These might be using the same underlying mechanism, but I haven't dug deeper.
So this is an edge case fix. But using grapheme-based checking (isSingleGrapheme(s) instead of s.length === 1) is more robust anyway - who knows what weird input methods exist out there (CJK IMEs, etc.). Better to handle grapheme clusters correctly from the start than chase down individual edge cases.

kommander · 2026-01-08T14:10:01Z

It feels like something is more fundamentally wrong with the stdin buffer if it causes such a complexity explosion. Grapheme detection is horribly fragile at the moment and already has a huge complexity in the native part to render them correctly. Doing something similar for input seems like beating a symptom of wrong underlying assumptions in the stdin buffer.

GreyElaina · 2026-01-08T14:30:53Z

zig/native side grapheme is used for rendering, the stdin buffer is for input parsing, i must say they are two different things.

the stdin buffer itself is simple. these complexity is brought by the terminal protocol and Unicode itself (see libghostty, utf8 everywhere etc.), anyway not a design issue. it's not correct to break UTF-8 by bytes or to break emoji by codepoints if not according to grapheme.

kommander · 2026-01-08T18:34:53Z

Okay, I think I understand better now. This is specifically a Kitty Keyboard issue, which emits separate input events for a ZWJ emoji for example, right? And even though inserting them one by one it would be rendered correctly, but not emitted as single input event, but separate ones for each byte?

GreyElaina · 2026-01-09T00:44:43Z

Yeah exactly. Kitty protocol emits separate escape sequences per codepoint, not per byte. So 👨‍👩‍👧 (family emoji, 7 codepoints joined by ZWJ) comes as 7 separate \x1b[<codepoint>u sequences.
The fix buffers these with a 10ms timeout and reassembles using Intl.Segmenter. StdinBuffer itself didn't change much for this: the Kitty reassembly logic is in KeyHandler.

Copilot AI review requested due to automatic review settings January 7, 2026 02:40

GreyElaina requested review from Hona, kommander and msmps as code owners January 7, 2026 02:40

Copilot started reviewing on behalf of GreyElaina January 7, 2026 02:40 View session

Copilot AI reviewed Jan 7, 2026

View reviewed changes

GreyElaina force-pushed the core-stdin-emoji-ime branch from 85e4eec to f76840a Compare January 7, 2026 03:06

GreyElaina requested a review from Copilot January 7, 2026 03:13

Copilot started reviewing on behalf of GreyElaina January 7, 2026 03:13 View session

Copilot AI reviewed Jan 7, 2026

View reviewed changes

GreyElaina force-pushed the core-stdin-emoji-ime branch from f76840a to 6dbb9fa Compare January 7, 2026 03:42

GreyElaina mentioned this pull request Jan 7, 2026

feat(core): add Intl.Segmenter for grapheme cluster handling #482

Closed

GreyElaina added 3 commits January 7, 2026 12:03

docs(core): add Unicode spec references (UAX anomalyco#29, UTS anomal…

b68c742

…yco#51) to emoji helpers

fix(core): handle surrogate edge cases and preserve polyfill errors

b0a0d66

- Reject lone surrogates in firstGrapheme fast path (fall through to segmenter) - Preserve original error message when polyfill loading fails

feat(core): use isSingleGrapheme in parse.keypress for proper emoji/C…

4f580bb

…JK detection Update parse.keypress.ts to use isSingleGrapheme() instead of s.length === 1 for proper handling of multi-codepoint characters like emoji and CJK.

	(codepoint >= 0x23 && codepoint <= 0x39) \|\| // #, *, 0-9 for keycaps
	((codepoint >= 0x30 && codepoint <= 0x39) \|\| codepoint === 0x23 \|\| codepoint === 0x2a) \|\| // #, *, 0-9 for keycaps

	await import("@formatjs/intl-segmenter/polyfill-force.js").catch(() => {})
	import("@formatjs/intl-segmenter/polyfill-force.js").catch(() => {})

		if (remaining.length > 0) this.process(remaining)
		return true

-    const text = String.fromCodePoint(...this.kittyBuffer)
+    let text = ""
+    for (let i = 0; i < this.kittyBuffer.length; i++) {
+      text += String.fromCodePoint(this.kittyBuffer[i]!)
+    }

-  const cp = parseInt(match[1]!, 10)
+  const [, group] = match
+  if (group === undefined) return null
+  const cp = parseInt(group, 10)

	} else if (str.length === 1) {
	} else if (str.length === 1 && (firstCode < 0xd800 \|\| firstCode > 0xdfff)) {

-          "Failed to load Intl.Segmenter polyfill. Please ensure @formatjs/intl-segmenter is installed or use a runtime that supports Intl.Segmenter natively.",
+          "Failed to load Intl.Segmenter polyfill: " +
+            (e instanceof Error ? e.message : String(e)) +
+            " Please ensure @formatjs/intl-segmenter is installed or use a runtime that supports Intl.Segmenter natively.",

feat(core): improve stdin-buffer emoji/IME handling with Kitty protocol #483

Are you sure you want to change the base?

feat(core): improve stdin-buffer emoji/IME handling with Kitty protocol #483

Uh oh!

Conversation

GreyElaina commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

StdinBuffer (grapheme segmentation)

KeyHandler (Kitty emoji reassembly)

Architecture

Supported Emoji Types

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

(0x23) ✓ valid for keycaps

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

kommander commented Jan 8, 2026

Uh oh!

GreyElaina commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kommander commented Jan 8, 2026

Uh oh!

GreyElaina commented Jan 7, 2026 •

edited

Loading

GreyElaina commented Jan 8, 2026 •

edited

Loading