Skip to content

macos: fall back to RTF/HTML when pasted plain text is encoding-lossy#45

Open
HelloBoLun wants to merge 1 commit into
manaflow-ai:mainfrom
HelloBoLun:fix-paste-encoding-lossy-fallback
Open

macos: fall back to RTF/HTML when pasted plain text is encoding-lossy#45
HelloBoLun wants to merge 1 commit into
manaflow-ai:mainfrom
HelloBoLun:fix-paste-encoding-lossy-fallback

Conversation

@HelloBoLun
Copy link
Copy Markdown

@HelloBoLun HelloBoLun commented Apr 20, 2026

Summary

  • Some macOS apps (certain Chinese IMs, Electron ports, legacy native apps) write a mangled plain-text pasteboard variant alongside a correctly-encoded RTF/HTML variant — non-ASCII chars arrive as U+FFFD or literal ? runs because the app's plain-text writer fell back to an ASCII encoding. Pasting into a terminal surface produced ?????.
  • Pasting the same clipboard into another app (e.g. TextEdit) and re-copying normalizes it to clean UTF-8, which is why the bug only reproduces for certain source apps.
  • getOpinionatedStringContents() now recovers from the rich-text variant when the plain-text variant is encoding-lossy, preserving existing behavior for clean pastes.

How

  1. Reads plain text first (unchanged fast path).
  2. If plain text looks encoding-lossy (contains U+FFFD or a run of 3+ ?) and the rich-text variant carries strictly more non-ASCII scalars, returns the rich-text-derived plain string.
  3. Rich-text extraction prefers RTF (local parser, no network calls); HTML is a fallback so WebKit is only touched when an app writes HTML but no RTF.
  4. Falls back to rich-text extraction when plain text is absent entirely (previously returned nil).

The "more non-ASCII scalars" swap condition prevents regressing legitimate user text like "really???" — both variants have the same non-ASCII count, so the rich-text version is not preferred.

Test plan

  • Copy Chinese from a source app that previously reproduced ????? → now shows the correct characters.
  • Copy ASCII text from Chrome (plain + HTML both present) → plain-text path used; HTML never parsed.
  • Text containing "really???" style runs is preserved untouched.
  • Copy Chinese from Slack / Discord / other Electron apps that write only HTML → recovered via HTML fallback.

No new unit tests — this path depends on live NSPasteboard + NSAttributedString runtime behavior.

🤖 Generated with Claude Code


Summary by cubic

Fixes garbled pastes like “?????” from some macOS apps by falling back to RTF/HTML when the plain-text pasteboard data is encoding-lossy. Keeps the fast plain-text path for clean pastes and prefers RTF to avoid WebKit.

  • Bug Fixes
    • In getOpinionatedStringContents(), detect lossy plain text (contains U+FFFD or runs of 3+ '?').
    • Prefer rich-text only if it has more non-ASCII scalars than the plain variant to avoid replacing valid text like “really???”.
    • Parse RTF first; use HTML only if RTF is absent.
    • Recover from rich-text when plain text is missing entirely.

Written for commit 50904e2. Summary will update on new commits.

Summary by CodeRabbit

  • Bug Fixes
    • Improved clipboard text retrieval to better handle text copied from rich-text sources (such as word processors). The system now detects corrupted or incorrectly encoded text and attempts to recover the original content from formatting data, providing more reliable paste functionality.

Some applications (certain IMs, Electron ports, legacy native apps)
write a mangled plain-text pasteboard variant alongside a
correctly-encoded RTF or HTML variant — non-ASCII characters arrive
as U+FFFD or literal '?' runs because the app's plain-text writer
fell back to an ASCII encoding. Pasting from such apps produced
"?????" in the terminal.

When getOpinionatedStringContents() detects an encoding-lossy
plain-text variant (contains U+FFFD, or runs of 3+ '?'), recover
the string from the rich-text variant provided it carries
strictly more non-ASCII characters than plain text — this avoids
regressing legitimate user text containing '?' runs.

RTF parsing (local) is preferred; HTML is tried only when RTF is
absent, minimizing WebKit involvement.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

📝 Walkthrough

Walkthrough

The getOpinionatedStringContents() method in the NSPasteboard extension has been enhanced to intelligently recover text from rich-text formats when plain-text extraction encounters encoding loss, determined by detecting Unicode replacement characters or repeated ? characters.

Changes

Cohort / File(s) Summary
NSPasteboard Plain-Text Recovery
macos/Sources/Helpers/Extensions/NSPasteboard+Extension.swift
Enhanced getOpinionatedStringContents() with encoding-loss detection via Unicode replacement character and ? streaks; conditionally recovers text from RTF or HTML sources when plain-text is lossy and recovered text contains more non-ASCII scalars; RTF extraction via NSAttributedString with HTML fallback.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 From plain text's loss, a clever way,
RTF and HTML come out to play,
With Unicode marks and question streaks,
The pasteboard now recovers what it seeks! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: detecting encoding-lossy plain text on macOS and falling back to RTF/HTML alternatives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 20, 2026

Greptile Summary

This PR adds a best-effort recovery path in getOpinionatedStringContents() that reads the RTF or HTML pasteboard variant when the plain-text variant appears encoding-lossy (contains U+FFFD or 3+ consecutive ?). The approach is sound, but the nonASCIIScalarCount guard used to decide whether to swap doesn't filter out U+FFFC (Object Replacement Character), which NSAttributedString inserts for every embedded image or attachment in RTF/HTML. This means the swap condition can trigger on a clipboard that has mangled plain text and an RTF with only images, returning a bare \u{FFFC} to the terminal instead of the original ? run — the opposite of the bug being fixed.

Confidence Score: 3/5

Not safe to merge without addressing the U+FFFC / nonASCIIScalarCount issue, which can paste garbage characters in a broader set of cases than the bug being fixed.

One P1 finding: embedded objects in RTF/HTML produce U+FFFC which is counted as non-ASCII, causing the swap heuristic to fire incorrectly and return garbage to the terminal. The two P2 findings (3-? threshold, main-thread assertion) are not blocking on their own.

macos/Sources/Helpers/Extensions/NSPasteboard+Extension.swift — specifically the nonASCIIScalarCount helper and richTextFallback return value.

Important Files Changed

Filename Overview
macos/Sources/Helpers/Extensions/NSPasteboard+Extension.swift Adds RTF/HTML fallback in getOpinionatedStringContents() when plain text is encoding-lossy; introduces looksEncodingLossy, nonASCIIScalarCount, and richTextFallback helpers. The nonASCIIScalarCount heuristic is skewed by U+FFFC (object replacement chars from embedded images), which can cause garbage characters to be pasted or trigger a wrong swap.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[getOpinionatedStringContents] --> B{File URL in pasteboard?}
    B -- Yes --> C[Return escaped path]
    B -- No --> D[Read plain text]
    D --> E{plainText non-nil?}
    E -- No --> J[richTextFallback]
    E -- Yes --> F{looksEncodingLossy?}
    F -- No --> G[Return plainText]
    F -- Yes --> H[richTextFallback]
    H --> I{recovered non-nil AND nonASCII recovered > nonASCII plain?}
    I -- Yes --> K[Return recovered ⚠️ may contain U+FFFC]
    I -- No --> G
    J --> L{RTF data available?}
    L -- Yes --> M[NSAttributedString RTF parse]
    L -- No --> N{HTML data available?}
    N -- Yes --> O[NSAttributedString HTML parse via WebKit]
    N -- No --> P[Return nil]
    M --> Q{string non-empty?}
    Q -- Yes --> R[Return string]
    Q -- No --> N
    O --> S{string non-empty?}
    S -- Yes --> T[Return string]
    S -- No --> P
Loading

Reviews (1): Last reviewed commit: "macos: fall back to RTF/HTML when pasted..." | Re-trigger Greptile

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
macos/Sources/Helpers/Extensions/NSPasteboard+Extension.swift (2)

70-82: Optional: simplify looksEncodingLossy.

The manual run-length loop can be replaced with a single contains check, which is easier to read and equivalent for the "3+ consecutive ?" rule.

♻️ Proposed simplification
 private static func looksEncodingLossy(_ s: String) -> Bool {
-    if s.contains("\u{FFFD}") { return true }
-    var run = 0
-    for scalar in s.unicodeScalars {
-        if scalar == "?" {
-            run += 1
-            if run >= 3 { return true }
-        } else {
-            run = 0
-        }
-    }
-    return false
+    s.contains("\u{FFFD}") || s.contains("???")
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macos/Sources/Helpers/Extensions/NSPasteboard`+Extension.swift around lines
70 - 82, The looksEncodingLossy function can be simplified by replacing the
manual run-length loop with a single contains check for three consecutive
question marks: change the body of private static func looksEncodingLossy(_ s:
String) to return s.contains("\u{FFFD}") || s.contains("???"), keeping the same
function name (looksEncodingLossy) so it still detects the replacement character
or a run of 3+ '?' characters.

60-67: LGTM — fallback ordering and strict non-ASCII guard look correct.

The "more non-ASCII scalars" gate effectively preserves legitimate ? runs while still recovering mangled CJK/emoji pastes, and the plainText == nil branch falling through to richTextFallback() is a nice addition for HTML-only sources (Slack/Discord).

One small consideration: the comparison uses strict >. If a source app writes a plain variant where some non-ASCII scalars survived (say 2) but the RTF variant has the full 5, the replacement still happens correctly. But if both counts happen to be equal while the plain one is still partially mangled (e.g., mixed U+FFFD + recovered scalars), the recovered version will be dropped. Given U+FFFD alone already flags lossy, you may want to tighten: when plainText contains U+FFFD, prefer recovered whenever it's non-empty (since U+FFFD is unambiguous evidence of loss), reserving the strict count comparison for the ??? heuristic.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macos/Sources/Helpers/Extensions/NSPasteboard`+Extension.swift around lines
60 - 67, The logic should prefer the richTextFallback when the plainText
contains the replacement character U+FFFD (an unambiguous sign of lossy
encoding) even if nonASCIIScalarCount(recovered) is not strictly greater; update
the branch that currently checks Self.looksEncodingLossy(s) and
Self.nonASCIIScalarCount(recovered) > Self.nonASCIIScalarCount(s) to first test
if s contains U+FFFD and if recovered is non-empty then return recovered,
otherwise keep the existing nonASCIIScalarCount comparison (or change to >= if
you prefer) so that recovered is chosen on clear lossy evidence while preserving
the current heuristic for ambiguous cases; refer to plainText (s),
Self.looksEncodingLossy, self.richTextFallback(), and Self.nonASCIIScalarCount
when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@macos/Sources/Helpers/Extensions/NSPasteboard`+Extension.swift:
- Around line 104-115: The HTML-to-NSAttributedString conversion inside
getOpinionatedStringContents() must only run on the main thread; add a
main-thread guard at the start of that function (e.g.,
dispatchPrecondition(condition: .onQueue(.main))) to assert caller correctness
and prevent background-thread invocation, and also update the function's doc
comment to state the main-thread requirement so callers know the contract;
ensure the guard is added before the call to
NSAttributedString(data:options:documentAttributes:) and the doc mentions
getOpinionatedStringContents() explicitly.

---

Nitpick comments:
In `@macos/Sources/Helpers/Extensions/NSPasteboard`+Extension.swift:
- Around line 70-82: The looksEncodingLossy function can be simplified by
replacing the manual run-length loop with a single contains check for three
consecutive question marks: change the body of private static func
looksEncodingLossy(_ s: String) to return s.contains("\u{FFFD}") ||
s.contains("???"), keeping the same function name (looksEncodingLossy) so it
still detects the replacement character or a run of 3+ '?' characters.
- Around line 60-67: The logic should prefer the richTextFallback when the
plainText contains the replacement character U+FFFD (an unambiguous sign of
lossy encoding) even if nonASCIIScalarCount(recovered) is not strictly greater;
update the branch that currently checks Self.looksEncodingLossy(s) and
Self.nonASCIIScalarCount(recovered) > Self.nonASCIIScalarCount(s) to first test
if s contains U+FFFD and if recovered is non-empty then return recovered,
otherwise keep the existing nonASCIIScalarCount comparison (or change to >= if
you prefer) so that recovered is chosen on clear lossy evidence while preserving
the current heuristic for ambiguous cases; refer to plainText (s),
Self.looksEncodingLossy, self.richTextFallback(), and Self.nonASCIIScalarCount
when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fd3c093e-e646-4cd9-a842-e9489894c217

📥 Commits

Reviewing files that changed from the base of the PR and between 3b684a0 and 50904e2.

📒 Files selected for processing (1)
  • macos/Sources/Helpers/Extensions/NSPasteboard+Extension.swift

Comment on lines +104 to +115
if let data = self.data(forType: .html),
let attr = try? NSAttributedString(
data: data,
options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue,
],
documentAttributes: nil
),
!attr.string.isEmpty {
return attr.string
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find all call sites of getOpinionatedStringContents to assess threading context.
rg -nP --type=swift -C3 '\bgetOpinionatedStringContents\s*\('

Repository: manaflow-ai/ghostty

Length of output: 2940


🌐 Web query:

Does NSAttributedString init with NSAttributedString.DocumentType.html require the main thread on macOS?

💡 Result:

Yes, initializing NSAttributedString with NSAttributedString.DocumentType.html requires the main thread on macOS. Apple's official documentation explicitly states: "Don’t call this method from a background thread if the options dictionary includes the documentType attribute with a value of html. If you do, the method tries to synchronize with the main thread, fails, and times out." This applies to the synchronous initializers like init(data:options:documentAttributes:). An older Apple document confirms that since OS X v10.4, NSAttributedString uses WebKit for HTML import, which is not thread-safe on background threads; on macOS (OS X v10.5+), it transfers to the main thread via performSelectorOnMainThread:withObject:waitUntilDone:, requiring the main thread's run loop to be running. Recent forum discussions (up to 2025) affirm this behavior persists on macOS. New asynchronous APIs like loadFromHTML(data:options:completionHandler:) exist but are separate from the synchronous init methods.

Citations:


🏁 Script executed:

# Examine the call site contexts to understand threading guarantees
sed -n '2100,2130p' macos/Sources/Ghostty/Surface\ View/SurfaceView_AppKit.swift

Repository: manaflow-ai/ghostty

Length of output: 1001


🏁 Script executed:

# Check the clipboard request handler context
sed -n '330,345p' macos/Sources/Ghostty/Ghostty.App.swift

Repository: manaflow-ai/ghostty

Length of output: 738


🏁 Script executed:

# Check the C interop callback pattern to understand threading context
rg -nB5 "completeClipboardRequest" macos/Sources/Ghostty/Ghostty.App.swift | head -30

Repository: manaflow-ai/ghostty

Length of output: 587


🏁 Script executed:

# Look for the definition of the clipboard callback to understand its threading context
rg -nB10 "performable paste bindings" macos/Sources/Ghostty/Ghostty.App.swift

Repository: manaflow-ai/ghostty

Length of output: 628


🏁 Script executed:

# Search for where this callback function is registered/called to understand threading
rg -n "readClipboard\|clipboard.*callback" macos/Sources/Ghostty/Ghostty.App.swift | head -20

Repository: manaflow-ai/ghostty

Length of output: 45


🏁 Script executed:

# Check if there's a dispatch or threading guard anywhere in the clipboard handling chain
rg -nB2 "readClipboard" macos/Sources/Ghostty/Ghostty.App.swift | head -40

Repository: manaflow-ai/ghostty

Length of output: 666


🏁 Script executed:

# Check the C library interface to understand threading guarantees for clipboard callbacks
rg -n "read_clipboard_cb|ghostty_app_callbacks_s" macos/Sources/ -A3 | head -40

Repository: manaflow-ai/ghostty

Length of output: 870


🏁 Script executed:

# Look for documentation or comments about the C callback threading model
rg -n "ghostty_clipboard_e|clipboard.*thread|read.*clipboard" macos/Sources/Ghostty/ -B2 -A2 | head -50

Repository: manaflow-ai/ghostty

Length of output: 3491


🏁 Script executed:

# Search for any comments or documentation about threading in the C library headers
find . -name "*.h" -type f | head -20

Repository: manaflow-ai/ghostty

Length of output: 673


🏁 Script executed:

# Check if there are any assertions or main-thread checks elsewhere in the codebase for clipboard handling
rg -n "dispatchPrecondition\|MainActor\|Thread\|main thread" macos/Sources/Ghostty/Ghostty.App.swift | head -20

Repository: manaflow-ai/ghostty

Length of output: 45


🏁 Script executed:

# Check the main app event loop and threading model to understand if clipboard callbacks are main-thread guaranteed
head -100 macos/Sources/Ghostty/Ghostty.App.swift | grep -A20 "class App\|init\|runModal"

Repository: manaflow-ai/ghostty

Length of output: 3304


🏁 Script executed:

# Look for documentation or comments about the C library's threading model in the header
grep -n "read_clipboard_cb\|thread\|main" ./include/ghostty.h | head -30

Repository: manaflow-ai/ghostty

Length of output: 398


Add main-thread guarantee to getOpinionatedStringContents() for HTML parsing.

Apple's official documentation confirms that NSAttributedString(data:options:documentAttributes:) with DocumentType.html requires the main thread (it uses WebKit internally, which is not thread-safe). Calling this on a background thread will cause the method to attempt synchronization with the main thread, which fails and times out.

While current call sites (readSelection, pasteSelection, clipboard request handlers) appear to execute on the main thread, the method lacks defensive guards, making it fragile against future misuse.

Add either:

  • A main-thread assertion (dispatchPrecondition(condition: .onQueue(.main))) at method entry, or
  • A documented caller contract with explicit main-thread requirement in the method documentation
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macos/Sources/Helpers/Extensions/NSPasteboard`+Extension.swift around lines
104 - 115, The HTML-to-NSAttributedString conversion inside
getOpinionatedStringContents() must only run on the main thread; add a
main-thread guard at the start of that function (e.g.,
dispatchPrecondition(condition: .onQueue(.main))) to assert caller correctness
and prevent background-thread invocation, and also update the function's doc
comment to state the main-thread requirement so callers know the contract;
ensure the guard is added before the call to
NSAttributedString(data:options:documentAttributes:) and the doc mentions
getOpinionatedStringContents() explicitly.

Comment on lines +84 to +86
private static func nonASCIIScalarCount(_ s: String) -> Int {
s.unicodeScalars.reduce(0) { $1.value > 127 ? $0 + 1 : $0 }
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 \u{FFFC} skews non-ASCII count and leaks into terminal output

NSAttributedString.string replaces every embedded image/attachment in RTF or HTML with U+FFFC (OBJECT REPLACEMENT CHARACTER, value 0xFFFC). Because 0xFFFC > 127, nonASCIIScalarCount counts each \u{FFFC} as a non-ASCII scalar. This has two effects:

  1. A clipboard with mangled plain text ("?????", 0 non-ASCII) and RTF containing only a logo image (\u{FFFC}, 1 non-ASCII) satisfies 1 > 0, so richTextFallback() is returned — a single garbage character pasted instead of the original ?-run.
  2. Even in the happy path (Chinese text recovered correctly), any embedded image in the RTF/HTML causes \u{FFFC} to appear in the pasted string, which renders as visible garbage in the terminal.

Consider filtering replacement characters out of the richTextFallback result:

private func richTextFallback() -> String? {
    // ... existing extraction ...
    let cleaned = result.unicodeScalars
        .filter { $0.value != 0xFFFC && $0.value != 0xFFFD }
        .reduce(into: "") { $0.append(Character($1)) }
    return cleaned.isEmpty ? nil : cleaned
}

Comment on lines +70 to +82
private static func looksEncodingLossy(_ s: String) -> Bool {
if s.contains("\u{FFFD}") { return true }
var run = 0
for scalar in s.unicodeScalars {
if scalar == "?" {
run += 1
if run >= 3 { return true }
} else {
run = 0
}
}
return false
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 3-? threshold triggers on common English punctuation

With run >= 3, text like "What???" or "Seriously???" causes looksEncodingLossy to return true, unconditionally invoking the RTF/HTML parser (which may load WebKit for HTML). The non-ASCII guard prevents a wrong swap, but the parsing overhead is paid every time a user pastes text ending in three question marks.

U+FFFD is an unambiguous encoding failure signal and never appears in correctly-produced text. Consider requiring U+FFFD as the primary trigger and raising the ?-run threshold (e.g. 5 or more) to reduce false-positive parse invocations:

private static func looksEncodingLossy(_ s: String) -> Bool {
    if s.contains("\u{FFFD}") { return true }
    var run = 0
    for scalar in s.unicodeScalars {
        if scalar == "?" {
            run += 1
            if run >= 5 { return true }
        } else {
            run = 0
        }
    }
    return false
}

Comment on lines +104 to +115
if let data = self.data(forType: .html),
let attr = try? NSAttributedString(
data: data,
options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue,
],
documentAttributes: nil
),
!attr.string.isEmpty {
return attr.string
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 HTML NSAttributedString init requires the main thread

NSAttributedString(data:options:documentAttributes:) with .html internally drives a WebKit document parser, which must run on the main thread. The comment in the PR description acknowledges the WebKit cost, but the code doesn't assert the calling thread. The readClipboard C callback (registered as read_clipboard_cb in Ghostty.App.swift) is invoked from the libghostty core and its threading guarantee isn't obvious from this file.

Consider adding a runtime assertion to make the requirement explicit:

if let data = self.data(forType: .html) {
    assert(Thread.isMainThread, "NSAttributedString HTML init must be called on the main thread")
    // ...
}

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 1 file

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="macos/Sources/Helpers/Extensions/NSPasteboard+Extension.swift">

<violation number="1" location="macos/Sources/Helpers/Extensions/NSPasteboard+Extension.swift:62">
P2: Lossy plain text containing U+FFFD can bypass rich-text recovery because U+FFFD is counted as non-ASCII, making the strict `>` gate fail.</violation>

<violation number="2" location="macos/Sources/Helpers/Extensions/NSPasteboard+Extension.swift:101">
P1: `NSAttributedString.string` emits `U+FFFC` (Object Replacement Character) for every embedded image/attachment in RTF or HTML. Since `0xFFFC > 127`, each one inflates `nonASCIIScalarCount`, which can cause an incorrect swap (e.g., mangled plain text with 0 non-ASCII vs. RTF with only an image → `1 > 0` triggers fallback). Worse, any `U+FFFC` that survives into the return value renders as garbage in the terminal.

Filter `\u{FFFC}` (and `\u{FFFD}`) out of the string returned by `richTextFallback()` before using it.</violation>

<violation number="3" location="macos/Sources/Helpers/Extensions/NSPasteboard+Extension.swift:104">
P2: `NSAttributedString(data:options:documentAttributes:)` with `.html` drives WebKit internally and **must** run on the main thread — calling it from a background thread causes a synchronization attempt that silently times out (Apple docs: "the HTML importer should not be called from a background thread"). Add a `dispatchPrecondition(condition: .onQueue(.main))` before the HTML branch to make this requirement explicit and catch future misuse.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@@ -36,6 +36,8 @@ extension NSPasteboard {
/// Does these things in order:
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: NSAttributedString.string emits U+FFFC (Object Replacement Character) for every embedded image/attachment in RTF or HTML. Since 0xFFFC > 127, each one inflates nonASCIIScalarCount, which can cause an incorrect swap (e.g., mangled plain text with 0 non-ASCII vs. RTF with only an image → 1 > 0 triggers fallback). Worse, any U+FFFC that survives into the return value renders as garbage in the terminal.

Filter \u{FFFC} (and \u{FFFD}) out of the string returned by richTextFallback() before using it.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At macos/Sources/Helpers/Extensions/NSPasteboard+Extension.swift, line 101:

<comment>`NSAttributedString.string` emits `U+FFFC` (Object Replacement Character) for every embedded image/attachment in RTF or HTML. Since `0xFFFC > 127`, each one inflates `nonASCIIScalarCount`, which can cause an incorrect swap (e.g., mangled plain text with 0 non-ASCII vs. RTF with only an image → `1 > 0` triggers fallback). Worse, any `U+FFFC` that survives into the return value renders as garbage in the terminal.

Filter `\u{FFFC}` (and `\u{FFFD}`) out of the string returned by `richTextFallback()` before using it.</comment>

<file context>
@@ -45,7 +47,74 @@ extension NSPasteboard {
+               documentAttributes: nil
+           ),
+           !attr.string.isEmpty {
+            return attr.string
+        }
+
</file context>
Fix with Cubic

// rendering.
if let s = plainText, Self.looksEncodingLossy(s),
let recovered = self.richTextFallback(),
Self.nonASCIIScalarCount(recovered) > Self.nonASCIIScalarCount(s) {
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Lossy plain text containing U+FFFD can bypass rich-text recovery because U+FFFD is counted as non-ASCII, making the strict > gate fail.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At macos/Sources/Helpers/Extensions/NSPasteboard+Extension.swift, line 62:

<comment>Lossy plain text containing U+FFFD can bypass rich-text recovery because U+FFFD is counted as non-ASCII, making the strict `>` gate fail.</comment>

<file context>
@@ -45,7 +47,74 @@ extension NSPasteboard {
+        // rendering.
+        if let s = plainText, Self.looksEncodingLossy(s),
+           let recovered = self.richTextFallback(),
+           Self.nonASCIIScalarCount(recovered) > Self.nonASCIIScalarCount(s) {
+            return recovered
+        }
</file context>
Suggested change
Self.nonASCIIScalarCount(recovered) > Self.nonASCIIScalarCount(s) {
(Self.nonASCIIScalarCount(recovered) > Self.nonASCIIScalarCount(s) || s.contains("\u{FFFD}")) {
Fix with Cubic

return attr.string
}

if let data = self.data(forType: .html),
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: NSAttributedString(data:options:documentAttributes:) with .html drives WebKit internally and must run on the main thread — calling it from a background thread causes a synchronization attempt that silently times out (Apple docs: "the HTML importer should not be called from a background thread"). Add a dispatchPrecondition(condition: .onQueue(.main)) before the HTML branch to make this requirement explicit and catch future misuse.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At macos/Sources/Helpers/Extensions/NSPasteboard+Extension.swift, line 104:

<comment>`NSAttributedString(data:options:documentAttributes:)` with `.html` drives WebKit internally and **must** run on the main thread — calling it from a background thread causes a synchronization attempt that silently times out (Apple docs: "the HTML importer should not be called from a background thread"). Add a `dispatchPrecondition(condition: .onQueue(.main))` before the HTML branch to make this requirement explicit and catch future misuse.</comment>

<file context>
@@ -45,7 +47,74 @@ extension NSPasteboard {
+            return attr.string
+        }
+
+        if let data = self.data(forType: .html),
+           let attr = try? NSAttributedString(
+               data: data,
</file context>
Fix with Cubic

@HelloBoLun
Copy link
Copy Markdown
Author

Hi maintainers — the Vouch - Check PR / check job timed out after 24h waiting for a namespace-profile-ghostty-xsm runner to pick it up (no failure, just never dispatched). Could you re-run the workflow or check on the runner pool?

All AI reviewers and security scans have passed. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant