Skip to content

Implement 8-bit ANSI#49

Merged
clipperhouse merged 15 commits into
masterfrom
ansi-8-bit
Feb 16, 2026
Merged

Implement 8-bit ANSI#49
clipperhouse merged 15 commits into
masterfrom
ansi-8-bit

Conversation

@clipperhouse

Copy link
Copy Markdown
Owner

Based on this discussion with @aymanbagabas, implement “plain” 8-bit ANSI initiators, instead of my notion of “UTF-8 encoded” initiators. This represents a mode for interpreting a raw terminal stream, as opposed to assuming that everything coming in is valid UTF-8.

This is a bit of a departure / scope creep for this graphemes package, but the opportunity is to allow charmbracelet/x/ansi to have faster / simpler width calculations. This PR takes care to test for identical boundaries between their parser and mine.

Here are some preliminary local benchmarks, just iterating, not doing anything else:

goos: darwin
goarch: arm64
pkg: github.com/clipperhouse/uax29/graphemes/comparative
cpu: Apple M2
BenchmarkAnsiIteration/clipperhouse/uax29-8      	26598 ns/op	   506.80 MB/s	  0 B/op	0 allocs/op
BenchmarkAnsiIteration/charmbracelet/x/ansi-8       42117 ns/op	   320.06 MB/s	  0 B/op	0 allocs/op

Copilot AI review requested due to automatic review settings February 15, 2026 19:23

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements 8-bit ANSI escape sequence support by treating raw bytes 0x80-0x9F as C1 control codes, replacing the previous UTF-8-encoded C1 approach. The change is motivated by enabling faster ANSI width calculations in charmbracelet/x/ansi by working with raw terminal streams rather than assuming all input is valid UTF-8.

Changes:

  • Switched from UTF-8-safe C1 detection (0xC2 0x80-0x9F) to raw byte detection (0x80-0x9F)
  • Updated ANSI parsing to handle single-byte C1 controls directly instead of two-byte UTF-8 sequences
  • Added comprehensive boundary agreement tests comparing against charmbracelet/x/ansi's parser

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
graphemes/iterator.go Changed ANSI detection from UTF-8-encoded C1 (checking for 0xC2 lead byte) to raw bytes 0x80-0x9F, added ST constant
graphemes/ansi.go Refactored to handle 8-bit C1 controls as single bytes instead of UTF-8 two-byte sequences; updated terminator detection
graphemes/ansi_test.go Updated all test inputs from UTF-8-encoded C1 sequences (\xC2\x9B) to raw bytes (\x9B); removed UTF-8 validation test
graphemes/comparative/comparative_test.go Added comprehensive boundary agreement tests against charmbracelet/x/ansi parser with 60+ test cases
graphemes/comparative/go.mod Updated Go version to 1.24.2 and added charmbracelet/x/ansi dependency
graphemes/comparative/go.sum Added checksums for new dependencies

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread graphemes/iterator.go Outdated
Comment thread graphemes/ansi.go Outdated
Comment thread graphemes/ansi_test.go
Comment thread graphemes/ansi.go Outdated
Comment thread graphemes/ansi.go Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread graphemes/comparative/go.mod
@clipperhouse clipperhouse merged commit 9378e43 into master Feb 16, 2026
19 checks passed
@clipperhouse clipperhouse deleted the ansi-8-bit branch February 16, 2026 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants