Implement 8-bit ANSI#49
Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements 8-bit ANSI escape sequence support by treating raw bytes 0x80-0x9F as C1 control codes, replacing the previous UTF-8-encoded C1 approach. The change is motivated by enabling faster ANSI width calculations in charmbracelet/x/ansi by working with raw terminal streams rather than assuming all input is valid UTF-8.
Changes:
- Switched from UTF-8-safe C1 detection (0xC2 0x80-0x9F) to raw byte detection (0x80-0x9F)
- Updated ANSI parsing to handle single-byte C1 controls directly instead of two-byte UTF-8 sequences
- Added comprehensive boundary agreement tests comparing against charmbracelet/x/ansi's parser
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| graphemes/iterator.go | Changed ANSI detection from UTF-8-encoded C1 (checking for 0xC2 lead byte) to raw bytes 0x80-0x9F, added ST constant |
| graphemes/ansi.go | Refactored to handle 8-bit C1 controls as single bytes instead of UTF-8 two-byte sequences; updated terminator detection |
| graphemes/ansi_test.go | Updated all test inputs from UTF-8-encoded C1 sequences (\xC2\x9B) to raw bytes (\x9B); removed UTF-8 validation test |
| graphemes/comparative/comparative_test.go | Added comprehensive boundary agreement tests against charmbracelet/x/ansi parser with 60+ test cases |
| graphemes/comparative/go.mod | Updated Go version to 1.24.2 and added charmbracelet/x/ansi dependency |
| graphemes/comparative/go.sum | Added checksums for new dependencies |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 10 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Based on this discussion with @aymanbagabas, implement “plain” 8-bit ANSI initiators, instead of my notion of “UTF-8 encoded” initiators. This represents a mode for interpreting a raw terminal stream, as opposed to assuming that everything coming in is valid UTF-8.
This is a bit of a departure / scope creep for this
graphemespackage, but the opportunity is to allowcharmbracelet/x/ansito have faster / simpler width calculations. This PR takes care to test for identical boundaries between their parser and mine.Here are some preliminary local benchmarks, just iterating, not doing anything else: