I'm working on a code editor that uses tree-sitter for syntax highlighting and supporting text like "π»π©βπ©βπ¦βπ¦" is a design goal.
Tree sitter's API references all text snippets by their byte offsets. For example, if tree sitter parses the clojure code (declare foo), it will tell you there's a list literal starting at byte offset 0, a symbol declare at byte offset 1, and a symbol foo at byte offset 9. To construct a view of the syntax highlighted text, I then need to grab chunks of text by their byte offsets.
On the UI side, the cursor needs to move forward and backwards by grapheme cluster. As an example, π©βπ©βπ¦βπ¦is 1 grapheme cluster or 25 utf-8 bytes or 7 code points or 11 java characters. It's really convenient that Bifurcan's Ropes already deals with the discrepancy between code points and java characters π.
I'm not sure if there's interest in extending Bifurcan's Rope book keeping to also support lookup by byte offset and/or grapheme cluster offset.
Thanks for the great library!
I'm working on a code editor that uses tree-sitter for syntax highlighting and supporting text like
"π»π©βπ©βπ¦βπ¦"is a design goal.Tree sitter's API references all text snippets by their byte offsets. For example, if tree sitter parses the clojure code
(declare foo), it will tell you there's a list literal starting at byte offset 0, a symboldeclareat byte offset 1, and a symbolfooat byte offset 9. To construct a view of the syntax highlighted text, I then need to grab chunks of text by their byte offsets.On the UI side, the cursor needs to move forward and backwards by grapheme cluster. As an example, π©βπ©βπ¦βπ¦is 1 grapheme cluster or 25 utf-8 bytes or 7 code points or 11 java characters. It's really convenient that Bifurcan's Ropes already deals with the discrepancy between code points and java characters π.
I'm not sure if there's interest in extending Bifurcan's Rope book keeping to also support lookup by byte offset and/or grapheme cluster offset.
Thanks for the great library!