str — Grapheme-Aware Core Utilities

Note (2.0+): The str/core module is now internal (str/internal/core). All functions documented here are available via import str. Use str.function_name() in your code.

Overview

The core of str provides fundamental string operations that correctly handle Unicode grapheme clusters, including:

Complex emoji sequences (ZWJ, skin tones, flags)
Combining character sequences (diacritics, accents)
Multi-codepoint grapheme clusters
CRLF line endings (treated as single grapheme)

All functions in this module operate at the grapheme boundary level, ensuring Unicode correctness.

API Reference

Truncation

`truncate(text: String, max_len: Int, suffix: String) -> String`

Truncates text to a maximum number of grapheme clusters, appending a suffix.

Example:

truncate("Hello 👨‍👩‍👧‍👦 World", 8, "...")  // "Hello 👨‍👩‍👧‍👦..."

`truncate_preserve(text: String, max_len: Int, suffix: String) -> String`

Variant that prioritizes preserving complete emoji sequences.

`truncate_strict(text: String, max_len: Int, suffix: String) -> String`

Strict truncation that may split complex sequences if necessary.

`truncate_default(text: String, max_len: Int) -> String`

Convenience function using "..." as the default suffix.

`ellipsis(text: String, max_len: Int) -> String`

Truncates text with ellipsis (…) suffix.

Example:

ellipsis("Hello World", 8)  // "Hello W…"

String Reversal

`reverse(text: String) -> String`

Reverses text at grapheme cluster boundaries.

Example:

reverse("café")       // "éfac"
reverse("👨‍👩‍👧‍👦")  // "👨‍👩‍👧‍👦" (single cluster, unchanged)

Grapheme Extraction

`length(text: String) -> Int`

Returns the number of grapheme clusters in text. This is a grapheme-aware length function that correctly counts complex emoji, combining sequences, flags, and other multi-codepoint graphemes.

Example:

length("hello")       // 5
length("👨‍👩‍👧‍👦")        // 1 (single family emoji cluster)
length("café")        // 4 (with combining accent)
length("🇮🇹")          // 1 (flag is a single grapheme)
length("")            // 0

`take(text: String, n: Int) -> String`

Returns the first N grapheme clusters from text.

Example:

take("hello", 3)       // "hel"
take("👨‍👩‍👧‍👦abc", 2)  // "👨‍👩‍👧‍👦a"

`drop(text: String, n: Int) -> String`

Drops the first N grapheme clusters from text.

Example:

drop("hello", 2)       // "llo"
drop("👨‍👩‍👧‍👦abc", 1)  // "abc"

`at(text: String, index: Int) -> Result(String, Nil)`

Returns the grapheme cluster at the given index (0-based).

Example:

at("hello", 1)       // Ok("e")
at("👨‍👩‍👧‍👦abc", 0)  // Ok("👨‍👩‍👧‍👦")
at("hi", 10)         // Error(Nil)

`chunk(text: String, size: Int) -> List(String)`

Splits text into chunks of N graphemes. Like Rust's chunks() or Lodash's chunk(). The last chunk may be smaller.

Example:

chunk("abcdef", 2)   // ["ab", "cd", "ef"]
chunk("abcdefg", 3)  // ["abc", "def", "g"]
chunk("hello", 10)   // ["hello"]
chunk("👨‍👩‍👧‍👦ab", 2)   // ["👨‍👩‍👧‍👦a", "b"]

Padding

`pad_left(text: String, width: Int, pad: String) -> String`

Pads text on the left to reach the specified width.

Example:

pad_left("hi", 5, " ")    // "   hi"
pad_left("x", 3, "->")    // "->->x"

`pad_right(text: String, width: Int, pad: String) -> String`

Pads text on the right.

`center(text: String, width: Int, pad: String) -> String`

Centers text within the specified width (right-biased when uneven: extra padding goes to the right).

Example:

center("hi", 6, " ")  // "  hi  "

`fill(text: String, width: Int, pad: String, position: FillPosition) -> String`

Flexible padding function. Position is a type: Left, Right, or Both (center).

Example:

fill("x", 5, "-", Left)   // "----x"
fill("x", 5, "-", Right)  // "x----"
fill("x", 5, "-", Both)   // "--x--"
fill("42", 5, "0", Left)  // "00042"

Counting

`count(haystack: String, needle: String, overlapping: Bool) -> Int`

Counts occurrences of a substring (grapheme-aware).

Example:

count("aaaa", "aa", True)   // 3 (overlapping)
count("aaaa", "aa", False)  // 2 (non-overlapping)
count("👩👩👩", "👩", True)   // 3

Blank Detection

`is_blank(text: String) -> Bool`

Checks if a string contains only whitespace characters.

Example:

is_blank("")           // True
is_blank("   ")        // True
is_blank("  hello  ")  // False

Word and Line Operations

`words(text: String) -> List(String)`

Splits text into words by whitespace.

Example:

words("Hello  world\n\ttest")  // ["Hello", "world", "test"]

`lines(text: String) -> List(String)`

Splits text into lines. Handles \n, \r\n, and \r.

Example:

lines("a\nb\nc")    // ["a", "b", "c"]
lines("a\r\nb")     // ["a", "b"]

`splitn(text: String, sep: String, n: Int) -> List(String)`

Splits text on separator with a maximum number of parts. Like Python's str.split(sep, n).

Example:

splitn("a-b-c-d", "-", 2)  // ["a", "b-c-d"]
splitn("a-b-c-d", "-", 3)  // ["a", "b", "c-d"]
splitn("a-b", "-", 10)     // ["a", "b"]
splitn("hello", "-", 2)    // ["hello"]

`dedent(text: String) -> String`

Removes common leading whitespace from all lines.

Example:

dedent("  a\n  b\n  c")  // "a\nb\nc"

`indent(text: String, spaces: Int) -> String`

Adds indentation to each line.

Example:

indent("hello\nworld", 2)  // "  hello\n  world"

`wrap_at(text: String, width: Int) -> String`

Wraps text at the specified width, breaking on word boundaries.

Example:

wrap_at("hello world foo bar", 11)  // "hello world\nfoo bar"

`chomp(text: String) -> String`

Removes trailing newline if present (handles \n, \r\n, \r as graphemes).

Example:

chomp("hello\n")    // "hello"
chomp("hello\r\n")  // "hello"

String Wrapping

`surround(text: String, prefix: String, suffix: String) -> String`

Wraps text with prefix and suffix.

Example:

surround("world", "Hello ", "!")  // "Hello world!"

`unwrap(text: String, prefix: String, suffix: String) -> String`

Removes prefix and suffix if both are present.

Character Stripping

`strip(text: String, chars: String) -> String`

Removes specified characters from both ends of text.

Example:

strip("..hello..", ".")  // "hello"

`squeeze(text: String, char: String) -> String`

Collapses consecutive occurrences of a character to a single instance.

Example:

squeeze("heeello", "e")              // "hello"
squeeze("   hello   world   ", " ")  // " hello world "

Partitioning

`partition(text: String, sep: String) -> #(String, String, String)`

Splits text into three parts: before, separator, and after.

Example:

partition("a-b-c", "-")  // #("a", "-", "b-c")
partition("hello", "-")  // #("hello", "", "")

`rpartition(text: String, sep: String) -> #(String, String, String)`

Splits text from the last occurrence of separator. Like Python's str.rpartition(). If separator not found, returns #("", "", text).

Example:

rpartition("a-b-c", "-")    // #("a-b", "-", "c")
rpartition("hello", "-")    // #("", "", "hello")
rpartition("a--b--c", "--") // #("a--b", "--", "c")

`common_prefix(strings: List(String)) -> String`

Finds the longest common prefix among a list of strings.

Example:

common_prefix(["abc", "abd", "abe"])  // "ab"

`common_suffix(strings: List(String)) -> String`

Finds the longest common suffix among a list of strings.

Example:

common_suffix(["abc", "xbc", "zbc"])  // "bc"

Character Type Checks

`is_numeric(text: String) -> Bool`

Checks if text contains only ASCII digits (0-9).

Example:

is_numeric("12345")   // True
is_numeric("123.45")  // False

`is_alpha(text: String) -> Bool`

Checks if text contains only ASCII letters (a-z, A-Z).

Example:

is_alpha("hello")     // True
is_alpha("hello123")  // False

`is_alphanumeric(text: String) -> Bool`

Checks if text contains only ASCII letters and digits.

Example:

is_alphanumeric("hello123")    // True
is_alphanumeric("hello-world") // False

Prefix/Suffix Manipulation

`remove_prefix(text: String, prefix: String) -> String`

Removes prefix from text if present.

Example:

remove_prefix("hello world", "hello ")  // "world"
remove_prefix("hello", "bye")           // "hello"

`remove_suffix(text: String, suffix: String) -> String`

Removes suffix from text if present.

`ensure_prefix(text: String, prefix: String) -> String`

Adds prefix if not already present.

Example:

ensure_prefix("world", "hello ")        // "hello world"
ensure_prefix("hello world", "hello ")  // "hello world"

`ensure_suffix(text: String, suffix: String) -> String`

Adds suffix if not already present.

`starts_with_any(text: String, prefixes: List(String)) -> Bool`

Checks if text starts with any of the given prefixes. Like Lodash's startsWith with multiple options.

Example:

starts_with_any("hello", ["hi", "he", "ha"])  // True
starts_with_any("hello", ["x", "y", "z"])     // False
starts_with_any("", ["a"])                     // False
starts_with_any("hello", [])                   // False

`ends_with_any(text: String, suffixes: List(String)) -> Bool`

Checks if text ends with any of the given suffixes.

Example:

ends_with_any("file.txt", [".txt", ".md"])   // True
ends_with_any("file.rs", [".txt", ".md"])    // False
ends_with_any("hello", ["lo", "llo", "o"])   // True

Case Manipulation

`swapcase(text: String) -> String`

Swaps case of all ASCII letters.

Example:

swapcase("Hello World")  // "hELLO wORLD"

`capitalize(text: String) -> String`

Capitalizes first grapheme and lowercases the rest. Like Python's str.capitalize().

Example:

capitalize("hello world")  // "Hello world"
capitalize("hELLO wORLD")  // "Hello world"
capitalize("HELLO")        // "Hello"
capitalize("123abc")       // "123abc"

String Distance

`distance(a: String, b: String) -> Int`

Calculates Levenshtein distance between two strings.

Example:

distance("kitten", "sitting")  // 3
distance("hello", "hello")     // 0

Search and Index

`index_of(text: String, needle: String) -> Result(Int, Nil)`

Finds the index of the first occurrence of needle in text (grapheme-aware).

Example:

index_of("hello world", "world")  // Ok(6)
index_of("👨‍👩‍👧‍👦 family", "family")   // Ok(2)
index_of("hello", "x")            // Error(Nil)

`last_index_of(text: String, needle: String) -> Result(Int, Nil)`

Finds the index of the last occurrence of needle in text.

Example:

last_index_of("hello hello", "hello")  // Ok(6)
last_index_of("a-b-c", "-")            // Ok(3)

`contains(text: String, needle: String) -> Bool`

Returns True if needle is found in text. This is grapheme-aware and correctly handles complex Unicode sequences.

Example:

contains("hello world", "world")  // True
contains("hello", "x")            // False
contains("👨‍👩‍👧‍👦 family", "👨‍👩‍👧‍👦")    // True
contains("", "")                  // False

`starts_with(text: String, prefix: String) -> Bool`

Returns True if text starts with prefix on grapheme boundaries.

Example:

starts_with("hello", "he")         // True
starts_with("hello", "")           // True
starts_with("hi", "hello")         // False
starts_with("👨‍👩‍👧‍👦abc", "👨‍👩‍👧‍👦")      // True

`ends_with(text: String, suffix: String) -> Bool`

Returns True if text ends with suffix on grapheme boundaries.

Example:

ends_with("hello.txt", ".txt")     // True
ends_with("hello", "")             // True
ends_with("hi", "hello")           // False
ends_with("abc👨‍👩‍👧‍👦", "👨‍👩‍👧‍👦")        // True

`contains_any(text: String, needles: List(String)) -> Bool`

Checks if text contains any of the given needles.

Example:

contains_any("hello world", ["foo", "world"])  // True
contains_any("hello", ["x", "y", "z"])         // False

`contains_all(text: String, needles: List(String)) -> Bool`

Checks if text contains all of the given needles.

Example:

contains_all("hello world", ["hello", "world"])  // True
contains_all("hello", ["hello", "x"])            // False

Replacement Variants

`replace_first(text: String, old: String, new: String) -> String`

Replaces only the first occurrence of old with new.

Example:

replace_first("hello hello", "hello", "hi")  // "hi hello"
replace_first("aaa", "a", "b")               // "baa"

`replace_last(text: String, old: String, new: String) -> String`

Replaces only the last occurrence of old with new.

Example:

replace_last("hello hello", "hello", "hi")  // "hello hi"
replace_last("aaa", "a", "b")               // "aab"

Validation Functions

`is_uppercase(text: String) -> Bool`

Checks if all cased characters are uppercase. Non-cased characters are ignored.

Example:

is_uppercase("HELLO")     // True
is_uppercase("Hello")     // False
is_uppercase("HELLO123")  // True (numbers ignored)
is_uppercase("123")       // False (no cased chars)

`is_lowercase(text: String) -> Bool`

Checks if all cased characters are lowercase.

Example:

is_lowercase("hello")     // True
is_lowercase("Hello")     // False
is_lowercase("hello123")  // True

`is_title_case(text: String) -> Bool`

Checks if text is in Title Case format: each word starts with uppercase and continues with lowercase. Words that don't start with a letter (numbers, emoji, punctuation) are ignored.

Example:

is_title_case("Hello World")        // True
is_title_case("Hello world")        // False
is_title_case("Hello 123 World")    // True (numbers ignored)
is_title_case("Hello 🎉 World")     // True (emoji ignored)
is_title_case("")                   // False

`is_empty(text: String) -> Bool`

Returns True if text is an empty string.

Example:

is_empty("")   // True
is_empty(" ")  // False
is_empty("a")  // False

`is_ascii(text: String) -> Bool`

Checks if text contains only ASCII characters (0x00-0x7F).

Example:

is_ascii("hello!@#")  // True
is_ascii("café")      // False
is_ascii("👋")        // False

`is_printable(text: String) -> Bool`

Checks if text contains only printable ASCII characters (0x20-0x7E).

Example:

is_printable("hello")    // True
is_printable("hello\n")  // False
is_printable("hello\t")  // False

`is_hex(text: String) -> Bool`

Checks if text contains only hexadecimal characters (0-9, a-f, A-F).

Example:

is_hex("abc123")   // True
is_hex("DEADBEEF") // True
is_hex("xyz")      // False

HTML Escaping

`escape_html(text: String) -> String`

Escapes HTML special characters to their entity equivalents.

Example:

escape_html("<div>Hello</div>")  // "&lt;div&gt;Hello&lt;/div&gt;"
escape_html("Tom & Jerry")       // "Tom &amp; Jerry"
escape_html("Say \"hello\"")     // "Say &quot;hello&quot;"

`unescape_html(text: String) -> String`

Unescapes HTML entities to their character equivalents.

Example:

unescape_html("&lt;div&gt;")     // "<div>"
unescape_html("Tom &amp; Jerry") // "Tom & Jerry"

`escape_regex(text: String) -> String`

Escapes regex metacharacters for use as a literal pattern.

Example:

escape_regex("hello.world")  // "hello\\.world"
escape_regex("[test]")       // "\\[test\\]"
escape_regex("a+b*c?")       // "a\\+b\\*c\\?"

Similarity

`similarity(a: String, b: String) -> Float`

Calculates similarity as a percentage (0.0 to 1.0) based on Levenshtein distance.

Example:

similarity("hello", "hello")  // 1.0
similarity("hello", "hallo")  // 0.8
similarity("abc", "xyz")      // 0.0

`hamming_distance(a: String, b: String) -> Result(Int, Nil)`

Calculates Hamming distance between two strings of equal length.

Example:

hamming_distance("karolin", "kathrin")  // Ok(3)
hamming_distance("hello", "hallo")      // Ok(1)
hamming_distance("abc", "ab")           // Error(Nil)

Additional Transformations

`take_right(text: String, n: Int) -> String`

Returns the last N grapheme clusters from text.

Example:

take_right("hello", 3)       // "llo"
take_right("👨‍👩‍👧‍👦abc", 2)  // "bc"

`drop_right(text: String, n: Int) -> String`

Drops the last N grapheme clusters from text.

Example:

drop_right("hello", 2)       // "hel"
drop_right("👨‍👩‍👧‍👦abc", 2)  // "👨‍👩‍👧‍👦a"

`reverse_words(text: String) -> String`

Reverses the order of words in text.

Example:

reverse_words("hello world")    // "world hello"
reverse_words("one two three")  // "three two one"

`initials(text: String) -> String`

Extracts initials from text (first letter of each word, uppercase).

Example:

initials("John Doe")            // "JD"
initials("visual studio code")  // "VSC"

`normalize_whitespace(text: String) -> String`

Collapses all consecutive whitespace (spaces, tabs, newlines) into single spaces and trims. Like JavaScript's equivalent.

Example:

normalize_whitespace("  hello   world  ")     // "hello world"
normalize_whitespace("hello\n\tworld")        // "hello world"
normalize_whitespace("  a  b  c  ")           // "a b c"
normalize_whitespace("")                      // ""

Implementation Notes

Grapheme Cluster Detection

The module uses string.to_graphemes/1 from the Gleam standard library for grapheme segmentation, which provides Unicode-compliant grapheme cluster boundaries (UAX #29).

Key behaviors:

\r\n is treated as a single grapheme (CRLF cluster)
Emoji ZWJ sequences are single graphemes
Combining marks stay attached to their base character

Performance Considerations

All functions operate in linear time with respect to the number of grapheme clusters. For very large strings (>100KB), consider pre-processing or chunking.

FilesExpand file tree

str_core.md

Latest commit

History

str_core.md

File metadata and controls

str — Grapheme-Aware Core Utilities

Overview

API Reference

Truncation

truncate(text: String, max_len: Int, suffix: String) -> String

truncate_preserve(text: String, max_len: Int, suffix: String) -> String

truncate_strict(text: String, max_len: Int, suffix: String) -> String

truncate_default(text: String, max_len: Int) -> String

ellipsis(text: String, max_len: Int) -> String

String Reversal

reverse(text: String) -> String

Grapheme Extraction

length(text: String) -> Int

take(text: String, n: Int) -> String

drop(text: String, n: Int) -> String

at(text: String, index: Int) -> Result(String, Nil)

chunk(text: String, size: Int) -> List(String)

Padding

pad_left(text: String, width: Int, pad: String) -> String

pad_right(text: String, width: Int, pad: String) -> String

center(text: String, width: Int, pad: String) -> String

fill(text: String, width: Int, pad: String, position: FillPosition) -> String

Counting

count(haystack: String, needle: String, overlapping: Bool) -> Int

Blank Detection

is_blank(text: String) -> Bool

Word and Line Operations

words(text: String) -> List(String)

lines(text: String) -> List(String)

splitn(text: String, sep: String, n: Int) -> List(String)

dedent(text: String) -> String

indent(text: String, spaces: Int) -> String

wrap_at(text: String, width: Int) -> String

chomp(text: String) -> String

String Wrapping

surround(text: String, prefix: String, suffix: String) -> String

unwrap(text: String, prefix: String, suffix: String) -> String

Character Stripping

strip(text: String, chars: String) -> String

squeeze(text: String, char: String) -> String

Partitioning

partition(text: String, sep: String) -> #(String, String, String)

rpartition(text: String, sep: String) -> #(String, String, String)

common_prefix(strings: List(String)) -> String

common_suffix(strings: List(String)) -> String

Character Type Checks

is_numeric(text: String) -> Bool

is_alpha(text: String) -> Bool

is_alphanumeric(text: String) -> Bool

Prefix/Suffix Manipulation

remove_prefix(text: String, prefix: String) -> String

remove_suffix(text: String, suffix: String) -> String

ensure_prefix(text: String, prefix: String) -> String

ensure_suffix(text: String, suffix: String) -> String

starts_with_any(text: String, prefixes: List(String)) -> Bool

ends_with_any(text: String, suffixes: List(String)) -> Bool

Case Manipulation

swapcase(text: String) -> String

capitalize(text: String) -> String

String Distance

distance(a: String, b: String) -> Int

Search and Index

index_of(text: String, needle: String) -> Result(Int, Nil)

last_index_of(text: String, needle: String) -> Result(Int, Nil)

contains(text: String, needle: String) -> Bool

starts_with(text: String, prefix: String) -> Bool

ends_with(text: String, suffix: String) -> Bool

contains_any(text: String, needles: List(String)) -> Bool

contains_all(text: String, needles: List(String)) -> Bool

Replacement Variants

replace_first(text: String, old: String, new: String) -> String

replace_last(text: String, old: String, new: String) -> String

Validation Functions

is_uppercase(text: String) -> Bool

`truncate(text: String, max_len: Int, suffix: String) -> String`

`truncate_preserve(text: String, max_len: Int, suffix: String) -> String`

`truncate_strict(text: String, max_len: Int, suffix: String) -> String`

`truncate_default(text: String, max_len: Int) -> String`

`ellipsis(text: String, max_len: Int) -> String`

`reverse(text: String) -> String`

`length(text: String) -> Int`

`take(text: String, n: Int) -> String`

`drop(text: String, n: Int) -> String`

`at(text: String, index: Int) -> Result(String, Nil)`

`chunk(text: String, size: Int) -> List(String)`

`pad_left(text: String, width: Int, pad: String) -> String`

`pad_right(text: String, width: Int, pad: String) -> String`

`center(text: String, width: Int, pad: String) -> String`

`fill(text: String, width: Int, pad: String, position: FillPosition) -> String`

`count(haystack: String, needle: String, overlapping: Bool) -> Int`

`is_blank(text: String) -> Bool`

`words(text: String) -> List(String)`

`lines(text: String) -> List(String)`

`splitn(text: String, sep: String, n: Int) -> List(String)`

`dedent(text: String) -> String`

`indent(text: String, spaces: Int) -> String`

`wrap_at(text: String, width: Int) -> String`

`chomp(text: String) -> String`

`surround(text: String, prefix: String, suffix: String) -> String`

`unwrap(text: String, prefix: String, suffix: String) -> String`

`strip(text: String, chars: String) -> String`

`squeeze(text: String, char: String) -> String`

`partition(text: String, sep: String) -> #(String, String, String)`

`rpartition(text: String, sep: String) -> #(String, String, String)`

`common_prefix(strings: List(String)) -> String`

`common_suffix(strings: List(String)) -> String`

`is_numeric(text: String) -> Bool`

`is_alpha(text: String) -> Bool`

`is_alphanumeric(text: String) -> Bool`

`remove_prefix(text: String, prefix: String) -> String`

`remove_suffix(text: String, suffix: String) -> String`

`ensure_prefix(text: String, prefix: String) -> String`

`ensure_suffix(text: String, suffix: String) -> String`

`starts_with_any(text: String, prefixes: List(String)) -> Bool`

`ends_with_any(text: String, suffixes: List(String)) -> Bool`

`swapcase(text: String) -> String`

`capitalize(text: String) -> String`

`distance(a: String, b: String) -> Int`

`index_of(text: String, needle: String) -> Result(Int, Nil)`

`last_index_of(text: String, needle: String) -> Result(Int, Nil)`

`contains(text: String, needle: String) -> Bool`

`starts_with(text: String, prefix: String) -> Bool`

`ends_with(text: String, suffix: String) -> Bool`

`contains_any(text: String, needles: List(String)) -> Bool`

`contains_all(text: String, needles: List(String)) -> Bool`

`replace_first(text: String, old: String, new: String) -> String`

`replace_last(text: String, old: String, new: String) -> String`

`is_uppercase(text: String) -> Bool`

`is_lowercase(text: String) -> Bool`

`is_title_case(text: String) -> Bool`

`is_empty(text: String) -> Bool`

`is_ascii(text: String) -> Bool`

`is_printable(text: String) -> Bool`

`is_hex(text: String) -> Bool`

`escape_html(text: String) -> String`

`unescape_html(text: String) -> String`

`escape_regex(text: String) -> String`

`similarity(a: String, b: String) -> Float`

`hamming_distance(a: String, b: String) -> Result(Int, Nil)`

`take_right(text: String, n: Int) -> String`

`drop_right(text: String, n: Int) -> String`

`reverse_words(text: String) -> String`

`initials(text: String) -> String`

`normalize_whitespace(text: String) -> String`