This guide explains how to integrate OTP-based Unicode normalization with the str library while maintaining the library's zero-dependency design.
Keep OTP integration in your application code, not in the library.
This approach ensures:
- The
strlibrary remains dependency-free - Applications can opt into OTP features as needed
- The library works in any Gleam runtime environment
Unicode normalization (NFC, NFD, NFKC, NFKD) requires substantial Unicode data and algorithms. Erlang/OTP provides robust built-in support via the :unicode module, but including it as a hard dependency would:
- Force all users to accept the OTP dependency
- Limit portability to JavaScript or other non-BEAM targets
- Increase compilation and deployment overhead for simple use cases
By accepting normalizer functions as parameters, str allows applications to choose their normalization strategy.
Create a module in your application (e.g., src/unicode_helpers.gleam):
// src/unicode_helpers.gleam
import gleam/dynamic
/// Normalize to NFD (canonical decomposition)
pub fn nfd(text: String) -> String {
// Use Erlang FFI to call :unicode.characters_to_nfd_binary/1
do_normalize(text, "nfd")
}
/// Normalize to NFC (canonical composition)
pub fn nfc(text: String) -> String {
do_normalize(text, "nfc")
}
@external(erlang, "unicode", "characters_to_nfd_binary")
fn do_normalize(text: String, mode: String) -> StringPass your normalizer to the *_with_normalizer variants:
import str/extra
import unicode_helpers
pub fn process_title(title: String) -> String {
// Use NFC normalization before ASCII folding
extra.ascii_fold_with_normalizer(title, unicode_helpers.nfc)
}
pub fn create_slug(text: String) -> String {
// Use NFD for decomposition-based transliteration
extra.slugify_with_normalizer(text, unicode_helpers.nfd)
}For more control, use slugify_opts_with_normalizer:
pub fn create_url_slug(text: String, max_words: Int) -> String {
extra.slugify_opts_with_normalizer(
text,
max_words, // Token limit
"-", // Separator
False, // Convert to ASCII
unicode_helpers.nfd
)
}For testing or development without OTP, create mock normalizers:
// test/helpers.gleam
pub fn mock_nfd(text: String) -> String {
text
|> string.replace("é", "e\u{0301}")
|> string.replace("ñ", "n\u{0303}")
// Add more decompositions as needed
}// src/slug.gleam
import str/extra
import unicode_helpers
/// Create a URL-friendly slug from arbitrary text
pub fn from_text(text: String) -> String {
extra.slugify_opts_with_normalizer(
text,
0, // No word limit
"-", // Hyphen separator
False, // ASCII output only
unicode_helpers.nfd
)
}
/// Create a slug preserving Unicode characters
pub fn from_text_unicode(text: String) -> String {
extra.slugify_opts_with_normalizer(
text,
0,
"-",
True, // Preserve Unicode
unicode_helpers.nfc
)
}
// Usage:
// slug.from_text("Café Münchner") -> "cafe-munchner"
// slug.from_text_unicode("Café Münchner") -> "café-münchner"ascii_fold_with_normalizer(text, normalizer)ascii_fold_no_decompose_with_normalizer(text, normalizer)slugify_with_normalizer(text, normalizer)— Convenience aliasslugify_opts_with_normalizer(text, max_len, sep, preserve_unicode, normalizer)
pub type Normalizer = fn(String) -> StringAll normalizers must accept a string and return a normalized string.
-
Centralize Normalizers: Keep all OTP normalization logic in one module for easy testing and maintenance
-
Choose Normalization Form:
- Use NFD (decomposition) for transliteration and ASCII folding
- Use NFC (composition) for display and storage
- Consider NFKD/NFKC for compatibility normalization
-
Test with Real Data: Include test cases with actual Unicode edge cases (emoji, combining marks, ligatures)
-
Document Choices: Explain which normalization form you're using and why