-
Notifications
You must be signed in to change notification settings - Fork 7
Basic Filtering Explained
Basic Filtering is the matching algorithm defined in RFC 4647 Section 3.3.1. It determines whether a user's language preference (called a language range) matches one of your available languages (called a language tag). Understanding this algorithm is key to predicting how AcceptLanguage will behave.
A language range matches a language tag if:
- It exactly equals the tag, OR
- It exactly equals a prefix of the tag, where the next character is a hyphen (
-)
The simplest case: the range and tag are identical (case-insensitive).
AcceptLanguage.parse("en").match(:en)
# => :en ✓ (exact match)
AcceptLanguage.parse("en-US").match(:"en-US")
# => :"en-US" ✓ (exact match)
AcceptLanguage.parse("zh-Hant-TW").match(:"zh-Hant-TW")
# => :"zh-Hant-TW" ✓ (exact match)A shorter range can match a longer tag if the range is a valid prefix:
# "en" is a prefix of "en-US" (followed by hyphen)
AcceptLanguage.parse("en").match(:"en-US")
# => :"en-US" ✓
# "en" is a prefix of "en-GB" (followed by hyphen)
AcceptLanguage.parse("en").match(:"en-GB")
# => :"en-GB" ✓
# "zh" matches "zh-Hant-TW"
AcceptLanguage.parse("zh").match(:"zh-Hant-TW")
# => :"zh-Hant-TW" ✓Prefix matching only works at hyphen boundaries. This prevents false matches:
# "en" does NOT match "eng" (no hyphen after "en")
AcceptLanguage.parse("en").match(:eng)
# => nil ✗
# "zh" does NOT match "zhx" (different language code)
AcceptLanguage.parse("zh").match(:zhx)
# => nil ✗
# "de" does NOT match "deu"
AcceptLanguage.parse("de").match(:deu)
# => nil ✗This is crucial: en and eng are different ISO 639 language codes, not variants of each other.
Prefix matching only works in one direction: a shorter range can match a longer tag, but not vice versa.
# ✓ Short range → Long tag
AcceptLanguage.parse("en").match(:"en-US")
# => :"en-US"
# ✗ Long range → Short tag (does NOT match)
AcceptLanguage.parse("en-US").match(:en)
# => nilWhy? Because a user asking for en-US specifically wants American English. Generic en might be British, Australian, or any other variant—not what they requested.
Language Range: "en" Does it match? en → ✓ YES (exact match) en-US → ✓ YES (prefix + hyphen) en-GB → ✓ YES (prefix + hyphen) en-Latn-US → ✓ YES (prefix + hyphen) eng → ✗ NO (no hyphen after "en") en-US-x-custom → ✓ YES (prefix + hyphen) Language Range: "en-US" Does it match? en-US → ✓ YES (exact match) en-US-x-foo → ✓ YES (prefix + hyphen) en → ✗ NO (range is longer than tag) en-GB → ✗ NO (not a prefix match)
Matching is case-insensitive, but the original case of your available tags is preserved:
# Header uses uppercase, your tag uses lowercase
AcceptLanguage.parse("EN-US").match(:"en-us")
# => :"en-us" (your case preserved)
# Header uses lowercase, your tag uses uppercase
AcceptLanguage.parse("en-us").match(:"EN-US")
# => :"EN-US" (your case preserved)
# Mixed case works fine
AcceptLanguage.parse("eN-uS").match(:"En-Us")
# => :"En-Us"When multiple tags could match, the first one in your list wins (for the same quality level):
parser = AcceptLanguage.parse("en")
# en-US comes first in the list
parser.match(:"en-US", :"en-GB")
# => :"en-US"
# en-GB comes first in the list
parser.match(:"en-GB", :"en-US")
# => :"en-GB"This means the order of your available languages matters when there are ties.
User wants Swiss German, your app has German variants:
header = "de-CH"
# You have de-DE and de-AT, but not de-CH
AcceptLanguage.parse(header).match(:"de-DE", :"de-AT", :de)
# => nil (de-CH doesn't match any of these)Wait, why nil? Because de-CH is more specific than de, and Basic Filtering doesn't match specific→generic.
Solution: The user's browser typically sends multiple preferences:
header = "de-CH, de;q=0.9"
AcceptLanguage.parse(header).match(:"de-DE", :"de-AT", :de)
# => :de (matched via the fallback "de" preference)User wants Traditional Chinese:
header = "zh-Hant"
AcceptLanguage.parse(header).match(:"zh-Hant-TW", :"zh-Hans-CN")
# => :"zh-Hant-TW" (zh-Hant is a prefix of zh-Hant-TW)The range zh-Hant matches zh-Hant-TW but not zh-Hans-CN (different script).
User accepts any Portuguese:
header = "pt"
AcceptLanguage.parse(header).match(:"pt-BR", :"pt-PT")
# => :"pt-BR" (first match wins)# User wants "en-US", you only have "en"
AcceptLanguage.parse("en-US").match(:en)
# => nil (NOT :en!)If your app only supports generic :en, make sure to also accept generic language ranges, or encourage users to configure broader preferences.
# "no" (Norwegian) vs "nb" (Norwegian Bokmål)
AcceptLanguage.parse("no").match(:nb)
# => nil (different language codes)
# "zh" (Chinese) vs "zhx" (not a real code, but illustrates the point)
AcceptLanguage.parse("zh").match(:zhx)
# => nil (no hyphen boundary)parser = AcceptLanguage.parse("en")
# Different order = different result
parser.match(:"en-US", :"en-GB") # => :"en-US"
parser.match(:"en-GB", :"en-US") # => :"en-GB"Be intentional about the order of your available locales.
-
Working with Wildcards — Match any language with
* -
Excluding Languages — Reject languages with
q=0 - BCP 47 Language Tags — Understand scripts, regions, and variants