Skip to content

Basic Filtering Explained

Cyril Kato edited this page Jan 20, 2026 · 1 revision

Table of Contents

Basic Filtering Explained

Basic Filtering is the matching algorithm defined in RFC 4647 Section 3.3.1. It determines whether a user's language preference (called a language range) matches one of your available languages (called a language tag). Understanding this algorithm is key to predicting how AcceptLanguage will behave.

The Core Rule

A language range matches a language tag if:

  1. It exactly equals the tag, OR
  2. It exactly equals a prefix of the tag, where the next character is a hyphen (-)
That's it. This simple rule handles all the complexity of language matching.

Exact Matching

The simplest case: the range and tag are identical (case-insensitive).

AcceptLanguage.parse("en").match(:en)
# => :en ✓ (exact match)

AcceptLanguage.parse("en-US").match(:"en-US")
# => :"en-US" ✓ (exact match)

AcceptLanguage.parse("zh-Hant-TW").match(:"zh-Hant-TW")
# => :"zh-Hant-TW" ✓ (exact match)

Prefix Matching

A shorter range can match a longer tag if the range is a valid prefix:

# "en" is a prefix of "en-US" (followed by hyphen)
AcceptLanguage.parse("en").match(:"en-US")
# => :"en-US" ✓

# "en" is a prefix of "en-GB" (followed by hyphen)
AcceptLanguage.parse("en").match(:"en-GB")
# => :"en-GB" ✓

# "zh" matches "zh-Hant-TW"
AcceptLanguage.parse("zh").match(:"zh-Hant-TW")
# => :"zh-Hant-TW" ✓

The Hyphen Boundary Rule

Prefix matching only works at hyphen boundaries. This prevents false matches:

# "en" does NOT match "eng" (no hyphen after "en")
AcceptLanguage.parse("en").match(:eng)
# => nil ✗

# "zh" does NOT match "zhx" (different language code)
AcceptLanguage.parse("zh").match(:zhx)
# => nil ✗

# "de" does NOT match "deu"
AcceptLanguage.parse("de").match(:deu)
# => nil ✗

This is crucial: en and eng are different ISO 639 language codes, not variants of each other.

One-Way Matching

Prefix matching only works in one direction: a shorter range can match a longer tag, but not vice versa.

# ✓ Short range → Long tag
AcceptLanguage.parse("en").match(:"en-US")
# => :"en-US"

# ✗ Long range → Short tag (does NOT match)
AcceptLanguage.parse("en-US").match(:en)
# => nil

Why? Because a user asking for en-US specifically wants American English. Generic en might be British, Australian, or any other variant—not what they requested.

Visual Diagram

Language Range: "en"

Does it match?

  en           → ✓ YES (exact match)
  en-US        → ✓ YES (prefix + hyphen)
  en-GB        → ✓ YES (prefix + hyphen)
  en-Latn-US   → ✓ YES (prefix + hyphen)
  eng          → ✗ NO  (no hyphen after "en")
  en-US-x-custom → ✓ YES (prefix + hyphen)


Language Range: "en-US"

Does it match?

  en-US        → ✓ YES (exact match)
  en-US-x-foo  → ✓ YES (prefix + hyphen)
  en           → ✗ NO  (range is longer than tag)
  en-GB        → ✗ NO  (not a prefix match)

Case Insensitivity

Matching is case-insensitive, but the original case of your available tags is preserved:

# Header uses uppercase, your tag uses lowercase
AcceptLanguage.parse("EN-US").match(:"en-us")
# => :"en-us" (your case preserved)

# Header uses lowercase, your tag uses uppercase
AcceptLanguage.parse("en-us").match(:"EN-US")
# => :"EN-US" (your case preserved)

# Mixed case works fine
AcceptLanguage.parse("eN-uS").match(:"En-Us")
# => :"En-Us"

Matching Multiple Available Languages

When multiple tags could match, the first one in your list wins (for the same quality level):

parser = AcceptLanguage.parse("en")

# en-US comes first in the list
parser.match(:"en-US", :"en-GB")
# => :"en-US"

# en-GB comes first in the list
parser.match(:"en-GB", :"en-US")
# => :"en-GB"

This means the order of your available languages matters when there are ties.

Real-World Scenarios

Scenario 1: Regional Fallback

User wants Swiss German, your app has German variants:

header = "de-CH"

# You have de-DE and de-AT, but not de-CH
AcceptLanguage.parse(header).match(:"de-DE", :"de-AT", :de)
# => nil (de-CH doesn't match any of these)

Wait, why nil? Because de-CH is more specific than de, and Basic Filtering doesn't match specific→generic.

Solution: The user's browser typically sends multiple preferences:

header = "de-CH, de;q=0.9"

AcceptLanguage.parse(header).match(:"de-DE", :"de-AT", :de)
# => :de (matched via the fallback "de" preference)

Scenario 2: Script Variants

User wants Traditional Chinese:

header = "zh-Hant"

AcceptLanguage.parse(header).match(:"zh-Hant-TW", :"zh-Hans-CN")
# => :"zh-Hant-TW" (zh-Hant is a prefix of zh-Hant-TW)

The range zh-Hant matches zh-Hant-TW but not zh-Hans-CN (different script).

Scenario 3: Language Without Region

User accepts any Portuguese:

header = "pt"

AcceptLanguage.parse(header).match(:"pt-BR", :"pt-PT")
# => :"pt-BR" (first match wins)

Common Pitfalls

Pitfall 1: Expecting Reverse Matching

# User wants "en-US", you only have "en"
AcceptLanguage.parse("en-US").match(:en)
# => nil (NOT :en!)

If your app only supports generic :en, make sure to also accept generic language ranges, or encourage users to configure broader preferences.

Pitfall 2: Confusing Similar Codes

# "no" (Norwegian) vs "nb" (Norwegian Bokmål)
AcceptLanguage.parse("no").match(:nb)
# => nil (different language codes)

# "zh" (Chinese) vs "zhx" (not a real code, but illustrates the point)
AcceptLanguage.parse("zh").match(:zhx)
# => nil (no hyphen boundary)

Pitfall 3: Order of Available Languages

parser = AcceptLanguage.parse("en")

# Different order = different result
parser.match(:"en-US", :"en-GB")  # => :"en-US"
parser.match(:"en-GB", :"en-US")  # => :"en-GB"

Be intentional about the order of your available locales.

What's Next?