Basic Filtering Explained

Table of Contents Basic Filtering Explained The Core Rule Exact Matching Prefix Matching The Hyphen Boundary Rule One-Way Matching Visual Diagram Case Insensitivity Matching Multiple Available Languages Real-World Scenarios Scenario 1: Regional Fallback Scenario 2: Script Variants Scenario 3: Language Without Region Common Pitfalls Pitfall 1: Expecting Reverse Matching Pitfall 2: Confusing Similar Codes Pitfall 3: Order of Available Languages What's Next?

Basic Filtering Explained

Basic Filtering is the matching algorithm defined in RFC 4647 Section 3.3.1. It determines whether a user's language preference (called a language range) matches one of your available languages (called a language tag). Understanding this algorithm is key to predicting how AcceptLanguage will behave.

The Core Rule

A language range matches a language tag if:

It exactly equals the tag, OR
It exactly equals a prefix of the tag, where the next character is a hyphen (-)

That's it. This simple rule handles all the complexity of language matching.

Exact Matching

The simplest case: the range and tag are identical (case-insensitive).

AcceptLanguage.parse("en").match(:en)
# => :en ✓ (exact match)

AcceptLanguage.parse("en-US").match(:"en-US")
# => :"en-US" ✓ (exact match)

AcceptLanguage.parse("zh-Hant-TW").match(:"zh-Hant-TW")
# => :"zh-Hant-TW" ✓ (exact match)

Prefix Matching

A shorter range can match a longer tag if the range is a valid prefix:

# "en" is a prefix of "en-US" (followed by hyphen)
AcceptLanguage.parse("en").match(:"en-US")
# => :"en-US" ✓

# "en" is a prefix of "en-GB" (followed by hyphen)
AcceptLanguage.parse("en").match(:"en-GB")
# => :"en-GB" ✓

# "zh" matches "zh-Hant-TW"
AcceptLanguage.parse("zh").match(:"zh-Hant-TW")
# => :"zh-Hant-TW" ✓

The Hyphen Boundary Rule

Prefix matching only works at hyphen boundaries. This prevents false matches:

# "en" does NOT match "eng" (no hyphen after "en")
AcceptLanguage.parse("en").match(:eng)
# => nil ✗

# "zh" does NOT match "zhx" (different language code)
AcceptLanguage.parse("zh").match(:zhx)
# => nil ✗

# "de" does NOT match "deu"
AcceptLanguage.parse("de").match(:deu)
# => nil ✗

This is crucial: en and eng are different ISO 639 language codes, not variants of each other.

One-Way Matching

Prefix matching only works in one direction: a shorter range can match a longer tag, but not vice versa.

# ✓ Short range → Long tag
AcceptLanguage.parse("en").match(:"en-US")
# => :"en-US"

# ✗ Long range → Short tag (does NOT match)
AcceptLanguage.parse("en-US").match(:en)
# => nil

Why? Because a user asking for en-US specifically wants American English. Generic en might be British, Australian, or any other variant—not what they requested.

Visual Diagram

Language Range: "en"

Does it match?

  en           → ✓ YES (exact match)
  en-US        → ✓ YES (prefix + hyphen)
  en-GB        → ✓ YES (prefix + hyphen)
  en-Latn-US   → ✓ YES (prefix + hyphen)
  eng          → ✗ NO  (no hyphen after "en")
  en-US-x-custom → ✓ YES (prefix + hyphen)


Language Range: "en-US"

Does it match?

  en-US        → ✓ YES (exact match)
  en-US-x-foo  → ✓ YES (prefix + hyphen)
  en           → ✗ NO  (range is longer than tag)
  en-GB        → ✗ NO  (not a prefix match)

Case Insensitivity

Matching is case-insensitive, but the original case of your available tags is preserved:

# Header uses uppercase, your tag uses lowercase
AcceptLanguage.parse("EN-US").match(:"en-us")
# => :"en-us" (your case preserved)

# Header uses lowercase, your tag uses uppercase
AcceptLanguage.parse("en-us").match(:"EN-US")
# => :"EN-US" (your case preserved)

# Mixed case works fine
AcceptLanguage.parse("eN-uS").match(:"En-Us")
# => :"En-Us"

Matching Multiple Available Languages

When multiple tags could match, the first one in your list wins (for the same quality level):

parser = AcceptLanguage.parse("en")

# en-US comes first in the list
parser.match(:"en-US", :"en-GB")
# => :"en-US"

# en-GB comes first in the list
parser.match(:"en-GB", :"en-US")
# => :"en-GB"

This means the order of your available languages matters when there are ties.

Real-World Scenarios

Scenario 1: Regional Fallback

User wants Swiss German, your app has German variants:

header = "de-CH"

# You have de-DE and de-AT, but not de-CH
AcceptLanguage.parse(header).match(:"de-DE", :"de-AT", :de)
# => nil (de-CH doesn't match any of these)

Wait, why nil? Because de-CH is more specific than de, and Basic Filtering doesn't match specific→generic.

Solution: The user's browser typically sends multiple preferences:

header = "de-CH, de;q=0.9"

AcceptLanguage.parse(header).match(:"de-DE", :"de-AT", :de)
# => :de (matched via the fallback "de" preference)

Scenario 2: Script Variants

User wants Traditional Chinese:

header = "zh-Hant"

AcceptLanguage.parse(header).match(:"zh-Hant-TW", :"zh-Hans-CN")
# => :"zh-Hant-TW" (zh-Hant is a prefix of zh-Hant-TW)

The range zh-Hant matches zh-Hant-TW but not zh-Hans-CN (different script).

Scenario 3: Language Without Region

User accepts any Portuguese:

header = "pt"

AcceptLanguage.parse(header).match(:"pt-BR", :"pt-PT")
# => :"pt-BR" (first match wins)

Common Pitfalls

Pitfall 1: Expecting Reverse Matching

# User wants "en-US", you only have "en"
AcceptLanguage.parse("en-US").match(:en)
# => nil (NOT :en!)

If your app only supports generic :en, make sure to also accept generic language ranges, or encourage users to configure broader preferences.

Pitfall 2: Confusing Similar Codes

# "no" (Norwegian) vs "nb" (Norwegian Bokmål)
AcceptLanguage.parse("no").match(:nb)
# => nil (different language codes)

# "zh" (Chinese) vs "zhx" (not a real code, but illustrates the point)
AcceptLanguage.parse("zh").match(:zhx)
# => nil (no hyphen boundary)

Pitfall 3: Order of Available Languages

parser = AcceptLanguage.parse("en")

# Different order = different result
parser.match(:"en-US", :"en-GB")  # => :"en-US"
parser.match(:"en-GB", :"en-US")  # => :"en-GB"

Be intentional about the order of your available locales.

What's Next?

Working with Wildcards — Match any language with *
Excluding Languages — Reject languages with q=0
BCP 47 Language Tags — Understand scripts, regions, and variants

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic Filtering Explained

Table of Contents

Basic Filtering Explained

The Core Rule

Exact Matching

Prefix Matching

The Hyphen Boundary Rule

One-Way Matching

Visual Diagram

Case Insensitivity

Matching Multiple Available Languages

Real-World Scenarios

Scenario 1: Regional Fallback

Scenario 2: Script Variants

Scenario 3: Language Without Region

Common Pitfalls

Pitfall 1: Expecting Reverse Matching

Pitfall 2: Confusing Similar Codes

Pitfall 3: Order of Available Languages

What's Next?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally