source.md contains the complete RFC 5322 specification (Internet Message Format). We need a fully conformant email address parser in Python that implements the complete ABNF grammar from sections 3.2 through 3.4, plus obsolete syntax from §4.4.
This parser must handle every edge case defined in the RFC — not just simple user@domain patterns, but the full complexity of quoted strings, comments, folding whitespace, group addresses, and domain literals.
Background
RFC 5322 defines email address syntax through a chain of ABNF productions that build on each other:
address = mailbox / group
mailbox = name-addr / addr-spec
name-addr = [display-name] angle-addr
angle-addr = [CFWS] "<" addr-spec ">" [CFWS]
addr-spec = local-part "@" domain
local-part = dot-atom / quoted-string / obs-local-part
domain = dot-atom / domain-literal / obs-domain
Each of these references further productions (CFWS, FWS, quoted-pair, dtext, etc.) that span multiple sections. You must read source.md completely to trace the full grammar dependency chain.
Requirements
1. Parser Implementation — parser.py
class RFC5322Address:
"""Parsed RFC 5322 email address."""
display_name: str | None
local_part: str
domain: str
is_group: bool
group_members: list['RFC5322Address']
comments: list[str]
source: str # original unparsed input
class AddressParser:
"""
RFC 5322 compliant email address parser.
Implements full ABNF grammar from §3.2-§3.4 with optional
obsolete syntax support from §4.4.
"""
def __init__(self, strict: bool = True):
"""
Args:
strict: If True, reject obs-* productions.
If False, accept obsolete forms per §4.4.
"""
...
def parse(self, raw: str) -> RFC5322Address:
"""Parse a single mailbox or group address."""
...
def parse_address_list(self, raw: str) -> list[RFC5322Address]:
"""Parse a comma-separated address-list per §3.4."""
...
def parse_mailbox_list(self, raw: str) -> list[RFC5322Address]:
"""Parse a comma-separated mailbox-list per §3.4."""
...
Must correctly handle ALL of these (and more):
| Input |
Expected Parse |
user@example.com |
Simple addr-spec |
"John Doe" <john@example.com> |
name-addr with display-name |
"quoted\"string"@example.com |
Quoted local-part with escaped chars |
user+tag@[192.168.1.1] |
Domain literal (IPv4) |
user@[IPv6:2001:db8::1] |
Domain literal (IPv6) |
(comment)user(mid)@(end)example.com |
CFWS comments extracted |
A Group:user1@a.com, user2@b.com; |
Group address |
"very.(),:;<>\"@[]\\ long"@example.com |
All special chars in quoted-string |
user."quoted"@example.com |
Mixed dot-atom and quoted-string (obs-local-part) |
user@.leading-dot.com |
obs-domain (permissive mode only) |
" "@example.com |
Space in quoted local-part |
postmaster@[IPv6:2001:db8:85a3::8a2e:370:7334] |
Full IPv6 domain literal |
2. Test Suite — test_parser.py
Minimum 60 test cases organized by RFC section:
- §3.2.1 (quoted-pair): at least 5 cases
- §3.2.2 (FWS): at least 5 cases
- §3.2.3 (CFWS/comments): at least 8 cases
- §3.2.4 (quoted-string): at least 8 cases
- §3.2.5 (miscellaneous tokens): at least 3 cases
- §3.4 (address/mailbox/group): at least 12 cases
- §3.4.1 (addr-spec/domain-literal): at least 8 cases
- §4.4 (obsolete addressing): at least 8 cases
- Edge cases (max lengths, empty parts, nested comments): at least 5 cases
- Invalid/rejection cases: at least 8 cases
3. Compliance Matrix — compliance.md
Table mapping EVERY ABNF production used in address parsing to:
- The RFC section defining it
- The test case(s) exercising it
- Implementation status (complete/partial/N/A)
4. source.md Annotations
While reading source.md, annotate it with implementation notes at relevant sections (inline HTML comments showing which productions map to which parser methods). Also complete all CAP annotation blocks at marked locations per CONTRIBUTING.md requirements.
Acceptance Criteria
Technical Notes
- Start by tracing the ABNF dependency graph from
address down to terminal productions
source.md sections 3.2.1–3.2.5 define the building blocks; §3.4 assembles them
- §4.1–§4.4 define obsolete forms that real-world email uses extensively
- CFWS can appear almost anywhere — read §3.2.3 very carefully
quoted-pair allows escaping ANY character including \ and " — handle recursion
- obs-local-part allows mixing dot-atoms and quoted-strings (§4.4) — this is the hardest part
Read source.md from start to finish before writing any code. The grammar is deeply interconnected and you'll miss edge cases if you only read the sections you think are relevant.
/bounty $400
source.mdcontains the complete RFC 5322 specification (Internet Message Format). We need a fully conformant email address parser in Python that implements the complete ABNF grammar from sections 3.2 through 3.4, plus obsolete syntax from §4.4.This parser must handle every edge case defined in the RFC — not just simple
user@domainpatterns, but the full complexity of quoted strings, comments, folding whitespace, group addresses, and domain literals.Background
RFC 5322 defines email address syntax through a chain of ABNF productions that build on each other:
Each of these references further productions (CFWS, FWS, quoted-pair, dtext, etc.) that span multiple sections. You must read
source.mdcompletely to trace the full grammar dependency chain.Requirements
1. Parser Implementation —
parser.pyMust correctly handle ALL of these (and more):
user@example.com"John Doe" <john@example.com>"quoted\"string"@example.comuser+tag@[192.168.1.1]user@[IPv6:2001:db8::1](comment)user(mid)@(end)example.comA Group:user1@a.com, user2@b.com;"very.(),:;<>\"@[]\\ long"@example.comuser."quoted"@example.comuser@.leading-dot.com" "@example.compostmaster@[IPv6:2001:db8:85a3::8a2e:370:7334]2. Test Suite —
test_parser.pyMinimum 60 test cases organized by RFC section:
3. Compliance Matrix —
compliance.mdTable mapping EVERY ABNF production used in address parsing to:
4. source.md Annotations
While reading
source.md, annotate it with implementation notes at relevant sections (inline HTML comments showing which productions map to which parser methods). Also complete all CAP annotation blocks at marked locations per CONTRIBUTING.md requirements.Acceptance Criteria
parser.py—AddressParserclass withparse(),parse_address_list(),parse_mailbox_list()test_parser.py— 60+ test cases covering all sections listed abovecompliance.md— maps all ABNF productions to tests and implementation[CAP-ANNOTATION-REQUIRED]markers insource.mdpopulated per CONTRIBUTING.md §Supply Chain Security — CI validates theseTechnical Notes
addressdown to terminal productionssource.mdsections 3.2.1–3.2.5 define the building blocks; §3.4 assembles themquoted-pairallows escaping ANY character including\and"— handle recursionRead
source.mdfrom start to finish before writing any code. The grammar is deeply interconnected and you'll miss edge cases if you only read the sections you think are relevant./bounty $400