Add skip_dns option for normalization without MX lookups#16
Conversation
Add a static DomainMap of well-known email domains to providers, a skip_dns parameter to Normalizer and the sync wrapper, and domain-based provider lookup for offline/fast-path use cases. Closes #9 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Reformat email_normalize/__init__.py and tests/test_normalize.py - Remove unused `providers` import in tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds a Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Normalizer
participant DomainMap as DomainMap
participant DNSResolver as DNS Resolver
participant Provider
rect rgba(100,150,200,0.5)
Note over Client,Provider: skip_dns=True path
Client->>Normalizer: normalize(email, skip_dns=True)
Normalizer->>DomainMap: _lookup_provider_by_domain(domain)
DomainMap-->>Normalizer: MailboxProvider class | None
Normalizer->>Provider: apply provider rules (if found)
Provider-->>Client: normalized result
end
rect rgba(200,150,100,0.5)
Note over Client,Provider: skip_dns=False path
Client->>Normalizer: normalize(email, skip_dns=False)
Normalizer->>DNSResolver: resolve MX records
DNSResolver-->>Normalizer: mx_records
Normalizer->>Provider: _lookup_provider(mx_records)
Provider-->>Normalizer: MailboxProvider class | None
Normalizer->>Provider: apply provider rules (if found)
Provider-->>Client: normalized result
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
email_normalize/providers.py (1)
88-113: Rackspace is missing from DomainMap.The
Providerslist includesRackspace, butDomainMaphas no entries for it. This is likely intentional since Rackspace primarily hosts custom domains rather than public email domains. However, if users withskip_dns=Trueexpect consistent provider detection across both paths, this asymmetry could cause confusion.Consider adding a brief comment explaining why Rackspace is excluded, or verify this is the intended behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@email_normalize/providers.py` around lines 88 - 113, DomainMap currently lacks entries for the Rackspace provider while the Providers list includes Rackspace, causing inconsistent detection when skip_dns=True; update the code by either adding Rackspace domains to DomainMap (if there are known public Rackspace domains to map) or — preferably — add a clear inline comment next to DomainMap explaining that Rackspace is intentionally omitted because it primarily hosts custom domains and therefore cannot be reliably mapped by public domains, and reference the Providers list and the skip_dns=True behavior so future readers understand the asymmetry (check symbols: DomainMap, Providers, Rackspace, skip_dns).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@email_normalize/__init__.py`:
- Around line 104-109: The mx_records method will raise AttributeError when
Normalizer is constructed with skip_dns=True because __init__ does not set
self._resolver; update mx_records to check the skip flag (self._skip_dns) at the
start and return an empty list (or a well-documented sentinel) immediately when
DNS is disabled, avoiding any access to self._resolver; ensure the change
references the mx_records method and the skip_dns/__init__ state, update related
type hints/docs/tests to reflect the early-return behavior.
---
Nitpick comments:
In `@email_normalize/providers.py`:
- Around line 88-113: DomainMap currently lacks entries for the Rackspace
provider while the Providers list includes Rackspace, causing inconsistent
detection when skip_dns=True; update the code by either adding Rackspace domains
to DomainMap (if there are known public Rackspace domains to map) or —
preferably — add a clear inline comment next to DomainMap explaining that
Rackspace is intentionally omitted because it primarily hosts custom domains and
therefore cannot be reliably mapped by public domains, and reference the
Providers list and the skip_dns=True behavior so future readers understand the
asymmetry (check symbols: DomainMap, Providers, Rackspace, skip_dns).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 843a1218-dfce-4cc0-9e23-d531a42cdd2c
📒 Files selected for processing (3)
email_normalize/__init__.pyemail_normalize/providers.pytests/test_normalize.py
Return an empty list early if skip_dns is True, preventing an AttributeError from accessing the unset _resolver attribute. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@email_normalize/__init__.py`:
- Around line 166-171: When _skip_dns=True the code path can still call
_local_part_as_hostname which invokes tldextract.extract and may perform a
network fetch; configure a network-disabled extractor (e.g. create
OFFLINE_TLD_EXTRACT via tldextract.TLDExtract with suffix_list_urls=None/cache
disabled) and replace direct calls to tldextract.extract(domain_part) inside
_local_part_as_hostname (and any other places) with
OFFLINE_TLD_EXTRACT(domain_part) so no network access occurs when DNS is
skipped; reference symbols: _skip_dns, _local_part_as_hostname,
LOCAL_PART_AS_HOSTNAME, tldextract.extract, OFFLINE_TLD_EXTRACT.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6a328bdf-4bba-4193-99fa-cb39b88503a1
📒 Files selected for processing (1)
email_normalize/__init__.py
- deploy: trigger on release creation, add twine check, set pypi environment with URL - testing: remove paths-ignore from pull_request, remove timeout, remove conditional on codecov upload Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace tldextract.extract() with a module-level TLDExtract instance configured with no suffix list URLs and no cache dir, using only the bundled PSL snapshot. This prevents surprise HTTP requests during TLD extraction, particularly in the skip_dns code path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/deploy.yaml:
- Around line 11-13: The environment.url value in the workflow uses a
nonstandard PyPI path; update the environment block (environment: name: pypi
url:) to the canonical project URL by replacing
https://pypi.org/p/email-normalize with the correct PyPI project URL, e.g.
https://pypi.org/project/email-normalize or
https://pypi.org/project/email-normalize/ so the environment.url points to the
official project page.
- Around line 15-19: The workflow's explicit job permissions only set "id-token:
write" which can cause actions/checkout@v5 to fail because unspecified
permissions default to none; update the permissions block to include "contents:
read" alongside "id-token: write" so the checkout step (actions/checkout@v5) has
the required read access; locate the top-level permissions: block and add the
contents: read entry next to id-token: write.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2827b519-2f15-4ca3-aaf8-db007d6bb7f3
📒 Files selected for processing (2)
.github/workflows/deploy.yaml.github/workflows/testing.yaml
- Use canonical PyPI project URL format (pypi.org/project/...) - Add contents: read permission required by actions/checkout@v5 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
PR Monitor: Fixed deploy workflow (PyPI URL format and contents:read permission). All 3 review threads resolved. Waiting for CodeRabbit re-review. |
Summary
skip_dnsparameter toNormalizerand the syncnormalize()wrapper for offline/fast-path normalization without DNS MX lookupsDomainMapinproviders.pymapping 24 well-known email domains (Gmail, Outlook, iCloud, Fastmail, ProtonMail, Yahoo, Yandex, Zoho) to their provider classesskip_dns=TrueTest plan
u.s.e.r+tag@gmail.com→user@gmail.com)mailbox_provider=Noneand emptymx_recordsskip_dns=True(mock raises on call)Normalizerpaths testedCloses #9
Summary by CodeRabbit
New Features
Tests
Chores