Add skip_dns option for normalization without MX lookups by gmr · Pull Request #16 · gmr/email-normalize

gmr · 2026-03-13T17:20:25Z

Summary

Add skip_dns parameter to Normalizer and the sync normalize() wrapper for offline/fast-path normalization without DNS MX lookups
Add DomainMap in providers.py mapping 24 well-known email domains (Gmail, Outlook, iCloud, Fastmail, ProtonMail, Yahoo, Yandex, Zoho) to their provider classes
Provider-specific rules (plus addressing, period stripping) still apply when using skip_dns=True

Test plan

All 24 domain mappings tested individually
Rule application verified (e.g., u.s.e.r+tag@gmail.com → user@gmail.com)
Unknown domains return mailbox_provider=None and empty mx_records
DNS is never called when skip_dns=True (mock raises on call)
Both sync wrapper and async Normalizer paths tested
All 52 tests pass, 99% coverage

Closes #9

Summary by CodeRabbit

New Features
- Added a skip_dns option to the email normalization API to bypass DNS MX lookups and use domain-based provider resolution.
- Introduced a domain-to-provider mapping to determine mailbox providers when DNS is skipped.
Tests
- Added tests covering skip_dns behavior across multiple providers, Gmail/Microsoft rules, async/concurrent normalization, and asserting DNS is not invoked when skipped.
Chores
- Updated CI workflows for release publishing and testing triggers.

Add a static DomainMap of well-known email domains to providers, a skip_dns parameter to Normalizer and the sync wrapper, and domain-based provider lookup for offline/fast-path use cases. Closes #9 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Reformat email_normalize/__init__.py and tests/test_normalize.py - Remove unused `providers` import in tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-03-13T19:40:55Z

Warning

Rate limit exceeded

@gmr has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 15 minutes and 39 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b4f705a8-e27e-4c3a-967c-4c44a0ef9aa1

📥 Commits

Reviewing files that changed from the base of the PR and between 6cc928b and 6532e2d.

📒 Files selected for processing (2)

.github/workflows/deploy.yaml
email_normalize/__init__.py

📝 Walkthrough

Walkthrough

Adds a skip_dns option to normalization: when true, DNS MX lookups are bypassed and provider detection uses a new DomainMap via _lookup_provider_by_domain; when false, existing DNS-based provider resolution remains unchanged.

Changes

Cohort / File(s)	Summary
Core normalization logic `email_normalize/__init__.py`	Added `skip_dns` parameter to `Normalizer.__init__()` and public `normalize()`; conditional DNS resolver initialization; new static `_lookup_provider_by_domain(domain_part)` that queries `DomainMap` when `skip_dns=True`; control flow and docstrings updated.
Provider domain mapping `email_normalize/providers.py`	Added public `DomainMap: dict[str, type[MailboxProvider]]` mapping common provider domains (e.g., `gmail.com`, `outlook.com`, `yahoo.com`, `icloud.com`, `yandex.com`, `zoho.com`, etc.) to their MailboxProvider classes for domain-based lookup.
Tests `tests/test_normalize.py`	Added `SkipDNSTestCase` to validate `skip_dns=True` across multiple providers, async parity, and that DNS resolution is not invoked when skipped; patches `Normalizer.mx_records` to assert non-use and imports `asyncio`.
CI workflows `.github/workflows/deploy.yaml`, `.github/workflows/testing.yaml`	Adjusted release trigger and PyPI publish steps; removed some path filters and tightened testing workflow behavior (coverage upload and timeout changes).

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Normalizer
    participant DomainMap as DomainMap
    participant DNSResolver as DNS Resolver
    participant Provider

    rect rgba(100,150,200,0.5)
    Note over Client,Provider: skip_dns=True path
    Client->>Normalizer: normalize(email, skip_dns=True)
    Normalizer->>DomainMap: _lookup_provider_by_domain(domain)
    DomainMap-->>Normalizer: MailboxProvider class | None
    Normalizer->>Provider: apply provider rules (if found)
    Provider-->>Client: normalized result
    end

    rect rgba(200,150,100,0.5)
    Note over Client,Provider: skip_dns=False path
    Client->>Normalizer: normalize(email, skip_dns=False)
    Normalizer->>DNSResolver: resolve MX records
    DNSResolver-->>Normalizer: mx_records
    Normalizer->>Provider: _lookup_provider(mx_records)
    Provider-->>Normalizer: MailboxProvider class | None
    Normalizer->>Provider: apply provider rules (if found)
    Provider-->>Client: normalized result
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Modernize project tooling and packaging #15: Modifies email_normalize/__init__.py and email_normalize/providers.py around provider definitions and typing; likely related to provider lookup and typing modernizations.

Poem

🐰 I skip the MX and hop straight to the map,
Domains lined up tidy in my lap —
Offline I find the provider with cheer,
Normalize fast, no DNS near.
Hooray — addresses neat and clear. 🥕

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 7.32% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Out of Scope Changes check	❓ Inconclusive	Workflow file changes in .github/workflows/ appear peripheral to the core skip_dns feature and warrant clarification on their necessity.	Clarify whether workflow modifications (deploy.yaml and testing.yaml) are essential to this feature or should be addressed separately.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding a skip_dns option for normalization that bypasses MX lookups.
Linked Issues check	✅ Passed	All requirements from issue `#9` are met: skip_dns parameter added, provider rules applied when domain recognized, and DNS lookups bypassed in offline mode.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/skip-dns

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

email_normalize/providers.py (1)
88-113: Rackspace is missing from DomainMap.

The Providers list includes Rackspace, but DomainMap has no entries for it. This is likely intentional since Rackspace primarily hosts custom domains rather than public email domains. However, if users with skip_dns=True expect consistent provider detection across both paths, this asymmetry could cause confusion.

Consider adding a brief comment explaining why Rackspace is excluded, or verify this is the intended behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@email_normalize/providers.py` around lines 88 - 113, DomainMap currently
lacks entries for the Rackspace provider while the Providers list includes
Rackspace, causing inconsistent detection when skip_dns=True; update the code by
either adding Rackspace domains to DomainMap (if there are known public
Rackspace domains to map) or — preferably — add a clear inline comment next to
DomainMap explaining that Rackspace is intentionally omitted because it
primarily hosts custom domains and therefore cannot be reliably mapped by public
domains, and reference the Providers list and the skip_dns=True behavior so
future readers understand the asymmetry (check symbols: DomainMap, Providers,
Rackspace, skip_dns).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@email_normalize/__init__.py`:
- Around line 104-109: The mx_records method will raise AttributeError when
Normalizer is constructed with skip_dns=True because __init__ does not set
self._resolver; update mx_records to check the skip flag (self._skip_dns) at the
start and return an empty list (or a well-documented sentinel) immediately when
DNS is disabled, avoiding any access to self._resolver; ensure the change
references the mx_records method and the skip_dns/__init__ state, update related
type hints/docs/tests to reflect the early-return behavior.

---

Nitpick comments:
In `@email_normalize/providers.py`:
- Around line 88-113: DomainMap currently lacks entries for the Rackspace
provider while the Providers list includes Rackspace, causing inconsistent
detection when skip_dns=True; update the code by either adding Rackspace domains
to DomainMap (if there are known public Rackspace domains to map) or —
preferably — add a clear inline comment next to DomainMap explaining that
Rackspace is intentionally omitted because it primarily hosts custom domains and
therefore cannot be reliably mapped by public domains, and reference the
Providers list and the skip_dns=True behavior so future readers understand the
asymmetry (check symbols: DomainMap, Providers, Rackspace, skip_dns).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 843a1218-dfce-4cc0-9e23-d531a42cdd2c

📥 Commits

Reviewing files that changed from the base of the PR and between 5bf5014 and 77adec7.

📒 Files selected for processing (3)

email_normalize/__init__.py
email_normalize/providers.py
tests/test_normalize.py

Return an empty list early if skip_dns is True, preventing an AttributeError from accessing the unset _resolver attribute. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@email_normalize/__init__.py`:
- Around line 166-171: When _skip_dns=True the code path can still call
_local_part_as_hostname which invokes tldextract.extract and may perform a
network fetch; configure a network-disabled extractor (e.g. create
OFFLINE_TLD_EXTRACT via tldextract.TLDExtract with suffix_list_urls=None/cache
disabled) and replace direct calls to tldextract.extract(domain_part) inside
_local_part_as_hostname (and any other places) with
OFFLINE_TLD_EXTRACT(domain_part) so no network access occurs when DNS is
skipped; reference symbols: _skip_dns, _local_part_as_hostname,
LOCAL_PART_AS_HOSTNAME, tldextract.extract, OFFLINE_TLD_EXTRACT.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6a328bdf-4bba-4193-99fa-cb39b88503a1

📥 Commits

Reviewing files that changed from the base of the PR and between 77adec7 and 15a204c.

📒 Files selected for processing (1)

email_normalize/__init__.py

- deploy: trigger on release creation, add twine check, set pypi environment with URL - testing: remove paths-ignore from pull_request, remove timeout, remove conditional on codecov upload Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace tldextract.extract() with a module-level TLDExtract instance configured with no suffix list URLs and no cache dir, using only the bundled PSL snapshot. This prevents surprise HTTP requests during TLD extraction, particularly in the skip_dns code path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/deploy.yaml:
- Around line 11-13: The environment.url value in the workflow uses a
nonstandard PyPI path; update the environment block (environment: name: pypi
url:) to the canonical project URL by replacing
https://pypi.org/p/email-normalize with the correct PyPI project URL, e.g.
https://pypi.org/project/email-normalize or
https://pypi.org/project/email-normalize/ so the environment.url points to the
official project page.
- Around line 15-19: The workflow's explicit job permissions only set "id-token:
write" which can cause actions/checkout@v5 to fail because unspecified
permissions default to none; update the permissions block to include "contents:
read" alongside "id-token: write" so the checkout step (actions/checkout@v5) has
the required read access; locate the top-level permissions: block and add the
contents: read entry next to id-token: write.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2827b519-2f15-4ca3-aaf8-db007d6bb7f3

📥 Commits

Reviewing files that changed from the base of the PR and between 15a204c and 6cc928b.

📒 Files selected for processing (2)

.github/workflows/deploy.yaml
.github/workflows/testing.yaml

- Use canonical PyPI project URL format (pypi.org/project/...) - Add contents: read permission required by actions/checkout@v5 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gmr · 2026-03-13T20:57:17Z

PR Monitor: Fixed deploy workflow (PyPI URL format and contents:read permission). All 3 review threads resolved. Waiting for CodeRabbit re-review.

gmr and others added 2 commits March 13, 2026 13:20

Fix ruff formatting and remove unused import

77adec7

- Reformat email_normalize/__init__.py and tests/test_normalize.py - Remove unused `providers` import in tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai Bot reviewed Mar 13, 2026

View reviewed changes

Comment thread email_normalize/__init__.py

Guard mx_records() against use when skip_dns is enabled

15a204c

Return an empty list early if skip_dns is True, preventing an AttributeError from accessing the unset _resolver attribute. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai Bot reviewed Mar 13, 2026

View reviewed changes

Comment thread email_normalize/__init__.py

gmr and others added 2 commits March 13, 2026 16:49

coderabbitai Bot reviewed Mar 13, 2026

View reviewed changes

Comment thread .github/workflows/deploy.yaml Outdated

Comment thread .github/workflows/deploy.yaml

Fix deploy workflow: correct PyPI URL and add contents:read permission

6532e2d

- Use canonical PyPI project URL format (pypi.org/project/...) - Add contents: read permission required by actions/checkout@v5 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gmr merged commit f44b3ab into main Mar 13, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add skip_dns option for normalization without MX lookups#16

Add skip_dns option for normalization without MX lookups#16
gmr merged 6 commits into
mainfrom
feature/skip-dns

gmr commented Mar 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 13, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

gmr commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gmr commented Mar 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gmr commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gmr commented Mar 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 13, 2026 •

edited

Loading