Skip to content

Add skip_dns option for normalization without MX lookups#16

Merged
gmr merged 6 commits into
mainfrom
feature/skip-dns
Mar 13, 2026
Merged

Add skip_dns option for normalization without MX lookups#16
gmr merged 6 commits into
mainfrom
feature/skip-dns

Conversation

@gmr
Copy link
Copy Markdown
Owner

@gmr gmr commented Mar 13, 2026

Summary

  • Add skip_dns parameter to Normalizer and the sync normalize() wrapper for offline/fast-path normalization without DNS MX lookups
  • Add DomainMap in providers.py mapping 24 well-known email domains (Gmail, Outlook, iCloud, Fastmail, ProtonMail, Yahoo, Yandex, Zoho) to their provider classes
  • Provider-specific rules (plus addressing, period stripping) still apply when using skip_dns=True

Test plan

  • All 24 domain mappings tested individually
  • Rule application verified (e.g., u.s.e.r+tag@gmail.comuser@gmail.com)
  • Unknown domains return mailbox_provider=None and empty mx_records
  • DNS is never called when skip_dns=True (mock raises on call)
  • Both sync wrapper and async Normalizer paths tested
  • All 52 tests pass, 99% coverage

Closes #9

Summary by CodeRabbit

  • New Features

    • Added a skip_dns option to the email normalization API to bypass DNS MX lookups and use domain-based provider resolution.
    • Introduced a domain-to-provider mapping to determine mailbox providers when DNS is skipped.
  • Tests

    • Added tests covering skip_dns behavior across multiple providers, Gmail/Microsoft rules, async/concurrent normalization, and asserting DNS is not invoked when skipped.
  • Chores

    • Updated CI workflows for release publishing and testing triggers.

gmr and others added 2 commits March 13, 2026 13:20
Add a static DomainMap of well-known email domains to providers,
a skip_dns parameter to Normalizer and the sync wrapper, and
domain-based provider lookup for offline/fast-path use cases.

Closes #9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Reformat email_normalize/__init__.py and tests/test_normalize.py
- Remove unused `providers` import in tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 13, 2026

Warning

Rate limit exceeded

@gmr has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 15 minutes and 39 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b4f705a8-e27e-4c3a-967c-4c44a0ef9aa1

📥 Commits

Reviewing files that changed from the base of the PR and between 6cc928b and 6532e2d.

📒 Files selected for processing (2)
  • .github/workflows/deploy.yaml
  • email_normalize/__init__.py
📝 Walkthrough

Walkthrough

Adds a skip_dns option to normalization: when true, DNS MX lookups are bypassed and provider detection uses a new DomainMap via _lookup_provider_by_domain; when false, existing DNS-based provider resolution remains unchanged.

Changes

Cohort / File(s) Summary
Core normalization logic
email_normalize/__init__.py
Added skip_dns parameter to Normalizer.__init__() and public normalize(); conditional DNS resolver initialization; new static _lookup_provider_by_domain(domain_part) that queries DomainMap when skip_dns=True; control flow and docstrings updated.
Provider domain mapping
email_normalize/providers.py
Added public DomainMap: dict[str, type[MailboxProvider]] mapping common provider domains (e.g., gmail.com, outlook.com, yahoo.com, icloud.com, yandex.com, zoho.com, etc.) to their MailboxProvider classes for domain-based lookup.
Tests
tests/test_normalize.py
Added SkipDNSTestCase to validate skip_dns=True across multiple providers, async parity, and that DNS resolution is not invoked when skipped; patches Normalizer.mx_records to assert non-use and imports asyncio.
CI workflows
.github/workflows/deploy.yaml, .github/workflows/testing.yaml
Adjusted release trigger and PyPI publish steps; removed some path filters and tightened testing workflow behavior (coverage upload and timeout changes).

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Normalizer
    participant DomainMap as DomainMap
    participant DNSResolver as DNS Resolver
    participant Provider

    rect rgba(100,150,200,0.5)
    Note over Client,Provider: skip_dns=True path
    Client->>Normalizer: normalize(email, skip_dns=True)
    Normalizer->>DomainMap: _lookup_provider_by_domain(domain)
    DomainMap-->>Normalizer: MailboxProvider class | None
    Normalizer->>Provider: apply provider rules (if found)
    Provider-->>Client: normalized result
    end

    rect rgba(200,150,100,0.5)
    Note over Client,Provider: skip_dns=False path
    Client->>Normalizer: normalize(email, skip_dns=False)
    Normalizer->>DNSResolver: resolve MX records
    DNSResolver-->>Normalizer: mx_records
    Normalizer->>Provider: _lookup_provider(mx_records)
    Provider-->>Normalizer: MailboxProvider class | None
    Normalizer->>Provider: apply provider rules (if found)
    Provider-->>Client: normalized result
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 I skip the MX and hop straight to the map,
Domains lined up tidy in my lap —
Offline I find the provider with cheer,
Normalize fast, no DNS near.
Hooray — addresses neat and clear. 🥕

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.32% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Out of Scope Changes check ❓ Inconclusive Workflow file changes in .github/workflows/ appear peripheral to the core skip_dns feature and warrant clarification on their necessity. Clarify whether workflow modifications (deploy.yaml and testing.yaml) are essential to this feature or should be addressed separately.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding a skip_dns option for normalization that bypasses MX lookups.
Linked Issues check ✅ Passed All requirements from issue #9 are met: skip_dns parameter added, provider rules applied when domain recognized, and DNS lookups bypassed in offline mode.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/skip-dns
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
email_normalize/providers.py (1)

88-113: Rackspace is missing from DomainMap.

The Providers list includes Rackspace, but DomainMap has no entries for it. This is likely intentional since Rackspace primarily hosts custom domains rather than public email domains. However, if users with skip_dns=True expect consistent provider detection across both paths, this asymmetry could cause confusion.

Consider adding a brief comment explaining why Rackspace is excluded, or verify this is the intended behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@email_normalize/providers.py` around lines 88 - 113, DomainMap currently
lacks entries for the Rackspace provider while the Providers list includes
Rackspace, causing inconsistent detection when skip_dns=True; update the code by
either adding Rackspace domains to DomainMap (if there are known public
Rackspace domains to map) or — preferably — add a clear inline comment next to
DomainMap explaining that Rackspace is intentionally omitted because it
primarily hosts custom domains and therefore cannot be reliably mapped by public
domains, and reference the Providers list and the skip_dns=True behavior so
future readers understand the asymmetry (check symbols: DomainMap, Providers,
Rackspace, skip_dns).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@email_normalize/__init__.py`:
- Around line 104-109: The mx_records method will raise AttributeError when
Normalizer is constructed with skip_dns=True because __init__ does not set
self._resolver; update mx_records to check the skip flag (self._skip_dns) at the
start and return an empty list (or a well-documented sentinel) immediately when
DNS is disabled, avoiding any access to self._resolver; ensure the change
references the mx_records method and the skip_dns/__init__ state, update related
type hints/docs/tests to reflect the early-return behavior.

---

Nitpick comments:
In `@email_normalize/providers.py`:
- Around line 88-113: DomainMap currently lacks entries for the Rackspace
provider while the Providers list includes Rackspace, causing inconsistent
detection when skip_dns=True; update the code by either adding Rackspace domains
to DomainMap (if there are known public Rackspace domains to map) or —
preferably — add a clear inline comment next to DomainMap explaining that
Rackspace is intentionally omitted because it primarily hosts custom domains and
therefore cannot be reliably mapped by public domains, and reference the
Providers list and the skip_dns=True behavior so future readers understand the
asymmetry (check symbols: DomainMap, Providers, Rackspace, skip_dns).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 843a1218-dfce-4cc0-9e23-d531a42cdd2c

📥 Commits

Reviewing files that changed from the base of the PR and between 5bf5014 and 77adec7.

📒 Files selected for processing (3)
  • email_normalize/__init__.py
  • email_normalize/providers.py
  • tests/test_normalize.py

Comment thread email_normalize/__init__.py
Return an empty list early if skip_dns is True, preventing an
AttributeError from accessing the unset _resolver attribute.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@email_normalize/__init__.py`:
- Around line 166-171: When _skip_dns=True the code path can still call
_local_part_as_hostname which invokes tldextract.extract and may perform a
network fetch; configure a network-disabled extractor (e.g. create
OFFLINE_TLD_EXTRACT via tldextract.TLDExtract with suffix_list_urls=None/cache
disabled) and replace direct calls to tldextract.extract(domain_part) inside
_local_part_as_hostname (and any other places) with
OFFLINE_TLD_EXTRACT(domain_part) so no network access occurs when DNS is
skipped; reference symbols: _skip_dns, _local_part_as_hostname,
LOCAL_PART_AS_HOSTNAME, tldextract.extract, OFFLINE_TLD_EXTRACT.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6a328bdf-4bba-4193-99fa-cb39b88503a1

📥 Commits

Reviewing files that changed from the base of the PR and between 77adec7 and 15a204c.

📒 Files selected for processing (1)
  • email_normalize/__init__.py

Comment thread email_normalize/__init__.py
gmr and others added 2 commits March 13, 2026 16:49
- deploy: trigger on release creation, add twine check, set pypi
  environment with URL
- testing: remove paths-ignore from pull_request, remove timeout,
  remove conditional on codecov upload

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace tldextract.extract() with a module-level TLDExtract instance
configured with no suffix list URLs and no cache dir, using only the
bundled PSL snapshot. This prevents surprise HTTP requests during
TLD extraction, particularly in the skip_dns code path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/deploy.yaml:
- Around line 11-13: The environment.url value in the workflow uses a
nonstandard PyPI path; update the environment block (environment: name: pypi
url:) to the canonical project URL by replacing
https://pypi.org/p/email-normalize with the correct PyPI project URL, e.g.
https://pypi.org/project/email-normalize or
https://pypi.org/project/email-normalize/ so the environment.url points to the
official project page.
- Around line 15-19: The workflow's explicit job permissions only set "id-token:
write" which can cause actions/checkout@v5 to fail because unspecified
permissions default to none; update the permissions block to include "contents:
read" alongside "id-token: write" so the checkout step (actions/checkout@v5) has
the required read access; locate the top-level permissions: block and add the
contents: read entry next to id-token: write.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2827b519-2f15-4ca3-aaf8-db007d6bb7f3

📥 Commits

Reviewing files that changed from the base of the PR and between 15a204c and 6cc928b.

📒 Files selected for processing (2)
  • .github/workflows/deploy.yaml
  • .github/workflows/testing.yaml

Comment thread .github/workflows/deploy.yaml Outdated
Comment thread .github/workflows/deploy.yaml
- Use canonical PyPI project URL format (pypi.org/project/...)
- Add contents: read permission required by actions/checkout@v5

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@gmr
Copy link
Copy Markdown
Owner Author

gmr commented Mar 13, 2026

PR Monitor: Fixed deploy workflow (PyPI URL format and contents:read permission). All 3 review threads resolved. Waiting for CodeRabbit re-review.

@gmr gmr merged commit f44b3ab into main Mar 13, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Option to normalize without mxrecords lookup

1 participant