perf: avoid urljoin for absolute HTML links#10903
Merged
radoering merged 3 commits intoMay 17, 2026
Merged
Conversation
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The absolute URL detection currently relies on hard-coded string prefixes; consider using
urllib.parse.urlparse(href).scheme(or reusing a shared helper with the JSON Simple API) to avoid missing valid schemes (e.g.//example.org/...,ftp:) and to reduce the risk of the two code paths diverging. - Since
ABSOLUTE_LINK_PREFIXESis only used inside this module and is performance-related, you might want to annotate this with a short comment explaining why only these schemes are handled specially and why others still go throughurljoin, to make future modifications safer.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The absolute URL detection currently relies on hard-coded string prefixes; consider using `urllib.parse.urlparse(href).scheme` (or reusing a shared helper with the JSON Simple API) to avoid missing valid schemes (e.g. `//example.org/...`, `ftp:`) and to reduce the risk of the two code paths diverging.
- Since `ABSOLUTE_LINK_PREFIXES` is only used inside this module and is performance-related, you might want to annotate this with a short comment explaining why only these schemes are handled specially and why others still go through `urljoin`, to make future modifications safer.
## Individual Comments
### Comment 1
<location path="tests/repositories/link_sources/test_html.py" line_range="99" />
<code_context>
assert link.hashes == {"sha256": "abcd1234"}
+def test_absolute_url_skips_urljoin(
+ html_page_content: HTMLPageGetter, monkeypatch: pytest.MonkeyPatch
+) -> None:
</code_context>
<issue_to_address>
**suggestion (testing):** Consider parametrizing this test over all supported absolute URL prefixes
`ABSOLUTE_LINK_PREFIXES` includes `("http://", "https://", "file://")`, but this test only covers `https://`. Please parametrize it over all entries in `ABSOLUTE_LINK_PREFIXES` (or at least add explicit `http://` and `file://` cases) so we verify that `urljoin` is consistently skipped for every supported absolute scheme and remain robust if the tuple changes in the future.
Suggested implementation:
```python
@pytest.mark.parametrize("prefix", ABSOLUTE_LINK_PREFIXES)
def test_absolute_url_skips_urljoin(
html_page_content: HTMLPageGetter,
monkeypatch: pytest.MonkeyPatch,
prefix: str,
) -> None:
def fail_urljoin(base: str, url: str) -> str:
raise AssertionError("urljoin should not be called for absolute URLs")
monkeypatch.setattr(
"poetry.repositories.link_sources.html.urllib.parse.urljoin", fail_urljoin
)
anchor = (
f'<a href="{prefix}files.pythonhosted.org/packages/demo-0.1.whl">'
"demo-0.1.whl</a><br/>"
```
1. Ensure `ABSOLUTE_LINK_PREFIXES` is imported in this test module, for example:
`from poetry.repositories.link_sources.html import ABSOLUTE_LINK_PREFIXES, HTMLPageGetter`
or by extending the existing import that already brings in `HTMLPageGetter`.
2. If `pytest` is not yet imported under the name `pytest` in this file, add `import pytest` to the imports so that `@pytest.mark.parametrize` is available.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
80cbce6 to
7a8c413
Compare
radoering
requested changes
May 17, 2026
fc175ab to
56bc6a8
Compare
Member
|
@sourcery-ai review |
radoering
approved these changes
May 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This mirrors the JSON Simple API shortcut from #10896 for HTML Simple API pages: absolute file links do not need
urllib.parse.urljoin(), and PyPI-style repository pages commonly use absolute links.A small local benchmark over absolute
https://links showed the prefix check avoidingurljoin()is significantly faster:Pull Request Check List
Local checks: