fix: duplicate cover image on imported posts#13
Merged
Conversation
Imported (email-HTML) posts wrap the cover in <tr> table markup, so the leading-cover strip missed it and the cover showed twice (template + body). Now sanitize first, then strip the body's copy of the cover by matching the cover URL anywhere (not just the leading node). article() takes the cover URL; post_page passes it. Verified 0 duplicate covers across all 23 posts + regression test.
There was a problem hiding this comment.
Pull request overview
Fixes duplicate cover images in archived posts (notably imported email-table HTML) by adjusting the render pipeline so the body copy of the cover is removed after sanitization, and by threading the cover URL into the article renderer.
Changes:
- Update
render.article()to accept acoverURL, sanitize first, then remove a duplicate cover image from the body. - Pass
post["image"]intorender.article()from the post page template. - Add a regression test and regenerate affected archive HTML pages to remove the duplicated cover image.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_render.py | Adds regression test for duplicate cover removal in imported email-table HTML. |
| render.py | Adds _strip_cover() and updates article() pipeline to sanitize before stripping duplicate cover. |
| pages.py | Passes post cover URL into render.article() so duplicate stripping can be accurate. |
| archive/sauga/index.html | Regenerated output: removes duplicated cover image from body. |
| archive/rand-paradoksas/index.html | Regenerated output: removes duplicated cover image from body. |
| archive/nieko-nedarau-o-turiu-ko-noriu/index.html | Regenerated output: removes duplicated cover image from body. |
| archive/knygskaitys/index.html | Regenerated output: removes duplicated cover image from body. |
| archive/ivairios-suvestines/index.html | Regenerated output: removes duplicated cover image from body. |
| archive/darbuotoju-atranka/index.html | Regenerated output: removes duplicated cover image from body. |
| archive/darbo-imitacijos-rinka/index.html | Regenerated output reflecting updated sanitizer/strip order (no duplicate cover). |
| archive/dalinuosi-dar-vieno-analitiko-mintimis/index.html | Regenerated output: removes duplicated cover image from body. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+94
to
+96
| name = re.escape(cover.rsplit("/", 1)[-1]) | ||
| img = re.compile(r"<img\b[^>]*" + name + r"[^>]*>", re.IGNORECASE) | ||
| new = img.sub("", html, count=1) |
Comment on lines
+100
to
+101
| new = re.sub(r"<p>\s*</p>", "", new, count=1) | ||
| new = re.sub(r"<figure>\s*(?:<figcaption\b[^>]*>.*?</figcaption>\s*)?</figure>", "", new, count=1, flags=re.S) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
QA of
/archive/nieko-nedarau-o-turiu-ko-noriu/found the cover image rendered twice (template cover + a copy at the top of the body).Cause: imported posts are email-table HTML — the cover
<img>is wrapped in<tr id="content-blocks">, so the leading-cover strip (which expected a leading<figure>/<img>) missed it; the sanitizer then unwrapped the<tr>, leaving a duplicate.Fix: sanitize first, then strip the body's copy of the cover by matching the cover URL anywhere (not just position 0).
render.article(body, cover)now takes the cover;post_pagepassespost['image']. Verified 0 duplicate covers across all 23 posts; regression test added. 20 tests green. Squash-merge.