fix: March 2026 integration test fixes — 4 councils repaired + diagnostics#1899
fix: March 2026 integration test fixes — 4 councils repaired + diagnostics#1899
Conversation
…tCouncil, WirralCouncil - date parsing, null handling, input.json config
📝 WalkthroughWalkthroughThis PR documents integration test results from 2026-03-28 and implements corresponding infrastructure changes. It updates test result visualization to display detailed error categorization, fixes test data for one council, and improves three council scrapers to handle multiple collection dates and missing data robustly. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1899 +/- ##
=======================================
Coverage 86.67% 86.67%
=======================================
Files 9 9
Lines 1141 1141
=======================================
Hits 989 989
Misses 152 152 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@uk_bin_collection/map.html`:
- Around line 105-128: The popup content currently injects unescaped fields
(e.g., tr.error_summary, tr.error_category, name, wiki.module, wiki.wiki_name)
into the HTML passed to layer.bindPopup, allowing arbitrary HTML injection; add
an HTML-escaping helper (e.g., htmlEscape) and apply it to all interpolated
user-sourced values used when building statusText/extra and the final popup
string (references: tr.error_summary, tr.error_category, name, wiki.module,
wiki.wiki_name, and the layer.bindPopup call) so that bindPopup receives only
escaped text; alternatively construct the popup via a DOM element and use
setContent with text nodes to avoid HTML interpretation.
In `@uk_bin_collection/uk_bin_collection/councils/RotherDistrictCouncil.py`:
- Around line 68-69: The current check in RotherDistrictCouncil (inside the
method that iterates collection rows) uses "if not date or 'no data' in
date.lower(): continue", which silently skips empty/missing dates and can mask
page-structure changes; change this so only the explicit upstream sentinel ("no
data" case) is ignored, while truly missing or empty dates cause an explicit
failure: keep the "'no data' in date.lower()" branch to continue, but replace
the "not date" branch with an explicit error/exception (or processLogger.error +
raise) so the scraper fails fast when date is missing; update the relevant
method in class RotherDistrictCouncil where date is parsed (the loop referencing
the date variable) to implement this behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0ebf340b-ef03-4d79-98a4-e3c28811cd3f
📒 Files selected for processing (7)
integration_test_results.mduk_bin_collection/map.htmluk_bin_collection/tests/generate_map_test_results.pyuk_bin_collection/tests/input.jsonuk_bin_collection/uk_bin_collection/councils/BaberghDistrictCouncil.pyuk_bin_collection/uk_bin_collection/councils/MidSuffolkDistrictCouncil.pyuk_bin_collection/uk_bin_collection/councils/RotherDistrictCouncil.py
| let statusText, extra = ''; | ||
| if (status === 'pass') { | ||
| statusText = '✅ Covered (Test Passed)'; | ||
| } else if (status === 'fail') { | ||
| statusText = '🟠 Covered (Test Failed)'; | ||
| if (tr?.error_category) { | ||
| const cat = tr.error_category.replace(/_/g, ' '); | ||
| extra = `<br>Error: <em>${cat}</em>`; | ||
| } | ||
| if (tr?.error_summary) { | ||
| const summary = tr.error_summary.length > 100 | ||
| ? tr.error_summary.substring(0, 100) + '…' | ||
| : tr.error_summary; | ||
| extra += `<br><small>${summary}</small>`; | ||
| } | ||
| } else { | ||
| statusText = '✅ Covered (No test result)'; | ||
| } | ||
| layer.bindPopup( | ||
| `<strong>${name}</strong><br>` + | ||
| `Module: <code>${wiki.module || ''}</code><br>` + | ||
| `Status: ${statusText}${extra}<br>` + | ||
| `<a href="${wiki.url}" target="_blank">📘 ${wiki.wiki_name}</a>` | ||
| ); |
There was a problem hiding this comment.
Escape popup content before building HTML.
tr.error_summary is sourced from JUnit failure text. Injecting it directly into bindPopup() turns the popup into an HTML sink, so scraper error messages can render arbitrary markup in the map artifact.
🛡️ Proposed fix
+ const escapeHtml = str => String(str)
+ .replace(/&/g, '&')
+ .replace(/</g, '<')
+ .replace(/>/g, '>')
+ .replace(/"/g, '"')
+ .replace(/'/g, ''');
+
...
- extra = `<br>Error: <em>${cat}</em>`;
+ extra = `<br>Error: <em>${escapeHtml(cat)}</em>`;
...
- extra += `<br><small>${summary}</small>`;
+ extra += `<br><small>${escapeHtml(summary)}</small>`;
...
- `<strong>${name}</strong><br>` +
- `Module: <code>${wiki.module || ''}</code><br>` +
+ `<strong>${escapeHtml(name)}</strong><br>` +
+ `Module: <code>${escapeHtml(wiki.module || '')}</code><br>` +
`Status: ${statusText}${extra}<br>` +
- `<a href="${wiki.url}" target="_blank">📘 ${wiki.wiki_name}</a>`
+ `<a href="${wiki.url}" target="_blank" rel="noopener noreferrer">📘 ${escapeHtml(wiki.wiki_name)}</a>`📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let statusText, extra = ''; | |
| if (status === 'pass') { | |
| statusText = '✅ Covered (Test Passed)'; | |
| } else if (status === 'fail') { | |
| statusText = '🟠 Covered (Test Failed)'; | |
| if (tr?.error_category) { | |
| const cat = tr.error_category.replace(/_/g, ' '); | |
| extra = `<br>Error: <em>${cat}</em>`; | |
| } | |
| if (tr?.error_summary) { | |
| const summary = tr.error_summary.length > 100 | |
| ? tr.error_summary.substring(0, 100) + '…' | |
| : tr.error_summary; | |
| extra += `<br><small>${summary}</small>`; | |
| } | |
| } else { | |
| statusText = '✅ Covered (No test result)'; | |
| } | |
| layer.bindPopup( | |
| `<strong>${name}</strong><br>` + | |
| `Module: <code>${wiki.module || ''}</code><br>` + | |
| `Status: ${statusText}${extra}<br>` + | |
| `<a href="${wiki.url}" target="_blank">📘 ${wiki.wiki_name}</a>` | |
| ); | |
| const escapeHtml = str => String(str) | |
| .replace(/&/g, '&') | |
| .replace(/</g, '<') | |
| .replace(/>/g, '>') | |
| .replace(/"/g, '"') | |
| .replace(/'/g, '''); | |
| let statusText, extra = ''; | |
| if (status === 'pass') { | |
| statusText = '✅ Covered (Test Passed)'; | |
| } else if (status === 'fail') { | |
| statusText = '🟠 Covered (Test Failed)'; | |
| if (tr?.error_category) { | |
| const cat = tr.error_category.replace(/_/g, ' '); | |
| extra = `<br>Error: <em>${escapeHtml(cat)}</em>`; | |
| } | |
| if (tr?.error_summary) { | |
| const summary = tr.error_summary.length > 100 | |
| ? tr.error_summary.substring(0, 100) + '…' | |
| : tr.error_summary; | |
| extra += `<br><small>${escapeHtml(summary)}</small>`; | |
| } | |
| } else { | |
| statusText = '✅ Covered (No test result)'; | |
| } | |
| layer.bindPopup( | |
| `<strong>${escapeHtml(name)}</strong><br>` + | |
| `Module: <code>${escapeHtml(wiki.module || '')}</code><br>` + | |
| `Status: ${statusText}${extra}<br>` + | |
| `<a href="${wiki.url}" target="_blank" rel="noopener noreferrer">📘 ${escapeHtml(wiki.wiki_name)}</a>` | |
| ); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@uk_bin_collection/map.html` around lines 105 - 128, The popup content
currently injects unescaped fields (e.g., tr.error_summary, tr.error_category,
name, wiki.module, wiki.wiki_name) into the HTML passed to layer.bindPopup,
allowing arbitrary HTML injection; add an HTML-escaping helper (e.g.,
htmlEscape) and apply it to all interpolated user-sourced values used when
building statusText/extra and the final popup string (references:
tr.error_summary, tr.error_category, name, wiki.module, wiki.wiki_name, and the
layer.bindPopup call) so that bindPopup receives only escaped text;
alternatively construct the popup via a DOM element and use setContent with text
nodes to avoid HTML interpretation.
| if not date or "no data" in date.lower(): | ||
| continue |
There was a problem hiding this comment.
Don't swallow missing dates here.
"No data found" is a valid upstream sentinel, but not date also covers missing spans / empty text when the page shape changes. Silently continue-ing there can return a partial schedule instead of failing the scraper.
🔧 Proposed fix
- if not date or "no data" in date.lower():
- continue
+ if not date:
+ raise ValueError(
+ f"Missing collection date for {bin_type!r} at UPRN {user_uprn}"
+ )
+ if "no data" in date.lower():
+ continueBased on learnings: when parsing council bin collection data, prefer explicit failures over silent defaults so format changes are detected early.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@uk_bin_collection/uk_bin_collection/councils/RotherDistrictCouncil.py` around
lines 68 - 69, The current check in RotherDistrictCouncil (inside the method
that iterates collection rows) uses "if not date or 'no data' in date.lower():
continue", which silently skips empty/missing dates and can mask page-structure
changes; change this so only the explicit upstream sentinel ("no data" case) is
ignored, while truly missing or empty dates cause an explicit failure: keep the
"'no data' in date.lower()" branch to continue, but replace the "not date"
branch with an explicit error/exception (or processLogger.error + raise) so the
scraper fails fast when date is missing; update the relevant method in class
RotherDistrictCouncil where date is parsed (the loop referencing the date
variable) to implement this behavior.
Summary
Ran full integration test suite (334 councils) and fixed 4 broken scrapers. Also improved the test results map and diagnostics tooling.
Fixes (4 councils, all tested and passing)
Fri 27 Mar 2026, Mon 27 Apr 2026). Now splits on comma and parses each date.input.jsonusedpaonkey but test harness expectshouse_number. Renamed the key.Tooling Improvements
generate_map_test_results.py— added--detailedflag that outputs error categories and summaries per council.map.html— now loads detailed test results JSON, shows module name, error category, and error summary in popups for failed councils.Remaining 46 Failures (diagnosed, documented in
integration_test_results.md)Test Results
Summary by CodeRabbit
Documentation
Bug Fixes
New Features