Replies: 2 comments 3 replies
-
|
I originally wrote the Leeds parser (and some others, I forget), but the reason Selenium was introduced was to get around firewalls that councils (like Leeds) run when you try emulate browser behaviour repeatedly. |
Beta Was this translation helpful? Give feedback.
-
|
@PaulBrack nice work. I never really wanted selenium in the project as it adds a big layer of setup for the user, however there are some that need temp creds and api keys which only make sense if accessing like a human does. I'm cool with reducing the need for selenium if we can provide a stable way to get the data that won't use a cred or key that will expire. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I went through the test input list and targeted Selenium based scrapers to see if there were any I could quickly find that coalesced on GET endpoints where data could be looked up without doing anything like mocking authentication. Of the exactly 100 Selenium based scrapers, I was able to immediately identify 11 with such endpoints.
I am posting them here before re-writing any parsers using them to double-check my thinking before rewriting anything.
Notes:
GET Endpoints:
GET https://www.lbbd.gov.uk/rest/bin/<UPRN>GET https://area.southnorfolkandbroadland.gov.uk/MyArea.Data={"Uprn":"<UPRN>","Address":"<POSTCODE>"}GET https://wasterecyclingapi.eastriding.gov.uk/api/RecyclingData/CollectionsData?APIKey=ekBWR8tSiv6qwMo31REEeTZ5FAiMNB&Licensee=BinCollectionWebTeam&Postcode=<POSTCODE>GET https://www.enfield.gov.uk/_design/integrations/bartec/find-my-collection/rest/schedule?uprn=<UPRN>GET https://api.leeds.gov.uk/public/waste/v1/BinsDays?uprn=<UPRN>&startDate=2026-01-12&endDate=2026-06-29Ocp-Apim-Subscription-Key: ad8dd80444fe45fcad376f82cf9a5ab4GET https://shared-frontends.leeds.gov.uk/CheckYourBinDay/dist/dist/assets/index.jskey = re.search(r'apiKey\s*:\s*["\']([a-f0-9]{32})["\']', js, re.I).group(1)(you'll want to null check though)GET https://www.richmond.gov.uk/my_richmond?pid=<UPRN>GET https://my.redbridge.gov.uk/RecycleRefuseUPRN=<UPRN>GET https://midulsterbincalendar.azurewebsites.net/api/collectiondates/<UPRN>GET https://my.nwleics.gov.uk/my-property-finder?address_id=nwl<UPRN>GET https://report.peterborough.gov.uk/waste/<POSTCODE>:<UPRN>GET https://www.renfrewshire.gov.uk/bins-and-recycling/bin-collection/bin-collection-calendar/check-your-bin-collection-day/view/<UPRN>POST Endpoints:
POST https://www1.arun.gov.uk/when-are-my-bins-collected/selectdata={"address": <UPRN>}What now?
Put in issues for these tagged as 'improvement' maybe?
Beta Was this translation helpful? Give feedback.
All reactions