A HACS integration that scrapes text pdf files from the web (available via http: and https: only). Using a combination of regex and limited templates it creates sensors that are then updated using polling.
Configuration via Homeassistant UI.
The recommended way to install this is via HACS:
-
Click on HACS in the Homeassistant side bar
-
Click on the three dots in the upper right-hand corner and select "Custom repositories."
-
In the form enter:
- Respository:
iluvdata/pdf_scrape - Select "Integration" as "Type"
- Respository:
Copy the pdf_scrape directory to the custom_components directory of your Homeassistant Instance.
Add the intergration to Home Assistant:
You should be prompted enter a name (optional), url (required), and the polling interval (mininum 30s). Long intervals are recommended as pdf files tend to be static and you don't want to be blocked for too frequent of requests or overburden your system with unecessary downloads or updates.
Each pdf will create a Device that will be listed under "Devices that don't belong to a sub-entry" with a timestamp sensor "Last Modified" containing the date of the last updated of the PDF document. The source of this data will be contained in an attribute source_of_date along with a MD5 checksum (MD5_checksum). The source of this date (listed in priority):
- PDF Metadata (last modified date)
- HTTP Server Response Header (
last-modified) - Initial load time if the above were missing
- Changes in the document MD5 checksum (subsequent updates of the document)
Click "Add Service" in the integration's configuration screen.
Go to the intergation's dashboard.
Click on the three dots to the right of your intergation's entry name. On the menu click "+ Add Search Target" to start the configuration flow for the search target. This should be intuitive.
This will create a subentry under the PDF configuration entry for the individual sensor that you've created.