Integration

A HACS integration that scrapes text pdf files from the web (available via http: and https: only). Using a combination of regex and limited templates it creates sensors that are then updated using polling.

Configuration via Homeassistant UI.

Requirements

Installation with HACS

The recommended way to install this is via HACS:

Semi-manual install

Click on HACS in the Homeassistant side bar
Click on the three dots in the upper right-hand corner and select "Custom repositories."
In the form enter:
1. Respository: iluvdata/pdf_scrape
2. Select "Integration" as "Type"

Manual Installation

Copy the pdf_scrape directory to the custom_components directory of your Homeassistant Instance.

Configuration

Configure a PDF

Add the intergration to Home Assistant:

You should be prompted enter a name (optional), url (required), and the polling interval (mininum 30s). Long intervals are recommended as pdf files tend to be static and you don't want to be blocked for too frequent of requests or overburden your system with unecessary downloads or updates.

Devices and Entities

Each pdf will create a Device that will be listed under "Devices that don't belong to a sub-entry" with a timestamp sensor "Last Modified" containing the date of the last updated of the PDF document. The source of this data will be contained in an attribute source_of_date along with a MD5 checksum (MD5_checksum). The source of this date (listed in priority):

PDF Metadata (last modified date)
HTTP Server Response Header (last-modified)
Initial load time if the above were missing
Changes in the document MD5 checksum (subsequent updates of the document)

Configure additional PDFs

Click "Add Service" in the integration's configuration screen.

Configure a sensor

Go to the intergation's dashboard.

Click on the three dots to the right of your intergation's entry name. On the menu click "+ Add Search Target" to start the configuration flow for the search target. This should be intuitive.

This will create a subentry under the PDF configuration entry for the individual sensor that you've created.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github/workflows		.github/workflows
custom_components/pdf_scrape		custom_components/pdf_scrape
.gitignore		.gitignore
README.md		README.md
hacs.json		hacs.json
logo.svg		logo.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Integration

Requirements

Installation with HACS

Semi-manual install

Manual Installation

Configuration

Configure a PDF

Devices and Entities

Configure additional PDFs

Configure a sensor

About

Uh oh!

Releases 4

Languages

iluvdata/pdf_scrape

Folders and files

Latest commit

History

Repository files navigation

Integration

Requirements

Installation with HACS

Semi-manual install

Manual Installation

Configuration

Configure a PDF

Devices and Entities

Configure additional PDFs

Configure a sensor

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 4

Languages