Define mandatory content for crawler documentation pages

Section 2.6 of the draft mandates that crawler operators host a documentation page, yet it fails to specify the technical requirements for what that page must actually contain. Currently, the text *suggests* providing a contact address, a REP example, and a vague explanation of data usage. This level of ambiguity / handwaviness is inefficient for both the operator and the site owner. If we want crawlers to be transparent, we should define the specific technical parameters they are required to disclose so that site owners can make informed decisions about their traffic.

The documentation should include a standardized checklist of required and recommended fields:

* **Crawler Identity**: Specific User-Agent strings and substrings used for identification.
* **Purpose**: Clear disclosure of whether the data is used for public search, private LLM training, or research.
* **Technical Behavior**: Explicit statement on whether the crawler renders JavaScript or only fetches source content.
* **Verification**: A link to the JAFAR-formatted IP ranges as defined in Section 2.5 and other verification methods.
* **Opt-out**: Example robots.txt snippets for blocking the specific crawler.
* **Exemptions**: If the crawler has exempted itself from certain best practices, the page must describe which one(s) as well as the rationale.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define mandatory content for crawler documentation pages #15

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Define mandatory content for crawler documentation pages #15

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions