Implement controlled documentation scraper (URL ingestion)

Create a scraper that fetches documentation from a given URL.

Requirements:
- extract meaningful content only (ignore nav, ads, etc.)
- normalize into structured format
- prevent raw HTML storage
- support re-fetch for updates

This should NOT blindly scrape entire websites.
Only targeted pages should be processed.