Skip to content

Thanane15M/scrapling-skill

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Claude Skill Scrapling MIT License

scrapling-skill

A Claude Skill for Scrapling — an adaptive web-scraping framework that goes from a single request to a full-scale async crawl.

BeautifulSoup4 is roughly 780× slower than Scrapling on the official parsing benchmark, yet most Python scraping tutorials still default to it. The official Scrapling docs cover the API surface. This skill covers what to actually do with it — and is verified against the real v0.4.9 API, not from memory.

Why this skill exists

Three high-value capabilities aren't obvious from the README:

  • Self-healing selectors — a two-phase workflow (auto_save then adaptive), not a magic flag.
  • Fetcher selection — knowing when to use Fetcher vs StealthyFetcher vs DynamicFetcher (and their Session/Async variants) means reading several doc pages. This skill ships a one-glance decision tree.
  • MCP + spiders — the native MCP server, multi-session spiders, lifecycle hooks (on_scraped_item, is_blocked, retry_blocked_request), and pause/resume crawling, with copy-paste-correct invocations.

What's covered

  • Fetcher selection decision tree (HTTP / stealth / dynamic × one-shot / session / async)
  • Self-healing selectors with the correct two-phase auto_saveadaptive workflow
  • Spider architecture — concurrency, download delays, real lifecycle hooks, multi-session routing
  • Correct ProxyRotator usage (proxy_rotator=, not proxy=rotator.next())
  • Native MCP server wiring (scrapling mcp, stdio + HTTP transports)
  • CLI usage for one-off scraping without writing a script
  • BeautifulSoup → Scrapling migration cheat sheet
  • A gotchas checklist (the scrapling install prerequisite, the v0.4.9 proxy-leak fix, etc.)

Install

# As a Claude Skill
claude skill install https://github.com/Thanane15M/scrapling-skill
# or copy SKILL.md into .claude/skills/scrapling/

# The underlying library (pin the version)
pip install "scrapling[fetchers]==0.4.9"
scrapling install   # required before any browser-based fetcher

Versioning

This skill targets Scrapling v0.4.x (verified on 0.4.9). The 0.4 line introduced breaking API changes (async spiders, ProxyRotator). If you're on 0.3.x, upgrade before using these patterns.

A note on the upstream project

Scrapling is authored by Karim Shoair (D4Vinci) and licensed BSD-3-Clause. It's intended for educational and research use — respect each target site's robots.txt, terms of service, and applicable data-protection law (e.g. GDPR). This skill repository is MIT-licensed; the Scrapling library is not.

License

MIT

About

Claude Skill for Scrapling — adaptive web scraping with self-healing selectors.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors