Skip to content

V0.2.0

Choose a tag to compare

@meirk-brd meirk-brd released this 13 May 14:20

v0.2.0 - Scraper Studio AI

Build and run Bright Data scrapers from the CLI. Two new commands under a new scraper group.

brightdata scraper create <url> <description>

Build a custom scraper from a natural-language description. Wraps the Scraper Studio AI Flow: creates the scraper template, triggers AI generation, polls until done.

brightdata scraper create https://example.com/product/1 \
  "Extract title, price, and image URL from this product page"

Returns a collector_id you reuse with scraper run. Defaults to a placeholder webhook delivery target you reconfigure in the Bright Data web UI; override with --deliver-webhook.

Flags: --name, --deliver-webhook, --timeout, -o, --json, --pretty, --timing, -k.

brightdata scraper run <collector_id> <url>

Run a scraper against a URL and get the data back. Three execution paths, picked automatically:

  • Async + poll (default) — triggers /dca/trigger_immediate, polls /dca/get_result until ready. Right for most jobs.
  • Sync (--sync) — one-shot /dca/crawl with a 25–50s server-side cap. Right for fast pages where you want to skip polling entirely. On server-side timeout, prints the response_id so you can re-run without --sync to recover.
  • Auto-fallback to batch — if the realtime endpoint reports the page limit was exceeded (paginated listings, infinite scroll, etc.), the CLI switches to the batch endpoint (/dca/trigger → poll /dca/dataset) with a longer poll interval and a 1-hour default timeout. No flag required.
# Default async + poll
brightdata scraper run c_mp3tuab31lswoxvpws https://www.amazon.com/dp/B08N5WRWNW --pretty

# Sync for fast pages
brightdata scraper run c_mp3tuab31lswoxvpws https://example.com/p/1 --sync

# Large/paginated URL — falls back to batch automatically
brightdata scraper run c_mp3tuab31lswoxvpws \
  "https://www.ycombinator.com/companies?batch=Spring%202026" --pretty

Flags: --sync, --sync-timeout, --timeout, --name, --version, -o, --json, --pretty, --timing, -k.

Implementation notes

  • Errors surface the relevant collector_id / response_id / collection_id so partial state is recoverable from the web UI.
  • TTY output is human-readable; piped / --json / --output always emit JSON.
  • 45 new unit tests covering pure helpers, both run modes, the page-limit detector, and the fallback wiring.

Not included (planned)

  • Self-healing (scraper refactor) — reuses the same scraper command group.
  • Resuming a job by response_id / collection_id after Ctrl+C.

Full diff: v0.1.8...v0.2.0