V0.2.0
v0.2.0 - Scraper Studio AI
Build and run Bright Data scrapers from the CLI. Two new commands under a new scraper group.
brightdata scraper create <url> <description>
Build a custom scraper from a natural-language description. Wraps the Scraper Studio AI Flow: creates the scraper template, triggers AI generation, polls until done.
brightdata scraper create https://example.com/product/1 \
"Extract title, price, and image URL from this product page"Returns a collector_id you reuse with scraper run. Defaults to a placeholder webhook delivery target you reconfigure in the Bright Data web UI; override with --deliver-webhook.
Flags: --name, --deliver-webhook, --timeout, -o, --json, --pretty, --timing, -k.
brightdata scraper run <collector_id> <url>
Run a scraper against a URL and get the data back. Three execution paths, picked automatically:
- Async + poll (default) — triggers
/dca/trigger_immediate, polls/dca/get_resultuntil ready. Right for most jobs. - Sync (
--sync) — one-shot/dca/crawlwith a 25–50s server-side cap. Right for fast pages where you want to skip polling entirely. On server-side timeout, prints theresponse_idso you can re-run without--syncto recover. - Auto-fallback to batch — if the realtime endpoint reports the page limit was exceeded (paginated listings, infinite scroll, etc.), the CLI switches to the batch endpoint (
/dca/trigger→ poll/dca/dataset) with a longer poll interval and a 1-hour default timeout. No flag required.
# Default async + poll
brightdata scraper run c_mp3tuab31lswoxvpws https://www.amazon.com/dp/B08N5WRWNW --pretty
# Sync for fast pages
brightdata scraper run c_mp3tuab31lswoxvpws https://example.com/p/1 --sync
# Large/paginated URL — falls back to batch automatically
brightdata scraper run c_mp3tuab31lswoxvpws \
"https://www.ycombinator.com/companies?batch=Spring%202026" --prettyFlags: --sync, --sync-timeout, --timeout, --name, --version, -o, --json, --pretty, --timing, -k.
Implementation notes
- Errors surface the relevant
collector_id/response_id/collection_idso partial state is recoverable from the web UI. - TTY output is human-readable; piped /
--json/--outputalways emit JSON. - 45 new unit tests covering pure helpers, both run modes, the page-limit detector, and the fallback wiring.
Not included (planned)
- Self-healing (
scraper refactor) — reuses the samescrapercommand group. - Resuming a job by
response_id/collection_idafter Ctrl+C.
Full diff: v0.1.8...v0.2.0