Skip to content

us/crw

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

342 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fastCRW — Self-hosted, Rust-native web crawler & scraper for AI agents

The open-source alternative to Firecrawl. One static binary, ~50 MB RAM idle, Firecrawl-compatible REST API (/v1/scrape, /v1/crawl, /v1/extract, /v1/map, /v1/search) plus first-class MCP. Self-host free under AGPL-3.0, or hit our managed API at api.fastcrw.com. Reproducible 63.74% truth-recall on the public 1,000-URL dataset (diagnose_3way.py, 2026-05-08) — see fastcrw.com/benchmarks. Built in Rust because every millisecond of agent latency compounds.

crates.io CI License GitHub Stars fastcrw.com

Follow on X Follow on LinkedIn Join our Discord

Works with: Claude Code · Cursor · Windsurf · Cline · Copilot · Continue.dev · Codex · Gemini CLI


Why fastCRW?

  • Rust-native, single static binary — no Redis, no Node.js, no Python venv, no headless-browser sidecar in the request path. One binary, one config file, one process.
  • ~50 MB RAM idle — leaves headroom on a $5 VPS. Browser-render-first stacks (Firecrawl, Crawl4AI) carry a Chromium heap baseline measured in hundreds of MB before a single request lands.
  • Firecrawl-compatible drop-in — same /v1/scrape, /v1/crawl, /v1/extract, /v1/map, /v1/search endpoints with compatible request/response shapes. Swap the base URL and keep your code.
  • AGPL-3.0 open core + managed option — self-host free, or point at api.fastcrw.com for managed proxy network, dashboard, and SLA without the AGPL obligations on your application code.

Comparison Table

Qualitative positioning vs. the three most-cited alternatives. Numerical claims trace to the inline sources noted; everything else is descriptive.

fastCRW Firecrawl Crawl4AI Spider
Language Rust Node.js + Playwright Python + Playwright Rust
License AGPL-3.0 (commercial avail.) AGPL-3.0 (commercial avail.) Apache-2.0 Source-available / commercial (spider.cloud)
Self-host install size Single static binary (~8 MB) Multi-container (~500 MB+ image) ~2 GB image (browser bundled) Managed-first; self-host via crate
Memory baseline (idle) ~50 MB Large (Chromium heap) Large (Chromium heap) Light (Rust)
Firecrawl-compat API Yes — /v1/scrape, /v1/crawl, /v1/extract, /v1/map, /v1/search Native No No
MCP server Built-in (crw-mcp) Separate package Community add-on No first-party
Hosted option api.fastcrw.com (BYOK or managed) firecrawl.dev None official spider.cloud (primary product)
Reproducible public benchmark Yes — 63.74% truth-recall on 1,000-URL dataset (diagnose_3way.py, 2026-05-08) Vendor-published only Vendor-published only Vendor-published only

Pricing/spec cells where claimed link to the vendor page; everything else is the qualitative architectural shape, not a comparison number.


Quickstart

Hit the managed API at api.fastcrw.com, or self-host the same binary.

# /v1/scrape — URL → markdown / HTML / JSON / links
curl -X POST https://api.fastcrw.com/v1/scrape \
  -H "Authorization: Bearer $CRW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com","formats":["markdown"]}'
# /v1/extract — structured JSON from a URL via a JSON Schema
curl -X POST https://api.fastcrw.com/v1/extract \
  -H "Authorization: Bearer $CRW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url":"https://example.com",
    "schema":{
      "type":"object",
      "properties":{"title":{"type":"string"}}
    }
  }'
# /v1/crawl — async multi-page job (returns a job id; poll with /v1/crawl/:id)
curl -X POST https://api.fastcrw.com/v1/crawl \
  -H "Authorization: Bearer $CRW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://docs.example.com","maxDepth":2,"maxPages":50}'
# Self-host (no auth, localhost) — single docker command
docker run -p 3000:3000 ghcr.io/us/crw
curl http://localhost:3000/v1/scrape \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com"}'

Other install paths (each documented under Install further down):

npx crw-mcp                           # zero install — runs the embedded engine
pip install crw                        # Python SDK (auto-downloads binary)
brew install us/crw/crw                # Homebrew
cargo install crw-cli                  # Cargo
curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh | sh

Why Rust?

Cold start is sub-second and the resident memory ceiling is bounded by the crawl queue, not by a JavaScript runtime or a headless browser parked in the background. An agent that issues N scrapes per task pays the network floor N times — anything you add on top (process spawn, JIT warmup, browser navigation overhead) multiplies. Pushing the request-path language down to Rust strips that surcharge out of every call. The same property lets one static binary saturate a $5 VPS instead of needing a multi-container compose stack, which is why the idle footprint is in the tens of MB rather than the hundreds.


MCP + SDK quickstart

fastCRW ships a built-in MCP server so any MCP-compatible agent (Claude Code, Cursor, Windsurf, Cline, Continue.dev, Codex, Gemini CLI) can call scraping tools without bespoke glue. Embedded mode runs the engine in-process — no server, no API key, no setup. The crw Python SDK and the crw-mcp Node binary both shell to the same Rust core.

npm install -g crw-mcp          # MCP server (Node wrapper)
pip install crw                 # Python SDK (auto-downloads binary)
claude mcp add crw -- npx crw-mcp                                          # Claude Code, embedded
claude mcp add -e CRW_API_URL=https://api.fastcrw.com -e CRW_API_KEY=… \
  crw -- npx crw-mcp                                                       # Claude Code, managed

Per-client config recipes (Claude Desktop, Cursor, Windsurf, Cline, Continue.dev) live under docs.fastcrw.com/mcp-clients/.


Self-host vs Managed

Self-host (free) Managed — api.fastcrw.com
Best when You want full data residency, AGPL is fine, you can run your own proxy strategy, latency to your infra matters more than ours. You want zero infra, a global proxy network, a dashboard, usage metering, and AGPL carve-out for closed-source product code.
Install docker run -p 3000:3000 ghcr.io/us/crw or cargo install crw-server. Sign up at fastcrw.com — 500 free credits, no card.
Search Bundled SearXNG sidecar (docker compose up). Managed search backend.
Proxy rotation Bring your own pool via env vars. Built-in.
Cost $0 + your hosting bill. From $13/mo; pricing on fastcrw.com/pricing.
License obligations AGPL-3.0 applies if you expose the API to third parties. AGPL carve-out included.

The binary is the same in both modes — you can develop against your self-hosted instance and ship to managed without code changes.


Install

MCP server (crw-mcp) — recommended for AI agents

npx crw-mcp                              # zero install (npm)
pip install crw                          # Python SDK (auto-downloads binary)
brew install us/crw/crw-mcp              # Homebrew
cargo install crw-mcp                    # Cargo
docker run -i ghcr.io/us/crw crw-mcp     # Docker

CLI (crw) — scrape URLs from your terminal

brew install us/crw/crw

# One-line install (auto-detects OS & arch):
curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh | CRW_BINARY=crw sh

# APT (Debian/Ubuntu):
curl -fsSL https://apt.fastcrw.com/gpg.key | sudo gpg --dearmor -o /usr/share/keyrings/crw.gpg
echo "deb [signed-by=/usr/share/keyrings/crw.gpg] https://apt.fastcrw.com stable main" \
  | sudo tee /etc/apt/sources.list.d/crw.list
sudo apt update && sudo apt install crw

cargo install crw-cli

API server (crw-server) — Firecrawl-compatible REST API

For serving multiple apps, other languages (Node.js, Go, Java), or as a shared microservice.

brew install us/crw/crw-server

# One-line install:
curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh | CRW_BINARY=crw-server sh

# Docker:
docker run -p 3000:3000 ghcr.io/us/crw

Docker Compose ships with lightpanda by default; chrome is opt-in:

docker compose up -d                                         # http + lightpanda
docker compose --profile heavy up -d                         # + chrome failover
docker compose -f docker-compose.yml \
  -f docker-compose.stealth.yml --profile stealth up -d      # browserless stealth tier

See the self-hosting guide for production hardening, auth, reverse proxy, and resource tuning.


API endpoints

Method Endpoint Description
POST /v1/scrape Scrape a single URL, optionally with LLM extraction or summary
POST /v1/crawl Start async BFS crawl (returns job ID)
GET /v1/crawl/:id Check crawl status and retrieve results
DELETE /v1/crawl/:id Cancel a running crawl job
POST /v1/map Discover all URLs on a site
POST /v1/extract Structured JSON extraction from a URL via JSON Schema
POST /v1/search Web search via SearXNG sidecar, with optional content scraping
GET /health Health check (no auth required)
POST /mcp Streamable HTTP MCP transport

Full reference at docs.fastcrw.com/#rest-api. The Firecrawl compatibility matrix (field-by-field diff) lives in COMPATIBILITY-firecrawl.md.


Benchmark

Reproduce it yourself first — the canonical harness is diagnose_3way.py (matches truth text against md + strip_md_links(md), applied identically to all three tools — a fairness control, not a looser number):

cd ~/coding/crw/crw-opencore
docker compose -f docker-compose.yml -f docker-compose.override.yml \
               -f docker-compose.stealth.yml --profile stealth up -d
docker start crawl4ai-bench
cd ~/coding/crw/competitors/firecrawl && docker compose up -d

cd ~/coding/crw/crw-opencore
uv run python bench/diagnose_3way.py \
  --max-urls 1000 --tools crw,crawl4ai,firecrawl \
  --concurrency 5 --timeout 120 \
  --out bench/server-runs/diag3w-1000-full.jsonl

3-way scrape benchmark, full 1,000-URL run on Firecrawl's scrape-content-dataset-v1 (diagnose_3way.py, 2026-05-08, concurrency 5, timeout 120s):

Metric fastCRW crawl4ai Firecrawl
Truth-recall (522/819 labeled URLs) 63.74% 59.95% 56.04%
Scrape-success (of 1,000) 877 (87.7%) 835 (83.5%) 897 (89.7%)
Thrown errors (3,000 requests) 0 0 0
p50 latency 1914 ms 1916 ms 2305 ms
p90 latency 14157 ms 4754 ms 6937 ms

Single 1,000-URL run (N=1,000 crushes variance; the 150-URL subset oscillated ±0.83pp). The 63.74% denominator is 819 labeled/matchable URLs — not 3,000 requests, not 1,000. Read the 87.7% scrape-success adjacent to "0 errors": 12.3% returned no usable content without throwing. fastCRW's p50 beats Firecrawl; its p90 is the worst of the three — the chrome-stealth fallback that recovers the URLs the others miss is also why the tail is worst. We publish the full distribution because the recall is worth the tail.

Full result of record: bench/server-runs/RESULT_3WAY_1000_FULL.md.


SDKs and integrations

Python

pip install crw
from crw import CrwClient

# Managed (includes web search):
client = CrwClient(api_url="https://api.fastcrw.com", api_key="YOUR_API_KEY")
# Local (embedded, no server needed):
# client = CrwClient()

result = client.scrape("https://example.com", formats=["markdown", "links"])
pages = client.crawl("https://docs.example.com", max_depth=2, max_pages=50)
urls = client.map("https://example.com")
results = client.search("AI news", limit=10, sources=["web", "news"])

Requires Python 3.9+. Local mode auto-downloads crw-mcp on first use.

Community SDKs

Node.js: no official SDK yet — use the REST API directly or npx crw-mcp for MCP. SDK examples →

Frameworks & platforms

CrewAI · LangChain · Agno · Dify · n8n · Flowise

All integrations →


Architecture

┌─────────────────────────────────────────────┐
│                 crw-server                  │
│         Axum HTTP API + Auth + MCP          │
├──────────┬──────────┬───────────────────────┤
│ crw-crawl│crw-extract│    crw-renderer      │
│ BFS crawl│ HTML→MD   │  HTTP + CDP(WS)      │
│ robots   │ LLM/JSON  │  LightPanda/Chrome   │
│ sitemap  │ clean/read│  auto-detect SPA     │
├──────────┴──────────┴───────────────────────┤
│                 crw-core                    │
│        Types, Config, Errors                │
└─────────────────────────────────────────────┘
Crate Description
crw-core Core types, config, and error handling
crw-renderer HTTP + CDP browser rendering engine
crw-extract HTML → markdown/plaintext extraction
crw-crawl Async BFS crawler with robots.txt & sitemap
crw-server Axum API server (Firecrawl-compatible)
crw-mcp MCP stdio server (embedded + proxy mode)
crw-cli Standalone CLI (crw binary, no server)

Full architecture docs →


Security

  • SSRF protection — blocks loopback, private IPs, cloud metadata (169.254.x.x), IPv6 mapped addresses, and non-HTTP schemes (file://, data:)
  • Auth — optional Bearer token with constant-time comparison
  • robots.txt — RFC 9309 compliant with wildcard patterns
  • Rate limiting — token-bucket algorithm, returns 429 with error_code
  • Resource limits — max body 1 MB, max crawl depth 10, max pages 1,000

Full security docs →


Contributing

Contributions are welcome — issues and PRs both.

  1. Fork the repository
  2. Install pre-commit hooks: make hooks
  3. Create your feature branch (git checkout -b feat/my-feature)
  4. Commit your changes (git commit -m 'feat: add my feature')
  5. Push to the branch (git push origin feat/my-feature)
  6. Open a Pull Request

The pre-commit hook runs the same checks as CI (cargo fmt, cargo clippy, cargo test). Run manually with make check.

contributors

License

fastCRW is open source under AGPL-3.0. If you embed fastCRW in a closed-source product or expose it as a hosted service to third parties and you can't comply with AGPL's source-availability requirements, the managed offering at fastcrw.com includes a commercial carve-out, and standalone commercial licenses are available on request — write to hello@fastcrw.com.


Links


It is the sole responsibility of end users to respect websites' policies when scraping. Users are advised to adhere to applicable privacy policies and terms of use. By default, fastCRW respects robots.txt directives.

About

Fast, lightweight Firecrawl alternative in Rust. Web scraper, crawler & search API with MCP server for AI agents. Drop-in Firecrawl-compatible API (/v1/scrape, /v1/crawl, /v1/search). 2.3x faster than Tavily, 1.5x faster than Firecrawl in 1K-URL benchmarks. 6 MB RAM, single binary. Self-host or use managed cloud.

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Packages

 
 
 

Contributors