curllm = curl + LLM

AI Cost Tracking

This project uses AI-generated code. Total cost: $17.4471 with 169 AI commits.

Generated on 2026-06-29 using openrouter/qwen/qwen3-coder-next

curllm = curl + LLM

Intelligent Browser Automation with Local LLMs

Quick Start • Features • Examples • Documentation • API

🎯 What is curllm?

curllm is a powerful CLI tool that combines browser automation with local LLMs (like Ollama's Qwen, Llama, Mistral) to intelligently extract data, fill forms, and automate web workflows - all running locally on your machine with complete privacy.

🆕 v2 LLM-DSL Architecture! Dynamic element detection, semantic goal understanding, no hardcoded selectors. 388 tests passing.

# Extract products with prices from any e-commerce site
curllm "https://shop.example.com" -d "Find all products under $100"

# Fill contact forms automatically
curllm --stealth "https://example.com/contact" -d "Fill form: name=John, email=john@example.com"

# Extract all emails from a page
curllm "https://example.com" -d "extract all email addresses"

✨ Features

Feature	Description
🧠 Local LLM	Works with 8GB GPUs (Qwen 2.5, Llama 3, Mistral)
🎯 Smart Extraction	LLM-guided DOM analysis - no hardcoded selectors
📝 Form Automation	Auto-fill forms with intelligent field mapping
🥷 Stealth Mode	Bypass anti-bot detection
👁️ Visual Mode	See browser actions in real-time
🔍 BQL Support	Browser Query Language for structured queries
📊 Export Formats	JSON, CSV, HTML, XLS output
🔒 Privacy-First	Everything runs locally - no cloud APIs needed

🧠 LLM-DSL Architecture

curllm v2 uses LLM-DSL (LLM Domain Specific Language) - a dynamic approach that eliminates hardcoded selectors:

┌─────────────────────────────────────────────────────────────┐
│                     LLM-DSL Flow                            │
├─────────────────────────────────────────────────────────────┤
│  1. Goal Detection (semantic)                               │
│     "Find RAM DDR5" → FIND_PRODUCTS                         │
│                                                             │
│  2. Strategy Selection                                      │
│     FIND_PRODUCTS → use search flow                         │
│     FIND_CART → find link by semantic scoring               │
│                                                             │
│  3. Element Finding (LLM-first)                             │
│     LLM analysis → Statistical scoring → Fallback           │
│                                                             │
│  4. Dynamic Selector Generation                             │
│     Analyze DOM → Score elements → Generate selector        │
└─────────────────────────────────────────────────────────────┘

Key Benefits

Feature	Traditional	LLM-DSL
Selectors	Hardcoded CSS/XPath	Dynamic generation
Keywords	Static lists	Semantic analysis
Language	English only	Multi-language (PL, EN)
Maintenance	Manual updates	Self-adapting

🚀 Quick Start

Installation

pip install -U curllm
curllm-setup      # One-time setup (installs Playwright browsers)
curllm-doctor     # Verify installation

Requirements

Python 3.10+
GPU: NVIDIA with 6-8GB VRAM (RTX 3060/4060) or CPU mode
Ollama: For local LLM inference

# Install Ollama (if not installed)
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull qwen2.5:7b

📖 Examples

Extract Data

# Extract all links
curllm "https://example.com" -d "extract all links"

# Extract emails
curllm "https://example.com/contact" -d "extract all email addresses"
# Output: {"emails": ["info@example.com", "sales@example.com"]}

# Extract products with price filter
curllm --stealth "https://shop.example.com" -d "Find all products under 500zł"

Form Automation

# Fill contact form
curllm --visual --stealth "https://example.com/contact" \
  -d "Fill form: name=John Doe, email=john@example.com, message=Hello"

# Login automation
curllm --visual "https://app.example.com/login" \
  -d '{"instruction":"Login", "credentials":{"user":"admin", "pass":"secret"}}'

Export Results

# Export to CSV
curllm "https://example.com" -d "extract all products" --csv -o products.csv

# Export to HTML
curllm "https://example.com" -d "extract all links" --html -o links.html

# Export to Excel
curllm "https://example.com" -d "extract all data" --xls -o data.xlsx

Screenshots

# Take screenshot
curllm "https://example.com" -d "screenshot"

# Visual mode (watch browser)
curllm --visual "https://example.com" -d "extract all links"

BQL Queries

curllm --bql -d 'query {
  page(url: "https://news.ycombinator.com") {
    title
    links: select(css: "a.titlelink") { text url: attr(name: "href") }
  }
}'

🌐 Web Interface

curllm-web start   # Start web UI at http://localhost:5000
curllm-web status  # Check status
curllm-web stop    # Stop server

Features:

🎨 Modern responsive UI
📝 19 pre-configured prompts
📊 Real-time log viewer
📤 File upload support

🔧 Configuration

Environment variables (.env):

CURLLM_MODEL=qwen2.5:7b          # LLM model
CURLLM_OLLAMA_HOST=http://localhost:11434
CURLLM_HEADLESS=true             # Run browser headlessly
CURLLM_STEALTH_MODE=false        # Anti-detection
CURLLM_LOCALE=en-US              # Browser locale

🏗️ Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         curllm CLI                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌────────────────┐    ┌────────────────┐    ┌───────────────┐  │
│  │  DSL Executor  │───▶│ Knowledge Base │───▶│ Strategy YAML │  │
│  │  (Orchestrator)│    │   (SQLite)     │    │    Files      │  │
│  └────────────────┘    └────────────────┘    └───────────────┘  │
│          │                                                      │
│          ▼                                                      │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │                    DOM Toolkit (Pure JS)                   │ │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌────────────┐  │ │
│  │  │Structure │  │ Patterns │  │Selectors │  │   Prices   │  │ │
│  │  │ Analyzer │  │ Detector │  │Generator │  │  Detector  │  │ │
│  │  └──────────┘  └──────────┘  └──────────┘  └────────────┘  │ │
│  └────────────────────────────────────────────────────────────┘ │
│          │                                                      │
│          ▼                                                      │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │              Playwright Browser Engine                     │ │
│  │         (Chromium with Stealth & Anti-Detection)           │ │
│  └────────────────────────────────────────────────────────────┘ │
│          │                                                      │
│          ▼                                                      │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │                 Ollama / LiteLLM                           │ │
│  │      (Local LLM: Qwen 2.5, Llama 3, Mistral, GPT, etc)     │ │
│  └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Key Components

Component	Description	LLM Calls
URL Resolver	Smart navigation with goal detection	0-1
Goal Detector	Semantic intent understanding	0-1
Element Finder	Dynamic selector generation	0-1
DOM Toolkit	Pure JavaScript atomic queries	0
SPA Hydration	Wait for CSR/SPA content	0

📖 Full Architecture Documentation →

🧬 DSL System (Strategy-Based Extraction)

Note: The YAML DSL system works alongside the newer LLM-DSL. YAML strategies are used for known sites with proven extraction patterns, while LLM-DSL handles unknown sites dynamically.

curllm automatically learns and saves successful extraction strategies as YAML files:

# dsl/ceneo_products.yaml - Auto-generated from successful extraction
url_pattern: "*.ceneo.pl/*"
task: extract_products
algorithm: statistical_containers

selector: div.product-card
fields:
  name: h3.title
  price: span.price
  url: a[href]

metadata:
  success_rate: 0.95
  use_count: 42

How It Works

First visit - LLM-DSL dynamically analyzes page, extracts data
Successful - Strategy saved to dsl/*.yaml, recorded in Knowledge Base
Next visit - Knowledge Base loads saved strategy (fast path)
Unknown site - Falls back to LLM-DSL dynamic discovery

┌─────────────────────────────────────────────────────────┐
│                   Request Flow                          │
├─────────────────────────────────────────────────────────┤
│  URL → Knowledge Base lookup                            │
│        │                                                │
│        ├─ Found? → Load YAML strategy (fast)            │
│        │                                                │
│        └─ Not found? → LLM-DSL dynamic (flexible)       │
│                        │                                │
│                        └─ Success? → Save to YAML       │
└─────────────────────────────────────────────────────────┘

Algorithms

Algorithm	Best For	Speed
`statistical_containers`	Product grids	⚡ Fast
`pattern_detection`	Lists, tables	⚡ Fast
`llm_guided`	Complex layouts	🐢 Slower
`form_fill`	Contact forms	⚡ Fast

📖 DSL System Documentation →

🤝 Multi-Provider LLM Support

curllm supports multiple LLM providers via LiteLLM:

from curllm_core import LLMConfig

# OpenAI
config = LLMConfig(provider="openai/gpt-4o-mini")

# Anthropic
config = LLMConfig(provider="anthropic/claude-3-haiku-20240307")

# Google Gemini
config = LLMConfig(provider="gemini/gemini-2.0-flash")

# Local Ollama (default)
config = LLMConfig(provider="ollama/qwen2.5:7b")

📚 Documentation

Getting Started

Architecture

🏗️ System Architecture
🧬 DSL System - Strategy-based extraction
⚛️ DOM Toolkit - Pure JS queries
🧩 Components - Module overview
🔗 LLM-DSL URL Resolution - Smart URL navigation

Reference

🔌 API Reference
🤖 MCP (agents) — curllm-mcp for Cursor / Claude Desktop
🛠️ Configuration
❓ Troubleshooting

🧪 Development

# Clone and install
git clone https://github.com/wronai/curllm.git
cd curllm
make install

# Run tests (388 tests passing)
make test

# Run URL resolver examples
cd examples/url_resolver && python run_all.py

# Run with Docker
docker compose up -d

📄 License

Apache License 2.0 - see LICENSE

🙏 Acknowledgments

Built with:

Playwright - Browser automation
Ollama - Local LLM inference
LiteLLM - Multi-provider LLM support
Flask - Web framework

⭐ Star this repo if you find it useful!

Made with ❤️ by wronai

License

Licensed under Apache-2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 181 Commits
.github/workflows		.github/workflows
.planfile		.planfile
.projektor		.projektor
TODO		TODO
bql		bql
captcha		captcha
curllm_core		curllm_core
curllm_logs		curllm_logs
curllm_mcp		curllm_mcp
curllm_server		curllm_server
curllm_web		curllm_web
curlx_pkg		curlx_pkg
docs		docs
dsl		dsl
examples		examples
extension		extension
flows		flows
forms		forms
functions		functions
monitoring		monitoring
output		output
pricing		pricing
project		project
scripts		scripts
static/js		static/js
tests		tests
tools		tools
webtest		webtest
.env.example		.env.example
.gitignore		.gitignore
.nojekyll		.nojekyll
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
Dockerfile.dev		Dockerfile.dev
Dockerfile.test		Dockerfile.test
LICENSE		LICENSE
Makefile		Makefile
QUICKSTART.sh		QUICKSTART.sh
README.md		README.md
README_STREAMWARE.md		README_STREAMWARE.md
REFACTORING_COMPLETE.md		REFACTORING_COMPLETE.md
TODO.md		TODO.md
VERSION		VERSION
curllm		curllm
curllm-smart		curllm-smart
curlx		curlx
docker-compose.test.yml		docker-compose.test.yml
docker-compose.yml		docker-compose.yml
examples.py		examples.py
examples_streamware.py		examples_streamware.py
goal.yaml		goal.yaml
hardcoded_report.json		hardcoded_report.json
hardcoded_report.txt		hardcoded_report.txt
img.png		img.png
img_1.png		img_1.png
install.sh		install.sh
migration_plan.md		migration_plan.md
prefact.yaml		prefact.yaml
project.functions.toon		project.functions.toon
project.sh		project.sh
project.toon-schema.json		project.toon-schema.json
projektor.yaml		projektor.yaml
pyproject.toml		pyproject.toml
pyqual.yaml		pyqual.yaml
requirements.txt		requirements.txt
restart.sh		restart.sh
run-20251128-185810.md		run-20251128-185810.md
run-20251129-104938.md		run-20251129-104938.md
start-web-full.sh		start-web-full.sh
test.json		test.json
test_extraction_debug.py		test_extraction_debug.py
tree.sh		tree.sh
uv.lock		uv.lock
web_prompts.json		web_prompts.json
wordpress_batch.py		wordpress_batch.py

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

AI Cost Tracking

curllm = curl + LLM

🎯 What is curllm?

✨ Features

🧠 LLM-DSL Architecture

Key Benefits

🚀 Quick Start

Installation

Requirements

📖 Examples

Extract Data

Form Automation

Export Results

Screenshots

BQL Queries

🌐 Web Interface

🔧 Configuration

🏗️ Architecture

Key Components

🧬 DSL System (Strategy-Based Extraction)

How It Works

Algorithms

🤝 Multi-Provider LLM Support

📚 Documentation

Getting Started

Architecture

Reference

🧪 Development

📄 License

🙏 Acknowledgments

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages