Automating Walmart Product Scraping:
OpenbrowserAI-Top40Walmart.mp4
OpenBrowserAI Automatic Flight Booking:
OpenBrowserAI.-.Automatic.Flight.Booking.mp4
AI-powered browser automation using LangGraph and CDP (Chrome DevTools Protocol)
OpenBrowser is a framework for intelligent browser automation. It combines direct CDP communication with LangGraph orchestration to create AI agents that can navigate, interact with, and extract information from web pages autonomously.
Full documentation: https://docs.openbrowser.me
- LangGraph-Powered Agents - Stateful workflow orchestration with perceive-plan-execute loop
- Raw CDP Communication - Direct Chrome DevTools Protocol for maximum control and speed
- Vision Support - Screenshot analysis for visual understanding of pages
- 12+ LLM Providers - OpenAI, Anthropic, Google, Groq, AWS Bedrock, Azure OpenAI, Ollama, and more
- Code Agent Mode - Jupyter notebook-like code execution for complex automation
- MCP Server - Model Context Protocol support for Claude Desktop integration
- Video Recording - Record browser sessions as video files
pip install openbrowser-ai# Install with all LLM providers
pip install openbrowser-ai[all]
# Install specific providers
pip install openbrowser-ai[anthropic] # Anthropic Claude
pip install openbrowser-ai[groq] # Groq
pip install openbrowser-ai[ollama] # Ollama (local models)
pip install openbrowser-ai[aws] # AWS Bedrock
pip install openbrowser-ai[azure] # Azure OpenAI
# Install with video recording support
pip install openbrowser-ai[video]uvx openbrowser install
# or
playwright install chromiumimport asyncio
from openbrowser import Agent, ChatGoogle
async def main():
agent = Agent(
task="Go to google.com and search for 'Python tutorials'",
llm=ChatGoogle(),
)
result = await agent.run()
print(f"Result: {result}")
asyncio.run(main())from openbrowser import Agent, ChatOpenAI, ChatAnthropic, ChatGoogle
# OpenAI
agent = Agent(task="...", llm=ChatOpenAI(model="gpt-4o"))
# Anthropic
agent = Agent(task="...", llm=ChatAnthropic(model="claude-sonnet-4-0"))
# Google Gemini
agent = Agent(task="...", llm=ChatGoogle(model="gemini-2.0-flash"))import asyncio
from openbrowser import BrowserSession, BrowserProfile
async def main():
profile = BrowserProfile(
headless=True,
viewport_width=1920,
viewport_height=1080,
)
session = BrowserSession(browser_profile=profile)
await session.start()
await session.navigate_to("https://example.com")
screenshot = await session.screenshot()
await session.stop()
asyncio.run(main())# Google (recommended)
export GOOGLE_API_KEY="..."
# OpenAI
export OPENAI_API_KEY="sk-..."
# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."
# Groq
export GROQ_API_KEY="gsk_..."
# AWS Bedrock
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_DEFAULT_REGION="us-west-2"
# Azure OpenAI
export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
# Browser-Use LLM (external service)
export BROWSER_USE_API_KEY="..."from openbrowser import BrowserProfile
profile = BrowserProfile(
headless=True,
viewport_width=1280,
viewport_height=720,
disable_security=False,
extra_chromium_args=["--disable-gpu"],
record_video_dir="./recordings",
proxy={
"server": "http://proxy.example.com:8080",
"username": "user",
"password": "pass",
},
)| Provider | Class | Models |
|---|---|---|
ChatGoogle |
gemini-2.0-flash, gemini-1.5-pro | |
| OpenAI | ChatOpenAI |
gpt-4o, o3, gpt-4-turbo |
| Anthropic | ChatAnthropic |
claude-sonnet-4-0, claude-3-opus |
| Groq | ChatGroq |
llama-3.3-70b-versatile, mixtral-8x7b |
| AWS Bedrock | ChatAWSBedrock |
claude-3, amazon.titan |
| Azure OpenAI | ChatAzureOpenAI |
Any Azure-deployed model |
| Ollama | ChatOllama |
llama3, mistral (local) |
| OCI | ChatOCIRaw |
Oracle Cloud GenAI models |
| Browser-Use | ChatBrowserUse |
External LLM service |
OpenBrowser includes an MCP server for integration with Claude Desktop.
python -m openbrowser.mcpAdd to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"openbrowser": {
"command": "uvx",
"args": ["openbrowser-ai", "mcp"],
"env": {
"GOOGLE_API_KEY": "..."
}
}
}
}# Run a browser automation task
uvx openbrowser run "Search for Python tutorials on Google"
# Install browser
uvx openbrowser install
# Run MCP server
uvx openbrowser mcpopenbrowser-ai/
├── src/openbrowser/
│ ├── __init__.py # Main exports
│ ├── cli.py # CLI commands
│ ├── config.py # Configuration
│ ├── actor/ # Element interaction
│ ├── agent/ # LangGraph agent
│ │ ├── graph.py # Agent workflow
│ │ ├── service.py # Agent class
│ │ └── views.py # Data models
│ ├── browser/ # CDP browser control
│ │ ├── session.py # BrowserSession
│ │ └── profile.py # BrowserProfile
│ ├── code_use/ # Code agent
│ ├── dom/ # DOM extraction
│ ├── llm/ # LLM providers
│ │ ├── openai/
│ │ ├── anthropic/
│ │ ├── google/
│ │ ├── groq/
│ │ ├── aws/
│ │ ├── azure/
│ │ └── ...
│ ├── mcp/ # MCP server
│ └── tools/ # Action registry
└── tests/ # Test suite
# Run tests
pytest tests/
# Run with verbose output
pytest tests/ -vContributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Email: billy.suharno@gmail.com
- GitHub: @billy-enrizky
- Repository: github.com/billy-enrizky/openbrowser-ai
- Documentation: https://docs.openbrowser.me
Made with love for the AI automation community