A modular web agent designed for the AGI SDK REAL benchmark, featuring dynamic prompt routing and Chain-of-Thought planning for autonomous web navigation.
📐 System Architecture Diagram (Miro)
- 🧠 Modular Architecture — High-level Orchestrator (project manager) + focused Agent (LLM specialist)
- 🔀 Dynamic Prompt Routing — Automatically selects task-specific prompts based on the goal
- 💭 Chain-of-Thought Planning — Self-verification step reviews plans for logical flaws before execution
- 🔄 Advanced Self-Correction — Detects stuck states and changes strategy to recover
- 📋 Granular Planning — Breaks down complex goals into single-action steps for reliability
agiinc/
├── README.md
├── requirements.txt # Root dependencies
├── Dockerfile # Docker support
├── docker-compose.yml # Easy orchestration
│
├── agiwebagent/ # Web agent implementation
│ ├── main.py # Entry point
│ ├── requirements.txt # Agent-specific dependencies
│ └── agent_src/
│ ├── agent.py # LLM communication layer
│ ├── config.py # Configuration dataclass
│ ├── memory.py # Action history tracking
│ ├── orchestrator.py # Task lifecycle manager
│ ├── prompt_selector.py # Dynamic prompt routing
│ ├── llm_utils.py # LLM utilities with retry logic
│ ├── vision_tools.py # Visual OCR extraction
│ ├── utils.py # Helper functions
│ └── prompts/ # Task-specific prompt files
│ ├── omnizon_prompts.py
│ ├── dashdish_prompts.py
│ └── ...
│
└── agisdk/ # AGI SDK (submodule/dependency)
-
Create a virtual environment
python -m venv agienv source agienv/bin/activate # On Windows: agienv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt pip install -r agiwebagent/requirements.txt
-
Configure your API key
Copy the example environment file and add your API key:
cp .env.example .env # Edit .env and replace 'sk-your-api-key-here' with your actual OpenAI API key -
Run the agent
python agiwebagent/main.py --task_name webclones.omnizon-1 --headless true
-
Configure your API key
cp .env.example .env # Edit .env and add your OpenAI API key -
Build and run
docker compose build docker compose run --rm agiwebagent \ --task_name webclones.omnizon-1 \ --headless true
python agiwebagent/main.py --task_name webclones.omnizon-1 --no-cache --headless true# Run all Omnizon (e-commerce) tasks
python agiwebagent/main.py --task_type omnizon --headless true
# Run all DashDish (food delivery) tasks
python agiwebagent/main.py --task_type dashdish --headless true
# Run all NetworkIn (professional networking) tasks
python agiwebagent/main.py --task_type networkin --headless true| Argument | Description | Example |
|---|---|---|
--task_name |
Run a single task by ID | webclones.omnizon-1 |
--task_type |
Run all tasks of a type | omnizon, dashdish |
--headless |
Run browser in background | true / false |
--no-cache |
Force re-run without cache | Flag |
--model |
OpenAI model for main agent | gpt-4o, gpt-4o-mini |
--vision_model |
Model for OCR/vision | gpt-4o |
--use_ocr |
Enable visual OCR | true / false |
| Variable | Description | Required |
|---|---|---|
OPENAI_API_KEY |
Your OpenAI API key | ✅ Yes |
Edit agiwebagent/agent_src/config.py to customize:
@dataclass
class AgentConfig:
model_name: str = "gpt-4o" # Main execution model
plan_model_name: str = "gpt-4o" # Planning model
parser_model_name: str = "gpt-4o-mini" # Prompt routing model
vision_model_name: str = "gpt-4o" # OCR/vision model
max_steps: int = 25 # Max steps per task
max_retries: int = 3 # Max plan generation retries
use_screenshot: bool = True # Include screenshots
use_axtree: bool = True # Include accessibility tree
use_ocr: bool = False # Enable visual OCR1. Playwright browsers not installed
playwright install chromium
playwright install-deps2. Rate limit errors The agent has built-in exponential backoff retry logic. If you're hitting limits frequently, consider:
- Using a model with higher rate limits
- Reducing parallel execution
3. Docker display issues (headed mode) For headed mode in Docker on Linux:
xhost +local:docker
docker-compose run --rm agiwebagent --headless false┌─────────────────────────────────────────────────────────────┐
│ main.py │
│ (Entry Point) │
└──────────────────────────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ TaskOrchestrator │
│ (Manages task lifecycle) │
│ • Creates plans • Handles errors • Tracks progress │
└──────────────────────────┬──────────────────────────────────┘
│
┌────────────┴────────────┐
▼ ▼
┌──────────────────────┐ ┌──────────────────────┐
│ PromptSelector │ │ HighPerformanceAgent │
│ (Routes to prompts) │ │ (LLM Interface) │
└──────────────────────┘ └──────────────────────┘
│ │
▼ ▼
┌──────────────────────┐ ┌──────────────────────┐
│ prompts/*.py │ │ AgentMemory │
│ (Task-specific) │ │ (History tracking) │
└──────────────────────┘ └──────────────────────┘
- Integrate DSPy for better prompt optimization
- Add RL post-training (GRPO, PPO)
- Test with different LLMs for role-based cost optimization
- Build on Nova-act and browser-use frameworks
- Fine-tune multimodal LLM components
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request