1. This project contains a modular Universal Web Agent designed to operate within the AGI SDK REAL benchmark and on the open web.
2. The agent uses a Dynamic Router to intelligently switch between Browsing (Playwright), Researching (Perplexity), and Coding (E2B) strategies based on the user's goal.
- Universal Router: An intelligent "Traffic Controller" that routes tasks to the best specialist (Researcher, Coder, or Navigator).
- Cloud-Native Tooling: Integrated Perplexity AI (via Docker MCP) and E2B Code Interpreter running in secure cloud sandboxes.
- High-Level Orchestrator: Manages task lifecycles, planning, and self-correction.
- Dynamic Prompt Routing: Automatically selects specialized prompts for benchmarks (Omnizon, NetworkIn) vs. strict safety prompts for real web tasks.
- Agent Memory: Stores interaction history and thought processes (JSON-based).
- Self-Healing: Implemented self-critique loops to recover from navigation errors.
- Strict Mode: Special handling for anti-bot sites (Amazon/Google) using keyboard shortcuts and OCR.
- Benchmark Ready: Full setup and eval on REAL Bench for OMNIZON and NetworkIn.
- Integrating DSpy for optimizing the Router and Planner prompts automatically.
- Using RL (Reinforcement Learning) for post-training (GRPO, PPO) to improve navigation efficiency.
- Testing with open-weights models (Llama 3, DeepSeek) to reduce costs.
- Building on better browser-use frameworks (Nova-act) and fine-tuning parts of the multi-modal LLM.
- The algorithm for capturing screenshots and BrowserGym’s HighLevelActionSet feature occasionally desync on heavy pages.
- We can create a better semantic map for button tasks (bid) by fine-tuning prompts or using a dedicated VLM for UI element detection.
- Integrating with more agentic frameworks (LangGraph) for multi-agent collaboration.
- The agent defaults to GPT-4o for planning and routing to ensure high reliability. Switching to
gpt-4o-miniinconfig.pysaves costs but reduces complex reasoning capabilities. - Requires E2B and Perplexity credits for the advanced tools functionality.
- Post-training using GRPO with DSpy could improve performance significantly.
- Universal Router: The agent doesn't just click buttons; it thinks. If you ask "Who is the CEO?", it calls Perplexity. If you ask "Calculate Fibonacci", it spins up an E2B sandbox.
- Cloud-Native Infrastructure: Heavy tools (Docker containers, Python runtime) run in E2B Cloud Sandboxes, ensuring sub-second startup times and zero load on your local machine.
- Strict Mode for Real Web: When browsing real sites (Amazon, Google), the agent switches to a "Strict" prompt set that enforces keyboard shortcuts and prevents hallucination.
- Dynamic Prompt Routing: Uses a smart selector to load the correct "instruction manual" (prompt file) for benchmarks (Omnizon, DashDish) while using safe defaults for the open web.
- Chain-of-Thought Planning: The agent performs a "self-verification" step after creating a plan, critiquing it for logical flaws before execution.
The agent's source code is located entirely within the agiwebagent/ directory.
agiwebagent/
├── main.py # The "Brain": Universal entry point with the Router logic.
├── requirements.txt # Python dependencies.
└── agent_src/ # Core source code.
├── __init__.py
├── agent.py # The "Specialist": Manages the LLM loop and Tool execution.
├── config.py # Configuration (Models, API Keys, Template IDs).
├── memory.py # Stores history of actions and thoughts.
├── orchestrator.py # The "Manager": Oversees the plan-act-critique loop.
├── prompt_selector.py # The "Librarian": Picks the right prompt file.
├── tools.py # The "Toolbelt": Manages E2B Cloud Sandbox & Perplexity.
├── utils.py # Helper functions (Image processing).
└── prompts/ # Prompt Engineering logic.
├── __init__.py
├── general_prompts.py # Strict prompts for Real Web tasks.
├── networkIn_prompts.py # Specialized brain for NetworkIn.
├── omnizon_prompts.py # Specialized brain for Omnizon.
└── ...
Follow these steps from the root directory of the project.
It's highly recommended to use a virtual environment to manage dependencies.
python -m venv agienv
source agienv/bin/activate
# On Windows: agienv\Scripts\activateInstall the required Python packages.
pip install -r requirements.txt
pip install -r agiwebagent/requirements.txt
playwright install chromium # Required for the browserCreate a file named .env in the agiwebagent/ directory. You need keys for the LLM and the Tools.
OPENAI_API_KEY="sk-..."
PERPLEXITY_API_KEY="pplx-..."
E2B_API_KEY="e2b_..."All commands should be run from the root directory. The main script is agiwebagent/main.py.
Use this for open-ended tasks on real websites or for general research. The Router will automatically decide if it needs a browser.
Research (No Browser):
python agiwebagent/main.py --goal "Who is the current CEO of OpenAI?"Coding (No Browser):
python agiwebagent/main.py --goal "Generate a Fibonacci sequence in Python."Browsing (Opens Chrome):
python agiwebagent/main.py \
--url "[https://www.amazon.com](https://www.amazon.com)" \
--goal "Find Sony Headphones and add to cart." \
--use_ocr True --headless FalseTo run specific scientific benchmarks (WebClones), use the --task_name argument.
Run a specific task:
python agiwebagent/main.py --task_name webclones.networkin-3 --no-cache --headless trueRun a full suite:
python agiwebagent/main.py --task_type omnizon --no-cache --headless true| Argument | Description | Example |
|---|---|---|
--goal |
(New) The instruction for the agent in Custom Mode. | "Find the price of TSLA" |
--url |
(New) The starting URL for Custom Mode. Forces browsing. | https://google.com |
--task_name |
Runs a single benchmark task by ID. | webclones.dashdish-2 |
--headless |
true/false. Hides the browser window. |
--headless false |
--use_ocr |
Enables visual text reading (Crucial for Amazon/Google). | --use_ocr True |
--model |
Specifies the OpenAI model (default: gpt-4o). | --model gpt-4o |