A Playwright-based “computer-use” agent that completes a browser navigation challenge (30 steps) and reaches /finish in under 5 minutes in headless Chromium.
This repo is a worked example of the Agent Harbor long-horizon task methodology: write the spec, encode success as tests, track progress explicitly, and iterate on failures using traces until the suite is green — resulting in a “one-shot” agent you can run end-to-end without human intervention.
- Spec the task (inputs/outputs, constraints, non-cheating policy):
docs/SPEC.md - Make success verifiable with correctness + performance gates:
tests/ - Track the plan + verification criteria in a single status file:
computer-use-agent.status.org - Build the smallest end-to-end loop (open → solve step → submit → advance) until all success criteria pass
Fully automated loop: run tests, inspect the failure artifacts, patch, repeat — until the agent can reliably “one-shot” the full run.
Requirements: Node >=22.
npm ci
npx playwright install chromium
npm testRun the solver:
npm run solve -- --version 3 --headlessOverride the challenge URL:
BNC_BASE_URL='https://serene-frangipane-7fd25b.netlify.app' npm run solve -- --version 3 --headlessnpx playwright test tests/perf.spec.tsdocs/SPEC.md— requirements/speccomputer-use-agent.status.org— milestones + verification criteriasrc/— agent runner + per-method solverstests/— correctness + performance tests (with trace/video/screenshot on failure)