AI-driven test generation and self-healing with trace-based backend validation -- proving not just that the UI looks right, but that every backend mutation actually happened correctly.
Auto QA Workbench is an autonomous QA testing tool that replaces brittle, coordinate-based E2E scripts with self-healing AI agents. It generates deterministic Playwright TypeScript tests from natural language descriptions, validates backend behavior through OpenTelemetry trace correlation (no direct database connections), and files defects in Jira automatically. It runs fully autonomously in CI/CD pipelines.
- AI-Powered Test Generation -- Describe tests in natural language; a multi-agent LangGraph pipeline (Planner, Executor, Healer, Evaluator) generates production-ready Playwright TypeScript specs
- Self-Healing Tests -- When locators break, a Healer agent re-extracts the DOM accessibility snapshot and reconstructs semantic locators (
getByRole,getByLabel,getByText) automatically - Trace-Based Backend Validation -- Injects OpenTelemetry traceparent headers and uses Tracetest to assert that backend mutations (database writes, service calls) actually happened, not just that the UI looks right
- GenAI Application Testing -- Validates AI application internals via OTel GenAI semantic conventions with version-adaptive attribute handling across 3 semconv epochs
- Ephemeral CI Infrastructure -- Testcontainers spins up OTel Collector, Jaeger, and Tracetest on demand; zero permanent infrastructure needed
- Automated Defect Filing -- Failures auto-create Jira tickets with full evidence: root-cause analysis (13-category taxonomy), trace deep links, and reproduction steps
- CI/CD Native -- JUnit XML output, structured JSON/HTML reports, headless mode; designed for pipeline integration
LangGraph StateGraph orchestrates 5 specialized agents:
| Agent | Role |
|---|---|
| Supervisor | Routes work between agents, manages retries, enforces token budget |
| Planner | Creates step-by-step test plans by probing live DOM via browser accessibility snapshots |
| Executor | Generates deterministic Playwright TypeScript code from test plans |
| Healer | Re-extracts DOM snapshots and reconstructs broken locators using semantic selectors |
| Evaluator | Validates backend mutations via OpenTelemetry trace correlation with Tracetest assertions |
Each agent uses Playwright MCP Server for browser interaction with separate MCP sessions -- the Executor intentionally starts from a clean browser state to avoid inheriting the Planner's navigated session. This prevents generated tests from silently skipping setup steps (login, navigation) that would fail in CI.
Reports are generated post-graph with three output formats: JUnit XML (CI integration), structured JSON (programmatic consumption), and HTML (human review with trace deep links).
- Describe -- Provide a URL and a natural language test specification
- Plan -- The Planner agent navigates the target app, probes the DOM accessibility tree, and builds a step-by-step test plan with semantic locators
- Generate -- The Executor agent converts the plan into a deterministic Playwright TypeScript test file
- Validate -- The Evaluator agent runs the test with OTel trace injection and asserts backend behavior via Tracetest
- Heal -- If locators break on subsequent runs, the Healer agent automatically reconstructs them from fresh accessibility snapshots
- Report -- Results are written as JUnit XML, JSON, and HTML; failures auto-create Jira tickets with root-cause analysis
- Python 3.11+
- Node.js 18+ (Playwright runtime)
- Docker (Testcontainers)
- Anthropic API key
# Install
git clone https://github.com/damir-topic/auto-qa-workbench.git
cd auto-qa-workbench
uv venv && source .venv/bin/activate
uv pip install -e .
cp .env.example .env
# Add your ANTHROPIC_API_KEY to .env
# Install Playwright browsers
npx playwright install chromium
# Generate a test
auto-qa run --url https://example.com --spec "Verify the page has a heading"
# Run the generated test
npx playwright test generated/specs/src/
cli/ # Typer CLI entry point
config/ # Pydantic settings
graph/ # LangGraph agents (supervisor, planner, executor, healer, evaluator)
mcp/ # Playwright MCP client
infra/ # Testcontainers (OTel Collector, Jaeger, Tracetest)
tracetest/ # Tracetest API client
jira/ # Jira REST API integration
genai/ # GenAI semantic convention support
codegen/ # Playwright code generation and validation
reporting/ # JUnit XML, JSON, HTML report generation
All settings are managed via environment variables. See .env.example for the full list of available options, or check src/config/settings.py for detailed field documentation.
| Variable | Required | Default | Description |
|---|---|---|---|
ANTHROPIC_API_KEY |
Yes | -- | Anthropic API key for Claude |
MODEL_NAME |
No | claude-sonnet-4-5-20250929 |
Model to use for agents |
MAX_RETRIES |
No | 3 |
Max retry attempts per agent |
TOKEN_BUDGET |
No | 150000 |
Token budget per run |
OUTPUT_DIR |
No | generated/specs |
Output directory for specs |
HEADLESS |
No | false |
Run browser in headless mode |
Jira integration is disabled by default. See .env.example for Jira configuration variables.
Each run produces a Playwright test file in generated/specs/:
// generated/specs/verify-login-page.spec.ts
import { test, expect } from '@playwright/test';
test('Verify login page has username and password fields', async ({ page }) => {
await page.goto('https://example.com/login');
await expect(page.getByRole('textbox', { name: 'Username' })).toBeVisible();
await expect(page.getByRole('textbox', { name: 'Password' })).toBeVisible();
await expect(page.getByRole('button', { name: 'Sign in' })).toBeEnabled();
});Tests use accessibility-driven locators exclusively -- no CSS selectors or XPath.
Apache 2.0 -- see LICENSE for details.
See CONTRIBUTING.md for development setup and guidelines.