Skip to content

damir-topic/auto-qa-workbench

Repository files navigation

Auto QA Workbench

AI-driven test generation and self-healing with trace-based backend validation -- proving not just that the UI looks right, but that every backend mutation actually happened correctly.

License: Apache 2.0 Python 3.11+

What is this?

Auto QA Workbench is an autonomous QA testing tool that replaces brittle, coordinate-based E2E scripts with self-healing AI agents. It generates deterministic Playwright TypeScript tests from natural language descriptions, validates backend behavior through OpenTelemetry trace correlation (no direct database connections), and files defects in Jira automatically. It runs fully autonomously in CI/CD pipelines.

Key Features

  • AI-Powered Test Generation -- Describe tests in natural language; a multi-agent LangGraph pipeline (Planner, Executor, Healer, Evaluator) generates production-ready Playwright TypeScript specs
  • Self-Healing Tests -- When locators break, a Healer agent re-extracts the DOM accessibility snapshot and reconstructs semantic locators (getByRole, getByLabel, getByText) automatically
  • Trace-Based Backend Validation -- Injects OpenTelemetry traceparent headers and uses Tracetest to assert that backend mutations (database writes, service calls) actually happened, not just that the UI looks right
  • GenAI Application Testing -- Validates AI application internals via OTel GenAI semantic conventions with version-adaptive attribute handling across 3 semconv epochs
  • Ephemeral CI Infrastructure -- Testcontainers spins up OTel Collector, Jaeger, and Tracetest on demand; zero permanent infrastructure needed
  • Automated Defect Filing -- Failures auto-create Jira tickets with full evidence: root-cause analysis (13-category taxonomy), trace deep links, and reproduction steps
  • CI/CD Native -- JUnit XML output, structured JSON/HTML reports, headless mode; designed for pipeline integration

Architecture Overview

LangGraph StateGraph orchestrates 5 specialized agents:

Agent Role
Supervisor Routes work between agents, manages retries, enforces token budget
Planner Creates step-by-step test plans by probing live DOM via browser accessibility snapshots
Executor Generates deterministic Playwright TypeScript code from test plans
Healer Re-extracts DOM snapshots and reconstructs broken locators using semantic selectors
Evaluator Validates backend mutations via OpenTelemetry trace correlation with Tracetest assertions

Each agent uses Playwright MCP Server for browser interaction with separate MCP sessions -- the Executor intentionally starts from a clean browser state to avoid inheriting the Planner's navigated session. This prevents generated tests from silently skipping setup steps (login, navigation) that would fail in CI.

Reports are generated post-graph with three output formats: JUnit XML (CI integration), structured JSON (programmatic consumption), and HTML (human review with trace deep links).

How it Works

  1. Describe -- Provide a URL and a natural language test specification
  2. Plan -- The Planner agent navigates the target app, probes the DOM accessibility tree, and builds a step-by-step test plan with semantic locators
  3. Generate -- The Executor agent converts the plan into a deterministic Playwright TypeScript test file
  4. Validate -- The Evaluator agent runs the test with OTel trace injection and asserts backend behavior via Tracetest
  5. Heal -- If locators break on subsequent runs, the Healer agent automatically reconstructs them from fresh accessibility snapshots
  6. Report -- Results are written as JUnit XML, JSON, and HTML; failures auto-create Jira tickets with root-cause analysis

Prerequisites

  • Python 3.11+
  • Node.js 18+ (Playwright runtime)
  • Docker (Testcontainers)
  • Anthropic API key

Quickstart

# Install
git clone https://github.com/damir-topic/auto-qa-workbench.git
cd auto-qa-workbench
uv venv && source .venv/bin/activate
uv pip install -e .
cp .env.example .env
# Add your ANTHROPIC_API_KEY to .env

# Install Playwright browsers
npx playwright install chromium

# Generate a test
auto-qa run --url https://example.com --spec "Verify the page has a heading"

# Run the generated test
npx playwright test generated/specs/

Project Structure

src/
  cli/          # Typer CLI entry point
  config/       # Pydantic settings
  graph/        # LangGraph agents (supervisor, planner, executor, healer, evaluator)
  mcp/          # Playwright MCP client
  infra/        # Testcontainers (OTel Collector, Jaeger, Tracetest)
  tracetest/    # Tracetest API client
  jira/         # Jira REST API integration
  genai/        # GenAI semantic convention support
  codegen/      # Playwright code generation and validation
  reporting/    # JUnit XML, JSON, HTML report generation

Configuration

All settings are managed via environment variables. See .env.example for the full list of available options, or check src/config/settings.py for detailed field documentation.

Variable Required Default Description
ANTHROPIC_API_KEY Yes -- Anthropic API key for Claude
MODEL_NAME No claude-sonnet-4-5-20250929 Model to use for agents
MAX_RETRIES No 3 Max retry attempts per agent
TOKEN_BUDGET No 150000 Token budget per run
OUTPUT_DIR No generated/specs Output directory for specs
HEADLESS No false Run browser in headless mode

Jira integration is disabled by default. See .env.example for Jira configuration variables.

Generated Output

Each run produces a Playwright test file in generated/specs/:

// generated/specs/verify-login-page.spec.ts
import { test, expect } from '@playwright/test';

test('Verify login page has username and password fields', async ({ page }) => {
  await page.goto('https://example.com/login');
  await expect(page.getByRole('textbox', { name: 'Username' })).toBeVisible();
  await expect(page.getByRole('textbox', { name: 'Password' })).toBeVisible();
  await expect(page.getByRole('button', { name: 'Sign in' })).toBeEnabled();
});

Tests use accessibility-driven locators exclusively -- no CSS selectors or XPath.

License

Apache 2.0 -- see LICENSE for details.

Contributing

See CONTRIBUTING.md for development setup and guidelines.

About

A fully agentic QA workbench

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors