specify/README.md at main · gm2211/specify


   ███████╗██████╗ ███████╗ ██████╗██╗███████╗██╗   ██╗
   ██╔════╝██╔══██╗██╔════╝██╔════╝██║██╔════╝╚██╗ ██╔╝
   ███████╗██████╔╝█████╗  ██║     ██║█████╗   ╚████╔╝
   ╚════██║██╔═══╝ ██╔══╝  ██║     ██║██╔══╝    ╚██╔╝
   ███████║██║     ███████╗╚██████╗██║██║        ██║
   ╚══════╝╚═╝     ╚══════╝ ╚═════╝╚═╝╚═╝        ╚═╝

   Write specs. Validate behavior. Ship with evidence.

Specify turns functional requirements into machine-verifiable specs and runs an autonomous agent against them. Define what your app should do — pages, flows, assertions, API contracts — and Specify tells you what's met, what's not, and what's untested. Every assertion shows its work: expected value, actual value, raw output.

Cooperative QA: the agent runs, you watch the activity stream in the browser, flag what looks wrong, and the next run remembers. Per-spec memory, session transcripts, and a confidence model accumulate into mined skills the agent replays automatically.

No opinions about your test framework. No lock-in. Just structured truth.

Install

npm install
npm run build
(cd webapp && npm install && npm run build)   # builds the review UI into dist/webapp

The wrapper script at ./specify auto-builds on first run.

Quickstart

# 1. Generate a contract from existing capture data (or run `specify capture` first)
specify spec generate --input ./captures/my-app --output app.spec.yaml

# 2. Verify the implementation
specify verify --spec app.spec.yaml --url http://localhost:3000

# 3. Review results in the browser — flag what looks wrong, the next run remembers
specify review --spec app.spec.yaml

specify review opens the webapp shown above. Click any timeline event to flag it; flags become observations the agent reads as preamble next run.

Commands

Command	What
`spec generate`	Generate a spec from a capture directory
`capture`	Agent-driven capture from a live system (`--url`) or code (`--from code`)
`compare`	Live side-by-side comparison of remote vs local targets
`review`	Browser UI: narrative, activity stream, feedback, skill drafts
`verify`	Verify against a live target (`--url`) — emits a structured report
`replay`	Replay captured traffic against a target and diff results
`impersonate`	Spin up a MockServer container that impersonates the captured system
`lint` / `spec lint`	Structural validation (no captures needed)
`spec guide`	Authoring guide for LLM spec writers
`schema`	Emit JSON Schema for spec, report, or commands
`mcp`	MCP server — any LLM client can use Specify as a tool
`daemon`	Long-running HTTP inbox; other agents push verify/capture/compare jobs
`serve` / `ui` / `ui start` / `ui stop`	Lower-level review-UI controls
`human`	Interactive wizard / REPL / TUI dashboard
`clean`	Remove generated reports, agent output, and `*.review.html` files

Run specify <cmd> --help for full flags. Source: src/cli/commands-manifest.ts.

Reports you can trust

Every validation report includes expected vs actual evidence for every assertion. No "100% passed, trust me" — you get the raw output, the exact match, and the assertion logic.

Formats: JSON (machine), Markdown (diff-friendly), HTML (interactive, filterable, single file).

| Status | Type           | Expected          | Actual                              |
|--------|----------------|-------------------|-------------------------------------|
| ✅     | text_contains  | spec validate     | ..."name": "spec validate", ...     |
| ✅     | json_path      | 0.1.0             | 0.1.0                               |
| ❌     | json_schema    | matches schema    | /items: must have >= 5 items        |

The learning loop

Specify is more than a one-shot verifier. Every run reads, writes, and refines state under <spec_dir>/.specify/:

.specify/
  memory/<spec_id>/<target_key>.json   # learned rows: quirks, playbooks, observations
  sessions.db                          # SQLite + FTS5 transcripts of every session
  confidence.json                      # accept/override tally per behavior
  specify.observations.yaml            # per-spec observations (user feedback + reflection)
  skill-drafts/<id>.md                 # mined-pattern → SKILL.md draft, pending review
  skills/<name>/SKILL.md               # approved skills, replayed in future runs
  prompts/<id>.md                      # versioned, evolved system prompts
  verify/verify-result.json            # latest agent run result

Memory rows (src/agent/memory-provider.ts, src/agent/memory.ts) persist across runs, scoped strictly by (spec_id, target_key) so staging and prod never cross-contaminate. The agent injects them into the next prompt as a preamble; subsequent runs read/update via memory_record + memory_list MCP tools.

Three context layers (src/agent/memory-layers.ts) are merged into every system prompt: user (~/.specify/memory.md), project (SPECIFY.md or CLAUDE.md), and per-spec (specify.observations.yaml). Missing layers are silently skipped.

Sessions store (src/agent/session-store.ts) indexes every event in SQLite with FTS5 so the agent (and you) can search prior runs by content.

Confidence model (src/agent/confidence-store.ts) tallies accept vs override per behavior id. The autonomy preset (ask_everything / ask_uncertain / autonomous) decides whether to ask before flagging, run silently, or skip.

Pattern miner → skill drafts (src/agent/pattern-miner.ts, src/agent/skill-synthesizer.ts) walks the session corpus, extracts recurring (role, kind) n-grams, and emits draft SKILL.md files. You approve or reject in the webapp; approved drafts move to .specify/skills/<name>/SKILL.md and are injected as a preamble in future runs.

Prompt evolution loop (src/agent/prompt-evolution.ts) folds high-confidence observations and frequently-overridden behaviors into a "lessons learned" preamble. Pure text + deterministic by default; if a Python script lives at scripts/evolve-prompt.py, it's used as an optional DSPy/GEPA-style optimiser. Evolved prompts are versioned under .specify/prompts/.

Optional dialectic provider (src/agent/honcho-provider.ts) — when HONCHO_URL is set, an external dialectic user-model service is used instead of the file-backed memory provider. Optional env vars: HONCHO_APP (default specify), HONCHO_USER (default $USER), HONCHO_TOKEN. Without those vars, Specify uses the file-backed provider.

Cooperative QA via the review webapp

specify review --spec app.spec.yaml boots a Hono server with a React UI. The UI subscribes to a WebSocket of agent events and lets you flag rows inline.

Each flag is one of: note, important_pattern, missed_check, false_positive, ignore_pattern, file_bug. Behaviour (src/agent/feedback.ts):

writes an observation into specify.observations.yaml with source: user_feedback and the originating session id
updates the confidence store (important_pattern / file_bug reinforce; missed_check / false_positive / ignore_pattern override)
on file_bug, best-effort spawns bd create if available
on important_pattern, publishes a feedback:ingested event so the sibling-check propagator pre-flags similar rows in the same session

Approved skill drafts surface in a dedicated panel:

MCP — use Specify from any LLM

# Local (stdio)
specify mcp

# Remote (HTTP)
specify mcp --http --port 8080

Claude Desktop / Cursor / Claude Code config:

{ "mcpServers": { "specify": { "command": "specify", "args": ["mcp"] } } }

Tools exposed include spec authoring helpers and bridge tools for the daemon (daemon_verify, daemon_submit, daemon_status).

Daemon — background agent

Run Specify as a long-lived background process. Idle = 0 tokens. Other agents (or chat bots, webhooks, CI runners) push jobs into an HTTP inbox; each job spawns an Agent SDK run, streams progress, and writes its structured result to disk.

specify daemon --port 4100
# → listens on 127.0.0.1:4100
# → writes a bearer token to ~/.specify/daemon.token on first start

Submit a verify job from any agent:

TOKEN=$(cat ~/.specify/daemon.token)

curl -s -H "Authorization: Bearer $TOKEN" \
     -H 'Content-Type: application/json' \
     -d '{"spec":"/abs/path/spec.yaml","url":"http://localhost:3000"}' \
     http://127.0.0.1:4100/verify
# → {"id":"msg_ab12","status":"queued","stream":"/inbox/msg_ab12/stream"}

# Stream agent events for this message (SSE)
curl -N -H "Authorization: Bearer $TOKEN" \
     http://127.0.0.1:4100/inbox/msg_ab12/stream

# Poll the final result (includes path to on-disk verify-result.json)
curl -s -H "Authorization: Bearer $TOKEN" \
     http://127.0.0.1:4100/inbox/msg_ab12

Endpoints (all require Authorization: Bearer <token> except /health):

Method	Path	Purpose
GET	`/health`	Liveness + active session count
POST	`/verify`	`{spec, url}` shorthand
POST	`/capture`	`{url}` shorthand
POST	`/inbox`	Generic: `{task, prompt, spec?, url?, mode?, session?}`
GET	`/inbox`	Recent messages
GET	`/inbox/:id`	Status + result + `resultPath`
GET	`/inbox/:id/stream`	SSE stream of agent events
GET	`/events/stream`	SSE stream of all daemon events
GET	`/sessions`	Active persistent sessions
POST	`/sessions/:id/close`	Close a persistent session

Dispatch modes:

stateless (default) — fresh SDK run per message, bounded cost. Concurrent jobs run in forked worker processes up to --max-workers (default 2), each with its own Playwright/Chromium.
attach — injects into a persistent SDK session keyed by session. Holds context across messages; idle still uses 0 tokens. Always in-process, serial per session.

Live inspector: GET / on the daemon serves a zero-build HTML page that streams agent events, lists recent messages, and shows structured results. Prompts for the token on first load.

Deploy as a QA agent in Kubernetes

Specify ships a container image and a Terraform module so it can run as a long-lived QA agent inside a cluster. One pod per spec, PVC-backed memory that survives restarts, and pluggable triggers (k8s informer, webhook, or both).

module "qa" {
  source = "github.com/gm2211/specify//deploy/terraform/modules/specify-qa?ref=main"

  name      = "renzo-qa"
  namespace = "qa"

  target_url  = "http://renzo.app.svc.cluster.local:8080"
  spec_inline = file("${path.module}/specify.spec.yaml")

  discovery = { mode = "watch", namespaces = ["app"] }

  report_slack_webhook     = var.slack_webhook_url
  anthropic_api_key_secret = "anthropic-api-key"
}

Group	Pick exactly one
Target	`target_url` · `target_dns` · `target_cluster_ip` · `target_from_configmap`
Spec	`spec_inline` · `spec_url` (+ optional bearer) · `spec_git`
Discovery	`webhook` (default) · `watch` · `both` · `none`
Reports	`report_file_dir` (default) + optional `report_slack_webhook`

Self-describing install for agents. specify deploy describe --format=json emits a structured manifest: image coordinates, module ref, oneof groups, required Secrets, outputs, and an agent_install_recipe. Drop specify deploy print-tf <preset> into a consumer repo for a working .tf skeleton (minimal, watch-mode, webhook-mode, gitops-spec).

specify deploy describe --format=json | jq .
specify deploy print-tf watch-mode > specify-qa.tf

Worked examples live in deploy/terraform/examples/: minimal, watch-mode, gitops-spec. Each example is a runnable terraform apply directory with a per-example README.

The pod's /work PVC keeps everything the daemon learns:

Path	Content
`/work/.specify/memory/<spec_id>/<target>.json`	learned memory rows
`/work/.specify/sessions.db`	session SQLite + FTS5
`/work/.specify/skill-drafts/`	mined skills awaiting approval
`/work/.specify/skills/`	active skills replayed each run
`/work/reports/`	per-run JSON reports (file sink)

See deploy/terraform/modules/specify-qa/README.md for the full input / output reference.

Spec format

YAML or JSON. Human-readable, machine-verifiable.

version: "1.0"
name: "My App"
description: "Behavioral contract for My App"

pages:
  - id: dashboard
    path: /dashboard
    title: "Dashboard"
    visual_assertions:
      - type: element_exists
        selector: "nav.sidebar"
        description: "Navigation sidebar is present"
    expected_requests:
      - method: GET
        url_pattern: "/api/v1/stats"
    scenarios:
      - id: user-login
        description: "User logs in and sees dashboard"
        steps:
          - action: fill
            selector: "#email"
            value: "{{email}}"
          - action: click
            selector: "button[type=submit]"
          - action: wait_for_navigation
            url_pattern: "/dashboard"
          - action: assert_visible
            selector: ".welcome-message"

variables:
  base_url: "${TARGET_BASE_URL}"

Self-verifying

Specify eats its own dogfood. The repo includes specify.spec.yaml — a spec for Specify itself — validated on every release.

License

GPL-3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Install

Quickstart

Commands

Reports you can trust

The learning loop

Cooperative QA via the review webapp

MCP — use Specify from any LLM

Daemon — background agent

Deploy as a QA agent in Kubernetes

Spec format

Self-verifying

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Install

Quickstart

Commands

Reports you can trust

The learning loop

Cooperative QA via the review webapp

MCP — use Specify from any LLM

Daemon — background agent

Deploy as a QA agent in Kubernetes

Spec format

Self-verifying

License