███████╗██████╗ ███████╗ ██████╗██╗███████╗██╗ ██╗
██╔════╝██╔══██╗██╔════╝██╔════╝██║██╔════╝╚██╗ ██╔╝
███████╗██████╔╝█████╗ ██║ ██║█████╗ ╚████╔╝
╚════██║██╔═══╝ ██╔══╝ ██║ ██║██╔══╝ ╚██╔╝
███████║██║ ███████╗╚██████╗██║██║ ██║
╚══════╝╚═╝ ╚══════╝ ╚═════╝╚═╝╚═╝ ╚═╝
Write specs. Validate behavior. Ship with evidence.
Specify turns functional requirements into machine-verifiable specs and runs an autonomous agent against them. Define what your app should do — pages, flows, assertions, API contracts — and Specify tells you what's met, what's not, and what's untested. Every assertion shows its work: expected value, actual value, raw output.
Cooperative QA: the agent runs, you watch the activity stream in the browser, flag what looks wrong, and the next run remembers. Per-spec memory, session transcripts, and a confidence model accumulate into mined skills the agent replays automatically.
No opinions about your test framework. No lock-in. Just structured truth.
npm install
npm run build
(cd webapp && npm install && npm run build) # builds the review UI into dist/webappThe wrapper script at ./specify auto-builds on first run.
# 1. Generate a contract from existing capture data (or run `specify capture` first)
specify spec generate --input ./captures/my-app --output app.spec.yaml
# 2. Verify the implementation
specify verify --spec app.spec.yaml --url http://localhost:3000
# 3. Review results in the browser — flag what looks wrong, the next run remembers
specify review --spec app.spec.yamlspecify review opens the webapp shown above. Click any timeline event to flag
it; flags become observations the agent reads as preamble next run.
| Command | What |
|---|---|
spec generate |
Generate a spec from a capture directory |
capture |
Agent-driven capture from a live system (--url) or code (--from code) |
compare |
Live side-by-side comparison of remote vs local targets |
review |
Browser UI: narrative, activity stream, feedback, skill drafts |
verify |
Verify against a live target (--url) — emits a structured report |
replay |
Replay captured traffic against a target and diff results |
impersonate |
Spin up a MockServer container that impersonates the captured system |
lint / spec lint |
Structural validation (no captures needed) |
spec guide |
Authoring guide for LLM spec writers |
schema |
Emit JSON Schema for spec, report, or commands |
mcp |
MCP server — any LLM client can use Specify as a tool |
daemon |
Long-running HTTP inbox; other agents push verify/capture/compare jobs |
serve / ui / ui start / ui stop |
Lower-level review-UI controls |
human |
Interactive wizard / REPL / TUI dashboard |
clean |
Remove generated reports, agent output, and *.review.html files |
Run specify <cmd> --help for full flags. Source: src/cli/commands-manifest.ts.
Every validation report includes expected vs actual evidence for every assertion. No "100% passed, trust me" — you get the raw output, the exact match, and the assertion logic.
Formats: JSON (machine), Markdown (diff-friendly), HTML (interactive, filterable, single file).
| Status | Type | Expected | Actual |
|--------|----------------|-------------------|-------------------------------------|
| ✅ | text_contains | spec validate | ..."name": "spec validate", ... |
| ✅ | json_path | 0.1.0 | 0.1.0 |
| ❌ | json_schema | matches schema | /items: must have >= 5 items |
Specify is more than a one-shot verifier. Every run reads, writes, and refines
state under <spec_dir>/.specify/:
.specify/
memory/<spec_id>/<target_key>.json # learned rows: quirks, playbooks, observations
sessions.db # SQLite + FTS5 transcripts of every session
confidence.json # accept/override tally per behavior
specify.observations.yaml # per-spec observations (user feedback + reflection)
skill-drafts/<id>.md # mined-pattern → SKILL.md draft, pending review
skills/<name>/SKILL.md # approved skills, replayed in future runs
prompts/<id>.md # versioned, evolved system prompts
verify/verify-result.json # latest agent run result
Memory rows (src/agent/memory-provider.ts, src/agent/memory.ts)
persist across runs, scoped strictly by (spec_id, target_key) so staging and
prod never cross-contaminate. The agent injects them into the next prompt as a
preamble; subsequent runs read/update via memory_record + memory_list MCP
tools.
Three context layers (src/agent/memory-layers.ts)
are merged into every system prompt: user (~/.specify/memory.md), project
(SPECIFY.md or CLAUDE.md), and per-spec (specify.observations.yaml).
Missing layers are silently skipped.
Sessions store (src/agent/session-store.ts)
indexes every event in SQLite with FTS5 so the agent (and you) can search prior
runs by content.
Confidence model (src/agent/confidence-store.ts)
tallies accept vs override per behavior id. The autonomy preset
(ask_everything / ask_uncertain / autonomous) decides whether to ask
before flagging, run silently, or skip.
Pattern miner → skill drafts
(src/agent/pattern-miner.ts,
src/agent/skill-synthesizer.ts)
walks the session corpus, extracts recurring (role, kind) n-grams, and emits
draft SKILL.md files. You approve or reject in the webapp; approved drafts
move to .specify/skills/<name>/SKILL.md and are injected as a preamble in
future runs.
Prompt evolution loop
(src/agent/prompt-evolution.ts)
folds high-confidence observations and frequently-overridden behaviors into a
"lessons learned" preamble. Pure text + deterministic by default; if a Python
script lives at scripts/evolve-prompt.py, it's used as an optional
DSPy/GEPA-style optimiser. Evolved prompts are versioned under
.specify/prompts/.
Optional dialectic provider
(src/agent/honcho-provider.ts) —
when HONCHO_URL is set, an external dialectic user-model service is used
instead of the file-backed memory provider. Optional env vars:
HONCHO_APP (default specify), HONCHO_USER (default $USER),
HONCHO_TOKEN. Without those vars, Specify uses the file-backed provider.
specify review --spec app.spec.yaml boots a Hono server with a React UI.
The UI subscribes to a WebSocket of agent events and lets you flag rows inline.
Each flag is one of: note, important_pattern, missed_check,
false_positive, ignore_pattern, file_bug. Behaviour
(src/agent/feedback.ts):
- writes an observation into
specify.observations.yamlwithsource: user_feedbackand the originating session id - updates the confidence store (
important_pattern/file_bugreinforce;missed_check/false_positive/ignore_patternoverride) - on
file_bug, best-effort spawnsbd createif available - on
important_pattern, publishes afeedback:ingestedevent so the sibling-check propagator pre-flags similar rows in the same session
Approved skill drafts surface in a dedicated panel:
# Local (stdio)
specify mcp
# Remote (HTTP)
specify mcp --http --port 8080Claude Desktop / Cursor / Claude Code config:
{ "mcpServers": { "specify": { "command": "specify", "args": ["mcp"] } } }Tools exposed include spec authoring helpers and bridge tools for the daemon
(daemon_verify, daemon_submit, daemon_status).
Run Specify as a long-lived background process. Idle = 0 tokens. Other agents (or chat bots, webhooks, CI runners) push jobs into an HTTP inbox; each job spawns an Agent SDK run, streams progress, and writes its structured result to disk.
specify daemon --port 4100
# → listens on 127.0.0.1:4100
# → writes a bearer token to ~/.specify/daemon.token on first startSubmit a verify job from any agent:
TOKEN=$(cat ~/.specify/daemon.token)
curl -s -H "Authorization: Bearer $TOKEN" \
-H 'Content-Type: application/json' \
-d '{"spec":"/abs/path/spec.yaml","url":"http://localhost:3000"}' \
http://127.0.0.1:4100/verify
# → {"id":"msg_ab12","status":"queued","stream":"/inbox/msg_ab12/stream"}
# Stream agent events for this message (SSE)
curl -N -H "Authorization: Bearer $TOKEN" \
http://127.0.0.1:4100/inbox/msg_ab12/stream
# Poll the final result (includes path to on-disk verify-result.json)
curl -s -H "Authorization: Bearer $TOKEN" \
http://127.0.0.1:4100/inbox/msg_ab12Endpoints (all require Authorization: Bearer <token> except /health):
| Method | Path | Purpose |
|---|---|---|
| GET | /health |
Liveness + active session count |
| POST | /verify |
{spec, url} shorthand |
| POST | /capture |
{url} shorthand |
| POST | /inbox |
Generic: {task, prompt, spec?, url?, mode?, session?} |
| GET | /inbox |
Recent messages |
| GET | /inbox/:id |
Status + result + resultPath |
| GET | /inbox/:id/stream |
SSE stream of agent events |
| GET | /events/stream |
SSE stream of all daemon events |
| GET | /sessions |
Active persistent sessions |
| POST | /sessions/:id/close |
Close a persistent session |
Dispatch modes:
stateless(default) — fresh SDK run per message, bounded cost. Concurrent jobs run in forked worker processes up to--max-workers(default 2), each with its own Playwright/Chromium.attach— injects into a persistent SDK session keyed bysession. Holds context across messages; idle still uses 0 tokens. Always in-process, serial per session.
Live inspector: GET / on the daemon serves a zero-build HTML page
that streams agent events, lists recent messages, and shows structured
results. Prompts for the token on first load.
Specify ships a container image and a Terraform module so it can run as a long-lived QA agent inside a cluster. One pod per spec, PVC-backed memory that survives restarts, and pluggable triggers (k8s informer, webhook, or both).
module "qa" {
source = "github.com/gm2211/specify//deploy/terraform/modules/specify-qa?ref=main"
name = "renzo-qa"
namespace = "qa"
target_url = "http://renzo.app.svc.cluster.local:8080"
spec_inline = file("${path.module}/specify.spec.yaml")
discovery = { mode = "watch", namespaces = ["app"] }
report_slack_webhook = var.slack_webhook_url
anthropic_api_key_secret = "anthropic-api-key"
}| Group | Pick exactly one |
|---|---|
| Target | target_url · target_dns · target_cluster_ip · target_from_configmap |
| Spec | spec_inline · spec_url (+ optional bearer) · spec_git |
| Discovery | webhook (default) · watch · both · none |
| Reports | report_file_dir (default) + optional report_slack_webhook |
Self-describing install for agents. specify deploy describe --format=json
emits a structured manifest: image coordinates, module ref, oneof groups,
required Secrets, outputs, and an agent_install_recipe. Drop specify deploy print-tf <preset> into a consumer repo for a working .tf
skeleton (minimal, watch-mode, webhook-mode, gitops-spec).
specify deploy describe --format=json | jq .
specify deploy print-tf watch-mode > specify-qa.tfWorked examples live in deploy/terraform/examples/:
minimal,
watch-mode,
gitops-spec. Each example is a
runnable terraform apply directory with a per-example README.
The pod's /work PVC keeps everything the daemon learns:
| Path | Content |
|---|---|
/work/.specify/memory/<spec_id>/<target>.json |
learned memory rows |
/work/.specify/sessions.db |
session SQLite + FTS5 |
/work/.specify/skill-drafts/ |
mined skills awaiting approval |
/work/.specify/skills/ |
active skills replayed each run |
/work/reports/ |
per-run JSON reports (file sink) |
See deploy/terraform/modules/specify-qa/README.md
for the full input / output reference.
YAML or JSON. Human-readable, machine-verifiable.
version: "1.0"
name: "My App"
description: "Behavioral contract for My App"
pages:
- id: dashboard
path: /dashboard
title: "Dashboard"
visual_assertions:
- type: element_exists
selector: "nav.sidebar"
description: "Navigation sidebar is present"
expected_requests:
- method: GET
url_pattern: "/api/v1/stats"
scenarios:
- id: user-login
description: "User logs in and sees dashboard"
steps:
- action: fill
selector: "#email"
value: "{{email}}"
- action: click
selector: "button[type=submit]"
- action: wait_for_navigation
url_pattern: "/dashboard"
- action: assert_visible
selector: ".welcome-message"
variables:
base_url: "${TARGET_BASE_URL}"Specify eats its own dogfood. The repo includes specify.spec.yaml — a spec for Specify itself — validated on every release.
GPL-3.0


