Autonomous software delivery pipeline using Claude Code or OpenCode. Takes a project from PRD to merged PRs without human intervention.
# Install (once per machine)
git clone <repo> ~/dev/claude-workflow
~/dev/claude-workflow/bin/cw setup # symlinks 'cw' to ~/.local/bin
cw sync # copies agents + skills to ~/.claude (+ opencode if installed)
# Bootstrap a project (once per project)
cd my-project
cw init # creates .claude/CLAUDE.md + hooks.json
# Edit .claude/CLAUDE.md — fill in {{PROJECT_NAME}}, {{CHECK_CMD}}, etc.
# Run the pipeline
cw prd-qa # refine PRD.md
cw create-issues # decompose into GitHub issues
cw implement # autonomous TDD + CI + review + mergeAfter editing agents or skills in the repo, run cw sync to update ~/.claude. If OpenCode is installed, agents are also converted and synced to ~/.config/opencode/agents automatically.
PRD.md → cw prd-qa → cw create-issues → cw implement → merged PRs
(you) (refine PRD) (GitHub issues) (TDD + CI + review + merge)
Multi-agent loop with 3 personas (Integrator, Critic, Architect) that iterate until the PRD is implementation-ready. Each agent is a separate Claude invocation with no shared context, forcing the PRD to be self-contained.
Requires PRD.md in the current directory. See examples/PRD.md for a sample.
Turn a rough feature idea into a well-structured issue through interactive QA. Explores the codebase, asks clarifying questions, then creates a single issue when you're satisfied.
cw create-issue "Add dark mode support"
cw create-issue "Retry failed webhook deliveries with exponential backoff"Reads PRD.md and creates GitHub issues via gh. Issues are vertical slices (not layer-sliced), each with full context, acceptance criteria, test plan, and implementation hints.
Processes open issues in dependency-aware batches, parallelizing independent issues. For each:
- Planner — reads issue + codebase, outputs sequenced test plan for the implementor
- Spawns implementor agent with TDD (red-green-refactor)
- Refactorer — scans new code against existing codebase, extracts shared patterns if warranted
- CI check — if CI fails, invokes ci-fixer agent (up to
CI_FIX_RETRIES) - Code review — single-pass reviewer→judge pipeline, implementor fixes findings
- Squash merges, then synthesizer distills patterns into
LEARNINGS.md - Independent issues run in parallel via git worktrees (up to
MAX_PARALLEL) - Writes diagnostic report to
failures/before any halt
Runs the full reviewer→judge pipeline independently.
cw review 42 # review PR #42
cw review --branch feat/foo # review branch vs main
cw review 42 --skip-checks # skip deterministic checksEvery cw command delegates workflow logic to an orchestrator agent. You can use these same orchestrators directly in Claude Code or OpenCode for a human-in-the-loop version of the pipeline:
| Orchestrator | What it does | Script equivalent |
|---|---|---|
implementation-orchestrator |
Plan → implement (TDD) | cw implement (per issue) |
review-orchestrator |
Checks → code-reviewer → review-judge | cw review |
prd-orchestrator |
Critic → architect → integrator loop | cw prd-qa |
issue-creation-orchestrator |
Validate PRD → create issues | cw create-issues |
ci-fix-orchestrator |
Diagnose → fix → verify CI | CI fix step in cw implement |
The difference: scripts add automation glue (git worktrees, parallelism, retries, failure reports, merge). Orchestrators own the workflow logic and work identically in both modes.
LLM-based reviews are non-deterministic — the same diff can surface different findings each run. To improve consistency, add project-specific review rules to your CLAUDE.md:
# Code Review Rules
- All API routes must use auth middleware
- No raw SQL — use the query builder
- React components must not call hooks conditionallyThese rules constrain the reviewer to check specific invariants rather than freestyle, making results more repeatable across runs. The generic checklist (Logic, Security, Error Handling, Performance, Test Quality) in the code-review skill still applies alongside your project rules.
Analyzes pipeline run history — failure reports, merged PRs, review findings, CI logs, and LEARNINGS.md — to identify recurring patterns. Generates concrete patches to agent/skill/template files and opens a PR against claude-workflow for review.
cw evolve # run from any project that has pipeline data (failures/, merged PRs, etc.)Requires at least one data source: failures/issue-*.md, LEARNINGS.md, merged PRs, or failed CI runs. Every proposed change needs 2+ independent data points — one failure is an anecdote, two is a pattern.
The pipeline integrates with Stitch for UI design reference. When a project has Stitch designs:
- DESIGN.md — auto-generated by Stitch, placed in the project root. Contains the design system (colors, typography, components, spacing, do's/don'ts). This is the styling authority for all UI work.
- Stitch project ID — specified in the PRD or issue. Used to fetch screen mockups.
- Screen HTML — the planner discovers available screens via
list_screensand maps them to components. The implementor fetches screen HTML viaget_screenand matches it exactly. - Missing screens — when implementing a UI component with no existing screen, the implementor generates one via
generate_screen_from_text, fetches the HTML, and implements to match.
Requires the Stitch MCP server to be configured in your Claude Code or OpenCode settings.
| Command | Description |
|---|---|
cw setup |
Install cw to ~/.local/bin (once per machine) |
cw sync |
Copy agents + skills to ~/.claude (+ opencode if installed) |
cw init |
Bootstrap current project (.claude/CLAUDE.md + hooks.json) |
cw prd-qa |
Refine PRD via multi-agent loop |
cw create-issue "<desc>" |
Interactive QA → single GitHub issue |
cw create-issues |
Decompose PRD.md into GitHub issues |
cw implement |
Autonomous implementation pipeline |
cw review <PR#> |
Code review (reviewer → judge) |
cw evolve |
Pipeline retrospective → agent/skill improvements (PR) |
Any agent can be invoked directly:
# Claude Code
claude --agent implementor -p "implement feature X using TDD"
claude --agent ci-fixer -p "fix CI failure on branch feat/foo, run ID 12345"
# OpenCode
opencode run --agent implementor "implement feature X using TDD"
opencode run --agent ci-fixer "fix CI failure on branch feat/foo, run ID 12345"bin/
cw — CLI entrypoint
scripts/
implement-issues.sh — autonomous implementation loop
create-issue.sh — interactive single issue creator
create-issues.sh — PRD to GitHub issues
prd-qa.sh — multi-agent PRD refinement
review.sh — standalone code review (reviewer + judge)
evolve.sh — pipeline retrospective (analyze failures → patch agents)
lib.sh — shared utilities (run_agent, check_signal, extract_text)
stream-filter.jq — Claude stream output formatter
stream-filter-opencode.jq — OpenCode stream output formatter
convert-agent-opencode.awk — converts Claude agents to OpenCode format
agents/
# Orchestrators (use as main agent in IDE — same workflow as scripts)
implementation-orchestrator.md — plan → implement pipeline
review-orchestrator.md — checks → code-reviewer → review-judge
prd-orchestrator.md — critic → architect → integrator loop
issue-creation-orchestrator.md — validate PRD → create issues
ci-fix-orchestrator.md — diagnose → fix → verify CI
# Core agents (invoked by orchestrators)
planner.md — pre-implementation test sequencer (researches unfamiliar deps)
implementor.md — TDD feature builder
refactorer.md — post-implementation cross-codebase deduplication
code-reviewer.md — structured checklist-driven reviewer
review-judge.md — validates review findings against actual code
ci-fixer.md — diagnoses and fixes CI failures
issue-creator.md — decomposes PRD into vertical-slice issues
prd-integrator.md — incorporates answers into PRD
prd-critic.md — reviews PRD for completeness
prd-architect.md — answers PRD questions using code context + library research
# Standalone agents
synthesizer.md — post-merge learnings distiller
retrospective.md — pipeline retrospective analyzer
single-issue-creator.md — interactive QA → single issue
skills/
tdd/ — TDD philosophy, test examples, mocking, refactoring, interface design
code-review/ — structured review criteria, categories, confidence scoring, anti-patterns
issue-template/ — shared issue format (used by both issue creators)
prd-quality/ — PRD completeness criteria
opensrc/ — fetch library source code for any language
stitch/ — UI design reference via Stitch (DESIGN.md convention, screen fetching, generation)
retrospective/ — pipeline retrospective analysis patterns + output format
templates/
CLAUDE.md.global — user-level agent instructions (planning, vertical slicing)
CLAUDE.md.project — project-level template (parameterized)
hooks.json — CI reminder hook (fires after git push / gh pr create)
Environment variables for cw implement:
| Variable | Default | Description |
|---|---|---|
TEST_CMD |
(auto-detected) | Command to run tests |
CHECK_CMD |
(auto-detected) | Full CI check command |
INSTALL_CMD |
(auto-detected) | Dependency install command |
CI_FIX_RETRIES |
3 |
Max ci-fixer attempts per CI failure |
MAX_PARALLEL |
1 |
Max concurrent issue implementations (via git worktrees) |
CW_BACKEND |
claude |
Agent backend: claude or opencode |
CW_MODEL |
(default) | Override model ID for agent invocations |
NO_COLOR |
(unset) | Disable colored output (no-color.org) |
CW_VERBOSE |
(unset) | Show full tool output (no truncation) |
Commands are auto-detected from project files. Env var overrides always win.
| Project file | TEST_CMD | CHECK_CMD | INSTALL_CMD |
|---|---|---|---|
mix.exs |
mix test |
mix format --check-formatted && mix credo && mix test |
mix deps.get |
Cargo.toml |
cargo test |
cargo clippy -- -D warnings && cargo test |
(none) |
go.mod |
go test ./... |
go vet ./... && go test ./... |
go mod download |
*.sln / *.csproj |
dotnet test |
dotnet build --warnaserror && dotnet test |
dotnet restore |
pyproject.toml / requirements.txt |
pytest |
ruff check . && pytest |
pip install -e '.[dev]' |
package.json |
<pm> test |
<pm> run check |
<pm> install --frozen-lockfile |
Package manager (<pm>) is detected from lockfile: pnpm-lock.yaml → pnpm, yarn.lock → yarn, bun.lockb → bun, else npm.
Override when auto-detection doesn't fit:
CHECK_CMD="mix precommit" cw implement
TEST_CMD="go test -race ./..." cw implementThe pipeline enforces quality at 5 layers:
- Pre-commit hook — blocks commits to main, runs typecheck + lint
- Claude hook (
.claude/hooks.json) — reminds agent to watch CI after push - CLAUDE.md rules — agent reads before acting (branch workflow, TDD, regression prevention)
- Code review — single-pass reviewer→judge pipeline, implementor fixes validated findings
- Bash script gates — verifies PR creation, CI pass, review pass, and diagnostics on failure
Orchestrators emit structured signals that scripts and the IDE both understand:
| Signal | Meaning | Used by |
|---|---|---|
<COMPLETE> |
Work finished successfully | prd-orchestrator |
<PASS> |
Review passed, no findings | review-orchestrator |
<HALT> |
Blocker encountered | All orchestrators |
<HALT> includes structured metadata:
<HALT>
REASON: ci-failed
RUN_ID: 12345
DETAILS: CI failed after 3 attempts on branch feat/issue-42.
In scripts, <HALT> triggers failure reports and exits. In the IDE, the orchestrator explains the blocker before halting.
"tests failing on main" — Pre-flight check failed. Run your TEST_CMD on main manually, fix, then re-run cw implement.
"all issues in batch failed" — Every issue in the batch failed. Check failures/issue-N.md for diagnostics (includes CI logs and review findings). Fix the root cause, then re-run.
"unresolved dependencies" — Remaining issues depend on failed/incomplete issues. Check issue dependencies on GitHub, resolve the blocker, then re-run.
"agent-halted" / "implementor-blocked" — Implementor hit a conflict with previous work. Check the draft PR it created for details.
Pipeline crashed mid-batch: Re-run cw implement. It resumes automatically — already-merged issues are skipped, and issues with existing PRs resume from CI/review instead of re-implementing.
CI keeps failing: Check failures/issue-N.md for the last 100 lines of CI logs. Common fixes: missing env vars in CI, dependency version mismatches, flaky tests.
Review findings not fixed: Reviewer found issues the implementor couldn't resolve. Check failures/issue-N.md for unresolved findings. Fix manually and push.
| Log | Location | Purpose |
|---|---|---|
| Real-time progress | tail -f progress.log |
Timeline of all issue progress |
| Per-issue detail | ./logs/issue-N.log |
Full agent output (parallel mode only) |
| Failure reports | failures/issue-N.md |
Diagnostics with CI logs and review findings |
The CLI uses colored output with Unicode symbols by default:
┌──────────────────────────────────────────┐
│ Batch: issues 1 2 3 (3 issue(s), 5 open) │
└──────────────────────────────────────────┘
● Pre-flight: running tests on main
[14:32:01] #1 Add user auth — planning
▸ Read src/index.ts
│ (42 lines)
▸ Bash pnpm test
│ PASS src/auth.test.ts
✓ Done (end_turn, 45.2s, $0.23)
✓ #1 — merged (PR #12)
✗ #2 — FAILED: ci-failed-after-2-attempts
Disable colors with NO_COLOR=1. Show full tool output with CW_VERBOSE=1.
The pipeline supports OpenCode as an alternative backend. Set CW_BACKEND=opencode for all commands:
cw sync # auto-detects opencode and syncs agents
CW_BACKEND=opencode cw prd-qa
CW_BACKEND=opencode cw implementThe conversion (convert-agent-opencode.awk) handles differences between the two:
| Claude Code | OpenCode |
|---|---|
skills: in frontmatter (pre-loaded) |
Skills auto-discovered; injected as ## Skills load instruction |
tools: as array |
Stripped (incompatible record format; all tools enabled by default) |
model: inherit |
Stripped (OpenCode uses its configured default) |
color: cyan (named) |
Converted to hex (#00FFFF) |
| No mode concept | Script-called agents get mode: all; others get mode: subagent |
- Claude Code CLI or OpenCode
- GitHub CLI (
gh) jq