From 2ffc58f68a40309c1826f726810120348b594ecc Mon Sep 17 00:00:00 2001 From: kasin-it Date: Wed, 6 May 2026 11:40:37 +0200 Subject: [PATCH 1/4] docs: add SETUP.md --- README.md | 82 ++++++++---- SETUP.md | 387 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 445 insertions(+), 24 deletions(-) create mode 100644 SETUP.md diff --git a/README.md b/README.md index 6376e77..d880c16 100644 --- a/README.md +++ b/README.md @@ -77,7 +77,7 @@ flowchart TD ### 1. Clone and install ```bash -git clone https://github.com/AmeliaBlaworiq/ai-workflow.git +git clone https://github.com/Blazity/ai-workflow.git cd ai-workflow pnpm install ``` @@ -104,13 +104,15 @@ Walk through each section: **Jira** — Your Atlassian instance and API credentials: ```bash -ISSUE_TRACKER_KIND=jira JIRA_BASE_URL=https://your-domain.atlassian.net JIRA_EMAIL=your-email@example.com JIRA_API_TOKEN=your-jira-api-token # Generate at https://id.atlassian.com/manage-profile/security/api-tokens JIRA_PROJECT_KEY=PROJ # Your Jira project key (e.g., AWT) +JIRA_WEBHOOK_SECRET= # Optional: openssl rand -hex 32. Without it, dispatch falls back to 1-min cron polling. ``` +> The Jira webhook is registered separately (see [SETUP.md § 8](./SETUP.md#8-register-the-jira-webhook)). The handler at `/webhooks/jira` verifies an `X-Hub-Signature` HMAC-SHA256 header. + **Jira columns** — The board column names Blazebot watches and moves tickets between: ```bash COLUMN_AI=AI # Column where tickets are assigned to the agent @@ -118,21 +120,34 @@ COLUMN_AI_REVIEW=AI Review # Column where completed tickets go for human review COLUMN_BACKLOG=Backlog # Column where tickets go when clarification is needed ``` -**GitHub** — Repository where PRs will be created: +**VCS** — Choose `github` or `gitlab`. Only fill the block matching your provider. + ```bash VCS_KIND=github + +# GitHub (active when VCS_KIND=github) GITHUB_TOKEN=ghp_xxxxxxxxxxxx # Personal access token with repo scope GITHUB_OWNER=your-org # GitHub org or username GITHUB_REPO=your-repo # Target repository name -GITHUB_BASE_BRANCH=main # Branch PRs will target +GITHUB_BASE_BRANCH=main # Branch PRs will target +``` + +```bash +VCS_KIND=gitlab + +# GitLab (active when VCS_KIND=gitlab) +GITLAB_TOKEN=glpat-xxxxxxxxxxxx # PAT with api, read_repository, write_repository scopes +GITLAB_PROJECT_ID=group/repo # Project ID or full path +GITLAB_BASE_BRANCH=main # Branch PRs will target +GITLAB_HOST=https://gitlab.com # Override for self-hosted ``` -**Slack** — Bot notifications and slash commands: +**Slack** — Bot notifications and slash commands. Bot scopes: `chat:write`, `commands`, `files:read`, `users:read`. ```bash -CHAT_SDK_SLACK_TOKEN=xoxb-xxxxxxxxxxxx # Slack bot token (chat:write scope) +CHAT_SDK_SLACK_TOKEN=xoxb-xxxxxxxxxxxx # Slack bot token CHAT_SDK_CHANNEL_ID=C0123456789 # Channel ID for notifications CHAT_SDK_BOT_NAME=blazebot # Display name for the bot -SLACK_SIGNING_SECRET=xxxxxxxxxxxxxxxx # Required for /ai-workflow slash commands +SLACK_SIGNING_SECRET=xxxxxxxxxxxxxxxx # Required — verifies /ai-workflow slash commands SLACK_ALLOWED_USER_IDS=U0123,U4567 # Optional: comma-separated allowlist ``` @@ -224,20 +239,25 @@ curl -H "Authorization: Bearer $CRON_SECRET" http://localhost:3000/cron/poll | Variable | Required | Default | Description | |----------|----------|---------|-------------| | **Jira** | | | | -| `ISSUE_TRACKER_KIND` | Yes | — | Issue tracker type (`jira`) | +| `ISSUE_TRACKER_KIND` | No | `jira` | Issue tracker type (only `jira` supported today) | | `JIRA_BASE_URL` | Yes | — | Atlassian instance URL | | `JIRA_EMAIL` | Yes | — | Jira account email | | `JIRA_API_TOKEN` | Yes | — | Jira API token | | `JIRA_PROJECT_KEY` | Yes | — | Jira project key | +| `JIRA_WEBHOOK_SECRET` | No | — | HMAC secret for `/webhooks/jira`. Without it, dispatch is cron-bound. | | `COLUMN_AI` | Yes | — | Board column for AI-assigned tickets | | `COLUMN_AI_REVIEW` | Yes | — | Board column for completed tickets | | `COLUMN_BACKLOG` | Yes | — | Board column for tickets needing clarification | -| **GitHub** | | | | -| `VCS_KIND` | Yes | — | VCS provider (`github`) | -| `GITHUB_TOKEN` | Yes | — | GitHub PAT with repo scope | -| `GITHUB_OWNER` | Yes | — | GitHub org or username | -| `GITHUB_REPO` | Yes | — | Target repository | +| **VCS** | | | | +| `VCS_KIND` | Yes | — | `github` or `gitlab` | +| `GITHUB_TOKEN` | Yes† | — | GitHub PAT with `repo` scope (when `VCS_KIND=github`) | +| `GITHUB_OWNER` | Yes† | — | GitHub org or username (when `VCS_KIND=github`) | +| `GITHUB_REPO` | Yes† | — | Target repository (when `VCS_KIND=github`) | | `GITHUB_BASE_BRANCH` | No | `main` | Base branch for PRs | +| `GITLAB_TOKEN` | Yes† | — | GitLab PAT with `api`, `read_repository`, `write_repository` (when `VCS_KIND=gitlab`) | +| `GITLAB_PROJECT_ID` | Yes† | — | Project ID or `group/repo` path (when `VCS_KIND=gitlab`) | +| `GITLAB_BASE_BRANCH` | No | `main` | Base branch for MRs | +| `GITLAB_HOST` | No | `https://gitlab.com` | Override for self-hosted GitLab | | **Slack** | | | | | `CHAT_SDK_SLACK_TOKEN` | Yes | — | Slack bot token | | `CHAT_SDK_CHANNEL_ID` | Yes | — | Notification channel ID | @@ -246,10 +266,10 @@ curl -H "Authorization: Bearer $CRON_SECRET" http://localhost:3000/cron/poll | `SLACK_ALLOWED_USER_IDS` | No | — | Comma-separated Slack user IDs allowed to run `/ai-workflow`; empty = anyone | | **Agent** | | | | | `AGENT_KIND` | No | `claude` | Runtime: `claude` or `codex` | -| `ANTHROPIC_API_KEY` | Yes* | — | Anthropic API key (required when `AGENT_KIND=claude`) | +| `ANTHROPIC_API_KEY` | Yes‡ | — | Anthropic API key (required when `AGENT_KIND=claude`) | | `CLAUDE_CODE_OAUTH_TOKEN` | No | — | Alternative to `ANTHROPIC_API_KEY` | | `CLAUDE_MODEL` | No | `claude-opus-4-6` | Claude model ID | -| `CODEX_API_KEY` | Yes* | — | OpenAI Codex API key (required when `AGENT_KIND=codex`) | +| `CODEX_API_KEY` | Yes‡ | — | OpenAI Codex API key (required when `AGENT_KIND=codex`) | | `CODEX_CHATGPT_OAUTH_TOKEN` | No | — | Alternative to `CODEX_API_KEY` | | `CODEX_MODEL` | No | `gpt-5-codex` | Codex model ID | | `CODEX_PRICING_URL` | No | LiteLLM JSON | Pricing source for Codex cost reporting | @@ -259,18 +279,31 @@ curl -H "Authorization: Bearer $CRON_SECRET" http://localhost:3000/cron/poll | **Sandbox** | | | | | `MAX_CONCURRENT_AGENTS` | No | `3` | Max parallel sandboxes | | `JOB_TIMEOUT_MS` | No | `1800000` | Agent timeout (ms) | +| **Attachments** | | | | +| `ATTACHMENT_MAX_FILE_SIZE_MB` | No | `25` | Per-file size limit | +| `ATTACHMENT_MAX_TOTAL_SIZE_MB` | No | `100` | Combined attachment size limit | +| `ATTACHMENT_MAX_COUNT` | No | `20` | Max attachments per ticket | +| `ATTACHMENT_DOWNLOAD_TIMEOUT_MS` | No | `30000` | Download timeout per attachment | | **Polling** | | | | -| `POLL_INTERVAL_MS` | No | `300000` | Poll interval (ms) | +| `POLL_INTERVAL_MS` | No | `300000` | Internal poll cadence (ms) — separate from the 1-min Vercel cron | | **Vercel** | | | | | `VERCEL_TOKEN` | No* | — | Vercel API token (local dev only) | | `VERCEL_TEAM_ID` | No* | — | Vercel team ID (local dev only) | | `VERCEL_PROJECT_ID` | No* | — | Vercel project ID (local dev only) | +| `WORKFLOW_POSTGRES_URL` | No* | — | Local Postgres for Vercel Workflow durable state (dev only) | +| **Arthur (optional)** | | | | +| `GENAI_ENGINE_API_KEY` | No | — | Arthur AI Engine API key | +| `GENAI_ENGINE_TRACE_ENDPOINT` | No | — | Arthur trace endpoint URL | +| `GENAI_ENGINE_PROMPT_TASK_ID` | No | — | Hosted prompt task ID (set after `pnpm setup:arthur-prompts`) | | **Redis** | | | | -| `AI_WORKFLOW_KV_REST_API_URL` | Yes | — | Upstash Redis REST URL | -| `AI_WORKFLOW_KV_REST_API_TOKEN` | Yes | — | Upstash Redis REST token | +| `AI_WORKFLOW_KV_REST_API_URL` | Yes | — | Upstash Redis REST URL (auto-injected by Marketplace integration) | +| `AI_WORKFLOW_KV_REST_API_TOKEN` | Yes | — | Upstash Redis REST token (auto-injected) | | **Security** | | | | -| `CRON_SECRET` | No | — | Cron endpoint auth token | -\* On Vercel, OIDC authenticates automatically. These are only needed for local development if `vercel env pull` doesn't cover your setup. +| `CRON_SECRET` | No | — | Cron endpoint auth token (Vercel sets this automatically when defined) | + +† Required only for the matching `VCS_KIND`. `env.ts` cross-validates at startup. +‡ Required only for the matching `AGENT_KIND` (the OAuth token alternative also satisfies this). +\* On Vercel, OIDC authenticates the sandbox automatically. These are only needed for local development if `vercel env pull` doesn't cover your setup. ## Deploying to Vercel @@ -313,10 +346,11 @@ This hits `/cron/poll` every minute. Vercel injects the `CRON_SECRET` header aut Two GitHub Actions workflows are included: -- **CI** (`ci.yml`) — Runs on every push to `main`/`dev` and on pull requests. Runs typecheck and unit tests. -- **E2E** (`e2e.yml`) — Manual trigger with tier selection: - - **Tier 1** — Basic integration tests (15 min timeout) - - **Tier 2** — Full end-to-end with real Jira/GitHub (150 min timeout, requires Tier 1 to pass first) +- **CI** (`ci.yml`) — Runs on pull requests targeting `main`/`dev` and on `merge_group` events. Runs typecheck and unit tests; gates the merge queue on `e2e-orchestration → e2e-capacity → e2e-agent`. +- **E2E** (`e2e.yml`) — Manual `workflow_dispatch` with tier selection (`orchestration`, `capacity`, `agent`, `all`) and an `agent` choice (`claude` | `codex`): + - **orchestration** — dispatch / cron / webhook flows (60 min timeout) + - **capacity** — concurrency, claim/release, reconciler (30 min timeout, runs after orchestration) + - **agent** — full ticket → PR run against real Jira + GitHub (120 min timeout, runs after capacity) ## Workflow Deep-dive diff --git a/SETUP.md b/SETUP.md new file mode 100644 index 0000000..9a50714 --- /dev/null +++ b/SETUP.md @@ -0,0 +1,387 @@ +# ai-workflow — Setup & Deployment Guide + +End-to-end instructions for deploying ai-workflow to your own Vercel account. Read the [README](./README.md) first for architectural context. + +--- + +## Table of Contents + +1. [Prerequisites](#1-prerequisites) +2. [Provision external accounts](#2-provision-external-accounts) +3. [Clone the repo and link to Vercel](#3-clone-the-repo-and-link-to-vercel) +4. [Install the Upstash marketplace integration](#4-install-the-upstash-marketplace-integration) +5. [Configure environment variables](#5-configure-environment-variables) +6. [Local development (optional)](#6-local-development-optional) +7. [Deploy to Vercel](#7-deploy-to-vercel) +8. [Register the Jira webhook](#8-register-the-jira-webhook) +9. [Register the Slack slash command](#9-register-the-slack-slash-command) +10. [Smoke test the deployment](#10-smoke-test-the-deployment) +11. [CI / GitHub Actions](#11-ci--github-actions) +12. [Optional integrations](#12-optional-integrations) +13. [Troubleshooting](#13-troubleshooting) + +--- + +## 1. Prerequisites + +Local toolchain: + +| Tool | Version | Install | +|------|---------|---------| +| Node.js | 20+ | https://nodejs.org | +| pnpm | 10+ | `npm i -g pnpm` | +| Vercel CLI | latest | `npm i -g vercel@latest` | +| Git | 2.40+ | https://git-scm.com | + +Accounts you must own: + +- **Vercel** — Pro plan recommended (Cron Jobs, Sandbox, Workflow are paid features on Hobby). +- **Atlassian Jira Cloud** — admin access on the project to manage columns, transitions, and webhooks. +- **GitHub** *or* **GitLab** — admin on the target repository (PR + branch creation). +- **Slack** workspace — admin to install a custom app and register slash commands. +- **Anthropic** *or* **OpenAI** — API key for the agent runtime. +- **Upstash** — installed via Vercel Marketplace in step 4. + +--- + +## 2. Provision external accounts + +Do these in any order — you'll paste the resulting values into Vercel in step 5. + +### 2.1 Jira + +1. Go to https://id.atlassian.com/manage-profile/security/api-tokens and create an API token. Save it as `JIRA_API_TOKEN`. +2. Note your Atlassian instance URL (e.g. `https://your-domain.atlassian.net`) → `JIRA_BASE_URL`. +3. Note the email of the Jira user the token belongs to → `JIRA_EMAIL`. +4. Open the project ai-workflow will operate on. Note its key (e.g. `AWT`) → `JIRA_PROJECT_KEY`. +5. On the project board, identify the three columns ai-workflow uses. Create them if they don't exist: + - `COLUMN_AI` — tickets assigned to the agent (default: `AI`) + - `COLUMN_AI_REVIEW` — completed tickets pending human review (default: `AI Review`) + - `COLUMN_BACKLOG` — tickets bounced back for clarification (default: `Backlog`) +6. Generate a webhook secret to authenticate Jira → Vercel deliveries: + ```bash + openssl rand -hex 32 + ``` + Save as `JIRA_WEBHOOK_SECRET`. You'll register the webhook itself in step 8. + +> Without a webhook, dispatch falls back to the 1-minute cron poll — workable for testing, sluggish in production. + +### 2.2 GitHub (or GitLab) + +**GitHub:** +1. Create a fine-grained or classic PAT with `repo` scope at https://github.com/settings/tokens → `GITHUB_TOKEN`. +2. Note the target repo's `owner` and `name` → `GITHUB_OWNER`, `GITHUB_REPO`. +3. Note the base branch (usually `main`) → `GITHUB_BASE_BRANCH`. + +**GitLab:** +1. Create a project access token (or PAT) with `api`, `read_repository`, `write_repository` scopes → `GITLAB_TOKEN`. +2. Note the project ID or `group/repo` path → `GITLAB_PROJECT_ID`. +3. For self-hosted, set `GITLAB_HOST` to your instance base URL. + +### 2.3 Slack + +1. Create a new Slack app at https://api.slack.com/apps → **From scratch**. +2. Under **OAuth & Permissions**, add bot scopes: `chat:write`, `commands`, `files:read`, `users:read`. +3. Install the app to your workspace and copy the **Bot User OAuth Token** (`xoxb-...`) → `CHAT_SDK_SLACK_TOKEN`. +4. Under **Basic Information → App Credentials**, copy **Signing Secret** → `SLACK_SIGNING_SECRET`. +5. In the Slack client, right-click the destination channel → **View channel details** → copy the channel ID (`C...`) → `CHAT_SDK_CHANNEL_ID`. Invite the bot to the channel. +6. Optional: choose a display name → `CHAT_SDK_BOT_NAME` (default `blazebot`). + +The slash command itself is registered in step 9 (after you have a deployment URL). + +### 2.4 Agent runtime + +Pick one — controlled by `AGENT_KIND`. + +**Claude (default):** +- Create an API key at https://console.anthropic.com → `ANTHROPIC_API_KEY`. +- Optionally pin a model: `CLAUDE_MODEL=claude-opus-4-6` (default). + +**Codex:** +- `AGENT_KIND=codex` +- `CODEX_API_KEY=sk-...` (or `CODEX_CHATGPT_OAUTH_TOKEN`) +- Optionally `CODEX_MODEL=gpt-5-codex`. + +--- + +## 3. Clone the repo and link to Vercel + +```bash +git clone .git +cd ai-workflow +pnpm install +vercel link +``` + +`vercel link` walks you through selecting the team and either creating a new project or linking to an existing one. The result is `.vercel/project.json` — keep it out of source control (already gitignored). + +--- + +## 4. Install the Upstash marketplace integration + +ai-workflow uses Upstash Redis as its run registry (atomic claim/release for concurrent runs). + +1. Open https://vercel.com/marketplace/upstash and click **Install**. +2. Pick the team and project you just linked. +3. **Critical:** when prompted for the env-var prefix, set it to `AI_WORKFLOW_KV`. The code reads `AI_WORKFLOW_KV_REST_API_URL` and `AI_WORKFLOW_KV_REST_API_TOKEN` — wrong prefix means ai-workflow can't find the registry. +4. Vercel auto-injects both vars into Production, Preview, and Development environments. + +Verify: +```bash +vercel env ls | grep AI_WORKFLOW_KV +``` + +--- + +## 5. Configure environment variables + +Two paths — pick the one that matches your workflow. + +### 5a. Via Vercel Dashboard (recommended for production) + +Open **Project → Settings → Environment Variables** and add every required variable from the table below. Set scope to **Production, Preview, Development** unless noted otherwise. + +### 5b. Via the CLI + +```bash +cp .env.example .env +# fill in values, then: +vercel env add JIRA_BASE_URL production +vercel env add JIRA_API_TOKEN production +# ... repeat +``` + +### Required variables + +| Variable | Purpose | +|----------|---------| +| `JIRA_BASE_URL`, `JIRA_EMAIL`, `JIRA_API_TOKEN`, `JIRA_PROJECT_KEY` | Jira credentials | +| `COLUMN_AI`, `COLUMN_AI_REVIEW`, `COLUMN_BACKLOG` | Board columns | +| `VCS_KIND` | `github` or `gitlab` | +| `GITHUB_TOKEN`, `GITHUB_OWNER`, `GITHUB_REPO` | If `VCS_KIND=github` | +| `GITLAB_TOKEN`, `GITLAB_PROJECT_ID` | If `VCS_KIND=gitlab` | +| `CHAT_SDK_SLACK_TOKEN`, `CHAT_SDK_CHANNEL_ID` | Slack bot | +| `SLACK_SIGNING_SECRET` | Verifies `/ai-workflow` slash commands | +| `AGENT_KIND` | `claude` (default) or `codex` | +| `ANTHROPIC_API_KEY` | If `AGENT_KIND=claude` | +| `CODEX_API_KEY` | If `AGENT_KIND=codex` | +| `AI_WORKFLOW_KV_REST_API_URL`, `AI_WORKFLOW_KV_REST_API_TOKEN` | Auto-injected by Upstash integration | +| `CRON_SECRET` | Generate: `openssl rand -hex 32`. Required so `/cron/poll` rejects unauthenticated callers. | +| `JIRA_WEBHOOK_SECRET` | Generate: `openssl rand -hex 32`. Strongly recommended — without it, dispatch is cron-bound. | + +### Optional / has defaults + +| Variable | Default | Purpose | +|----------|---------|---------| +| `GITHUB_BASE_BRANCH` | `main` | PR target branch | +| `CHAT_SDK_BOT_NAME` | `blazebot` | Slack display name | +| `SLACK_ALLOWED_USER_IDS` | empty (anyone) | Comma-separated user IDs allowed to run slash commands | +| `CLAUDE_MODEL` | `claude-opus-4-6` | Anthropic model | +| `CODEX_MODEL` | `gpt-5-codex` | Codex model | +| `MAX_CONCURRENT_AGENTS` | `3` | Parallel sandbox cap | +| `JOB_TIMEOUT_MS` | `1800000` (30 min) | Per-run timeout | +| `POLL_INTERVAL_MS` | `300000` (5 min) | Internal poll cadence | +| `COMMIT_AUTHOR`, `COMMIT_EMAIL` | `ai-workflow-blazity`, `ai-workflow@blazity.com` | Git identity inside sandboxes | + +`env.ts` cross-validates at startup — missing required vars or wrong combinations (e.g. `VCS_KIND=github` without `GITHUB_OWNER`) crash the process with a precise error. + +--- + +## 6. Local development (optional) + +For local runs, pull the Vercel env (provisions OIDC tokens for Sandbox auth automatically): + +```bash +vercel env pull .env.local +``` + +Vercel Workflows needs a local Postgres for durable state in dev: + +```bash +# example with Docker +docker run -d --name workflow-pg -p 5432:5432 -e POSTGRES_PASSWORD=postgres postgres:16 +createdb -h localhost -U postgres ai_workflow + +# add to .env.local +WORKFLOW_POSTGRES_URL=postgresql://postgres:postgres@localhost:5432/ai_workflow +``` + +Run: +```bash +pnpm dev +curl http://localhost:3000/health # → {"status":"ok",...} +``` + +If `vercel env pull` doesn't cover Sandbox auth, set `VERCEL_TOKEN`, `VERCEL_TEAM_ID`, `VERCEL_PROJECT_ID` manually. + +--- + +## 7. Deploy to Vercel + +### First deploy (preview) + +```bash +vercel +``` + +Confirm the preview URL works: +```bash +curl https:///health +``` + +### Promote to production + +```bash +vercel --prod +``` + +Or push to your production branch if you've connected the Vercel Git integration — production deployments fire automatically. + +### What deploys + +- HTTP routes from `src/routes/` — health, cron, webhooks, slash commands. +- Vercel Workflow definitions — workflow state is managed by Vercel in production (no Postgres needed). +- Cron job from `vercel.json` (`* * * * *` → `/cron/poll`) — activates automatically. Vercel injects the `CRON_SECRET` auth header. + +--- + +## 8. Register the Jira webhook + +Without this, ai-workflow only learns about ticket changes via the 1-minute cron poll. + +1. Go to **Jira → System Settings → WebHooks** (admin only) or use the Atlassian REST API. +2. Create a webhook: + - **URL:** `https:///webhooks/jira` + - **Secret:** the `JIRA_WEBHOOK_SECRET` value from step 5. Jira signs each delivery with HMAC-SHA256 in the `X-Hub-Signature` header; the handler at `src/routes/webhooks/jira.post.ts` verifies it with `timingSafeEqual`. + - **Events:** `jira:issue_updated` (required). Add `jira:issue_created` and `comment_created` if you want creates and comments to dispatch instantly. + - **JQL filter** (optional): `project = AWT` to limit deliveries to the relevant project. +3. Save. + +Verify by moving a test ticket into the AI column and watching the Vercel runtime logs. + +--- + +## 9. Register the Slack slash command + +1. In your Slack app config, go to **Slash Commands → Create New Command**. +2. Configure: + - **Command:** `/ai-workflow` + - **Request URL:** `https:///webhooks/slack` + - **Short description:** `Manage ai-workflow runs` + - **Usage hint:** `list | status | cancel ` +3. Save and **reinstall the app** to your workspace if Slack prompts you. +4. Confirm `SLACK_SIGNING_SECRET` is set in Vercel (step 5) — `/webhooks/slack` rejects requests with bad signatures. + +Test in Slack: +``` +/ai-workflow list +``` + +If you set `SLACK_ALLOWED_USER_IDS`, only those Slack user IDs can invoke the command — useful for limiting to your engineering team. + +> See `.claude/skills/init-slack/references/slash-commands.md` for the full walkthrough. + +--- + +## 10. Smoke test the deployment + +### Health +```bash +curl https:///health +# → {"status":"ok","timestamp":"..."} +``` + +### Cron auth +```bash +curl https:///cron/poll +# → 401 Unauthorized + +curl -H "Authorization: Bearer $CRON_SECRET" https:///cron/poll +# → 200 with the poll result +``` + +### End-to-end +1. Create a test Jira ticket with a clear acceptance criterion (e.g. "add a `/ping` route returning `pong`"). +2. Move it to the **AI** column. +3. Within ~1 minute (cron) or instantly (webhook), watch: + - Vercel logs — workflow starts, sandbox provisions. + - Jira ticket — moves to **AI Review** (success) or **Backlog** (clarification needed). + - Target repo — new branch `blazebot/` and an open PR. + - Slack channel — notification fires. + +If anything stalls, jump to [troubleshooting](#13-troubleshooting). + +--- + +## 11. CI / GitHub Actions + +Two workflows ship in `.github/workflows/`: + +- **`ci.yml`** — runs on pull requests against `main`/`dev` and on `merge_group` events. The `ci` job runs typecheck + unit tests with no secrets. The merge-queue path additionally runs `e2e-orchestration → e2e-capacity → e2e-agent` against the same `e2e` GitHub environment. +- **`e2e.yml`** — manual `workflow_dispatch` with two inputs: + - `tier`: `orchestration` | `capacity` | `agent` | `all` (default `all`). + - `agent`: `claude` | `codex` — passed as `E2E_AGENT_KIND`, only consumed by the `agent` tier. + + Tiers and timeouts: + - **orchestration** — dispatch / cron / webhook (60 min). + - **capacity** — concurrency, claim/release, reconciler (30 min, gated on orchestration). + - **agent** — full ticket → PR run against real Jira + GitHub (120 min, gated on capacity). + +The E2E jobs need the production env vars exposed as GitHub Actions secrets in the `e2e` environment (Repo Settings → Environments → e2e → Secrets). They additionally require `E2E_BASE_URL`, `E2E_GITHUB_TOKEN`, `E2E_GITHUB_OWNER`, `E2E_GITHUB_REPO`, and `VERCEL_AUTOMATION_BYPASS_SECRET`. + +--- + +## 12. Optional integrations + +### Arthur AI Engine (tracing + hosted prompts) + +Set both: +```bash +GENAI_ENGINE_API_KEY=... +GENAI_ENGINE_TRACE_ENDPOINT=https://your-arthur-host/api/v1/traces +``` + +Then run once to register hosted prompts: +```bash +pnpm setup:arthur-prompts +# saves the resulting task ID — set it as: +GENAI_ENGINE_PROMPT_TASK_ID= +``` + +The tracer is built into every sandbox via `pnpm build:arthur-tracer` during deploy. + +### GitLab instead of GitHub + +Flip `VCS_KIND=gitlab` and provide `GITLAB_TOKEN` + `GITLAB_PROJECT_ID`. For self-hosted, also set `GITLAB_HOST`. `GITHUB_*` vars become inert. + +--- + +## 13. Troubleshooting + +| Symptom | Likely cause | Fix | +|---------|--------------|-----| +| Startup crash: `Invalid environment variables` | Missing required var or wrong cross-field combination | Read the error — `env.ts` lists exactly what's missing. | +| `/cron/poll` returns 401 from Vercel Cron | `CRON_SECRET` mismatch | Ensure the var is set in Production environment. Redeploy after changing. | +| Tickets in AI column never get picked up | Cron disabled / webhook misregistered | Check **Vercel → Project → Cron Jobs** is enabled. Curl `/cron/poll` with the secret to test manually. | +| Workflow starts but sandbox fails to provision | Missing Vercel OIDC / Sandbox quota | On Vercel, OIDC is automatic. Check the project has Sandbox enabled (Pro plan). For local dev, set `VERCEL_TOKEN`/`VERCEL_TEAM_ID`/`VERCEL_PROJECT_ID`. | +| Run registry: `AI_WORKFLOW_KV_REST_API_URL undefined` | Upstash integration installed with wrong prefix | Reinstall with prefix `AI_WORKFLOW_KV`. | +| Agent runs but PR isn't created | `GITHUB_TOKEN` lacks `repo` scope, or wrong owner/repo | Re-create the PAT with `repo` scope. Verify `GITHUB_OWNER`/`GITHUB_REPO` point at the *target* repo, not this repo. | +| Slack messages don't arrive | Bot not in channel, or wrong `CHAT_SDK_CHANNEL_ID` | Invite bot to the channel. Re-copy the channel ID. | +| Slash command returns `dispatch_failed` | Signing secret wrong, or app not reinstalled | Verify `SLACK_SIGNING_SECRET`. Reinstall the Slack app after adding the slash command. | +| Two pollers race on the same ticket | Stale claim sentinel | The reconciler clears claims older than 5 minutes on every poll — wait one cycle, or flush the registry key in Upstash. | +| Sandbox times out | Job too large for `JOB_TIMEOUT_MS` | Increase to 60–90 minutes for complex tickets, or split the work. | + +### Useful logs + +- **Vercel runtime logs:** `vercel logs ` or **Project → Logs**. +- **Workflow runs:** **Project → Workflows** in the Vercel dashboard — shows step-by-step state, failures, retries. +- **Local logs:** Pino prints structured JSON. Pipe through `pnpm dlx pino-pretty`. + +--- + +## Reference + +- Architecture and workflow internals → [README.md](./README.md) +- Spec → [docs/SPEC.md](./docs/SPEC.md) +- User stories → [docs/user-stories.md](./docs/user-stories.md) +- Per-integration walkthroughs → `.claude/skills/init-*/` (Jira, Slack, Upstash, VCS, agent runtime) From ea2cd43da51a065ce3f250ead5629df3476e11c5 Mon Sep 17 00:00:00 2001 From: kasin-it Date: Wed, 6 May 2026 12:04:29 +0200 Subject: [PATCH 2/4] docs: update skills --- .claude/skills/init-env/SKILL.md | 154 ++++++++++++++---- .../init-jira/references/webhook-setup.md | 4 + .claude/skills/init-slack/SKILL.md | 2 +- .../init-slack/references/bot-app-setup.md | 6 +- .claude/skills/init-upstash/SKILL.md | 8 + 5 files changed, 140 insertions(+), 34 deletions(-) diff --git a/.claude/skills/init-env/SKILL.md b/.claude/skills/init-env/SKILL.md index ea3cedd..75804f1 100644 --- a/.claude/skills/init-env/SKILL.md +++ b/.claude/skills/init-env/SKILL.md @@ -1,11 +1,13 @@ --- name: init-env -description: First-time setup orchestrator for the Blazebot ai-workflow repo. Coordinates project linking, env var population across Jira / VCS / Agent / Slack / Upstash, deployment, and webhook registration in a single guided flow. Use when starting fresh on this repo for the first time — "init project", "first-time setup", "bootstrap this repo", "onboard me", "set up env from scratch". +description: First-time setup orchestrator for the Blazebot ai-workflow repo. Mirrors SETUP.md as an agent-driven flow — project linking, env vars across Jira / VCS / Agent / Slack / Upstash, production deploy, post-deploy registrations (Jira webhook + Slack /ai-workflow slash command), and smoke checks. Use when starting fresh on this repo for the first time — "init project", "first-time setup", "bootstrap this repo", "onboard me", "set up env from scratch". --- # Initialize Project Environment (Cold Start) -Cold-start orchestrator. Coordinates project linking, paste-template-driven env population across 5 domains, a single production deploy, webhook registration, and a manual smoke handoff. Self-contained — does not invoke other plugins. +Cold-start orchestrator. Coordinates project linking, paste-template-driven env population across 5 domains, a single production deploy, post-deploy registrations (Jira webhook + Slack slash command), and a smoke handoff. Self-contained — does not invoke other plugins. + +> **Canonical reference:** [SETUP.md](../../../SETUP.md) at the repo root is the human-readable end-to-end guide. This skill is the agent-driven orchestration of that same flow. When the two diverge, SETUP.md wins — update this skill. ## What this skill does NOT do @@ -28,19 +30,19 @@ If the user replies with anything other than a clear go-signal, do not advance ## Sequence ``` -0. Pre-flight → vercel whoami, existing-link check, team scope +0. Pre-flight → tool versions, vercel whoami, existing-link check, team scope 1. vercel link → only if not already linked 2. init-jira (phase 1) → credentials + columns + JIRA_WEBHOOK_SECRET 3. init-vcs → branch on github | gitlab 4. init-agent → branch on claude | codex -5. init-slack +5. init-slack → bot token, channel, signing secret 6. init-upstash → Marketplace install runbook 7. Inline: CRON_SECRET → auto-generate, paste-template -8. vercel env pull → produces .env.local -9. Validate → pnpm tsx --env-file=.env.local env.ts -10. vercel --prod → single production deploy -11. init-jira (phase 2) → webhook registration with deploy URL -12. Manual smoke → user drags a ticket, reports result +8. vercel env pull + validate → .env.local + pnpm tsx env.ts +9. vercel --prod → single production deploy +10. init-jira (phase 2) → webhook registration with deploy URL +11. Slack slash command → /ai-workflow registration with deploy URL +12. Smoke checks → /health + /cron/poll auth + manual ticket 13. Final summary ``` @@ -50,27 +52,42 @@ If the user replies with anything other than a clear go-signal, do not advance Run these in order. Halt with a clear message on any failure; never invoke `vercel login` from this skill. -### 0a. Authentication +### 0a. Toolchain + +Required (per [SETUP.md §1](../../../SETUP.md#1-prerequisites)): + +```bash +node --version # v20+ +pnpm --version # v10+ +vercel --version # latest — older CLIs may miss flags this skill expects +git --version # 2.40+ +``` + +If any version is below the floor, HALT with the exact mismatch and the install command from SETUP.md §1 (`npm i -g pnpm`, `npm i -g vercel@latest`, etc.). Don't try to upgrade for the user. + +Also assume the user has run `pnpm install` after cloning. If `node_modules/` is missing, halt and direct them to run it. + +### 0b. Authentication ```bash vercel whoami ``` - **Fails:** HALT. Tell the user: *"Vercel CLI not authenticated. Run `vercel login`, then re-invoke `init-env`."* -- **OK:** record the current scope (team or personal) for step 0c. +- **OK:** record the current scope (team or personal) for step 0d. -### 0b. Existing link +### 0c. Existing link ```bash test -f .vercel/project.json && cat .vercel/project.json ``` -- **No link:** continue to step 0c. +- **No link:** continue to step 0d. - **Link present:** read its `orgId` / `projectId`. Print: *"Existing link found: scope=\ project=\. Use this link or relink?"* - **Use:** skip step 1 entirely; carry this link forward. - **Relink:** HALT. Tell the user: *"Remove `.vercel/project.json` (`rm .vercel/project.json`) and re-invoke `init-env`."* -### 0c. Team-scope confirmation +### 0d. Team-scope confirmation Compare the existing link's scope (if any) with `vercel whoami` output. If they differ, surface the mismatch explicitly. Otherwise: @@ -85,7 +102,7 @@ Print: *"Will link to team scope: \. Correct?"* ## Step 1 — `vercel link` -Skip if step 0b found a usable existing link. +Skip if step 0c found a usable existing link. ```bash vercel link @@ -218,16 +235,81 @@ Phase 2 derives the webhook URL from `.vercel/project.json` (`https://. If the user opts to defer webhook registration (custom domain coming, admin permission missing, etc.), record it as a TODO for the final summary and continue. -→ **Stop. Ask:** *"Webhook registered (or deferred). Ready for Step 11: smoke test?"* +→ **Stop. Ask:** *"Webhook registered (or deferred). Ready for Step 11: Slack slash command?"* --- -## Step 11 — Manual smoke +## Step 11 — Slack slash command + +Register the `/ai-workflow` slash command against the deployed URL. This is the Slack analogue of Step 10 — both webhook and slash command need a live deploy URL, hence both run post-deploy. + +Read `.vercel/project.json` for the project name and construct: + +``` +https://.vercel.app/webhooks/slack +``` + +Walk the user through the runbook (full version: `init-slack/references/slash-commands.md`). The TL;DR: + +1. Open the Slack app's config page (https://api.slack.com/apps → your Blazebot app). +2. **Slash Commands → Create New Command.** +3. Fill: + - **Command:** `/ai-workflow` + - **Request URL:** the slash URL from above + - **Short description:** `Manage ai-workflow runs` + - **Usage hint:** `list | status | cancel ` +4. Save and **reinstall the app** to the workspace if Slack prompts. +5. Confirm `SLACK_SIGNING_SECRET` is set in Vercel (collected in Step 5). The handler at `src/routes/webhooks/slack.post.ts` rejects bad signatures. + +Verify in Slack: + +``` +/ai-workflow list +``` + +If the user can't register the slash command now (admin permission missing, custom domain pending, etc.), record it as a TODO for the final summary and continue. + +→ **Stop. Ask:** *"Slash command registered (or deferred). Ready for Step 12: smoke checks?"* + +--- + +## Step 12 — Smoke checks + +Three checks, in order. Don't skip the first two — they catch deploy/env issues without needing a live ticket round-trip. + +### 12a. Health endpoint + +```bash +curl https://.vercel.app/health +# expect: {"status":"ok","timestamp":"..."} +``` + +Failure modes: +- Non-200: build is broken or env-var validation crashed at startup. Run `vercel logs --prod` to inspect; the user must fix and redeploy. +- 200 but wrong body: middleware or routing regression — surface the response and stop. + +### 12b. Cron auth + +```bash +# unauth — must reject +curl -i https://.vercel.app/cron/poll +# expect: HTTP/1.1 401 + +# authed — must succeed +curl -i -H "Authorization: Bearer $CRON_SECRET" https://.vercel.app/cron/poll +# expect: HTTP/1.1 200 +``` + +If the unauth call returns 200, `CRON_SECRET` is missing in Production. Ask the user to set it (Step 7's value) and redeploy. + +The user can paste `$CRON_SECRET` directly from their local shell if they ran `vercel env pull` in Step 8 and source `.env.local` first. + +### 12c. End-to-end ticket Print: ``` -Last step. Drop a test ticket in Jira to verify the bot end-to-end. +Last check. Drop a test ticket in Jira to verify the bot end-to-end. 1. Open ${JIRA_BASE_URL}/jira/your-projects 2. Create a small issue: @@ -246,13 +328,13 @@ Within ~5s (with webhook) or ~60s (cron fallback), expect: Reply when you've seen the PR (or "stuck on X" if a step is missing). ``` -Wait for the user's response. If they report a failure, capture which milestone was missing and include it in the final summary. +Wait for the user's response. If they report a failure, capture which milestone was missing and route to the matching row in [SETUP.md §13 — Troubleshooting](../../../SETUP.md#13-troubleshooting). Include the result in the final summary. → **Stop. Ask:** *"Smoke passed?"* --- -## Step 12 — Final summary +## Step 13 — Final summary Print the summary template below, populated with the values gathered during the flow. Use the actual project name from `.vercel/project.json` and the user-reported smoke result. @@ -261,27 +343,35 @@ Cold start complete. Linked Vercel project: / Production URL: https://.vercel.app -Webhook URL: https://.vercel.app/webhooks/jira +Jira webhook URL: https://.vercel.app/webhooks/jira +Slack request URL: https://.vercel.app/webhooks/slack Configured: Jira webhook VCS / Agent model - Slack channel bot @ + Slack channel bot @, slash Upstash AI_WORKFLOW_KV prefix via Marketplace Cron CRON_SECRET set schedule * * * * * -Skipped (you can add these later): - - Arthur AI tracing — see https://www.arthur.ai/ for setup; both - GENAI_ENGINE_API_KEY and GENAI_ENGINE_TRACE_ENDPOINT, then run - `pnpm setup:arthur-prompts`. +Skipped (see SETUP.md for the full how-to): + - Arthur AI tracing — SETUP.md §12. Set GENAI_ENGINE_API_KEY and + GENAI_ENGINE_TRACE_ENDPOINT, then run `pnpm setup:arthur-prompts` + and persist the resulting GENAI_ENGINE_PROMPT_TASK_ID. + - GitLab swap — SETUP.md §12. Flip VCS_KIND=gitlab and provide + GITLAB_TOKEN + GITLAB_PROJECT_ID (+ GITLAB_HOST for self-hosted). + - CI / GitHub Actions — SETUP.md §11. The `e2e` GitHub environment + needs the prod env vars plus E2E_BASE_URL, E2E_GITHUB_TOKEN/OWNER/ + REPO, and VERCEL_AUTOMATION_BYPASS_SECRET as secrets. - Custom domain — point a domain at the Vercel project for a stable - webhook URL (replace .vercel.app in Jira's webhook config). - - WORKFLOW_POSTGRES_URL — local dev only. - - VERCEL_TOKEN local PAT — local dev only; Vercel uses OIDC. + webhook URL (then update Jira webhook + Slack request URLs). + - WORKFLOW_POSTGRES_URL — local dev only (SETUP.md §6). + - VERCEL_TOKEN local PAT — local dev only; Vercel uses OIDC in prod. -Smoke test: - +Smoke checks: + /health + /cron/poll + end-to-end Maintenance: Rotate one integration later by invoking that subskill standalone: @@ -291,6 +381,8 @@ Maintenance: vercel logs --prod https://vercel.com///observability + Troubleshooting matrix: SETUP.md §13. + No git changes were made. .env.local and .vercel/project.json are gitignored. ``` diff --git a/.claude/skills/init-jira/references/webhook-setup.md b/.claude/skills/init-jira/references/webhook-setup.md index a5935be..73c390e 100644 --- a/.claude/skills/init-jira/references/webhook-setup.md +++ b/.claude/skills/init-jira/references/webhook-setup.md @@ -6,6 +6,10 @@ The handler dispatches when the ticket lands in `COLUMN_AI` and cancels in-flight runs when it leaves. Both cases are detected on `jira:issue_updated`. Subscribing to `created`, `deleted`, etc. just adds noise that gets filtered away — the handler ignores anything without a project-key match or without an issue key. +### Optional: instant create / comment dispatch + +If you want creates and new comments to dispatch immediately (rather than waiting for the next `jira:issue_updated` from a transition or edit), also check **Issue → Issue created** and **Comment → Comment created**. Tradeoff: more webhook traffic, but no perceptible latency on freshly-created tickets or replies. The handler still applies the same column/state filters either way — extra events are filtered out, not acted on. + ## Open the webhook admin page ``` diff --git a/.claude/skills/init-slack/SKILL.md b/.claude/skills/init-slack/SKILL.md index 69e2e8f..bf41ef8 100644 --- a/.claude/skills/init-slack/SKILL.md +++ b/.claude/skills/init-slack/SKILL.md @@ -34,7 +34,7 @@ Ask: - `CHAT_SDK_CHANNEL_ID` — channel ID like `C0123456789` (not `#channel-name`) - `CHAT_SDK_BOT_NAME` — defaults to `blazebot`; only ask if the user wants to override - `SLACK_SIGNING_SECRET` — required. App settings → **Basic Information** → **App Credentials** → **Signing Secret**. Used to verify inbound `/ai-workflow` slash command requests. See `references/slash-commands.md` for the full slash-command setup. -- `SLACK_ALLOWED_USER_IDS` — optional. Comma-separated Slack user IDs (`U…`) allowed to run `/ai-workflow`. Empty = anyone in the workspace. +- `SLACK_ALLOWED_USER_IDS` — optional. Comma-separated Slack user IDs (`U…`) allowed to run `/ai-workflow`. Defaults to empty, which lets anyone in the workspace run the slash command. Only ask if the user wants to restrict access. ### Finding the channel ID diff --git a/.claude/skills/init-slack/references/bot-app-setup.md b/.claude/skills/init-slack/references/bot-app-setup.md index d0e9fd0..40cea8d 100644 --- a/.claude/skills/init-slack/references/bot-app-setup.md +++ b/.claude/skills/init-slack/references/bot-app-setup.md @@ -16,10 +16,12 @@ In the app settings sidebar: 1. **OAuth & Permissions** → **Scopes** → **Bot Token Scopes** → **Add an OAuth Scope**. 2. Add these scopes: - `chat:write` — required. Lets the bot post messages. + - `commands` — required. Lets Slack deliver `/ai-workflow` slash command invocations to the bot. + - `files:read` — required. Lets the bot read file uploads attached to messages and threads. + - `users:read` — required. Resolves user IDs to display names. - `chat:write.public` — optional. Lets the bot post in public channels it isn't a member of. Skip if you'll always invite the bot. - - `users:read` — optional. For mentioning specific users in messages. -Only `chat:write` is hard-required. +The four above (`chat:write`, `commands`, `files:read`, `users:read`) are all required. ## Install the app to the workspace diff --git a/.claude/skills/init-upstash/SKILL.md b/.claude/skills/init-upstash/SKILL.md index 66da1da..edf5a42 100644 --- a/.claude/skills/init-upstash/SKILL.md +++ b/.claude/skills/init-upstash/SKILL.md @@ -44,6 +44,14 @@ Tell the user to confirm in Vercel → Project Settings → Environment Variable - `AI_WORKFLOW_KV_REST_API_URL` (value: `https://.upstash.io`) - `AI_WORKFLOW_KV_REST_API_TOKEN` +CLI alternative (faster from a terminal): + +```bash +vercel env ls | grep AI_WORKFLOW_KV +``` + +Success: both `AI_WORKFLOW_KV_REST_API_URL` and `AI_WORKFLOW_KV_REST_API_TOKEN` appear, scoped to all three environments (Production, Preview, Development). + If the keys are named differently (e.g. `KV_REST_API_URL` without the `AI_WORKFLOW_KV` prefix), the prefix wasn't set correctly during install. Two recovery paths: - **Easier:** uninstall the Upstash integration (Storage → Upstash → Disconnect), reinstall with the correct prefix. From 0184fcd9f1b623faf0bdc77558e767bd6b43d23a Mon Sep 17 00:00:00 2001 From: kasin-it Date: Wed, 6 May 2026 12:15:53 +0200 Subject: [PATCH 3/4] docs: update readme --- README.md | 226 +++++++++++++++++++++++++++--------------------------- 1 file changed, 114 insertions(+), 112 deletions(-) diff --git a/README.md b/README.md index d880c16..cf545a0 100644 --- a/README.md +++ b/README.md @@ -1,47 +1,54 @@ -# Blazebot +# ai workflow -A workflow-driven AI coding automation service that turns Jira tickets into merge-ready pull requests. Blazebot polls your issue tracker for tickets assigned to AI, implements features end-to-end inside isolated [Vercel Sandboxes](https://vercel.com/docs/sandbox), and delivers PRs for human approval — no manual intervention required. +A workflow-driven AI coding automation service that turns Jira tickets into merge-ready pull requests. ai workflow polls your issue tracker for tickets assigned to AI, implements features end-to-end inside isolated [Vercel Sandboxes](https://vercel.com/docs/sandbox), and delivers PRs for human approval — no manual intervention required. Designed for **self-hosting**: bring your own API keys (Jira, GitHub, Slack, Anthropic) and run on your own Vercel infrastructure. ## How It Works 1. **You move a Jira ticket** to the "AI" column on your board -2. **Blazebot's poller** discovers the ticket (runs every minute via Vercel Cron) -3. **A durable Vercel Workflow** orchestrates the full implementation lifecycle -4. **Claude Code** runs inside an isolated Vercel Sandbox — one sandbox per ticket, no access to production -5. **A pull request is created**, the ticket moves to "AI Review", and your team gets a Slack notification +2. **ai workflow dispatches** the ticket — instantly via the Jira webhook, or within ~1 min via the Vercel Cron poller as a fallback +3. **A durable Vercel Workflow** runs the agent in phases (research → implementation) inside a single Vercel Sandbox per ticket +4. **The sandbox pushes commits** directly to the feature branch, the ticket moves to "AI Review", and your team gets a Slack notification -If the PR gets review feedback, Blazebot picks it up again, runs a fix cycle, and pushes updates. If the agent can't proceed without human input, it posts clarification questions on the ticket and waits. +If the ticket already has an open PR (review feedback), the same workflow re-runs and feeds the PR comments + conflict status into the agent's context. If the agent can't proceed without human input, it posts clarification questions on the ticket and moves it to Backlog. ```mermaid flowchart TD - A["Jira ticket moved to AI column"] --> B["Poller discovers ticket"] - B --> C{"PR already exists?"} - - C -- No --> D["Implementation Workflow"] - C -- Yes --> E["Review-Fix Workflow"] - - D --> F["Create feature branch"] - F --> G["Provision Vercel Sandbox"] - G --> H["Run Claude Code agent"] - H --> I{Agent result?} - - I -- "implemented" --> J["Push changes & create PR"] - J --> K["Move ticket to AI Review"] - K --> L["Notify via Slack"] - - I -- "clarification_needed" --> M["Post questions on ticket"] - M --> N["Move ticket to Backlog"] - N --> O["Notify via Slack"] - - I -- "failed" --> P["Notify failure via Slack"] - - E --> Q["Fetch PR comments & conflict status"] - Q --> G - - R["Reconciler runs on every poll"] -.-> S["Cleans finished runs"] - R -.-> T["Cancels orphaned runs"] + A["Jira ticket moved to AI column"] --> B{"Dispatch"} + B -- "webhook (instant)" --> D["agentWorkflow"] + B -- "cron poll (~1 min)" --> D + + D --> E["fetchPRContext (existing PR?)"] + E --> F["createFeatureBranch (only if no PR)"] + F --> G["provisionSandbox + register sandbox"] + + G --> P1["Phase 1: Research / Plan"] + P1 --> P1R{Research result?} + P1R -- "clarification_needed" --> CL["Post questions → Backlog → notify"] + P1R -- "failed / timeout" --> FB["Move to Backlog → notify failed"] + P1R -- "completed" --> P2["Phase 2: Implementation"] + + P2 --> P2R{Impl result?} + P2R -- "clarification_needed" --> CL + P2R -- "failed / timeout" --> FB + P2R -- "implemented" --> PUSH["pushFromSandbox (git push --force from inside sandbox)"] + + PUSH --> PUSHR{Push ok?} + PUSHR -- "no" --> FIX["fixAndRetryPush (lightweight fix agent)"] + FIX --> PUSHR + PUSHR -- "yes" --> PR["createPullRequest / findPR"] + PR --> MV["Move to AI Review → notify pr_ready"] + + TD["teardownSandbox (always runs in finally)"] + MV -.-> TD + CL -.-> TD + FB -.-> TD + + R["Reconciler (every poll)"] -.-> R1["Stale claims (>5 min)"] + R -.-> R2["Finished runs"] + R -.-> R3["Orphaned runs (ticket left AI column)"] + R -.-> R4["Stale failed-ticket markers"] ``` ## Tech Stack @@ -51,11 +58,12 @@ flowchart TD | Server | [Nitropack](https://nitro.build) | HTTP server framework (Vercel Functions) | | Orchestration | [Vercel Workflows](https://vercel.com/docs/workflow) | Durable execution — survives crashes and deploys | | Agent Execution | [Vercel Sandbox](https://vercel.com/docs/sandbox) | Isolated per-ticket environments | -| AI Agent | [Claude Code](https://docs.anthropic.com/en/docs/claude-code) | Coding agent (Anthropic) | +| AI Agent | [Claude Code](https://docs.anthropic.com/en/docs/claude-code) or [OpenAI Codex CLI](https://github.com/openai/codex) | Coding agent (selectable via `AGENT_KIND`) | | Issue Tracker | Jira REST API | Ticket lifecycle management | -| VCS | GitHub ([Octokit](https://github.com/octokit/rest.js)) | Branches, PRs, file pushes | -| Messaging | [Chat SDK](https://chat-sdk.dev) + Slack | Team notifications | -| Run Registry | [Upstash Redis](https://upstash.com) via [Vercel KV](https://vercel.com/docs/storage/vercel-kv) | Atomic claim/release for concurrent runs | +| VCS | GitHub ([Octokit](https://github.com/octokit/rest.js)) or GitLab ([@gitbeaker/rest](https://github.com/jdalrymple/gitbeaker)) | Branches, PRs/MRs, comments | +| Messaging | [Chat SDK](https://chat-sdk.dev) + Slack | Team notifications + `/ai-workflow` slash commands | +| Run Registry | [Upstash Redis](https://upstash.com) (via Vercel Marketplace integration) | Atomic claim/release for concurrent runs | +| Tracing (optional) | [Arthur AI Engine](https://www.arthur.ai/) | Per-run prompt/tool tracing inside the sandbox | | Validation | [Zod](https://zod.dev) | Schema validation for config and agent output | | Logging | [Pino](https://getpino.io) | Structured JSON logs | | Testing | [Vitest](https://vitest.dev) | Unit and E2E tests | @@ -84,7 +92,7 @@ pnpm install ### 2. Link to Vercel -Blazebot runs on Vercel and uses OIDC for Sandbox authentication. Link the project first: +ai workflow runs on Vercel and uses OIDC for Sandbox authentication. Link the project first: ```bash vercel link @@ -113,7 +121,7 @@ JIRA_WEBHOOK_SECRET= # Optional: openssl rand -hex 32. Without > The Jira webhook is registered separately (see [SETUP.md § 8](./SETUP.md#8-register-the-jira-webhook)). The handler at `/webhooks/jira` verifies an `X-Hub-Signature` HMAC-SHA256 header. -**Jira columns** — The board column names Blazebot watches and moves tickets between: +**Jira columns** — The board column names ai workflow watches and moves tickets between: ```bash COLUMN_AI=AI # Column where tickets are assigned to the agent COLUMN_AI_REVIEW=AI Review # Column where completed tickets go for human review @@ -161,7 +169,7 @@ COMMIT_AUTHOR=ai-workflow-blazity # Git commit author name COMMIT_EMAIL=ai-workflow@blazity.com # Git commit author email ``` -**Switching agents** — Blazebot supports two CLI runtimes. Set `AGENT_KIND` once per deployment: +**Switching agents** — ai workflow supports two CLI runtimes. Set `AGENT_KIND` once per deployment: ```bash AGENT_KIND=claude # default — Anthropic Claude Code @@ -309,7 +317,7 @@ curl -H "Authorization: Bearer $CRON_SECRET" http://localhost:3000/cron/poll ### 1. Push to GitHub -Blazebot deploys automatically when connected to Vercel via Git integration. +ai workflow deploys automatically when connected to Vercel via Git integration. ### 2. Import project @@ -354,42 +362,33 @@ Two GitHub Actions workflows are included: ## Workflow Deep-dive -### Implementation Workflow - -When a ticket is discovered in the AI column and no PR exists yet, the **implementation workflow** runs: +### One workflow, two phases -| Step | What happens | -|------|-------------| -| `fetchAndValidateTicket` | Fetches ticket from Jira, verifies it's still in the AI column | -| `createFeatureBranch` | Creates `blazebot/{ticket-key}` branch from the base branch | -| `assembleImplementationRequirements` | Combines ticket title, description, acceptance criteria, and comments into a `requirements.md` prompt | -| `provisionAndStartAgent` | Provisions a Vercel Sandbox, installs Claude Code + global skills, starts the agent detached with a JSON output schema | -| *poll loop* | Polls the sandbox every 30s for completion (workflow suspends between polls) | -| `collectAgentResults` | Reads agent output and extracts changed files from the sandbox | -| `pushChanges` | Pushes all modified files to the feature branch via the GitHub API | -| `createPullRequest` | Opens a PR targeting the base branch | -| `moveTicket` | Moves the Jira ticket to the "AI Review" column | -| `notifySlack` | Sends a Slack message to the configured channel | -| `unregisterRun` | Removes the ticket from the run registry | - -If the agent returns `clarification_needed`, the workflow instead posts the questions as a Jira comment, moves the ticket to Backlog, and notifies via Slack. When someone answers and moves the ticket back to AI, Blazebot picks it up again with the full conversation history. - -### Review-Fix Workflow - -When a ticket is in the AI column but a PR already exists (indicating review feedback), the **review-fix workflow** runs: +There is a single durable workflow — `agentWorkflow` in [`src/workflows/agent.ts`](./src/workflows/agent.ts) — that handles both fresh tickets and review-fix re-runs. The branching happens at *context-assembly* time, not at the workflow level: if an open PR for `blazebot/{ticket-key}` already exists, its comments, check results, and conflict status are folded into the agent's input. | Step | What happens | |------|-------------| -| `fetchAndValidateTicket` | Same as implementation | -| `fetchPRContext` | Fetches all PR comments (review + issue) and merge conflict status | -| `assembleReviewFixRequirements` | Builds requirements including the original ticket context plus PR feedback and conflict status | -| `provisionAndStartFixingAgent` | Starts the agent detached with the fixing prompt | -| *poll loop* | Polls the sandbox every 30s for completion | -| `collectAgentResults` | Reads agent output and extracts changed files | -| `pushChanges` | Pushes fixes to the existing branch | -| `moveTicket` | Moves back to AI Review | -| `notifySlack` | Notifies the team | -| `unregisterRun` | Cleans up | +| `fetchAndValidateTicket` | Fetches the ticket from Jira; aborts if it's no longer in the AI column | +| `fetchPRContext` | Looks up an open PR for `blazebot/{ticket-key}`; returns comments, check results, conflict status (or `null` for fresh tickets) | +| `createFeatureBranch` | Only when there's no existing PR — creates/resets `blazebot/{ticket-key}` from the base branch | +| `fetchAttachments` | Downloads ticket attachments (size/count limited by `ATTACHMENT_*` env vars) | +| `ensureArthurTaskForTicket` | Optional — creates an Arthur trace task when `GENAI_ENGINE_*` is configured | +| `resolveAgentKindOverride` | Per-ticket override via labels (e.g. `agent:codex`); falls back to `AGENT_KIND` | +| `provisionSandbox` | Provisions a Vercel Sandbox, installs the agent CLI + skills, configures auth + Arthur tracer | +| `registerTicketSandbox` | Pins the sandbox id to the ticket in Redis so cleanup paths can stop it by id | +| `writeAttachments` | Writes downloaded attachments under `/tmp/attachments/` inside the sandbox | +| **Phase 1 — Research/Plan** | `setCommitGuardStep(false)` → `planPhaseStep("research")` → `writeAndStartPhase` → `pollUntilDone` (20 min) → `collectPhase` → `parseResearchStep`. Result is `completed`, `clarification_needed`, or `failed` | +| **Phase 2 — Implementation** | `setCommitGuardStep(true)` → `planPhaseStep("impl", AGENT_SCHEMA)` → `writeAndStartPhase` → `pollUntilDone` (35 min) → `collectPhase` → `parseAgentOutputStep` | +| `pushFromSandbox` | Injects the VCS token into the sandbox's git remote (after the agent process is dead) and runs `git push --force` from inside the sandbox | +| `fixAndRetryPush` | Fallback: if the push is rejected (e.g. pre-receive hook), spawns a lightweight fix agent in the same sandbox, then retries the push once | +| `createPullRequest` / `findPRForBranch` | Opens a new PR (no prior PR) or re-fetches the existing PR (review-fix path) | +| `moveTicket` → `notifyTicket("pr_ready")` | Moves the ticket to "AI Review" and sends the Slack notification with the usage report | +| `unregisterRun` | Removes the ticket from the Redis run registry | +| `teardownSandbox` | Always runs in `finally` — destroys the sandbox regardless of outcome | + +If either phase returns `clarification_needed`, the workflow posts numbered questions as a Jira comment, moves the ticket to Backlog, and emits a `needs_clarification` Slack event. If a phase fails or times out, the ticket is moved to Backlog with a `failed` event. + +> A third "Review" phase exists as commented-out scaffolding in `agent.ts`. It's intentionally disabled today. ### Sandbox Lifecycle @@ -399,26 +398,34 @@ Each agent run gets a fresh, isolated [Vercel Sandbox](https://vercel.com/docs/s | Input | How it's provided | |-------|-------------------| -| Repository source code | Cloned via `git` source at the feature branch (shallow clone, `depth=1`) | -| `ANTHROPIC_API_KEY` | Injected as an environment variable | -| `CLAUDE_MODEL` | Injected as an environment variable | -| `requirements.md` | Written to the sandbox root via `sandbox.writeFiles()` — contains ticket title, description, acceptance criteria, comments, and the agent prompt | -| Git identity | Configured inside the sandbox (`git config user.name` / `user.email`) | -| Claude Code | Installed globally via `npm i -g @anthropic-ai/claude-code` | -| Skills | Installed globally to `~/.claude/skills/` (not in the repo) — includes `using-superpowers`, `requesting-code-review`, and `frontend-design` | +| Repository source code | Cloned via `git` source at the feature branch (shallow `depth=1`); unshallowed before push if needed | +| Auth env vars | `ANTHROPIC_API_KEY` / `CLAUDE_CODE_OAUTH_TOKEN` (Claude) or `CODEX_API_KEY` / `CODEX_CHATGPT_OAUTH_TOKEN` (Codex) — written to `/tmp/agent-env.sh` (mode 0600) and sourced by each phase script | +| Model | `CLAUDE_MODEL` or `CODEX_MODEL` baked into the phase wrapper script | +| Per-phase input | `/tmp/research-requirements.md` and `/tmp/impl-requirements.md` — assembled by `assembleResearchPlanContext` / `assembleImplementationContext` | +| Attachments | Written to `/tmp/attachments/` | +| Git identity | `git config user.name` / `user.email` from `COMMIT_AUTHOR` / `COMMIT_EMAIL` | +| Agent CLI | `@anthropic-ai/claude-code` (Claude) or `@openai/codex` (Codex), installed globally | +| Skills | Installed via `npx skills add ... -g --agent claude-code codex --copy` to **both** `~/.claude/skills/` and `~/.agents/skills/`. Currently only [`frontend-design`](https://github.com/anthropics/skills) is in `GLOBAL_SKILLS` | +| Arthur tracer (optional) | Python tracer + `~/.claude/arthur_config.json` + hook entries in `~/.claude/settings.json` | The sandbox runs on **Node.js 24** with a configurable timeout (`JOB_TIMEOUT_MS`, default 30 minutes). On Vercel, OIDC authenticates the sandbox automatically. For local dev, explicit `VERCEL_TOKEN` / `VERCEL_TEAM_ID` / `VERCEL_PROJECT_ID` are needed. #### How the agent runs -Claude Code is invoked inside the sandbox with: -- `--dangerously-skip-permissions` — safe because the sandbox is fully isolated -- `--output-format json` — enforces structured output -- `--json-schema '{...}'` — the agent must return output matching the schema below +Each phase has its own wrapper script (`/tmp/{phase}-wrapper.sh`) that sources `/tmp/agent-env.sh` and pipes the phase input into the agent CLI: + +- **Claude** (`buildPhaseScript` in [`src/sandbox/agents/claude.ts`](./src/sandbox/agents/claude.ts)): + ``` + cat /tmp/{phase}-requirements.md | claude \ + --print --model '' --dangerously-skip-permissions --output-format json \ + [--json-schema ''] \ + > /tmp/{phase}-stdout.txt 2>/tmp/{phase}-stderr.txt + ``` +- **Codex** (`buildPhaseScript` in [`src/sandbox/agents/codex.ts`](./src/sandbox/agents/codex.ts)) uses `codex exec --model … --dangerously-bypass-approvals-and-sandbox --skip-git-repo-check --json` with `--output-schema` for structured output. -The agent reads `requirements.md` via stdin and implements the feature autonomously. It has access to the full repository, can run tests, install dependencies, and make commits. +The script ends by writing a sentinel file (`/tmp/{phase}-done`). The workflow polls every 30 seconds via `checkPhaseDone` and suspends between polls — durable across redeploys. -The agent must return structured output conforming to: +The implementation phase enforces the structured contract: ```json { @@ -429,34 +436,27 @@ The agent must return structured output conforming to: } ``` -#### How commits are extracted - -The agent commits inside the sandbox via a **stop hook** that blocks exit until all changes are committed. A **wrapper script** runs the agent detached, cleans up artifacts (`.claude/`, `requirements.md`), and writes a sentinel file (`/tmp/agent-done`) on completion. The workflow polls for this sentinel every 30 seconds, then: - -1. Reads agent stdout/stderr from `/tmp/agent-stdout.txt` and `/tmp/agent-stderr.txt` -2. Diffs against the pre-agent SHA to find changed files (`git diff --name-only`) -3. Reads each modified file's content from the sandbox (excluding `requirements.md` and `.claude/`) -4. Returns the file list `Array<{ path, content }>` to the workflow +A **commit-guard stop hook** (toggled per phase via `setCommitGuardStep`) blocks the agent from exiting with uncommitted changes. Phase 1 has it disabled (research only — no commits expected); phase 2 enables it so the implementation phase can't return `result: "implemented"` while leaving the working tree dirty. -#### How changes get pushed to GitHub +#### How changes get pushed -Blazebot does **not** push from inside the sandbox. Instead, the extracted files are pushed via the **GitHub Git Data API** (Octokit), which builds a commit from the outside: +ai workflow pushes from **inside the sandbox**, but only after the agent process has exited. The flow in [`src/sandbox/poll-agent.ts`](./src/sandbox/poll-agent.ts): -1. **Create blobs** — each file's content is uploaded as a base64-encoded blob -2. **Create tree** — a new Git tree is assembled referencing all blobs, based on the branch's current tree -3. **Create commit** — a new commit object is created with the tree and the branch tip as parent -4. **Update ref** — the branch ref (`refs/heads/blazebot/{ticket-key}`) is fast-forwarded to the new commit +1. **Verify commits exist** — compare the saved `/tmp/.pre-agent-sha` to the current `HEAD`. If unchanged, the workflow fails the run with "Agent reported success but made no commits." +2. **Inject the token** — `git remote set-url origin `. The agent process is already dead at this point and never sees the token. +3. **Unshallow if needed** — shallow clones miss shared ancestry with `main`, which breaks PR creation. +4. **Push** — `git push --force origin HEAD:refs/heads/{branch}` (force-push is safe; `blazebot/*` branches have no concurrent pushers). -This approach avoids giving the sandbox push credentials to the target repository. +If the push is rejected (e.g. by a remote pre-receive hook), `fixAndRetryPush` strips the token, spawns a smaller fix agent in the same sandbox with the push error as context, lets it commit fixes, then re-injects the token and retries the push once. #### How PRs are created -After pushing, the workflow calls `octokit.pulls.create()` to open a PR: -- **Head**: the feature branch (`blazebot/{ticket-key}`) -- **Base**: the configured base branch (default `main`) -- **Title and body**: generated from the ticket title and the agent's summary +For fresh tickets, the workflow opens a PR via the VCS adapter (`octokit.pulls.create()` for GitHub, `@gitbeaker/rest` for GitLab): +- **Head**: `blazebot/{ticket-key}` +- **Base**: `GITHUB_BASE_BRANCH` / `GITLAB_BASE_BRANCH` (default `main`) +- **Title**: the ticket title -For the review-fix workflow, no new PR is created — the existing PR is updated by pushing to the same branch. +For tickets that already had a PR (the review-fix path), no new PR is created — the existing PR is updated by the force-push and re-fetched via `findPRForBranch`. #### Teardown @@ -464,15 +464,17 @@ The sandbox is **always destroyed** after each run (in a `finally` block), wheth ### Run Registry and Reconciliation -Blazebot uses an **atomic claim pattern** via Upstash Redis to prevent duplicate runs: +ai workflow uses an **atomic claim pattern** via Upstash Redis to prevent duplicate runs: - When a ticket is dispatched, a `claiming:{timestamp}` sentinel is set atomically (`hsetnx`) - Only one poller instance can win the claim — others see it's taken -- After the workflow starts, the sentinel is replaced with the real workflow run ID -- On every poll cycle, the **reconciler** cleans up: - - Stale claims older than 5 minutes - - Finished runs still tracked in the registry - - Orphaned runs for tickets that left the AI column (cancels the workflow) +- After the workflow starts, the sentinel is replaced with the real workflow run ID and the sandbox id is pinned to the ticket +- On every poll cycle, the **reconciler** ([`src/lib/reconcile.ts`](./src/lib/reconcile.ts)) cleans up: + - Stale claims older than 5 minutes (kills any orphaned sandbox + clears the sentinel) + - Finished runs still tracked in the registry (status `completed` / `failed` / `cancelled`) + - Orphaned runs for tickets that left the AI column — cancels the workflow and stops the sandbox + - Stale failed-ticket markers (cleared once the ticket leaves the AI column) + - A 30-second grace window guards against Jira's JQL index lag during column transitions ## License From c6c6853d21e99804fbe8c8eb84e842a69e2b9bda Mon Sep 17 00:00:00 2001 From: kasin-it Date: Wed, 6 May 2026 12:20:56 +0200 Subject: [PATCH 4/4] feat: clean the stale docs --- README.md | 2 +- ...26-04-01-on-prem-aws.md => ON-PREM-AWS.md} | 0 ...ty-design.md => SECURUTY-OBSERVABILITY.md} | 0 .../plans/2026-03-20-initial-scaffold.md | 2582 ---------------- .../plans/2026-03-20-redis-dedup.md | 488 ---- .../plans/2026-03-23-agent-session-memory.md | 445 --- .../superpowers/plans/2026-03-26-e2e-tests.md | 1515 ---------- .../2026-04-01-sandbox-polling-suspension.md | 889 ------ .../2026-04-06-three-phase-agent-workflow.md | 1942 ------------ .../plans/2026-04-09-gitlab-vcs-adapter.md | 1016 ------- .../2026-04-13-jira-ticket-attachments.md | 1841 ------------ .../2026-04-21-arthur-tracer-in-sandbox.md | 773 ----- .../plans/2026-04-22-arthur-hosted-prompts.md | 626 ---- .../plans/2026-04-27-codex-integration.md | 2600 ----------------- .../2026-04-30-slack-threaded-messages.md | 1405 --------- .../plans/2026-05-01-slack-slash-commands.md | 141 - .../2026-03-23-agent-session-memory-design.md | 105 - .../specs/2026-03-26-e2e-tests-design.md | 259 -- ...26-04-02-failed-ticket-safeguard-design.md | 147 - ...eview-fix-cicd-and-line-comments-design.md | 170 -- .../specs/2026-04-02-sandbox-push-design.md | 313 -- .../2026-04-06-three-phase-workflow-design.md | 504 ---- .../2026-04-09-gitlab-vcs-adapter-design.md | 195 -- ...26-04-13-jira-ticket-attachments-design.md | 232 -- .../2026-04-27-codex-integration-design.md | 423 --- .../2026-04-27-setup-onboarding-research.md | 169 -- ...26-04-30-slack-threaded-messages-design.md | 287 -- 27 files changed, 1 insertion(+), 19068 deletions(-) rename docs/{superpowers/specs/2026-04-01-on-prem-aws.md => ON-PREM-AWS.md} (100%) rename docs/{superpowers/specs/2026-04-09-security-observability-design.md => SECURUTY-OBSERVABILITY.md} (100%) delete mode 100644 docs/superpowers/plans/2026-03-20-initial-scaffold.md delete mode 100644 docs/superpowers/plans/2026-03-20-redis-dedup.md delete mode 100644 docs/superpowers/plans/2026-03-23-agent-session-memory.md delete mode 100644 docs/superpowers/plans/2026-03-26-e2e-tests.md delete mode 100644 docs/superpowers/plans/2026-04-01-sandbox-polling-suspension.md delete mode 100644 docs/superpowers/plans/2026-04-06-three-phase-agent-workflow.md delete mode 100644 docs/superpowers/plans/2026-04-09-gitlab-vcs-adapter.md delete mode 100644 docs/superpowers/plans/2026-04-13-jira-ticket-attachments.md delete mode 100644 docs/superpowers/plans/2026-04-21-arthur-tracer-in-sandbox.md delete mode 100644 docs/superpowers/plans/2026-04-22-arthur-hosted-prompts.md delete mode 100644 docs/superpowers/plans/2026-04-27-codex-integration.md delete mode 100644 docs/superpowers/plans/2026-04-30-slack-threaded-messages.md delete mode 100644 docs/superpowers/plans/2026-05-01-slack-slash-commands.md delete mode 100644 docs/superpowers/specs/2026-03-23-agent-session-memory-design.md delete mode 100644 docs/superpowers/specs/2026-03-26-e2e-tests-design.md delete mode 100644 docs/superpowers/specs/2026-04-02-failed-ticket-safeguard-design.md delete mode 100644 docs/superpowers/specs/2026-04-02-review-fix-cicd-and-line-comments-design.md delete mode 100644 docs/superpowers/specs/2026-04-02-sandbox-push-design.md delete mode 100644 docs/superpowers/specs/2026-04-06-three-phase-workflow-design.md delete mode 100644 docs/superpowers/specs/2026-04-09-gitlab-vcs-adapter-design.md delete mode 100644 docs/superpowers/specs/2026-04-13-jira-ticket-attachments-design.md delete mode 100644 docs/superpowers/specs/2026-04-27-codex-integration-design.md delete mode 100644 docs/superpowers/specs/2026-04-27-setup-onboarding-research.md delete mode 100644 docs/superpowers/specs/2026-04-30-slack-threaded-messages-design.md diff --git a/README.md b/README.md index cf545a0..c805177 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ A workflow-driven AI coding automation service that turns Jira tickets into merge-ready pull requests. ai workflow polls your issue tracker for tickets assigned to AI, implements features end-to-end inside isolated [Vercel Sandboxes](https://vercel.com/docs/sandbox), and delivers PRs for human approval — no manual intervention required. -Designed for **self-hosting**: bring your own API keys (Jira, GitHub, Slack, Anthropic) and run on your own Vercel infrastructure. +Designed to work with **Vercel infrastructure**: bring your own API keys (Jira, GitHub, Slack, Anthropic) and deploy onto Vercel — Functions for the HTTP server, Workflows for durable orchestration, and Sandboxes for isolated agent execution. ## How It Works diff --git a/docs/superpowers/specs/2026-04-01-on-prem-aws.md b/docs/ON-PREM-AWS.md similarity index 100% rename from docs/superpowers/specs/2026-04-01-on-prem-aws.md rename to docs/ON-PREM-AWS.md diff --git a/docs/superpowers/specs/2026-04-09-security-observability-design.md b/docs/SECURUTY-OBSERVABILITY.md similarity index 100% rename from docs/superpowers/specs/2026-04-09-security-observability-design.md rename to docs/SECURUTY-OBSERVABILITY.md diff --git a/docs/superpowers/plans/2026-03-20-initial-scaffold.md b/docs/superpowers/plans/2026-03-20-initial-scaffold.md deleted file mode 100644 index 43fc165..0000000 --- a/docs/superpowers/plans/2026-03-20-initial-scaffold.md +++ /dev/null @@ -1,2582 +0,0 @@ -# Blazebot MVP — Initial Scaffold Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Build the complete Blazebot MVP — a polling-driven, Vercel Workflow-orchestrated service that discovers Jira tickets, runs Claude Code in Vercel Sandboxes, and delivers merge-ready PRs. - -**Architecture:** Nitro serverless app on Vercel. Vercel Cron polls Jira for tickets in the AI column. Each ticket dispatches a durable Vercel Workflow that provisions a Vercel Sandbox, runs Claude Code with a structured prompt, and handles the result (PR creation, clarification, or retry). No database — workflow state lives in Vercel Workflows. Messaging via Chat SDK. - -**Tech Stack:** Nitro (Vercel preset), Workflow DevKit (`workflow` + `workflow/nitro`), `@vercel/sandbox`, `@octokit/rest`, Chat SDK (`chat` + `@chat-adapter/slack`), `pino`, `zod`, `vitest` - ---- - -## File Structure - -``` -ai-workflow/ -├── src/ -│ ├── routes/ -│ │ ├── cron/ -│ │ │ └── poll.get.ts # Vercel Cron → poll tracker, start workflows -│ │ └── health.get.ts # Health check -│ ├── plugins/ -│ │ └── workflow-world.ts # Boot workflow runtime on server start -│ ├── workflows/ -│ │ ├── implementation.ts # "use workflow" — implementation flow -│ │ └── review-fix.ts # "use workflow" — fixing feedback flow -│ ├── adapters/ -│ │ ├── issue-tracker/ -│ │ │ ├── types.ts # IssueTrackerAdapter interface -│ │ │ ├── jira.ts # Jira REST API implementation -│ │ │ └── jira.test.ts -│ │ ├── vcs/ -│ │ │ ├── types.ts # VCSAdapter interface -│ │ │ ├── github.ts # GitHub via @octokit/rest -│ │ │ └── github.test.ts -│ │ └── messaging/ -│ │ ├── types.ts # MessagingAdapter interface -│ │ ├── chatsdk.ts # Chat SDK wrapper -│ │ └── chatsdk.test.ts -│ ├── sandbox/ -│ │ ├── manager.ts # Sandbox lifecycle (provision, end hook, teardown) -│ │ ├── manager.test.ts -│ │ ├── agent-runner.ts # Launch Claude Code, parse structured output -│ │ ├── agent-runner.test.ts -│ │ ├── context.ts # Assemble requirements.md -│ │ └── context.test.ts -│ └── lib/ -│ ├── env.ts # Zod-validated env config -│ ├── env.test.ts -│ ├── logger.ts # Pino structured JSON logger -│ └── adapters.ts # Adapter factory (instantiate from env config) -├── .blazebot/ -│ └── prompts/ -│ ├── implement.md # Implementation prompt -│ └── review-fix.md # Review fix prompt -├── nitro.config.ts -├── vitest.config.ts -├── vercel.json -├── package.json -├── tsconfig.json -├── .env.example -└── .gitignore -``` - -**Key design decisions:** -- Flat structure (no monorepo) — monorepo is deferred per spec. -- Tests co-located with source files. -- Prompts live in `.blazebot/prompts/` per spec Section 5. -- Adapters use raw `fetch` for Jira (no `jira.js` — keeps it simple and matches prior codebase). -- Workflows use `"use workflow"` for orchestration, `"use step"` for all real work (steps have full Node.js access, can do I/O). -- Deduplication via deterministic `id` in `start()` — e.g., `id: "ticket-${ticketId}"` prevents concurrent duplicate runs without a database. -- **No `git push` from inside sandbox** — per spec Section 15.2, the orchestrator pushes via VCS adapter from outside the sandbox. The sandbox manager extracts changes via `readFileToBuffer` + git diff. -- **Concurrency enforced in poller** — checks active sandbox count via `Sandbox.list()` before dispatching. - ---- - -### Task 1: Project Scaffold - -**Files:** -- Create: `package.json` -- Create: `tsconfig.json` -- Create: `nitro.config.ts` -- Create: `vitest.config.ts` -- Create: `vercel.json` -- Create: `.env.example` -- Create: `.gitignore` - -- [ ] **Step 1: Create `package.json`** - -```json -{ - "name": "ai-workflow", - "private": true, - "type": "module", - "scripts": { - "dev": "nitro dev", - "build": "nitro build", - "preview": "nitro preview", - "test": "vitest run", - "test:watch": "vitest", - "typecheck": "tsc --noEmit" - }, - "dependencies": { - "nitropack": "^2", - "h3": "^1", - "workflow": "latest", - "@workflow/world-postgres": "latest", - "@vercel/sandbox": "^1.8.1", - "@octokit/rest": "^22.0.1", - "chat": "^4.20.2", - "@chat-adapter/slack": "^4.20.2", - "pino": "^10.3.1", - "zod": "^3.25.76" - }, - "devDependencies": { - "typescript": "^5.8", - "vitest": "^3", - "@workflow/vitest": "latest" - } -} -``` - -- [ ] **Step 2: Create `tsconfig.json`** - -```json -{ - "compilerOptions": { - "target": "ES2022", - "module": "ESNext", - "moduleResolution": "Bundler", - "strict": true, - "esModuleInterop": true, - "skipLibCheck": true, - "outDir": "dist", - "rootDir": ".", - "types": ["vitest/globals"] - }, - "include": ["src/**/*.ts", "nitro.config.ts", "vitest.config.ts"], - "exclude": ["node_modules", "dist", ".output", ".nitro"] -} -``` - -- [ ] **Step 3: Create `nitro.config.ts`** - -```ts -import { defineNitroConfig } from "nitropack/config"; - -export default defineNitroConfig({ - preset: "vercel", - modules: ["workflow/nitro"], - compatibilityDate: "2025-01-01", - srcDir: "src", -}); -``` - -- [ ] **Step 4: Create `vitest.config.ts`** - -```ts -import { defineConfig } from "vitest/config"; - -export default defineConfig({ - test: { - globals: true, - environment: "node", - include: ["src/**/*.test.ts"], - }, -}); -``` - -- [ ] **Step 5: Create `vercel.json`** - -```json -{ - "crons": [ - { - "path": "/cron/poll", - "schedule": "*/5 * * * *" - } - ] -} -``` - -- [ ] **Step 6: Create `.env.example`** - -```bash -# Issue Tracker (Jira) -ISSUE_TRACKER_KIND=jira -JIRA_BASE_URL=https://your-domain.atlassian.net -JIRA_EMAIL=your-email@example.com -JIRA_API_TOKEN=your-jira-api-token -JIRA_PROJECT_KEY=PROJ -COLUMN_AI=AI -COLUMN_AI_REVIEW=AI Review -COLUMN_BACKLOG=Backlog - -# VCS (GitHub) -VCS_KIND=github -GITHUB_TOKEN=ghp_xxxxxxxxxxxx -GITHUB_OWNER=your-org -GITHUB_REPO=your-repo -GITHUB_BASE_BRANCH=main - -# Messaging (Chat SDK) -CHAT_SDK_SLACK_TOKEN=xoxb-xxxxxxxxxxxx -CHAT_SDK_CHANNEL_ID=C0123456789 -CHAT_SDK_BOT_NAME=blazebot - -# Agent -ANTHROPIC_API_KEY=sk-ant-xxxxxxxxxxxx -CLAUDE_MODEL=claude-opus-4-6 -COMMIT_AUTHOR=ai-workflow-blazity -COMMIT_EMAIL=ai-workflow@blazity.com - -# Sandbox -MAX_CONCURRENT_AGENTS=3 -JOB_TIMEOUT_MS=1800000 - -# Polling -POLL_INTERVAL_MS=300000 - -# Vercel (for local dev — automatic on Vercel via OIDC) -VERCEL_TOKEN= -VERCEL_TEAM_ID= -VERCEL_PROJECT_ID= - -# Cron auth -CRON_SECRET= - -# Workflow (local dev only) -WORKFLOW_POSTGRES_URL=postgresql://localhost:5432/ai_workflow -``` - -- [ ] **Step 7: Create `.gitignore`** - -``` -node_modules/ -dist/ -.output/ -.nitro/ -.env -*.local -``` - -- [ ] **Step 8: Install dependencies** - -Run: `pnpm install` - -- [ ] **Step 9: Verify scaffold builds** - -Run: `npx nitro build` -Expected: Build completes with no errors. - -- [ ] **Step 10: Commit** - -```bash -git add package.json tsconfig.json nitro.config.ts vitest.config.ts vercel.json .env.example .gitignore pnpm-lock.yaml -git commit -m "chore: scaffold project with Nitro + Workflow DevKit" -``` - ---- - -### Task 2: Environment Validation - -**Files:** -- Create: `src/lib/env.ts` -- Create: `src/lib/env.test.ts` - -- [ ] **Step 1: Write failing test for env validation** - -```ts -// src/lib/env.test.ts -import { describe, it, expect, vi, beforeEach, afterEach } from "vitest"; - -describe("env", () => { - const VALID_ENV = { - ISSUE_TRACKER_KIND: "jira", - JIRA_BASE_URL: "https://test.atlassian.net", - JIRA_EMAIL: "test@example.com", - JIRA_API_TOKEN: "token", - JIRA_PROJECT_KEY: "PROJ", - COLUMN_AI: "AI", - COLUMN_AI_REVIEW: "AI Review", - COLUMN_BACKLOG: "Backlog", - VCS_KIND: "github", - GITHUB_TOKEN: "ghp_test", - GITHUB_OWNER: "test-org", - GITHUB_REPO: "test-repo", - GITHUB_BASE_BRANCH: "main", - CHAT_SDK_SLACK_TOKEN: "xoxb-test", - CHAT_SDK_CHANNEL_ID: "C123", - CHAT_SDK_BOT_NAME: "blazebot", - ANTHROPIC_API_KEY: "sk-ant-test", - CLAUDE_MODEL: "claude-opus-4-6", - COMMIT_AUTHOR: "ai-workflow-blazity", - COMMIT_EMAIL: "bot@blazity.com", - MAX_CONCURRENT_AGENTS: "3", - JOB_TIMEOUT_MS: "1800000", - POLL_INTERVAL_MS: "300000", - }; - - let originalEnv: NodeJS.ProcessEnv; - - beforeEach(() => { - originalEnv = { ...process.env }; - vi.resetModules(); - }); - - afterEach(() => { - process.env = originalEnv; - }); - - it("parses valid env", async () => { - Object.assign(process.env, VALID_ENV); - const { parseEnv } = await import("./env.js"); - const env = parseEnv(); - expect(env.JIRA_BASE_URL).toBe("https://test.atlassian.net"); - expect(env.MAX_CONCURRENT_AGENTS).toBe(3); - expect(env.JOB_TIMEOUT_MS).toBe(1800000); - }); - - it("uses defaults for optional fields", async () => { - const partial = { ...VALID_ENV }; - delete (partial as any).COMMIT_AUTHOR; - delete (partial as any).MAX_CONCURRENT_AGENTS; - Object.assign(process.env, partial); - const { parseEnv } = await import("./env.js"); - const env = parseEnv(); - expect(env.COMMIT_AUTHOR).toBe("ai-workflow-blazity"); - expect(env.MAX_CONCURRENT_AGENTS).toBe(3); - }); - - it("throws on missing required field", async () => { - const partial = { ...VALID_ENV }; - delete (partial as any).ANTHROPIC_API_KEY; - Object.assign(process.env, partial); - const { parseEnv } = await import("./env.js"); - expect(() => parseEnv()).toThrow(); - }); -}); -``` - -- [ ] **Step 2: Run test to verify it fails** - -Run: `pnpm test src/lib/env.test.ts` -Expected: FAIL — `parseEnv` not found. - -- [ ] **Step 3: Implement `env.ts`** - -```ts -// src/lib/env.ts -import { z } from "zod"; - -const envSchema = z.object({ - // Issue Tracker - ISSUE_TRACKER_KIND: z.enum(["jira"]), - JIRA_BASE_URL: z.string().url(), - JIRA_EMAIL: z.string().email(), - JIRA_API_TOKEN: z.string().min(1), - JIRA_PROJECT_KEY: z.string().min(1), - COLUMN_AI: z.string().min(1), - COLUMN_AI_REVIEW: z.string().min(1), - COLUMN_BACKLOG: z.string().min(1), - - // VCS - VCS_KIND: z.enum(["github"]), - GITHUB_TOKEN: z.string().min(1), - GITHUB_OWNER: z.string().min(1), - GITHUB_REPO: z.string().min(1), - GITHUB_BASE_BRANCH: z.string().default("main"), - - // Messaging - CHAT_SDK_SLACK_TOKEN: z.string().min(1), - CHAT_SDK_CHANNEL_ID: z.string().min(1), - CHAT_SDK_BOT_NAME: z.string().default("blazebot"), - - // Agent - ANTHROPIC_API_KEY: z.string().min(1), - CLAUDE_MODEL: z.string().default("claude-opus-4-6"), - COMMIT_AUTHOR: z.string().default("ai-workflow-blazity"), - COMMIT_EMAIL: z.string().default("ai-workflow@blazity.com"), - - // Sandbox - MAX_CONCURRENT_AGENTS: z.coerce.number().int().positive().default(3), - JOB_TIMEOUT_MS: z.coerce.number().int().positive().default(1_800_000), - - // Polling - POLL_INTERVAL_MS: z.coerce.number().int().positive().default(300_000), - - // Vercel (optional — auto via OIDC on Vercel) - VERCEL_TOKEN: z.string().optional(), - VERCEL_TEAM_ID: z.string().optional(), - VERCEL_PROJECT_ID: z.string().optional(), - - // Cron - CRON_SECRET: z.string().optional(), -}); - -export type Env = z.infer; - -let cached: Env | null = null; - -export function parseEnv(): Env { - if (cached) return cached; - cached = envSchema.parse(process.env); - return cached; -} -``` - -- [ ] **Step 4: Run test to verify it passes** - -Run: `pnpm test src/lib/env.test.ts` -Expected: PASS (all 3 tests). - -- [ ] **Step 5: Commit** - -```bash -git add src/lib/env.ts src/lib/env.test.ts -git commit -m "feat: add zod-validated environment config" -``` - ---- - -### Task 3: Structured Logger - -**Files:** -- Create: `src/lib/logger.ts` - -- [ ] **Step 1: Create logger** - -```ts -// src/lib/logger.ts -import pino from "pino"; - -export const logger = pino({ - level: process.env.LOG_LEVEL ?? "info", - formatters: { - level(label) { - return { level: label }; - }, - }, - timestamp: pino.stdTimeFunctions.isoTime, -}); - -export function ticketLogger(ticketId: string, identifier: string) { - return logger.child({ ticket_id: ticketId, ticket_identifier: identifier }); -} - -export function workflowLogger( - ticketId: string, - identifier: string, - workflowRunId: string, -) { - return logger.child({ - ticket_id: ticketId, - ticket_identifier: identifier, - workflow_run_id: workflowRunId, - }); -} -``` - -- [ ] **Step 2: Commit** - -```bash -git add src/lib/logger.ts -git commit -m "feat: add pino structured logger with ticket context" -``` - ---- - -### Task 4: Adapter Interfaces - -**Files:** -- Create: `src/adapters/issue-tracker/types.ts` -- Create: `src/adapters/vcs/types.ts` -- Create: `src/adapters/messaging/types.ts` - -- [ ] **Step 1: Create issue tracker interface** - -```ts -// src/adapters/issue-tracker/types.ts -export interface TicketContent { - id: string; - identifier: string; - title: string; - description: string; - acceptanceCriteria: string; - comments: TicketComment[]; - labels: string[]; - trackerStatus: string; -} - -export interface TicketComment { - author: string; - body: string; - createdAt: string; -} - -export interface IssueTrackerAdapter { - fetchTicket(id: string): Promise; - moveTicket(id: string, column: string): Promise; - postComment(id: string, comment: string): Promise; - searchTickets(query: string): Promise; -} -``` - -- [ ] **Step 2: Create VCS interface** - -```ts -// src/adapters/vcs/types.ts -export interface PullRequest { - id: number; - url: string; - branch: string; -} - -export interface PRComment { - author: string; - body: string; - liked: boolean; -} - -export interface VCSAdapter { - createBranch(name: string, base: string): Promise; - createPR(branch: string, title: string, body: string): Promise; - push(branch: string, files: Array<{ path: string; content: string }>): Promise; - getPRComments(prId: number): Promise; - getPRConflictStatus(prId: number): Promise; - findPR(branch: string): Promise; -} -``` - -- [ ] **Step 3: Create messaging interface** - -```ts -// src/adapters/messaging/types.ts -export interface MessagingAdapter { - notify(message: string): Promise; -} -``` - -- [ ] **Step 4: Commit** - -```bash -git add src/adapters/ -git commit -m "feat: define adapter interfaces for issue tracker, VCS, and messaging" -``` - ---- - -### Task 5: Jira Adapter - -**Files:** -- Create: `src/adapters/issue-tracker/jira.ts` -- Create: `src/adapters/issue-tracker/jira.test.ts` - -- [ ] **Step 1: Write failing tests** - -```ts -// src/adapters/issue-tracker/jira.test.ts -import { describe, it, expect, vi, beforeEach } from "vitest"; -import { JiraAdapter } from "./jira.js"; - -const mockFetch = vi.fn(); -global.fetch = mockFetch; - -function jiraAdapter() { - return new JiraAdapter({ - baseUrl: "https://test.atlassian.net", - email: "test@example.com", - apiToken: "token", - projectKey: "PROJ", - }); -} - -describe("JiraAdapter", () => { - beforeEach(() => { - mockFetch.mockReset(); - }); - - describe("fetchTicket", () => { - it("returns normalized ticket content", async () => { - mockFetch.mockResolvedValueOnce({ - ok: true, - json: async () => ({ - id: "10001", - key: "PROJ-1", - fields: { - summary: "Add login page", - description: { content: [{ content: [{ text: "Build a login page" }] }] }, - comment: { - comments: [ - { author: { displayName: "Alice" }, body: { content: [{ content: [{ text: "Use OAuth" }] }] }, created: "2026-03-20T10:00:00Z" }, - ], - }, - labels: ["frontend"], - status: { name: "AI" }, - }, - }), - }); - - const adapter = jiraAdapter(); - const ticket = await adapter.fetchTicket("10001"); - - expect(ticket.id).toBe("10001"); - expect(ticket.identifier).toBe("PROJ-1"); - expect(ticket.title).toBe("Add login page"); - expect(ticket.comments).toHaveLength(1); - expect(ticket.trackerStatus).toBe("AI"); - }); - }); - - describe("searchTickets", () => { - it("returns ticket keys matching JQL", async () => { - mockFetch.mockResolvedValueOnce({ - ok: true, - json: async () => ({ - issues: [{ key: "PROJ-1" }, { key: "PROJ-2" }], - }), - }); - - const adapter = jiraAdapter(); - const keys = await adapter.searchTickets('project = PROJ AND status = "AI"'); - expect(keys).toEqual(["PROJ-1", "PROJ-2"]); - }); - }); - - describe("moveTicket", () => { - it("fetches transitions then posts the matching one", async () => { - mockFetch - .mockResolvedValueOnce({ - ok: true, - json: async () => ({ - transitions: [ - { id: "31", name: "AI Review" }, - { id: "41", name: "Backlog" }, - ], - }), - }) - .mockResolvedValueOnce({ ok: true, json: async () => ({}) }); - - const adapter = jiraAdapter(); - await adapter.moveTicket("10001", "AI Review"); - - expect(mockFetch).toHaveBeenCalledTimes(2); - const transitionCall = mockFetch.mock.calls[1]; - expect(JSON.parse(transitionCall[1].body)).toEqual({ - transition: { id: "31" }, - }); - }); - }); - - describe("postComment", () => { - it("posts ADF-formatted comment", async () => { - mockFetch.mockResolvedValueOnce({ ok: true, json: async () => ({}) }); - - const adapter = jiraAdapter(); - await adapter.postComment("10001", "Need more details"); - - const call = mockFetch.mock.calls[0]; - const body = JSON.parse(call[1].body); - expect(body.body.type).toBe("doc"); - }); - }); -}); -``` - -- [ ] **Step 2: Run tests to verify they fail** - -Run: `pnpm test src/adapters/issue-tracker/jira.test.ts` -Expected: FAIL — `JiraAdapter` not found. - -- [ ] **Step 3: Implement Jira adapter** - -```ts -// src/adapters/issue-tracker/jira.ts -import type { IssueTrackerAdapter, TicketContent, TicketComment } from "./types.js"; - -export interface JiraConfig { - baseUrl: string; - email: string; - apiToken: string; - projectKey: string; -} - -export class JiraAdapter implements IssueTrackerAdapter { - private baseUrl: string; - private authHeader: string; - - constructor(private config: JiraConfig) { - this.baseUrl = config.baseUrl.replace(/\/$/, ""); - this.authHeader = - "Basic " + - Buffer.from(`${config.email}:${config.apiToken}`).toString("base64"); - } - - private async request(path: string, options?: RequestInit) { - const res = await fetch(`${this.baseUrl}${path}`, { - ...options, - headers: { - Authorization: this.authHeader, - "Content-Type": "application/json", - ...options?.headers, - }, - }); - if (!res.ok) { - throw new Error(`Jira API error: ${res.status} ${res.statusText} on ${path}`); - } - return res.json(); - } - - async fetchTicket(id: string): Promise { - const data = await this.request( - `/rest/api/3/issue/${id}?fields=summary,description,comment,labels,status`, - ); - return { - id: data.id, - identifier: data.key, - title: data.fields.summary ?? "", - description: extractAdfText(data.fields.description), - acceptanceCriteria: extractAcceptanceCriteria(data.fields.description), - comments: (data.fields.comment?.comments ?? []).map( - (c: any): TicketComment => ({ - author: c.author?.displayName ?? "unknown", - body: extractAdfText(c.body), - createdAt: c.created, - }), - ), - labels: data.fields.labels ?? [], - trackerStatus: data.fields.status?.name ?? "", - }; - } - - async moveTicket(id: string, column: string): Promise { - const data = await this.request(`/rest/api/3/issue/${id}/transitions`); - const transition = data.transitions.find( - (t: any) => t.name.toLowerCase() === column.toLowerCase(), - ); - if (!transition) { - throw new Error( - `No transition to "${column}" found for issue ${id}. Available: ${data.transitions.map((t: any) => t.name).join(", ")}`, - ); - } - await this.request(`/rest/api/3/issue/${id}/transitions`, { - method: "POST", - body: JSON.stringify({ transition: { id: transition.id } }), - }); - } - - async postComment(id: string, comment: string): Promise { - await this.request(`/rest/api/3/issue/${id}/comment`, { - method: "POST", - body: JSON.stringify({ - body: { - type: "doc", - version: 1, - content: [ - { - type: "paragraph", - content: [{ type: "text", text: comment }], - }, - ], - }, - }), - }); - } - - async searchTickets(jql: string): Promise { - const data = await this.request( - `/rest/api/3/search/jql?jql=${encodeURIComponent(jql)}&fields=key&maxResults=50`, - ); - return (data.issues ?? []).map((issue: any) => issue.key); - } -} - -function extractAdfText(adf: any): string { - if (!adf) return ""; - if (typeof adf === "string") return adf; - if (adf.text) return adf.text; - if (adf.content) { - return adf.content.map(extractAdfText).join("\n"); - } - return ""; -} - -function extractAcceptanceCriteria(description: any): string { - const text = extractAdfText(description); - const match = text.match(/acceptance criteria[:\s]*([\s\S]*?)(?:\n\n|\n#|$)/i); - return match?.[1]?.trim() ?? ""; -} -``` - -- [ ] **Step 4: Run tests to verify they pass** - -Run: `pnpm test src/adapters/issue-tracker/jira.test.ts` -Expected: PASS. - -- [ ] **Step 5: Commit** - -```bash -git add src/adapters/issue-tracker/jira.ts src/adapters/issue-tracker/jira.test.ts -git commit -m "feat: implement Jira adapter with fetch, move, comment, search" -``` - ---- - -### Task 6: GitHub Adapter - -**Files:** -- Create: `src/adapters/vcs/github.ts` -- Create: `src/adapters/vcs/github.test.ts` - -- [ ] **Step 1: Write failing tests** - -```ts -// src/adapters/vcs/github.test.ts -import { describe, it, expect, vi, beforeEach } from "vitest"; -import { GitHubAdapter } from "./github.js"; - -const mockOctokit = { - git: { - getRef: vi.fn(), - createRef: vi.fn(), - }, - repos: { - createOrUpdateFileContents: vi.fn(), - }, - pulls: { - create: vi.fn(), - list: vi.fn(), - get: vi.fn(), - }, - issues: { - listComments: vi.fn(), - }, -}; - -vi.mock("@octokit/rest", () => ({ - Octokit: vi.fn(() => mockOctokit), -})); - -function ghAdapter() { - return new GitHubAdapter({ - token: "ghp_test", - owner: "test-org", - repo: "test-repo", - baseBranch: "main", - }); -} - -describe("GitHubAdapter", () => { - beforeEach(() => { - vi.clearAllMocks(); - }); - - describe("createBranch", () => { - it("creates branch from base ref", async () => { - mockOctokit.git.getRef.mockResolvedValueOnce({ - data: { object: { sha: "abc123" } }, - }); - mockOctokit.git.createRef.mockResolvedValueOnce({ data: {} }); - - const adapter = ghAdapter(); - await adapter.createBranch("feat/test", "main"); - - expect(mockOctokit.git.createRef).toHaveBeenCalledWith({ - owner: "test-org", - repo: "test-repo", - ref: "refs/heads/feat/test", - sha: "abc123", - }); - }); - - it("seeds empty repo on 409 then creates branch", async () => { - const error = new Error("Git Repository is empty") as any; - error.status = 409; - mockOctokit.git.getRef.mockRejectedValueOnce(error); - mockOctokit.repos.createOrUpdateFileContents.mockResolvedValueOnce({ - data: { commit: { sha: "seed123" } }, - }); - mockOctokit.git.createRef.mockResolvedValueOnce({ data: {} }); - - const adapter = ghAdapter(); - await adapter.createBranch("feat/test", "main"); - - expect(mockOctokit.repos.createOrUpdateFileContents).toHaveBeenCalled(); - expect(mockOctokit.git.createRef).toHaveBeenCalledWith( - expect.objectContaining({ sha: "seed123" }), - ); - }); - }); - - describe("createPR", () => { - it("creates pull request", async () => { - mockOctokit.pulls.create.mockResolvedValueOnce({ - data: { number: 42, html_url: "https://github.com/test-org/test-repo/pull/42" }, - }); - - const adapter = ghAdapter(); - const pr = await adapter.createPR("feat/test", "Add feature", "Description"); - - expect(pr.id).toBe(42); - expect(pr.url).toContain("/pull/42"); - }); - }); - - describe("findPR", () => { - it("returns null when no PR exists", async () => { - mockOctokit.pulls.list.mockResolvedValueOnce({ data: [] }); - - const adapter = ghAdapter(); - const pr = await adapter.findPR("feat/test"); - expect(pr).toBeNull(); - }); - - it("returns PR when one exists", async () => { - mockOctokit.pulls.list.mockResolvedValueOnce({ - data: [{ number: 42, html_url: "https://github.com/test-org/test-repo/pull/42", head: { ref: "feat/test" } }], - }); - - const adapter = ghAdapter(); - const pr = await adapter.findPR("feat/test"); - expect(pr).not.toBeNull(); - expect(pr!.id).toBe(42); - }); - }); -}); -``` - -- [ ] **Step 2: Run tests to verify they fail** - -Run: `pnpm test src/adapters/vcs/github.test.ts` -Expected: FAIL — `GitHubAdapter` not found. - -- [ ] **Step 3: Implement GitHub adapter** - -```ts -// src/adapters/vcs/github.ts -import { Octokit } from "@octokit/rest"; -import type { VCSAdapter, PullRequest, PRComment } from "./types.js"; - -export interface GitHubConfig { - token: string; - owner: string; - repo: string; - baseBranch: string; -} - -export class GitHubAdapter implements VCSAdapter { - private octokit: Octokit; - - constructor(private config: GitHubConfig) { - this.octokit = new Octokit({ auth: config.token }); - } - - private get ownerRepo() { - return { owner: this.config.owner, repo: this.config.repo }; - } - - async createBranch(name: string, base: string): Promise { - let baseSha: string; - try { - const ref = await this.octokit.git.getRef({ - ...this.ownerRepo, - ref: `heads/${base}`, - }); - baseSha = ref.data.object.sha; - } catch (err: any) { - if (err.status === 409) { - baseSha = await this.seedEmptyRepo(); - } else { - throw err; - } - } - await this.octokit.git.createRef({ - ...this.ownerRepo, - ref: `refs/heads/${name}`, - sha: baseSha, - }); - } - - private async seedEmptyRepo(): Promise { - try { - const result = await this.octokit.repos.createOrUpdateFileContents({ - ...this.ownerRepo, - path: "README.md", - message: "Initial commit", - content: Buffer.from("# Repository\n").toString("base64"), - }); - return result.data.commit.sha!; - } catch (err: any) { - throw new Error( - `Failed to seed empty repository ${this.config.owner}/${this.config.repo}: ${err.message}`, - ); - } - } - - async createPR( - branch: string, - title: string, - body: string, - ): Promise { - const { data } = await this.octokit.pulls.create({ - ...this.ownerRepo, - head: branch, - base: this.config.baseBranch, - title, - body, - }); - return { id: data.number, url: data.html_url, branch }; - } - - async push(branch: string, files: Array<{ path: string; content: string }>): Promise { - // Get the latest commit SHA on the branch - const { data: refData } = await this.octokit.git.getRef({ - ...this.ownerRepo, - ref: `heads/${branch}`, - }); - const latestCommitSha = refData.object.sha; - - // Get the tree of the latest commit - const { data: commitData } = await this.octokit.git.getCommit({ - ...this.ownerRepo, - commit_sha: latestCommitSha, - }); - - // Create blobs for each file - const treeItems = await Promise.all( - files.map(async (file) => { - const { data: blob } = await this.octokit.git.createBlob({ - ...this.ownerRepo, - content: Buffer.from(file.content).toString("base64"), - encoding: "base64", - }); - return { - path: file.path, - mode: "100644" as const, - type: "blob" as const, - sha: blob.sha, - }; - }), - ); - - // Create new tree - const { data: tree } = await this.octokit.git.createTree({ - ...this.ownerRepo, - base_tree: commitData.tree.sha, - tree: treeItems, - }); - - // Create commit - const { data: newCommit } = await this.octokit.git.createCommit({ - ...this.ownerRepo, - message: "feat: agent implementation", - tree: tree.sha, - parents: [latestCommitSha], - }); - - // Update branch ref - await this.octokit.git.updateRef({ - ...this.ownerRepo, - ref: `heads/${branch}`, - sha: newCommit.sha, - }); - } - - async getPRComments(prId: number): Promise { - const { data: reviewComments } = - await this.octokit.pulls.listReviewComments({ - ...this.ownerRepo, - pull_number: prId, - }); - const { data: issueComments } = await this.octokit.issues.listComments({ - ...this.ownerRepo, - issue_number: prId, - }); - const comments: PRComment[] = [ - ...reviewComments.map((c) => ({ - author: c.user?.login ?? "unknown", - body: c.body ?? "", - liked: (c.reactions?.total_count ?? 0) > 0, - })), - ...issueComments.map((c) => ({ - author: c.user?.login ?? "unknown", - body: c.body ?? "", - liked: (c.reactions?.total_count ?? 0) > 0, - })), - ]; - return comments; - } - - async getPRConflictStatus(prId: number): Promise { - const { data } = await this.octokit.pulls.get({ - ...this.ownerRepo, - pull_number: prId, - }); - return data.mergeable === false; - } - - async findPR(branch: string): Promise { - const { data } = await this.octokit.pulls.list({ - ...this.ownerRepo, - head: `${this.config.owner}:${branch}`, - state: "open", - }); - if (data.length === 0) return null; - const pr = data[0]; - return { id: pr.number, url: pr.html_url, branch: pr.head.ref }; - } -} -``` - -- [ ] **Step 4: Run tests to verify they pass** - -Run: `pnpm test src/adapters/vcs/github.test.ts` -Expected: PASS. - -- [ ] **Step 5: Commit** - -```bash -git add src/adapters/vcs/github.ts src/adapters/vcs/github.test.ts -git commit -m "feat: implement GitHub VCS adapter with branch, PR, and empty repo handling" -``` - ---- - -### Task 7: Chat SDK Messaging Adapter - -**Files:** -- Create: `src/adapters/messaging/chatsdk.ts` -- Create: `src/adapters/messaging/chatsdk.test.ts` - -- [ ] **Step 1: Write failing test** - -```ts -// src/adapters/messaging/chatsdk.test.ts -import { describe, it, expect, vi, beforeEach } from "vitest"; -import { ChatSDKAdapter } from "./chatsdk.js"; - -const mockPost = vi.fn(); -const mockChannel = vi.fn(() => ({ post: mockPost })); - -vi.mock("chat", () => ({ - Chat: vi.fn(() => ({ channel: mockChannel })), -})); - -vi.mock("@chat-adapter/slack", () => ({ - createSlackAdapter: vi.fn(() => ({})), -})); - -describe("ChatSDKAdapter", () => { - beforeEach(() => { - vi.clearAllMocks(); - mockPost.mockResolvedValue({ id: "msg-1" }); - }); - - it("sends notification to configured channel", async () => { - const adapter = new ChatSDKAdapter({ - slackToken: "xoxb-test", - channelId: "C123", - botName: "blazebot", - }); - - await adapter.notify("PR ready for review"); - - expect(mockChannel).toHaveBeenCalledWith("slack:C123"); - expect(mockPost).toHaveBeenCalledWith("PR ready for review"); - }); - - it("does not throw on notification failure", async () => { - mockPost.mockRejectedValueOnce(new Error("Slack API down")); - - const adapter = new ChatSDKAdapter({ - slackToken: "xoxb-test", - channelId: "C123", - botName: "blazebot", - }); - - await expect(adapter.notify("test")).resolves.not.toThrow(); - }); -}); -``` - -- [ ] **Step 2: Run tests to verify they fail** - -Run: `pnpm test src/adapters/messaging/chatsdk.test.ts` -Expected: FAIL — `ChatSDKAdapter` not found. - -- [ ] **Step 3: Implement Chat SDK adapter** - -```ts -// src/adapters/messaging/chatsdk.ts -import { Chat } from "chat"; -import { createSlackAdapter } from "@chat-adapter/slack"; -import { logger } from "../../lib/logger.js"; -import type { MessagingAdapter } from "./types.js"; - -export interface ChatSDKConfig { - slackToken: string; - channelId: string; - botName: string; -} - -export class ChatSDKAdapter implements MessagingAdapter { - private chat: InstanceType; - private channelId: string; - - constructor(config: ChatSDKConfig) { - this.channelId = config.channelId; - this.chat = new Chat({ - userName: config.botName, - adapters: { - slack: createSlackAdapter({ token: config.slackToken }), - }, - }); - } - - async notify(message: string): Promise { - try { - const channel = this.chat.channel(`slack:${this.channelId}`); - await channel.post(message); - } catch (err) { - logger.warn( - { error: (err as Error).message }, - "notification_failed", - ); - } - } -} -``` - -- [ ] **Step 4: Run tests to verify they pass** - -Run: `pnpm test src/adapters/messaging/chatsdk.test.ts` -Expected: PASS. - -- [ ] **Step 5: Commit** - -```bash -git add src/adapters/messaging/chatsdk.ts src/adapters/messaging/chatsdk.test.ts -git commit -m "feat: implement Chat SDK messaging adapter for Slack/Teams" -``` - ---- - -### Task 8: Adapter Factory - -**Files:** -- Create: `src/lib/adapters.ts` - -- [ ] **Step 1: Create adapter factory** - -```ts -// src/lib/adapters.ts -import { parseEnv } from "./env.js"; -import { JiraAdapter } from "../adapters/issue-tracker/jira.js"; -import { GitHubAdapter } from "../adapters/vcs/github.js"; -import { ChatSDKAdapter } from "../adapters/messaging/chatsdk.js"; -import type { IssueTrackerAdapter } from "../adapters/issue-tracker/types.js"; -import type { VCSAdapter } from "../adapters/vcs/types.js"; -import type { MessagingAdapter } from "../adapters/messaging/types.js"; - -export interface Adapters { - issueTracker: IssueTrackerAdapter; - vcs: VCSAdapter; - messaging: MessagingAdapter; -} - -export function createAdapters(): Adapters { - const env = parseEnv(); - - return { - issueTracker: new JiraAdapter({ - baseUrl: env.JIRA_BASE_URL, - email: env.JIRA_EMAIL, - apiToken: env.JIRA_API_TOKEN, - projectKey: env.JIRA_PROJECT_KEY, - }), - vcs: new GitHubAdapter({ - token: env.GITHUB_TOKEN, - owner: env.GITHUB_OWNER, - repo: env.GITHUB_REPO, - baseBranch: env.GITHUB_BASE_BRANCH, - }), - messaging: new ChatSDKAdapter({ - slackToken: env.CHAT_SDK_SLACK_TOKEN, - channelId: env.CHAT_SDK_CHANNEL_ID, - botName: env.CHAT_SDK_BOT_NAME, - }), - }; -} -``` - -- [ ] **Step 2: Commit** - -```bash -git add src/lib/adapters.ts -git commit -m "feat: add adapter factory to instantiate adapters from env config" -``` - ---- - -### Task 9: Context Assembly - -**Files:** -- Create: `src/sandbox/context.ts` -- Create: `src/sandbox/context.test.ts` - -- [ ] **Step 1: Write failing tests** - -```ts -// src/sandbox/context.test.ts -import { describe, it, expect } from "vitest"; -import { assembleImplementationContext, assembleFixingFeedbackContext } from "./context.js"; - -describe("assembleImplementationContext", () => { - it("assembles requirements.md for implementation", () => { - const result = assembleImplementationContext({ - ticket: { - title: "Add login page", - description: "Build a login page with OAuth", - acceptanceCriteria: "- User can log in\n- User can log out", - comments: [ - { author: "Alice", body: "Use OAuth2", createdAt: "2026-03-20T10:00:00Z" }, - ], - }, - prompt: "You are an implementation agent...", - }); - - expect(result).toContain("# Requirements"); - expect(result).toContain("Add login page"); - expect(result).toContain("Build a login page with OAuth"); - expect(result).toContain("User can log in"); - expect(result).toContain("Alice: Use OAuth2"); - expect(result).toContain("You are an implementation agent..."); - }); -}); - -describe("assembleFixingFeedbackContext", () => { - it("assembles requirements.md for fixing feedback", () => { - const result = assembleFixingFeedbackContext({ - ticket: { - title: "Add login page", - description: "Build a login page", - acceptanceCriteria: "", - comments: [], - }, - prompt: "You are a review-fix agent...", - prComments: [ - { author: "Bob", body: "Fix the typo on line 5", liked: true }, - ], - hasConflicts: true, - }); - - expect(result).toContain("# Requirements"); - expect(result).toContain("## PR Review Feedback"); - expect(result).toContain("Fix the typo on line 5"); - expect(result).toContain("## Merge Conflicts"); - expect(result).toContain("You are a review-fix agent..."); - }); -}); -``` - -- [ ] **Step 2: Run tests to verify they fail** - -Run: `pnpm test src/sandbox/context.test.ts` -Expected: FAIL — module not found. - -- [ ] **Step 3: Implement context assembly** - -```ts -// src/sandbox/context.ts -import type { PRComment } from "../adapters/vcs/types.js"; - -interface TicketData { - title: string; - description: string; - acceptanceCriteria: string; - comments: Array<{ author: string; body: string; createdAt: string }>; -} - -export interface ImplementationContextInput { - ticket: TicketData; - prompt: string; -} - -export interface FixingFeedbackContextInput { - ticket: TicketData; - prompt: string; - prComments: PRComment[]; - hasConflicts: boolean; -} - -export function assembleImplementationContext( - input: ImplementationContextInput, -): string { - const { ticket, prompt } = input; - - return `# Requirements - -## Ticket - -${ticket.title} - -## Description - -${ticket.description} - -## Acceptance Criteria - -${ticket.acceptanceCriteria || "None specified."} - -## Comments - -${formatComments(ticket.comments)} - ---- - -${prompt} -`; -} - -export function assembleFixingFeedbackContext( - input: FixingFeedbackContextInput, -): string { - const { ticket, prompt, prComments, hasConflicts } = input; - - return `# Requirements - -## Ticket - -${ticket.title} - -## Description - -${ticket.description} - -## Acceptance Criteria - -${ticket.acceptanceCriteria || "None specified."} - -## Comments - -${formatComments(ticket.comments)} - -## PR Review Feedback - -${formatPRComments(prComments)} - -## Merge Conflicts - -${hasConflicts ? "This PR has merge conflicts that must be resolved." : "No merge conflicts."} - ---- - -${prompt} -`; -} - -function formatComments( - comments: Array<{ author: string; body: string; createdAt: string }>, -): string { - if (comments.length === 0) return "No comments."; - return comments - .map((c) => `${c.author}: ${c.body}`) - .join("\n\n"); -} - -function formatPRComments(comments: PRComment[]): string { - if (comments.length === 0) return "No review feedback."; - return comments - .map((c) => `${c.author}${c.liked ? " (liked)" : ""}: ${c.body}`) - .join("\n\n"); -} -``` - -- [ ] **Step 4: Run tests to verify they pass** - -Run: `pnpm test src/sandbox/context.test.ts` -Expected: PASS. - -- [ ] **Step 5: Commit** - -```bash -git add src/sandbox/context.ts src/sandbox/context.test.ts -git commit -m "feat: implement context assembly for requirements.md generation" -``` - ---- - -### Task 10: Agent Runner - -**Files:** -- Create: `src/sandbox/agent-runner.ts` -- Create: `src/sandbox/agent-runner.test.ts` - -- [ ] **Step 1: Write failing tests** - -```ts -// src/sandbox/agent-runner.test.ts -import { describe, it, expect, vi } from "vitest"; -import { - parseAgentOutput, - AGENT_SCHEMA, - type AgentOutput, -} from "./agent-runner.js"; - -describe("parseAgentOutput", () => { - it("parses implemented result", () => { - const raw = JSON.stringify({ - result: "implemented", - summary: "Added login page with OAuth", - }); - const output = parseAgentOutput(raw); - expect(output.result).toBe("implemented"); - expect(output.summary).toBe("Added login page with OAuth"); - }); - - it("parses clarification_needed result", () => { - const raw = JSON.stringify({ - result: "clarification_needed", - questions: ["What OAuth provider?", "Should we support SSO?"], - }); - const output = parseAgentOutput(raw); - expect(output.result).toBe("clarification_needed"); - expect(output.questions).toHaveLength(2); - }); - - it("parses failed result", () => { - const raw = JSON.stringify({ - result: "failed", - error: "Tests do not pass", - }); - const output = parseAgentOutput(raw); - expect(output.result).toBe("failed"); - expect(output.error).toBe("Tests do not pass"); - }); - - it("throws on invalid JSON", () => { - expect(() => parseAgentOutput("not json")).toThrow(); - }); - - it("throws on missing result field", () => { - expect(() => parseAgentOutput(JSON.stringify({ summary: "oops" }))).toThrow(); - }); -}); - -describe("AGENT_SCHEMA", () => { - it("is valid JSON", () => { - expect(() => JSON.parse(AGENT_SCHEMA)).not.toThrow(); - }); -}); -``` - -- [ ] **Step 2: Run tests to verify they fail** - -Run: `pnpm test src/sandbox/agent-runner.test.ts` -Expected: FAIL — module not found. - -- [ ] **Step 3: Implement agent runner** - -```ts -// src/sandbox/agent-runner.ts -import { z } from "zod"; - -const agentOutputSchema = z.object({ - result: z.enum(["implemented", "clarification_needed", "failed"]), - summary: z.string().optional(), - questions: z.array(z.string()).optional(), - error: z.string().optional(), -}); - -export type AgentOutput = z.infer; - -export const AGENT_SCHEMA = JSON.stringify({ - type: "object", - properties: { - result: { - type: "string", - enum: ["implemented", "clarification_needed", "failed"], - }, - summary: { type: "string" }, - questions: { type: "array", items: { type: "string" } }, - error: { type: "string" }, - }, - required: ["result"], -}); - -export function parseAgentOutput(raw: string): AgentOutput { - const parsed = JSON.parse(raw); - return agentOutputSchema.parse(parsed); -} - -export function buildAgentCommand(model: string): { - cmd: string; - args: string[]; -} { - return { - cmd: "bash", - args: [ - "-c", - `cat /vercel/sandbox/requirements.md | claude --print --output-format json --json-schema '${AGENT_SCHEMA}' --model "${model}" --dangerously-skip-permissions`, - ], - }; -} -``` - -- [ ] **Step 4: Run tests to verify they pass** - -Run: `pnpm test src/sandbox/agent-runner.test.ts` -Expected: PASS. - -- [ ] **Step 5: Commit** - -```bash -git add src/sandbox/agent-runner.ts src/sandbox/agent-runner.test.ts -git commit -m "feat: implement agent runner with structured output parsing and schema" -``` - ---- - -### Task 11: Sandbox Manager - -**Files:** -- Create: `src/sandbox/manager.ts` -- Create: `src/sandbox/manager.test.ts` - -- [ ] **Step 1: Write failing tests** - -```ts -// src/sandbox/manager.test.ts -import { describe, it, expect, vi, beforeEach } from "vitest"; - -const mockRunCommand = vi.fn(); -const mockWriteFiles = vi.fn(); -const mockStop = vi.fn(); -const mockStdout = vi.fn(); -const mockReadFileToBuffer = vi.fn(); - -vi.mock("@vercel/sandbox", () => ({ - Sandbox: { - create: vi.fn(() => ({ - sandboxId: "sbx-test-123", - runCommand: mockRunCommand, - writeFiles: mockWriteFiles, - readFileToBuffer: mockReadFileToBuffer, - stop: mockStop, - })), - }, -})); - -import { SandboxManager } from "./manager.js"; - -describe("SandboxManager", () => { - beforeEach(() => { - vi.clearAllMocks(); - mockRunCommand.mockResolvedValue({ - exitCode: 0, - stdout: mockStdout, - }); - mockStdout.mockResolvedValue(""); - mockWriteFiles.mockResolvedValue(undefined); - mockStop.mockResolvedValue(undefined); - }); - - it("provisions sandbox with git source and env vars", async () => { - const { Sandbox } = await import("@vercel/sandbox"); - - const manager = new SandboxManager({ - githubToken: "ghp_test", - owner: "test-org", - repo: "test-repo", - anthropicApiKey: "sk-ant-test", - claudeModel: "claude-opus-4-6", - commitAuthor: "ai-workflow-blazity", - commitEmail: "bot@blazity.com", - jobTimeoutMs: 1_800_000, - }); - - const sandbox = await manager.provision("feat/test-branch", "# Requirements\n..."); - - expect(Sandbox.create).toHaveBeenCalledWith( - expect.objectContaining({ - source: expect.objectContaining({ - type: "git", - revision: "feat/test-branch", - }), - env: expect.objectContaining({ - ANTHROPIC_API_KEY: "sk-ant-test", - }), - }), - ); - expect(mockWriteFiles).toHaveBeenCalled(); - expect(sandbox.sandboxId).toBe("sbx-test-123"); - }); - - it("runs end hook and detects clean state", async () => { - mockStdout.mockResolvedValueOnce(""); // git status --porcelain returns empty - - const manager = new SandboxManager({ - githubToken: "ghp_test", - owner: "test-org", - repo: "test-repo", - anthropicApiKey: "sk-ant-test", - claudeModel: "claude-opus-4-6", - commitAuthor: "ai-workflow-blazity", - commitEmail: "bot@blazity.com", - jobTimeoutMs: 1_800_000, - }); - - const sandbox = await manager.provision("feat/test", "# Req"); - const result = await manager.runEndHook(sandbox); - - expect(result).toBe("clean"); - }); - - it("commits uncommitted changes in end hook", async () => { - mockStdout - .mockResolvedValueOnce(" M src/index.ts") // git status --porcelain - .mockResolvedValueOnce(""); // git add - mockRunCommand - .mockResolvedValueOnce({ exitCode: 0, stdout: mockStdout }) // git status - .mockResolvedValueOnce({ exitCode: 0, stdout: vi.fn().mockResolvedValue("") }) // git add - .mockResolvedValueOnce({ exitCode: 0, stdout: vi.fn().mockResolvedValue("") }); // git commit - - const manager = new SandboxManager({ - githubToken: "ghp_test", - owner: "test-org", - repo: "test-repo", - anthropicApiKey: "sk-ant-test", - claudeModel: "claude-opus-4-6", - commitAuthor: "ai-workflow-blazity", - commitEmail: "bot@blazity.com", - jobTimeoutMs: 1_800_000, - }); - - const sandbox = await manager.provision("feat/test", "# Req"); - const result = await manager.runEndHook(sandbox); - - expect(result).toBe("committed"); - }); -}); -``` - -- [ ] **Step 2: Run tests to verify they fail** - -Run: `pnpm test src/sandbox/manager.test.ts` -Expected: FAIL — `SandboxManager` not found. - -- [ ] **Step 3: Implement sandbox manager** - -```ts -// src/sandbox/manager.ts -import { Sandbox } from "@vercel/sandbox"; -import { logger } from "../lib/logger.js"; - -export interface SandboxConfig { - githubToken: string; - owner: string; - repo: string; - anthropicApiKey: string; - claudeModel: string; - commitAuthor: string; - commitEmail: string; - jobTimeoutMs: number; - vercelToken?: string; - vercelTeamId?: string; - vercelProjectId?: string; -} - -type SandboxInstance = Awaited>; - -export type EndHookResult = "clean" | "committed" | "error"; - -export class SandboxManager { - constructor(private config: SandboxConfig) {} - - async provision( - branch: string, - requirementsMd: string, - ): Promise { - const credentials: Record = {}; - if (this.config.vercelToken) credentials.token = this.config.vercelToken; - if (this.config.vercelTeamId) credentials.teamId = this.config.vercelTeamId; - if (this.config.vercelProjectId) credentials.projectId = this.config.vercelProjectId; - - const sandbox = await Sandbox.create({ - ...credentials, - source: { - type: "git", - url: `https://github.com/${this.config.owner}/${this.config.repo}.git`, - username: "x-access-token", - password: this.config.githubToken, - revision: branch, - depth: 1, - }, - runtime: "node24", - timeout: this.config.jobTimeoutMs, - env: { - ANTHROPIC_API_KEY: this.config.anthropicApiKey, - CLAUDE_MODEL: this.config.claudeModel, - }, - }); - - // Configure git identity - await sandbox.runCommand("bash", [ - "-c", - `git config user.name "${this.config.commitAuthor}" && git config user.email "${this.config.commitEmail}"`, - ]); - - // Install Claude Code - await sandbox.runCommand("npm", ["install", "-g", "@anthropic-ai/claude-code"]); - - // Write requirements.md - await sandbox.writeFiles([ - { path: "requirements.md", content: Buffer.from(requirementsMd) }, - ]); - - logger.info( - { sandboxId: sandbox.sandboxId, branch }, - "sandbox_provisioned", - ); - - return sandbox; - } - - async runEndHook(sandbox: SandboxInstance): Promise { - try { - const statusResult = await sandbox.runCommand("git", [ - "status", - "--porcelain", - ]); - const status = (await statusResult.stdout()).trim(); - - if (!status) return "clean"; - - // Uncommitted changes exist — force commit - await sandbox.runCommand("git", ["add", "-A"]); - await sandbox.runCommand("git", [ - "commit", - "-m", - "wip: auto-commit uncommitted changes before sandbox teardown", - ]); - - logger.info( - { sandboxId: sandbox.sandboxId }, - "sandbox_end_hook_committed", - ); - return "committed"; - } catch (err) { - logger.warn( - { sandboxId: sandbox.sandboxId, error: (err as Error).message }, - "sandbox_end_hook_error", - ); - return "error"; - } - } - - async extractChanges( - sandbox: SandboxInstance, - ): Promise> { - // Get list of changed files - const diffResult = await sandbox.runCommand("git", [ - "diff", - "--name-only", - "HEAD~1", - "HEAD", - ]); - const diffOutput = (await diffResult.stdout()).trim(); - if (!diffOutput) return []; - - const filePaths = diffOutput.split("\n").filter(Boolean); - const files: Array<{ path: string; content: string }> = []; - - for (const filePath of filePaths) { - const buf = await sandbox.readFileToBuffer({ - path: filePath, - cwd: "/vercel/sandbox", - }); - if (buf) { - files.push({ path: filePath, content: buf.toString("utf-8") }); - } - } - return files; - } - - async teardown(sandbox: SandboxInstance): Promise { - try { - await sandbox.stop(); - logger.info( - { sandboxId: sandbox.sandboxId }, - "sandbox_torn_down", - ); - } catch (err) { - logger.warn( - { sandboxId: sandbox.sandboxId, error: (err as Error).message }, - "sandbox_teardown_failed", - ); - } - } -} -``` - -- [ ] **Step 4: Run tests to verify they pass** - -Run: `pnpm test src/sandbox/manager.test.ts` -Expected: PASS. - -- [ ] **Step 5: Commit** - -```bash -git add src/sandbox/manager.ts src/sandbox/manager.test.ts -git commit -m "feat: implement sandbox manager with provision, end hook, and teardown" -``` - ---- - -### Task 12: Implementation Workflow - -**Files:** -- Create: `src/workflows/implementation.ts` - -- [ ] **Step 1: Implement the workflow** - -```ts -// src/workflows/implementation.ts -import { logger } from "../lib/logger.js"; -import type { AgentOutput } from "../sandbox/agent-runner.js"; -import type { TicketContent } from "../adapters/issue-tracker/types.js"; - -// --- Step Functions (full Node.js access, auto-retry) --- - -async function fetchAndValidateTicket(ticketId: string, columnAi: string) { - "use step"; - const { createAdapters } = await import("../lib/adapters.js"); - const { issueTracker } = createAdapters(); - const ticket = await issueTracker.fetchTicket(ticketId); - - if (ticket.trackerStatus.toLowerCase() !== columnAi.toLowerCase()) { - return null; // stale — ticket no longer in AI column - } - return ticket; -} - -async function createFeatureBranch(branchName: string, baseBranch: string) { - "use step"; - const { createAdapters } = await import("../lib/adapters.js"); - const { vcs } = createAdapters(); - await vcs.createBranch(branchName, baseBranch); -} - -async function assembleImplementationRequirements(ticket: TicketContent) { - "use step"; - const { assembleImplementationContext } = await import("../sandbox/context.js"); - const { readFile } = await import("fs/promises"); - const prompt = await readFile(".blazebot/prompts/implement.md", "utf-8"); - return assembleImplementationContext({ - ticket: { - title: ticket.title, - description: ticket.description, - acceptanceCriteria: ticket.acceptanceCriteria, - comments: ticket.comments, - }, - prompt, - }); -} - -async function runAgentInSandbox( - branchName: string, - requirementsMd: string, -): Promise<{ output: AgentOutput; files: Array<{ path: string; content: string }> }> { - "use step"; - const { parseEnv } = await import("../lib/env.js"); - const { SandboxManager } = await import("../sandbox/manager.js"); - const { buildAgentCommand, parseAgentOutput } = await import( - "../sandbox/agent-runner.js" - ); - - const env = parseEnv(); - const manager = new SandboxManager({ - githubToken: env.GITHUB_TOKEN, - owner: env.GITHUB_OWNER, - repo: env.GITHUB_REPO, - anthropicApiKey: env.ANTHROPIC_API_KEY, - claudeModel: env.CLAUDE_MODEL, - commitAuthor: env.COMMIT_AUTHOR, - commitEmail: env.COMMIT_EMAIL, - jobTimeoutMs: env.JOB_TIMEOUT_MS, - vercelToken: env.VERCEL_TOKEN, - vercelTeamId: env.VERCEL_TEAM_ID, - vercelProjectId: env.VERCEL_PROJECT_ID, - }); - - const sandbox = await manager.provision(branchName, requirementsMd); - - try { - const { cmd, args } = buildAgentCommand(env.CLAUDE_MODEL); - const result = await sandbox.runCommand({ cmd, args, cwd: "/vercel/sandbox" }); - const stdout = await result.stdout(); - - // Run end hook — force commit/discard uncommitted changes - await manager.runEndHook(sandbox); - - // Extract changed files from sandbox (push happens outside, via VCS adapter) - const files = await manager.extractChanges(sandbox); - - const output = parseAgentOutput(stdout); - return { output, files }; - } catch (err) { - await manager.runEndHook(sandbox).catch(() => {}); - const files = await manager.extractChanges(sandbox).catch(() => []); - throw Object.assign(err as Error, { files }); - } finally { - await manager.teardown(sandbox); - } -} - -async function pushChanges( - branchName: string, - files: Array<{ path: string; content: string }>, -) { - "use step"; - if (files.length === 0) return; - const { createAdapters } = await import("../lib/adapters.js"); - const { vcs } = createAdapters(); - await vcs.push(branchName, files); -} - -async function createPullRequest( - branchName: string, - title: string, - summary: string, -) { - "use step"; - const { createAdapters } = await import("../lib/adapters.js"); - const { vcs } = createAdapters(); - return vcs.createPR(branchName, title, summary); -} - -async function moveTicketAndNotify( - ticketId: string, - column: string, - message: string, -) { - "use step"; - const { createAdapters } = await import("../lib/adapters.js"); - const { issueTracker, messaging } = createAdapters(); - await issueTracker.moveTicket(ticketId, column); - await messaging.notify(message); -} - -async function postClarificationAndNotify( - ticketId: string, - questions: string[], - identifier: string, - backlogColumn: string, -) { - "use step"; - const { createAdapters } = await import("../lib/adapters.js"); - const { issueTracker, messaging } = createAdapters(); - const comment = questions.map((q, i) => `${i + 1}. ${q}`).join("\n"); - await issueTracker.postComment(ticketId, comment); - await issueTracker.moveTicket(ticketId, backlogColumn); - await messaging.notify(`Task ${identifier} needs clarification`); -} - -// --- Workflow (durable orchestration — no I/O directly here) --- - -export async function implementationWorkflow(ticketId: string) { - "use workflow"; - - const { parseEnv } = await import("../lib/env.js"); - const env = parseEnv(); - - // Step 1: Validate ticket is still in AI column - const ticket = await fetchAndValidateTicket(ticketId, env.COLUMN_AI); - if (!ticket) return; - - // Step 2: Create feature branch - const branchName = `blazebot/${ticket.identifier.toLowerCase()}`; - await createFeatureBranch(branchName, env.GITHUB_BASE_BRANCH); - - // Step 3: Assemble context (in step — reads filesystem) - const requirementsMd = await assembleImplementationRequirements(ticket); - - // Step 4: Run agent in sandbox - const { output, files } = await runAgentInSandbox(branchName, requirementsMd); - - // Step 5: Push changes from outside the sandbox (spec Section 15.2) - await pushChanges(branchName, files); - - // Step 6: Handle result - if (output.result === "implemented") { - await createPullRequest(branchName, ticket.title, output.summary ?? ""); - await moveTicketAndNotify( - ticketId, - env.COLUMN_AI_REVIEW, - `Task ${ticket.identifier} PR ready for review`, - ); - return; - } - - if (output.result === "clarification_needed") { - await postClarificationAndNotify( - ticketId, - output.questions ?? [], - ticket.identifier, - env.COLUMN_BACKLOG, - ); - return; - } - - // Failed — let workflow retry - throw new Error(`Agent failed for ${ticketId}: ${output.error}`); -} -``` - -- [ ] **Step 2: Verify build** - -Run: `npx nitro build` -Expected: Build succeeds. - -- [ ] **Step 3: Commit** - -```bash -git add src/workflows/implementation.ts -git commit -m "feat: implement durable implementation workflow with all steps" -``` - ---- - -### Task 13: Review-Fix Workflow - -**Files:** -- Create: `src/workflows/review-fix.ts` - -- [ ] **Step 1: Implement the workflow** - -```ts -// src/workflows/review-fix.ts -import { FatalError } from "workflow"; -import { logger } from "../lib/logger.js"; -import type { AgentOutput } from "../sandbox/agent-runner.js"; -import type { TicketContent } from "../adapters/issue-tracker/types.js"; -import type { PRComment } from "../adapters/vcs/types.js"; - -// --- Step Functions --- - -async function fetchAndValidateTicket(ticketId: string, columnAi: string) { - "use step"; - const { createAdapters } = await import("../lib/adapters.js"); - const { issueTracker } = createAdapters(); - const ticket = await issueTracker.fetchTicket(ticketId); - - if (ticket.trackerStatus.toLowerCase() !== columnAi.toLowerCase()) { - return null; - } - return ticket; -} - -async function fetchPRContext(branchName: string) { - "use step"; - const { createAdapters } = await import("../lib/adapters.js"); - const { vcs } = createAdapters(); - const pr = await vcs.findPR(branchName); - if (!pr) throw new FatalError(`No open PR found for branch ${branchName}`); - - const comments = await vcs.getPRComments(pr.id); - const hasConflicts = await vcs.getPRConflictStatus(pr.id); - return { pr, comments, hasConflicts }; -} - -async function assembleReviewFixRequirements( - ticket: TicketContent, - prComments: PRComment[], - hasConflicts: boolean, -) { - "use step"; - const { assembleFixingFeedbackContext } = await import("../sandbox/context.js"); - const { readFile } = await import("fs/promises"); - const prompt = await readFile(".blazebot/prompts/review-fix.md", "utf-8"); - return assembleFixingFeedbackContext({ - ticket: { - title: ticket.title, - description: ticket.description, - acceptanceCriteria: ticket.acceptanceCriteria, - comments: ticket.comments, - }, - prompt, - prComments, - hasConflicts, - }); -} - -async function runFixingAgentInSandbox( - branchName: string, - requirementsMd: string, -): Promise<{ output: AgentOutput; files: Array<{ path: string; content: string }> }> { - "use step"; - const { parseEnv } = await import("../lib/env.js"); - const { SandboxManager } = await import("../sandbox/manager.js"); - const { buildAgentCommand, parseAgentOutput } = await import( - "../sandbox/agent-runner.js" - ); - - const env = parseEnv(); - const manager = new SandboxManager({ - githubToken: env.GITHUB_TOKEN, - owner: env.GITHUB_OWNER, - repo: env.GITHUB_REPO, - anthropicApiKey: env.ANTHROPIC_API_KEY, - claudeModel: env.CLAUDE_MODEL, - commitAuthor: env.COMMIT_AUTHOR, - commitEmail: env.COMMIT_EMAIL, - jobTimeoutMs: env.JOB_TIMEOUT_MS, - vercelToken: env.VERCEL_TOKEN, - vercelTeamId: env.VERCEL_TEAM_ID, - vercelProjectId: env.VERCEL_PROJECT_ID, - }); - - const sandbox = await manager.provision(branchName, requirementsMd); - - try { - const { cmd, args } = buildAgentCommand(env.CLAUDE_MODEL); - const result = await sandbox.runCommand({ cmd, args, cwd: "/vercel/sandbox" }); - const stdout = await result.stdout(); - - await manager.runEndHook(sandbox); - const files = await manager.extractChanges(sandbox); - const output = parseAgentOutput(stdout); - return { output, files }; - } catch (err) { - await manager.runEndHook(sandbox).catch(() => {}); - const files = await manager.extractChanges(sandbox).catch(() => []); - throw Object.assign(err as Error, { files }); - } finally { - await manager.teardown(sandbox); - } -} - -async function pushChanges( - branchName: string, - files: Array<{ path: string; content: string }>, -) { - "use step"; - if (files.length === 0) return; - const { createAdapters } = await import("../lib/adapters.js"); - const { vcs } = createAdapters(); - await vcs.push(branchName, files); -} - -async function moveTicketAndNotify( - ticketId: string, - column: string, - message: string, -) { - "use step"; - const { createAdapters } = await import("../lib/adapters.js"); - const { issueTracker, messaging } = createAdapters(); - await issueTracker.moveTicket(ticketId, column); - await messaging.notify(message); -} - -// --- Workflow --- - -export async function reviewFixWorkflow( - ticketId: string, - branchName: string, -) { - "use workflow"; - - const { parseEnv } = await import("../lib/env.js"); - const env = parseEnv(); - - // Step 1: Validate ticket - const ticket = await fetchAndValidateTicket(ticketId, env.COLUMN_AI); - if (!ticket) return; - - // Step 2: Fetch PR context - const { pr, comments, hasConflicts } = await fetchPRContext(branchName); - - // Step 3: Assemble context (in step — reads filesystem) - const requirementsMd = await assembleReviewFixRequirements( - ticket, - comments, - hasConflicts, - ); - - // Step 4: Run agent in sandbox - const { output, files } = await runFixingAgentInSandbox(branchName, requirementsMd); - - // Step 5: Push changes from outside sandbox - await pushChanges(branchName, files); - - // Step 6: Handle result - if (output.result === "implemented") { - await moveTicketAndNotify( - ticketId, - env.COLUMN_AI_REVIEW, - `Task ${ticket.identifier} fixes applied, ready for re-review`, - ); - return; - } - - // Failed — let workflow retry - throw new Error(`Agent failed for ${ticketId}: ${output.error}`); -} -``` - -- [ ] **Step 2: Commit** - -```bash -git add src/workflows/review-fix.ts -git commit -m "feat: implement durable review-fix workflow" -``` - ---- - -### Task 14: Poller (Cron Route) - -**Files:** -- Create: `src/routes/cron/poll.get.ts` - -- [ ] **Step 1: Implement the poller** - -```ts -// src/routes/cron/poll.get.ts -import { defineEventHandler, getHeader } from "h3"; -import { start } from "workflow/api"; -import { Sandbox } from "@vercel/sandbox"; -import { parseEnv } from "../../lib/env.js"; -import { createAdapters } from "../../lib/adapters.js"; -import { implementationWorkflow } from "../../workflows/implementation.js"; -import { reviewFixWorkflow } from "../../workflows/review-fix.js"; -import { logger } from "../../lib/logger.js"; - -async function getActiveSandboxCount(): Promise { - try { - const { json } = await Sandbox.list({ limit: 100 }); - return json.sandboxes.filter((s: any) => s.status === "running").length; - } catch { - return 0; - } -} - -export default defineEventHandler(async (event) => { - const env = parseEnv(); - - // Verify Vercel Cron auth - if (env.CRON_SECRET) { - const auth = getHeader(event, "authorization"); - if (auth !== `Bearer ${env.CRON_SECRET}`) { - return { status: 401, error: "Unauthorized" }; - } - } - - const { issueTracker, vcs } = createAdapters(); - - // Search for tickets in AI column - const jql = `project = ${env.JIRA_PROJECT_KEY} AND status = "${env.COLUMN_AI}"`; - const ticketKeys = await issueTracker.searchTickets(jql); - - logger.info({ ticketCount: ticketKeys.length }, "poll_discovered_tickets"); - - // Concurrency control (spec Section 8.2) - const activeSandboxes = await getActiveSandboxCount(); - const availableSlots = Math.max(0, env.MAX_CONCURRENT_AGENTS - activeSandboxes); - if (availableSlots === 0) { - logger.info({ active: activeSandboxes, max: env.MAX_CONCURRENT_AGENTS }, "poll_at_capacity"); - return { status: "ok", discovered: ticketKeys.length, started: 0, reason: "at_capacity" }; - } - - const started: string[] = []; - - for (const key of ticketKeys) { - if (started.length >= availableSlots) break; // respect concurrency limit - - try { - const ticket = await issueTracker.fetchTicket(key); - const branchName = `blazebot/${ticket.identifier.toLowerCase()}`; - const existingPR = await vcs.findPR(branchName); - - // Deterministic dedup ID — start() is idempotent if a run with this ID is active - if (existingPR) { - const handle = await start(reviewFixWorkflow, [ticket.id, branchName], { - id: `review-fix-${ticket.id}`, - }); - logger.info( - { ticketId: ticket.id, identifier: ticket.identifier, runId: handle.runId }, - "workflow_started_review_fix", - ); - } else { - const handle = await start(implementationWorkflow, [ticket.id], { - id: `implementation-${ticket.id}`, - }); - logger.info( - { ticketId: ticket.id, identifier: ticket.identifier, runId: handle.runId }, - "workflow_started_implementation", - ); - } - - started.push(ticket.identifier); - } catch (err) { - logger.warn( - { ticketKey: key, error: (err as Error).message }, - "poll_ticket_dispatch_error", - ); - } - } - - return { status: "ok", discovered: ticketKeys.length, started: started.length }; -}); -``` - -- [ ] **Step 2: Commit** - -```bash -git add src/routes/cron/poll.get.ts -git commit -m "feat: implement Vercel Cron poller that discovers tickets and dispatches workflows" -``` - ---- - -### Task 15: Health Route + Workflow World Plugin - -**Files:** -- Create: `src/routes/health.get.ts` -- Create: `src/plugins/workflow-world.ts` - -- [ ] **Step 1: Create health route** - -```ts -// src/routes/health.get.ts -import { defineEventHandler } from "h3"; - -export default defineEventHandler(() => { - return { status: "ok", timestamp: new Date().toISOString() }; -}); -``` - -- [ ] **Step 2: Create workflow world plugin** - -```ts -// src/plugins/workflow-world.ts -import { defineNitroPlugin } from "nitropack/runtime"; - -export default defineNitroPlugin(async () => { - // Skip in serverless — Vercel handles the workflow runtime automatically - if (process.env.VERCEL || process.env.SERVERLESS) return; - - // For local dev: boot the workflow world (requires WORKFLOW_POSTGRES_URL) - try { - const { getWorld } = await import("workflow/runtime"); - await getWorld().start?.(); - } catch (err) { - console.warn("Workflow world not started:", (err as Error).message); - } -}); -``` - -- [ ] **Step 3: Commit** - -```bash -git add src/routes/health.get.ts src/plugins/workflow-world.ts -git commit -m "feat: add health route and workflow world plugin" -``` - ---- - -### Task 16: Prompt Files - -**Files:** -- Create: `.blazebot/prompts/implement.md` -- Create: `.blazebot/prompts/review-fix.md` - -- [ ] **Step 1: Create implementation prompt** - -```markdown - -# Instructions - -You are an AI coding agent implementing a feature based on the requirements above. - -## Constraints - -- Only modify files relevant to the ticket requirements. -- Do not refactor code outside the scope of the acceptance criteria. -- Do not make architectural changes unless explicitly requested. -- Follow existing code conventions in the repository (check CLAUDE.md, AGENTS.md if present). - -## Process - -1. Read and understand the requirements, description, and acceptance criteria. -2. Review existing code to understand the codebase structure. -3. Write tests first (TDD) — integration and e2e tests are required. -4. Implement the feature to make tests pass. -5. Run all tests to ensure nothing is broken. -6. Self-review your changes for quality, correctness, and completeness. -7. Commit your work with descriptive commit messages. - -## Comment Overrides - -If a ticket comment is prefixed with `[OVERRIDE]`, treat it as authoritative over any -prior conflicting instructions. The latest `[OVERRIDE]` comment takes precedence. - -## Output - -Return a JSON object with: -- `result`: "implemented" if done, "clarification_needed" if you have questions, "failed" if stuck. -- `summary`: Description of work done (when implemented). -- `questions`: List of questions (when clarification_needed). -- `error`: Failure details (when failed). -``` - -- [ ] **Step 2: Create review-fix prompt** - -```markdown - -# Instructions - -You are an AI coding agent fixing review feedback and resolving merge conflicts. - -## Constraints - -- Only address the specific review comments listed in PR Review Feedback. -- Do not refactor code outside the scope of the feedback. -- Do not make changes beyond what reviewers requested. -- Follow existing code conventions in the repository (check CLAUDE.md, AGENTS.md if present). - -## Process - -1. Read the review feedback carefully. -2. If merge conflicts exist, merge the target branch and resolve conflicts first. -3. Address each review comment — implement the requested changes. -4. Run all tests to ensure nothing is broken. -5. Self-review your changes. -6. Commit your work with descriptive commit messages. - -## Comment Overrides - -If a ticket comment is prefixed with `[OVERRIDE]`, treat it as authoritative over any -prior conflicting instructions. The latest `[OVERRIDE]` comment takes precedence. - -## Output - -Return a JSON object with: -- `result`: "implemented" if all feedback addressed, "failed" if stuck. -- `summary`: Description of fixes applied (when implemented). -- `error`: Failure details (when failed). -``` - -- [ ] **Step 3: Commit** - -```bash -git add .blazebot/prompts/implement.md .blazebot/prompts/review-fix.md -git commit -m "feat: add agent prompt files for implementation and review-fix" -``` - ---- - -### Task 17: Final Verification - -- [ ] **Step 1: Run all tests** - -Run: `pnpm test` -Expected: All tests pass. - -- [ ] **Step 2: Run type check** - -Run: `pnpm typecheck` -Expected: No type errors. - -- [ ] **Step 3: Build** - -Run: `pnpm build` -Expected: Build completes with no errors. - -- [ ] **Step 4: Final commit** - -```bash -git add -A -git commit -m "chore: final verification — all tests pass, types check, build succeeds" -``` diff --git a/docs/superpowers/plans/2026-03-20-redis-dedup.md b/docs/superpowers/plans/2026-03-20-redis-dedup.md deleted file mode 100644 index b4285c3..0000000 --- a/docs/superpowers/plans/2026-03-20-redis-dedup.md +++ /dev/null @@ -1,488 +0,0 @@ -# Redis-Based Workflow Deduplication Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Prevent duplicate workflow runs for the same ticket using @upstash/redis, and cancel active runs when tickets are moved out of the AI column. - -**Architecture:** A `RunRegistryAdapter` backed by an Upstash Redis hash (`blazebot:active-runs`) maps `ticketKey → runId`. The poll handler checks the registry before starting workflows and, after the start loop, reconciles the registry against the current AI column — cancelling and unregistering any runs whose tickets have left. - -**Tech Stack:** @upstash/redis, vitest, Vercel Workflow `getRun(runId).cancel()` - ---- - -## File Structure - -| Action | Path | Responsibility | -|--------|------|----------------| -| Create | `src/adapters/run-registry/types.ts` | `RunRegistryAdapter` interface | -| Create | `src/adapters/run-registry/upstash.ts` | Upstash Redis implementation | -| Create | `src/adapters/run-registry/upstash.test.ts` | Unit tests for the adapter | -| Modify | `env.ts` | Add `UPSTASH_REDIS_REST_URL` + `UPSTASH_REDIS_REST_TOKEN` | -| Modify | `src/lib/adapters.ts` | Add `runRegistry` to `Adapters` | -| Modify | `src/lib/step-adapters.ts` | Add `runRegistry` to `StepAdapters` | -| Modify | `src/routes/cron/poll.get.ts` | Dedup check + stale-run cancellation | - ---- - -### Task 1: Install @upstash/redis - -**Files:** -- Modify: `package.json` - -- [ ] **Step 1: Install the package** - -```bash -pnpm add @upstash/redis -``` - -- [ ] **Step 2: Verify installation** - -```bash -pnpm ls @upstash/redis -``` - -Expected: package listed with version - ---- - -### Task 2: Add env vars for Upstash Redis - -**Files:** -- Modify: `env.ts` - -- [ ] **Step 1: Add UPSTASH_REDIS_REST_URL and UPSTASH_REDIS_REST_TOKEN to env.ts** - -In the `server` object inside `createEnv`, add after the `CRON_SECRET` entry: - -```typescript -// Redis (run registry) -UPSTASH_REDIS_REST_URL: z.string().url(), -UPSTASH_REDIS_REST_TOKEN: z.string().min(1), -``` - -- [ ] **Step 2: Verify typecheck passes** - -```bash -pnpm typecheck -``` - -Expected: no new errors (existing env usage unchanged) - ---- - -### Task 3: Create RunRegistryAdapter interface - -**Files:** -- Create: `src/adapters/run-registry/types.ts` - -- [ ] **Step 1: Write the interface** - -```typescript -export interface RunRegistryAdapter { - /** Record that a workflow run is active for this ticket. */ - register(ticketKey: string, runId: string): Promise; - /** Get the runId for a ticket, or null if none registered. */ - getRunId(ticketKey: string): Promise; - /** Remove the ticket -> runId mapping. */ - unregister(ticketKey: string): Promise; - /** Get all tracked ticket -> runId pairs. */ - listAll(): Promise>; -} -``` - -- [ ] **Step 2: Verify typecheck** - -```bash -pnpm typecheck -``` - ---- - -### Task 4: Implement UpstashRunRegistry (TDD) - -**Files:** -- Create: `src/adapters/run-registry/upstash.test.ts` -- Create: `src/adapters/run-registry/upstash.ts` - -- [ ] **Step 1: Write the failing tests** - -```typescript -import { describe, it, expect, vi, beforeEach } from "vitest"; -import { UpstashRunRegistry } from "./upstash.js"; - -const HASH_KEY = "blazebot:active-runs"; - -const mockRedis = { - hset: vi.fn(), - hget: vi.fn(), - hdel: vi.fn(), - hgetall: vi.fn(), -}; - -vi.mock("@upstash/redis", () => ({ - Redis: vi.fn(() => mockRedis), -})); - -function createRegistry() { - return new UpstashRunRegistry({ - url: "https://fake.upstash.io", - token: "fake-token", - }); -} - -describe("UpstashRunRegistry", () => { - beforeEach(() => { - vi.clearAllMocks(); - }); - - describe("register", () => { - it("stores ticketKey -> runId in the hash", async () => { - const registry = createRegistry(); - await registry.register("PROJ-1", "run_abc"); - expect(mockRedis.hset).toHaveBeenCalledWith(HASH_KEY, { "PROJ-1": "run_abc" }); - }); - }); - - describe("getRunId", () => { - it("returns runId when ticket is registered", async () => { - mockRedis.hget.mockResolvedValueOnce("run_abc"); - const registry = createRegistry(); - const result = await registry.getRunId("PROJ-1"); - expect(result).toBe("run_abc"); - expect(mockRedis.hget).toHaveBeenCalledWith(HASH_KEY, "PROJ-1"); - }); - - it("returns null when ticket is not registered", async () => { - mockRedis.hget.mockResolvedValueOnce(null); - const registry = createRegistry(); - const result = await registry.getRunId("PROJ-99"); - expect(result).toBeNull(); - }); - }); - - describe("unregister", () => { - it("removes the ticketKey from the hash", async () => { - const registry = createRegistry(); - await registry.unregister("PROJ-1"); - expect(mockRedis.hdel).toHaveBeenCalledWith(HASH_KEY, "PROJ-1"); - }); - }); - - describe("listAll", () => { - it("returns all registered ticket -> runId pairs", async () => { - mockRedis.hgetall.mockResolvedValueOnce({ - "PROJ-1": "run_abc", - "PROJ-2": "run_def", - }); - const registry = createRegistry(); - const result = await registry.listAll(); - expect(result).toEqual([ - { ticketKey: "PROJ-1", runId: "run_abc" }, - { ticketKey: "PROJ-2", runId: "run_def" }, - ]); - }); - - it("returns empty array when no runs are registered", async () => { - mockRedis.hgetall.mockResolvedValueOnce(null); - const registry = createRegistry(); - const result = await registry.listAll(); - expect(result).toEqual([]); - }); - }); -}); -``` - -- [ ] **Step 2: Run tests to verify they fail** - -```bash -pnpm test -- src/adapters/run-registry/upstash.test.ts -``` - -Expected: FAIL — module `./upstash.js` not found - -- [ ] **Step 3: Write the implementation** - -```typescript -import { Redis } from "@upstash/redis"; -import type { RunRegistryAdapter } from "./types.js"; - -const HASH_KEY = "blazebot:active-runs"; - -export class UpstashRunRegistry implements RunRegistryAdapter { - private redis: Redis; - - constructor(opts: { url: string; token: string }) { - this.redis = new Redis({ url: opts.url, token: opts.token }); - } - - async register(ticketKey: string, runId: string): Promise { - await this.redis.hset(HASH_KEY, { [ticketKey]: runId }); - } - - async getRunId(ticketKey: string): Promise { - return this.redis.hget(HASH_KEY, ticketKey); - } - - async unregister(ticketKey: string): Promise { - await this.redis.hdel(HASH_KEY, ticketKey); - } - - async listAll(): Promise> { - const all = await this.redis.hgetall>(HASH_KEY); - if (!all) return []; - return Object.entries(all).map(([ticketKey, runId]) => ({ ticketKey, runId })); - } -} -``` - -- [ ] **Step 4: Run tests to verify they pass** - -```bash -pnpm test -- src/adapters/run-registry/upstash.test.ts -``` - -Expected: all 5 tests PASS - -- [ ] **Step 5: Commit** - -```bash -git add src/adapters/run-registry/ -git commit -m "feat: add UpstashRunRegistry adapter for workflow dedup" -``` - ---- - -### Task 5: Wire RunRegistry into adapter factories - -**Files:** -- Modify: `src/lib/adapters.ts` -- Modify: `src/lib/step-adapters.ts` - -- [ ] **Step 1: Update `src/lib/adapters.ts`** - -Add import: -```typescript -import { UpstashRunRegistry } from "../adapters/run-registry/upstash.js"; -import type { RunRegistryAdapter } from "../adapters/run-registry/types.js"; -``` - -Add `runRegistry` to the `Adapters` interface: -```typescript -export interface Adapters { - issueTracker: IssueTrackerAdapter; - vcs: VCSAdapter; - messaging: MessagingAdapter; - runRegistry: RunRegistryAdapter; -} -``` - -Add to `createAdapters()` return: -```typescript -runRegistry: new UpstashRunRegistry({ - url: env.UPSTASH_REDIS_REST_URL, - token: env.UPSTASH_REDIS_REST_TOKEN, -}), -``` - -- [ ] **Step 2: Update `src/lib/step-adapters.ts`** - -Add imports: -```typescript -import { UpstashRunRegistry } from "../adapters/run-registry/upstash.js"; -import type { RunRegistryAdapter } from "../adapters/run-registry/types.js"; -``` - -Add `runRegistry` to `StepAdapters` interface: -```typescript -export interface StepAdapters { - issueTracker: IssueTrackerAdapter; - vcs: VCSAdapter; - messaging: MessagingAdapter; - runRegistry: RunRegistryAdapter; -} -``` - -Add to `createStepAdapters()` return: -```typescript -runRegistry: new UpstashRunRegistry({ - url: env.UPSTASH_REDIS_REST_URL, - token: env.UPSTASH_REDIS_REST_TOKEN, -}), -``` - -- [ ] **Step 3: Verify typecheck** - -```bash -pnpm typecheck -``` - -- [ ] **Step 4: Commit** - -```bash -git add src/lib/adapters.ts src/lib/step-adapters.ts -git commit -m "feat: wire RunRegistry into adapter factories" -``` - ---- - -### Task 6: Update poll handler — dedup + stale-run cancellation - -**Files:** -- Modify: `src/routes/cron/poll.get.ts` - -- [ ] **Step 1: Merge getRun into existing import and update destructure** - -Merge `getRun` into the existing `start` import (line 2): -```typescript -import { start, getRun } from "workflow/api"; -``` - -Update the destructure on the existing line: -```typescript -const { issueTracker, vcs, runRegistry } = createAdapters(); -``` - -- [ ] **Step 2: Add dedup check inside the ticket loop** - -At the top of the `for` loop body, *before* `fetchTicket(key)` (to avoid unnecessary Jira calls): - -```typescript -// Skip if a workflow is already running for this ticket -const existingRunId = await runRegistry.getRunId(key); -if (existingRunId) { - logger.info({ ticketKey: key, runId: existingRunId }, "poll_ticket_already_running"); - continue; -} -``` - -- [ ] **Step 3: Register runId after starting each workflow** - -After each `start()` call, register the run. If `register` fails (Redis down), log a warning but don't block — the stale-run reconciliation will handle cleanup. Replace the two branches: - -```typescript -if (existingPR) { - const handle = await start(reviewFixWorkflow, [ticket.id, branchName]); - await runRegistry.register(ticket.identifier, handle.runId).catch((err) => - logger.warn({ ticketKey: key, runId: handle.runId, error: (err as Error).message }, "poll_register_failed"), - ); - logger.info( - { ticketId: ticket.id, identifier: ticket.identifier, runId: handle.runId }, - "workflow_started_review_fix", - ); -} else { - const handle = await start(implementationWorkflow, [ticket.id]); - await runRegistry.register(ticket.identifier, handle.runId).catch((err) => - logger.warn({ ticketKey: key, runId: handle.runId, error: (err as Error).message }, "poll_register_failed"), - ); - logger.info( - { ticketId: ticket.id, identifier: ticket.identifier, runId: handle.runId }, - "workflow_started_implementation", - ); -} -``` - -- [ ] **Step 4: Add stale-run cancellation after the start loop** - -After the `for` loop ends (after line 74), before the return statement, add. -Uses the already-fetched `ticketKeys` set (from the JQL query) to avoid redundant Jira API calls — any registered ticket NOT in that set has left the AI column: - -```typescript -// Cancel runs for tickets that have been moved out of the AI column -const aiColumnSet = new Set(ticketKeys); -const activeRuns = await runRegistry.listAll(); -let cancelled = 0; - -for (const { ticketKey, runId } of activeRuns) { - if (aiColumnSet.has(ticketKey)) continue; // still in AI column - - try { - const run = getRun(runId); - await run.cancel(); - await runRegistry.unregister(ticketKey); - logger.info({ ticketKey, runId }, "poll_cancelled_stale_run"); - cancelled++; - } catch (err) { - // Run may already be finished — unregister to clean up - await runRegistry.unregister(ticketKey).catch(() => {}); - logger.warn( - { ticketKey, runId, error: (err as Error).message }, - "poll_stale_run_cleanup_error", - ); - } -} -``` - -Update the return statement to include cancelled count: -```typescript -return { status: "ok", discovered: ticketKeys.length, started: started.length, cancelled }; -``` - -- [ ] **Step 5: Verify typecheck** - -```bash -pnpm typecheck -``` - -- [ ] **Step 6: Run all tests** - -```bash -pnpm test -``` - -- [ ] **Step 7: Commit** - -```bash -git add src/routes/cron/poll.get.ts -git commit -m "feat: add dedup check and stale-run cancellation to poll handler" -``` - ---- - -### Task 7: Unregister runs on workflow completion - -**Files:** -- Modify: `src/workflows/implementation.ts` -- Modify: `src/workflows/review-fix.ts` - -Both workflows should unregister their ticket from the run registry when they complete (success or clarification), so that finished runs don't accumulate in Redis. - -- [ ] **Step 1: Add unregister step function to `implementation.ts`** - -Add a new step function: -```typescript -async function unregisterRun(ticketIdentifier: string) { - "use step"; - const { createStepAdapters } = await import("../lib/step-adapters.js"); - const { runRegistry } = createStepAdapters(); - await runRegistry.unregister(ticketIdentifier); -} -``` - -- [ ] **Step 2: Call unregisterRun at every exit point in `implementationWorkflow`** - -Add `await unregisterRun(ticket.identifier);` before every `return` AND before the `throw` at the end: - -- Before `return` in the `implemented` branch -- Before `return` in the `clarification_needed` branch -- Before `throw new Error(...)` at the end (so error-path runs also get cleaned up) - -Skip the early `if (!ticket) return;` — we don't have the identifier there and the stale-run reconciliation in poll handles that case. - -- [ ] **Step 3: Add same unregister step to `review-fix.ts`** - -Add the same `unregisterRun` step function. Call it: -- Before `return` in the `implemented` branch -- Before `throw new Error(...)` at the end - -- [ ] **Step 4: Verify typecheck and run tests** - -```bash -pnpm typecheck && pnpm test -``` - -- [ ] **Step 5: Commit** - -```bash -git add src/workflows/implementation.ts src/workflows/review-fix.ts -git commit -m "feat: unregister runs from registry on workflow completion" -``` diff --git a/docs/superpowers/plans/2026-03-23-agent-session-memory.md b/docs/superpowers/plans/2026-03-23-agent-session-memory.md deleted file mode 100644 index 5be8f41..0000000 --- a/docs/superpowers/plans/2026-03-23-agent-session-memory.md +++ /dev/null @@ -1,445 +0,0 @@ -# Agent Session Memory Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Give blazebot agents persistent session memory so they can restore context across runs instead of starting from scratch. - -**Architecture:** Pure prompt changes to `implement.md` and `review-fix.md` instruct the agent to read/write a `blazebot/memory/[TASK_ID].md` file on the feature branch. A small code change to `context.ts` adds the ticket identifier to `requirements.md` so the agent knows the task ID. Callers in both workflow files pass the identifier through. - -**Tech Stack:** TypeScript (Nitropack), Markdown prompts - -**Spec:** `docs/superpowers/specs/2026-03-23-agent-session-memory-design.md` - ---- - -## File Map - -| File | Action | Responsibility | -|------|--------|----------------| -| `src/sandbox/context.ts` | Modify | Add `identifier` to interfaces and rendered context | -| `src/sandbox/context.test.ts` | Modify | Add `identifier` to test data, assert `## Ticket ID` renders | -| `src/workflows/implementation.ts` | Modify | Pass `ticket.identifier` to context assembly | -| `src/workflows/review-fix.ts` | Modify | Pass `ticket.identifier` to context assembly | -| `.blazebot/prompts/implement.md` | Modify | Add memory read (step 0) + memory write (Session Memory section) | -| `.blazebot/prompts/review-fix.md` | Modify | Add memory read (step 0) + memory write (Session Memory section) | - ---- - -### Task 1: Add ticket identifier to context assembly - -**Files:** -- Modify: `src/sandbox/context.ts:3-8` (TicketData interface) -- Modify: `src/sandbox/context.ts:22-48` (assembleImplementationContext) -- Modify: `src/sandbox/context.ts:51-86` (assembleFixingFeedbackContext) - -- [ ] **Step 1: Add `identifier` to `TicketData` interface** - -In `src/sandbox/context.ts`, add `identifier` to the `TicketData` interface: - -```typescript -interface TicketData { - identifier: string; - title: string; - description: string; - acceptanceCriteria: string; - comments: Array<{ author: string; body: string; createdAt: string }>; -} -``` - -- [ ] **Step 2: Add `## Ticket ID` section to `assembleImplementationContext`** - -In the template string returned by `assembleImplementationContext`, add a `## Ticket ID` section right after `# Requirements`: - -```typescript -export function assembleImplementationContext( - input: ImplementationContextInput, -): string { - const { ticket, prompt } = input; - - return `# Requirements - -## Ticket ID - -${ticket.identifier} - -## Ticket - -${ticket.title} - -## Description - -${ticket.description} - -## Acceptance Criteria - -${ticket.acceptanceCriteria || "None specified."} - -## Comments - -${formatComments(ticket.comments)} - ---- - -${prompt} -`; -} -``` - -- [ ] **Step 3: Add `## Ticket ID` section to `assembleFixingFeedbackContext`** - -Same change — add `## Ticket ID\n\n${ticket.identifier}` right after `# Requirements` in the `assembleFixingFeedbackContext` template string: - -```typescript -export function assembleFixingFeedbackContext( - input: FixingFeedbackContextInput, -): string { - const { ticket, prompt, prComments, hasConflicts } = input; - - return `# Requirements - -## Ticket ID - -${ticket.identifier} - -## Ticket - -${ticket.title} - -## Description - -${ticket.description} - -## Acceptance Criteria - -${ticket.acceptanceCriteria || "None specified."} - -## Comments - -${formatComments(ticket.comments)} - -## PR Review Feedback - -${formatPRComments(prComments)} - -## Merge Conflicts - -${hasConflicts ? "This PR has merge conflicts that must be resolved." : "No merge conflicts."} - ---- - -${prompt} -`; -} -``` - -- [ ] **Step 4: Update `context.test.ts` — add `identifier` to test data** - -In `src/sandbox/context.test.ts`, add `identifier` to both test ticket objects and add assertions for the new section: - -```typescript -import { describe, it, expect } from "vitest"; -import { assembleImplementationContext, assembleFixingFeedbackContext } from "./context.js"; - -describe("assembleImplementationContext", () => { - it("assembles requirements.md for implementation", () => { - const result = assembleImplementationContext({ - ticket: { - identifier: "TEST-1", - title: "Add login page", - description: "Build a login page with OAuth", - acceptanceCriteria: "- User can log in\n- User can log out", - comments: [ - { author: "Alice", body: "Use OAuth2", createdAt: "2026-03-20T10:00:00Z" }, - ], - }, - prompt: "You are an implementation agent...", - }); - - expect(result).toContain("# Requirements"); - expect(result).toContain("## Ticket ID"); - expect(result).toContain("TEST-1"); - expect(result).toContain("Add login page"); - expect(result).toContain("Build a login page with OAuth"); - expect(result).toContain("User can log in"); - expect(result).toContain("Alice: Use OAuth2"); - expect(result).toContain("You are an implementation agent..."); - }); -}); - -describe("assembleFixingFeedbackContext", () => { - it("assembles requirements.md for fixing feedback", () => { - const result = assembleFixingFeedbackContext({ - ticket: { - identifier: "TEST-2", - title: "Add login page", - description: "Build a login page", - acceptanceCriteria: "", - comments: [], - }, - prompt: "You are a review-fix agent...", - prComments: [ - { author: "Bob", body: "Fix the typo on line 5", liked: true }, - ], - hasConflicts: true, - }); - - expect(result).toContain("# Requirements"); - expect(result).toContain("## Ticket ID"); - expect(result).toContain("TEST-2"); - expect(result).toContain("## PR Review Feedback"); - expect(result).toContain("Fix the typo on line 5"); - expect(result).toContain("## Merge Conflicts"); - expect(result).toContain("You are a review-fix agent..."); - }); -}); -``` - -- [ ] **Step 5: Run tests to verify** - -Run: `npx vitest run src/sandbox/context.test.ts` -Expected: Both tests pass - -- [ ] **Step 6: Commit** - -```bash -git add src/sandbox/context.ts src/sandbox/context.test.ts -git commit -m "feat: add ticket identifier to context assembly" -``` - ---- - -### Task 2: Pass ticket identifier from workflow callers - -**Files:** -- Modify: `src/workflows/implementation.ts:25-41` (assembleImplementationRequirements) -- Modify: `src/workflows/review-fix.ts:32-53` (assembleReviewFixRequirements) - -- [ ] **Step 1: Update `assembleImplementationRequirements` in `implementation.ts`** - -Add `identifier: ticket.identifier` to the ticket object passed to `assembleImplementationContext`: - -```typescript -async function assembleImplementationRequirements(ticket: TicketContent) { - "use step"; - const { assembleImplementationContext } = await import("../sandbox/context.js"); - const { env } = await import("../../env.js"); - - const prompt = env.IMPLEMENTATION_PROMPT ?? ""; - return assembleImplementationContext({ - ticket: { - identifier: ticket.identifier, - title: ticket.title, - description: ticket.description, - acceptanceCriteria: ticket.acceptanceCriteria, - comments: ticket.comments, - }, - prompt, - }); -} -``` - -- [ ] **Step 2: Update `assembleReviewFixRequirements` in `review-fix.ts`** - -Same change — add `identifier: ticket.identifier`: - -```typescript -async function assembleReviewFixRequirements( - ticket: TicketContent, - prComments: PRComment[], - hasConflicts: boolean, -) { - "use step"; - const { assembleFixingFeedbackContext } = await import("../sandbox/context.js"); - const { env } = await import("../../env.js"); - - const prompt = env.REVIEW_FIX_PROMPT ?? ""; - return assembleFixingFeedbackContext({ - ticket: { - identifier: ticket.identifier, - title: ticket.title, - description: ticket.description, - acceptanceCriteria: ticket.acceptanceCriteria, - comments: ticket.comments, - }, - prompt, - prComments, - hasConflicts, - }); -} -``` - -- [ ] **Step 3: Verify TypeScript compiles** - -Run: `npx tsc --noEmit` -Expected: No errors - -- [ ] **Step 4: Commit** - -```bash -git add src/workflows/implementation.ts src/workflows/review-fix.ts -git commit -m "feat: pass ticket identifier to context assembly" -``` - ---- - -### Task 3: Add session memory instructions to implement.md - -**Files:** -- Modify: `.blazebot/prompts/implement.md` - -- [ ] **Step 1: Add memory restore as step 0 in Process section** - -Replace the current `## Process` section with: - -```markdown -## Process - -0. **Restore session memory** — Check if `blazebot/memory/[TASK_ID].md` exists (where `[TASK_ID]` is the Ticket ID from above, e.g. `AIW-123`). If it exists, read it immediately. Use the progress, decisions, and file list to skip redundant analysis and pick up where the previous session left off. -1. Read and understand the requirements, description, and acceptance criteria. -2. Review existing code to understand the codebase structure. -3. **Assess ticket clarity** — before writing any code, evaluate whether the ticket provides enough information to implement correctly. If not, return `clarification_needed` (see below). -4. Write tests first (TDD) — integration and e2e tests are required. -5. Implement the feature to make tests pass. -6. Run all tests to ensure nothing is broken. -7. Self-review your changes for quality, correctness, and completeness. -8. **Update session memory** — before returning your result, write/update `blazebot/memory/[TASK_ID].md` (see Session Memory below). -9. Commit your work with descriptive commit messages. -``` - -- [ ] **Step 2: Add Session Memory section before the Output section** - -Insert this new section between "Comment Overrides" and "Output": - -```markdown -## Session Memory - -Before returning your result — **regardless of outcome** (`implemented`, `clarification_needed`, or `failed`) — write or update `blazebot/memory/[TASK_ID].md` where `[TASK_ID]` is the Ticket ID (e.g. `AIW-123`). Create the `blazebot/memory/` directory if it does not exist. - -Use this format: - -``` -# Session Memory — [TASK_ID] - -## Progress -- What was analyzed, understood, and attempted this session - -## Decisions Made -- Technical choices and reasoning (e.g. "Using existing Zod pattern from src/db/schema.ts") - -## Blockers -- What is blocking progress (if clarification_needed or failed) -- Specific questions that need answers -- "None" if implemented successfully - -## Files Touched -- List of files created or modified with brief notes -``` - -Keep the memory concise and factual. This file will be read by future agent sessions (including review-fix agents) to restore context. -``` - -- [ ] **Step 3: Commit** - -```bash -git add .blazebot/prompts/implement.md -git commit -m "feat: add session memory instructions to implement prompt" -``` - ---- - -### Task 4: Add session memory instructions to review-fix.md - -**Files:** -- Modify: `.blazebot/prompts/review-fix.md` - -- [ ] **Step 1: Add memory restore as step 0 in Process section** - -Replace the current `## Process` section with: - -```markdown -## Process - -0. **Restore session memory** — Check if `blazebot/memory/[TASK_ID].md` exists (where `[TASK_ID]` is the Ticket ID from above, e.g. `AIW-123`). If it exists, read it immediately. Use the progress, decisions, and file list to understand prior implementation context and any previous fix attempts. -1. Read the review feedback carefully. -2. If merge conflicts exist, merge the target branch and resolve conflicts first. -3. Address each review comment — implement the requested changes. -4. Run all tests to ensure nothing is broken. -5. Self-review your changes. -6. **Update session memory** — before returning your result, write/update `blazebot/memory/[TASK_ID].md` (see Session Memory below). -7. Commit your work with descriptive commit messages. -``` - -- [ ] **Step 2: Add Session Memory section before the Output section** - -Insert this new section between "Comment Overrides" and "Output": - -```markdown -## Session Memory - -Before returning your result — **regardless of outcome** (`implemented` or `failed`) — write or update `blazebot/memory/[TASK_ID].md` where `[TASK_ID]` is the Ticket ID (e.g. `AIW-123`). Create the `blazebot/memory/` directory if it does not exist. - -Use this format: - -``` -# Session Memory — [TASK_ID] - -## Progress -- What was analyzed, understood, and attempted this session - -## Decisions Made -- Technical choices and reasoning - -## Blockers -- What is blocking progress (if failed) -- "None" if implemented successfully - -## Files Touched -- List of files created or modified with brief notes -``` - -Keep the memory concise and factual. This file persists across sessions and serves as context for future runs. -``` - -- [ ] **Step 3: Commit** - -```bash -git add .blazebot/prompts/review-fix.md -git commit -m "feat: add session memory instructions to review-fix prompt" -``` - ---- - -### Task 5: Verify end-to-end - -- [ ] **Step 1: Verify TypeScript compiles** - -Run: `npx tsc --noEmit` -Expected: No errors - -- [ ] **Step 2: Run full test suite** - -Run: `npx vitest run` -Expected: All tests pass - -- [ ] **Step 3: Review all changes together** - -Run: `git diff main --stat` - -Verify exactly 6 files changed: -- `src/sandbox/context.ts` -- `src/sandbox/context.test.ts` -- `src/workflows/implementation.ts` -- `src/workflows/review-fix.ts` -- `.blazebot/prompts/implement.md` -- `.blazebot/prompts/review-fix.md` - -- [ ] **Step 4: Spot-check the rendered context includes Ticket ID** - -Read `src/sandbox/context.ts` and confirm the `## Ticket ID` section appears right after `# Requirements` in both functions. - -- [ ] **Step 5: Spot-check prompts reference the memory file path correctly** - -Read both `.blazebot/prompts/implement.md` and `.blazebot/prompts/review-fix.md` and confirm: -- Step 0 references `blazebot/memory/[TASK_ID].md` -- Session Memory section references `blazebot/memory/[TASK_ID].md` -- The memory format template is included diff --git a/docs/superpowers/plans/2026-03-26-e2e-tests.md b/docs/superpowers/plans/2026-03-26-e2e-tests.md deleted file mode 100644 index 981940c..0000000 --- a/docs/superpowers/plans/2026-03-26-e2e-tests.md +++ /dev/null @@ -1,1515 +0,0 @@ -# E2E Test Suite Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Build a tiered e2e test suite that validates the full ticket-to-PR pipeline against real Jira, GitHub, Upstash Redis, Vercel Sandbox, and Claude Code agent. - -**Architecture:** Tier 1 tests (~5 min) validate webhook handling, cron polling, reconciliation, and deduplication by hitting HTTP endpoints and asserting side effects in external services. Tier 2 tests (~1-2 hours) run the real Claude agent in Vercel Sandbox and validate the full implementation, clarification, and review-fix flows via structural assertions. Tests are self-contained: each creates its own Jira tickets and cleans up after. - -**Tech Stack:** Vitest, native fetch, @octokit/rest, @upstash/redis, node:crypto (HMAC signing) - -**Spec:** `docs/superpowers/specs/2026-03-26-e2e-tests-design.md` - ---- - -## File Map - -``` -Files to create: - e2e/vitest.e2e.config.ts — Vitest config with per-tier timeouts - e2e/env.ts — Zod-validated e2e env vars - e2e/helpers/jira.ts — Jira ticket CRUD for test setup/teardown - e2e/helpers/github.ts — GitHub PR/branch assertions and cleanup - e2e/helpers/redis.ts — Upstash Redis reads and cleanup - e2e/helpers/webhook.ts — HMAC-signed webhook sender - e2e/helpers/wait.ts — Polling utilities - e2e/tier1/webhook-signature.test.ts - e2e/tier1/webhook-dispatch.test.ts - e2e/tier1/webhook-cancel.test.ts - e2e/tier1/webhook-ignore.test.ts - e2e/tier1/cron-poll.test.ts - e2e/tier1/cron-reconciliation.test.ts - e2e/tier1/duplicate-dispatch.test.ts - e2e/tier2/implementation-and-review.test.ts — Happy path + review-fix (sequenced) - e2e/tier2/clarification-flow.test.ts - .env.e2e.example — Template for e2e env vars - .github/workflows/e2e.yml — CI workflow_dispatch - -Files to modify: - package.json — Add test:e2e, test:e2e:tier1, test:e2e:tier2 scripts - .gitignore — Add .env.e2e -``` - ---- - -### Task 1: E2E Environment Configuration - -**Files:** -- Create: `e2e/env.ts` -- Create: `.env.e2e.example` -- Modify: `.gitignore` - -- [ ] **Step 1: Create the e2e env schema** - -```ts -// e2e/env.ts -import { z } from "zod"; - -const schema = z.object({ - E2E_BASE_URL: z.string().url(), - - JIRA_BASE_URL: z.string().url(), - JIRA_EMAIL: z.string().email(), - JIRA_API_TOKEN: z.string().min(1), - JIRA_PROJECT_KEY: z.string().min(1), - JIRA_WEBHOOK_SECRET: z.string().min(1), - COLUMN_AI: z.string().min(1), - COLUMN_AI_REVIEW: z.string().min(1), - COLUMN_BACKLOG: z.string().min(1), - - GITHUB_TOKEN: z.string().min(1), - GITHUB_OWNER: z.string().min(1), - GITHUB_REPO: z.string().min(1), - - CRON_SECRET: z.string().min(1), - - AI_WORKFLOW_KV_REST_API_URL: z.string().url(), - AI_WORKFLOW_KV_REST_API_TOKEN: z.string().min(1), -}); - -export const e2eEnv = schema.parse(process.env); -export type E2EEnv = z.infer; -``` - -- [ ] **Step 2: Create `.env.e2e.example`** - -``` -# Target server -E2E_BASE_URL=https://your-staging.vercel.app - -# Jira -JIRA_BASE_URL=https://your-domain.atlassian.net -JIRA_EMAIL=your-email@example.com -JIRA_API_TOKEN= -JIRA_PROJECT_KEY=PROJ -JIRA_WEBHOOK_SECRET= -COLUMN_AI=AI -COLUMN_AI_REVIEW=AI Review -COLUMN_BACKLOG=Backlog - -# GitHub -GITHUB_TOKEN= -GITHUB_OWNER= -GITHUB_REPO= - -# Cron auth -CRON_SECRET= - -# Upstash Redis -AI_WORKFLOW_KV_REST_API_URL= -AI_WORKFLOW_KV_REST_API_TOKEN= -``` - -- [ ] **Step 3: Add `.env.e2e` to `.gitignore`** - -Add this line to `.gitignore`: -``` -.env.e2e -``` - -- [ ] **Step 4: Commit** - -```bash -git add e2e/env.ts .env.e2e.example .gitignore -git commit -m "feat(e2e): add environment configuration and .env.e2e.example" -``` - ---- - -### Task 2: Vitest E2E Config and npm Scripts - -**Files:** -- Create: `e2e/vitest.e2e.config.ts` -- Modify: `package.json` - -- [ ] **Step 1: Create the vitest e2e config** - -```ts -// e2e/vitest.e2e.config.ts -import { defineConfig } from "vitest/config"; -import { config } from "dotenv"; -import { resolve } from "node:path"; - -// Load .env.e2e if it exists (CI uses environment secrets instead) -config({ path: resolve(import.meta.dirname, "../.env.e2e") }); - -export default defineConfig({ - test: { - globals: true, - environment: "node", - include: ["e2e/**/*.test.ts"], - sequence: { concurrent: false }, - projects: [ - { - test: { - name: "tier1", - include: ["e2e/tier1/**/*.test.ts"], - testTimeout: 120_000, - }, - }, - { - test: { - name: "tier2", - include: ["e2e/tier2/**/*.test.ts"], - testTimeout: 2_100_000, - }, - }, - ], - }, -}); -``` - -- [ ] **Step 2: Add npm scripts to `package.json`** - -Add these to the `"scripts"` object in `package.json`: - -```json -"test:e2e": "vitest run --config e2e/vitest.e2e.config.ts", -"test:e2e:tier1": "vitest run --config e2e/vitest.e2e.config.ts --project tier1", -"test:e2e:tier2": "vitest run --config e2e/vitest.e2e.config.ts --project tier2" -``` - -- [ ] **Step 3: Install dotenv as a dev dependency** - -```bash -npm install --save-dev dotenv -``` - -- [ ] **Step 4: Verify config loads without errors** - -Run: `npx vitest run --config e2e/vitest.e2e.config.ts --passWithNoTests 2>&1 | head -20` - -Expected: exits cleanly (no test files found, but no config errors). -Note: this will fail if `.env.e2e` is not present and env vars are not set. That's expected — it validates the config file structure is correct. - -- [ ] **Step 5: Commit** - -```bash -git add e2e/vitest.e2e.config.ts package.json package-lock.json -git commit -m "feat(e2e): add vitest e2e config with tiered timeouts and npm scripts" -``` - ---- - -### Task 3: Wait Helper - -**Files:** -- Create: `e2e/helpers/wait.ts` - -This is a dependency for other helpers, so build it first. - -- [ ] **Step 1: Create the wait utility** - -```ts -// e2e/helpers/wait.ts - -export class WaitTimeoutError extends Error { - constructor(description: string, timeoutMs: number) { - super(`Timed out after ${timeoutMs}ms waiting for: ${description}`); - this.name = "WaitTimeoutError"; - } -} - -export async function waitFor( - fn: () => Promise, - opts: { description: string; timeoutMs: number; intervalMs?: number }, -): Promise { - const { description, timeoutMs, intervalMs = 5_000 } = opts; - const deadline = Date.now() + timeoutMs; - - while (Date.now() < deadline) { - const result = await fn(); - if (result) return result; - await new Promise((r) => setTimeout(r, intervalMs)); - } - - throw new WaitTimeoutError(description, timeoutMs); -} -``` - -- [ ] **Step 2: Commit** - -```bash -git add e2e/helpers/wait.ts -git commit -m "feat(e2e): add generic waitFor polling utility" -``` - ---- - -### Task 4: Jira Helper - -**Files:** -- Create: `e2e/helpers/jira.ts` - -Uses the same Jira REST API v3 patterns as `src/adapters/issue-tracker/jira.ts` but with test-specific operations (create, delete). - -- [ ] **Step 1: Create the Jira helper** - -```ts -// e2e/helpers/jira.ts -import { e2eEnv } from "../env.js"; - -const authHeader = - "Basic " + - Buffer.from(`${e2eEnv.JIRA_EMAIL}:${e2eEnv.JIRA_API_TOKEN}`).toString( - "base64", - ); - -async function jiraRequest(path: string, options?: RequestInit) { - const res = await fetch(`${e2eEnv.JIRA_BASE_URL}${path}`, { - ...options, - headers: { - Authorization: authHeader, - "Content-Type": "application/json", - ...options?.headers, - }, - }); - if (!res.ok) { - const text = await res.text().catch(() => ""); - throw new Error( - `Jira API error: ${res.status} ${res.statusText} on ${path} — ${text}`, - ); - } - if (res.status === 204) return null; - try { - return await res.json(); - } catch { - return null; - } -} - -export async function createTestTicket( - overrides: { summary?: string; description?: string } = {}, -): Promise<{ ticketKey: string; ticketId: string }> { - const summary = - overrides.summary ?? `[E2E] test-${crypto.randomUUID().slice(0, 8)}`; - const description = overrides.description ?? "Automated e2e test ticket"; - - const data = await jiraRequest("/rest/api/3/issue", { - method: "POST", - body: JSON.stringify({ - fields: { - project: { key: e2eEnv.JIRA_PROJECT_KEY }, - summary, - description: { - type: "doc", - version: 1, - content: [ - { - type: "paragraph", - content: [{ type: "text", text: description }], - }, - ], - }, - issuetype: { name: "Task" }, - }, - }), - }); - - return { ticketKey: data.key, ticketId: data.id }; -} - -export async function moveTicketToColumn( - ticketKey: string, - column: string, -): Promise { - const data = await jiraRequest( - `/rest/api/3/issue/${ticketKey}/transitions`, - ); - const transition = data.transitions.find( - (t: any) => t.name.toLowerCase() === column.toLowerCase(), - ); - if (!transition) { - throw new Error( - `No transition to "${column}" for ${ticketKey}. Available: ${data.transitions.map((t: any) => t.name).join(", ")}`, - ); - } - await jiraRequest(`/rest/api/3/issue/${ticketKey}/transitions`, { - method: "POST", - body: JSON.stringify({ transition: { id: transition.id } }), - }); -} - -export async function getTicketStatus(ticketKey: string): Promise { - const data = await jiraRequest( - `/rest/api/3/issue/${ticketKey}?fields=status`, - ); - return data.fields.status.name; -} - -export async function getTicketComments( - ticketKey: string, -): Promise> { - const data = await jiraRequest( - `/rest/api/3/issue/${ticketKey}?fields=comment`, - ); - return (data.fields.comment?.comments ?? []).map((c: any) => ({ - author: c.author?.displayName ?? "unknown", - body: extractAdfText(c.body), - })); -} - -export async function deleteTicket(ticketKey: string): Promise { - await jiraRequest(`/rest/api/3/issue/${ticketKey}`, { - method: "DELETE", - }).catch(() => { - // Best-effort cleanup - }); -} - -function extractAdfText(adf: any): string { - if (!adf) return ""; - if (typeof adf === "string") return adf; - if (adf.text) return adf.text; - if (adf.content) return adf.content.map(extractAdfText).join("\n"); - return ""; -} -``` - -- [ ] **Step 2: Commit** - -```bash -git add e2e/helpers/jira.ts -git commit -m "feat(e2e): add Jira helper for ticket lifecycle management" -``` - ---- - -### Task 5: GitHub Helper - -**Files:** -- Create: `e2e/helpers/github.ts` - -- [ ] **Step 1: Create the GitHub helper** - -```ts -// e2e/helpers/github.ts -import { Octokit } from "@octokit/rest"; -import { e2eEnv } from "../env.js"; - -const octokit = new Octokit({ auth: e2eEnv.GITHUB_TOKEN }); -const ownerRepo = { owner: e2eEnv.GITHUB_OWNER, repo: e2eEnv.GITHUB_REPO }; - -export async function findPR( - branchName: string, -): Promise<{ number: number; url: string } | null> { - const { data } = await octokit.pulls.list({ - ...ownerRepo, - head: `${e2eEnv.GITHUB_OWNER}:${branchName}`, - state: "open", - }); - if (data.length === 0) return null; - return { number: data[0].number, url: data[0].html_url }; -} - -export async function getPRCommits( - prNumber: number, -): Promise> { - const { data } = await octokit.pulls.listCommits({ - ...ownerRepo, - pull_number: prNumber, - }); - return data.map((c) => ({ - sha: c.sha, - message: c.commit.message, - })); -} - -export async function addPRComment( - prNumber: number, - body: string, -): Promise { - await octokit.issues.createComment({ - ...ownerRepo, - issue_number: prNumber, - body, - }); -} - -export async function closePR(prNumber: number): Promise { - await octokit.pulls - .update({ - ...ownerRepo, - pull_number: prNumber, - state: "closed", - }) - .catch(() => {}); -} - -export async function deleteBranch(branchName: string): Promise { - await octokit.git - .deleteRef({ - ...ownerRepo, - ref: `heads/${branchName}`, - }) - .catch(() => {}); -} -``` - -- [ ] **Step 2: Commit** - -```bash -git add e2e/helpers/github.ts -git commit -m "feat(e2e): add GitHub helper for PR/branch assertions and cleanup" -``` - ---- - -### Task 6: Redis Helper - -**Files:** -- Create: `e2e/helpers/redis.ts` - -- [ ] **Step 1: Create the Redis helper** - -```ts -// e2e/helpers/redis.ts -import { Redis } from "@upstash/redis"; -import { e2eEnv } from "../env.js"; - -const HASH_KEY = "blazebot:active-runs"; - -const redis = new Redis({ - url: e2eEnv.AI_WORKFLOW_KV_REST_API_URL, - token: e2eEnv.AI_WORKFLOW_KV_REST_API_TOKEN, -}); - -export async function getRunId(ticketKey: string): Promise { - return redis.hget(HASH_KEY, ticketKey); -} - -export async function listAll(): Promise< - Array<{ ticketKey: string; runId: string }> -> { - const all = await redis.hgetall>(HASH_KEY); - if (!all) return []; - return Object.entries(all).map(([ticketKey, runId]) => ({ - ticketKey, - runId, - })); -} - -export async function setEntry( - ticketKey: string, - runId: string, -): Promise { - await redis.hset(HASH_KEY, { [ticketKey]: runId }); -} - -export async function cleanup(ticketKey: string): Promise { - await redis.hdel(HASH_KEY, ticketKey).catch(() => {}); -} -``` - -- [ ] **Step 2: Commit** - -```bash -git add e2e/helpers/redis.ts -git commit -m "feat(e2e): add Redis helper for registry assertions and cleanup" -``` - ---- - -### Task 7: Webhook Helper - -**Files:** -- Create: `e2e/helpers/webhook.ts` - -This helper crafts Jira-style webhook payloads, signs them with HMAC-SHA256, and sends them to the deployed app. - -- [ ] **Step 1: Create the webhook helper** - -```ts -// e2e/helpers/webhook.ts -import { createHmac } from "node:crypto"; -import { e2eEnv } from "../env.js"; - -function sign(body: string, secret: string): string { - return "sha256=" + createHmac("sha256", secret).update(body).digest("hex"); -} - -export interface WebhookOptions { - invalidSignature?: boolean; - omitSignature?: boolean; -} - -export function makeDispatchPayload(ticketKey: string) { - return { - webhookEvent: "jira:issue_updated", - issue: { key: ticketKey }, - changelog: { - items: [ - { - field: "status", - fromString: "To Do", - toString: e2eEnv.COLUMN_AI, - }, - ], - }, - }; -} - -export function makeCancelPayload(ticketKey: string) { - return { - webhookEvent: "jira:issue_updated", - issue: { key: ticketKey }, - changelog: { - items: [ - { - field: "status", - fromString: e2eEnv.COLUMN_AI, - toString: "In Progress", - }, - ], - }, - }; -} - -export function makeIgnorePayload(ticketKey: string) { - return { - webhookEvent: "jira:issue_updated", - issue: { key: ticketKey }, - changelog: { - items: [ - { - field: "summary", - fromString: "Old title", - toString: "New title", - }, - ], - }, - }; -} - -export async function sendJiraWebhook( - payload: Record, - options: WebhookOptions = {}, -): Promise<{ status: number; body: any }> { - const rawBody = JSON.stringify(payload); - const headers: Record = { - "Content-Type": "application/json", - }; - - if (!options.omitSignature) { - if (options.invalidSignature) { - headers["x-hub-signature"] = "sha256=invalid"; - } else { - headers["x-hub-signature"] = sign(rawBody, e2eEnv.JIRA_WEBHOOK_SECRET); - } - } - - const res = await fetch(`${e2eEnv.E2E_BASE_URL}/webhooks/jira`, { - method: "POST", - headers, - body: rawBody, - }); - - let body: any; - try { - body = await res.json(); - } catch { - body = null; - } - - return { status: res.status, body }; -} - -export async function callCronPoll(opts?: { - omitAuth?: boolean; -}): Promise<{ status: number; body: any }> { - const headers: Record = {}; - if (!opts?.omitAuth) { - headers["Authorization"] = `Bearer ${e2eEnv.CRON_SECRET}`; - } - - const res = await fetch(`${e2eEnv.E2E_BASE_URL}/cron/poll`, { - method: "GET", - headers, - }); - - let body: any; - try { - body = await res.json(); - } catch { - body = null; - } - - return { status: res.status, body }; -} -``` - -- [ ] **Step 2: Commit** - -```bash -git add e2e/helpers/webhook.ts -git commit -m "feat(e2e): add webhook helper with HMAC signing and cron poll caller" -``` - ---- - -### Task 8: Tier 1 — Webhook Signature Tests - -**Files:** -- Create: `e2e/tier1/webhook-signature.test.ts` - -- [ ] **Step 1: Write the tests** - -```ts -// e2e/tier1/webhook-signature.test.ts -import { describe, it, expect } from "vitest"; -import { sendJiraWebhook, makeDispatchPayload } from "../helpers/webhook.js"; - -describe("webhook signature validation", () => { - const payload = makeDispatchPayload("FAKE-1"); - - it("accepts a valid signature", async () => { - const { status, body } = await sendJiraWebhook(payload); - // The ticket doesn't exist in Jira, so dispatch may fail, - // but the signature was accepted (not 401) - expect(status).not.toBe(401); - }); - - it("rejects an invalid signature", async () => { - const { status } = await sendJiraWebhook(payload, { - invalidSignature: true, - }); - expect(status).toBe(401); - }); - - it("rejects a missing signature", async () => { - const { status } = await sendJiraWebhook(payload, { - omitSignature: true, - }); - expect(status).toBe(401); - }); - - it("rejects an empty body", async () => { - const res = await fetch( - `${(await import("../env.js")).e2eEnv.E2E_BASE_URL}/webhooks/jira`, - { - method: "POST", - headers: { "Content-Type": "application/json" }, - }, - ); - expect(res.status).toBe(400); - }); -}); -``` - -- [ ] **Step 2: Run test to verify it works against the deployed app** - -Run: `npm run test:e2e:tier1 -- --testPathPattern webhook-signature` - -Expected: 4 tests pass (assuming the app is deployed and `.env.e2e` is configured). - -- [ ] **Step 3: Commit** - -```bash -git add e2e/tier1/webhook-signature.test.ts -git commit -m "feat(e2e): add webhook signature validation tests" -``` - ---- - -### Task 9: Tier 1 — Webhook Dispatch Test - -**Files:** -- Create: `e2e/tier1/webhook-dispatch.test.ts` - -- [ ] **Step 1: Write the test** - -```ts -// e2e/tier1/webhook-dispatch.test.ts -import { describe, it, expect, afterAll } from "vitest"; -import { createTestTicket, moveTicketToColumn, deleteTicket } from "../helpers/jira.js"; -import { sendJiraWebhook, makeDispatchPayload } from "../helpers/webhook.js"; -import { getRunId, cleanup as redisCleanup } from "../helpers/redis.js"; -import { e2eEnv } from "../env.js"; - -describe("webhook dispatch", () => { - let ticketKey: string; - - afterAll(async () => { - if (ticketKey) { - await redisCleanup(ticketKey); - await deleteTicket(ticketKey); - } - }); - - it("dispatches a ticket when moved to AI column", async () => { - const ticket = await createTestTicket(); - ticketKey = ticket.ticketKey; - - await moveTicketToColumn(ticketKey, e2eEnv.COLUMN_AI); - - const payload = makeDispatchPayload(ticketKey); - const { status, body } = await sendJiraWebhook(payload); - - expect(status).toBe(200); - expect(body.ok).toBe(true); - expect(body.action).toBe("dispatch"); - expect(body.dispatched).toBe(true); - - // Give the server a moment to write to Redis - await new Promise((r) => setTimeout(r, 2_000)); - - const runId = await getRunId(ticketKey); - expect(runId).toBeTruthy(); - }); -}); -``` - -- [ ] **Step 2: Commit** - -```bash -git add e2e/tier1/webhook-dispatch.test.ts -git commit -m "feat(e2e): add webhook dispatch test" -``` - ---- - -### Task 10: Tier 1 — Webhook Cancel Test - -**Files:** -- Create: `e2e/tier1/webhook-cancel.test.ts` - -- [ ] **Step 1: Write the test** - -```ts -// e2e/tier1/webhook-cancel.test.ts -import { describe, it, expect, afterAll } from "vitest"; -import { createTestTicket, moveTicketToColumn, deleteTicket } from "../helpers/jira.js"; -import { - sendJiraWebhook, - makeDispatchPayload, - makeCancelPayload, -} from "../helpers/webhook.js"; -import { getRunId, cleanup as redisCleanup } from "../helpers/redis.js"; -import { e2eEnv } from "../env.js"; - -describe("webhook cancel", () => { - let ticketKey: string; - - afterAll(async () => { - if (ticketKey) { - await redisCleanup(ticketKey); - await deleteTicket(ticketKey); - } - }); - - it("cancels a dispatched ticket when moved away from AI column", async () => { - const ticket = await createTestTicket(); - ticketKey = ticket.ticketKey; - - // Dispatch first - await moveTicketToColumn(ticketKey, e2eEnv.COLUMN_AI); - const dispatchPayload = makeDispatchPayload(ticketKey); - const dispatchRes = await sendJiraWebhook(dispatchPayload); - expect(dispatchRes.body.dispatched).toBe(true); - - // Wait for Redis entry - await new Promise((r) => setTimeout(r, 2_000)); - - // Move away and send cancel webhook - await moveTicketToColumn(ticketKey, e2eEnv.COLUMN_BACKLOG); - const cancelPayload = makeCancelPayload(ticketKey); - const { status, body } = await sendJiraWebhook(cancelPayload); - - expect(status).toBe(200); - expect(body.ok).toBe(true); - expect(body.action).toBe("cancel"); - - // Verify Redis entry is cleaned up - await new Promise((r) => setTimeout(r, 2_000)); - const runId = await getRunId(ticketKey); - expect(runId).toBeNull(); - }); -}); -``` - -- [ ] **Step 2: Commit** - -```bash -git add e2e/tier1/webhook-cancel.test.ts -git commit -m "feat(e2e): add webhook cancel test" -``` - ---- - -### Task 11: Tier 1 — Webhook Ignore Test - -**Files:** -- Create: `e2e/tier1/webhook-ignore.test.ts` - -- [ ] **Step 1: Write the test** - -```ts -// e2e/tier1/webhook-ignore.test.ts -import { describe, it, expect } from "vitest"; -import { sendJiraWebhook, makeIgnorePayload } from "../helpers/webhook.js"; - -describe("webhook ignore", () => { - it("ignores non-status-change events", async () => { - const payload = makeIgnorePayload("FAKE-999"); - const { status, body } = await sendJiraWebhook(payload); - - expect(status).toBe(200); - expect(body.ok).toBe(true); - expect(body.action).toBe("ignored"); - }); -}); -``` - -- [ ] **Step 2: Commit** - -```bash -git add e2e/tier1/webhook-ignore.test.ts -git commit -m "feat(e2e): add webhook ignore test" -``` - ---- - -### Task 12: Tier 1 — Cron Poll Test - -**Files:** -- Create: `e2e/tier1/cron-poll.test.ts` - -- [ ] **Step 1: Write the test** - -```ts -// e2e/tier1/cron-poll.test.ts -import { describe, it, expect, afterAll } from "vitest"; -import { createTestTicket, moveTicketToColumn, deleteTicket } from "../helpers/jira.js"; -import { callCronPoll } from "../helpers/webhook.js"; -import { cleanup as redisCleanup } from "../helpers/redis.js"; -import { e2eEnv } from "../env.js"; - -describe("cron poll", () => { - let ticketKey: string; - - afterAll(async () => { - if (ticketKey) { - await redisCleanup(ticketKey); - await deleteTicket(ticketKey); - } - }); - - it("discovers tickets in the AI column", async () => { - const ticket = await createTestTicket(); - ticketKey = ticket.ticketKey; - await moveTicketToColumn(ticketKey, e2eEnv.COLUMN_AI); - - const { status, body } = await callCronPoll(); - - expect(status).toBe(200); - expect(body.status).toBe("ok"); - expect(body.discovered).toBeGreaterThanOrEqual(1); - }); - - it("rejects unauthenticated requests", async () => { - const { status } = await callCronPoll({ omitAuth: true }); - expect(status).toBe(401); - }); -}); -``` - -- [ ] **Step 2: Commit** - -```bash -git add e2e/tier1/cron-poll.test.ts -git commit -m "feat(e2e): add cron poll tests" -``` - ---- - -### Task 13: Tier 1 — Cron Reconciliation Test - -**Files:** -- Create: `e2e/tier1/cron-reconciliation.test.ts` - -- [ ] **Step 1: Write the test** - -```ts -// e2e/tier1/cron-reconciliation.test.ts -import { describe, it, expect, afterAll } from "vitest"; -import { callCronPoll } from "../helpers/webhook.js"; -import { setEntry, getRunId, cleanup as redisCleanup } from "../helpers/redis.js"; - -describe("cron reconciliation", () => { - // Use a ticket key that definitely doesn't exist in the AI column - const fakeTicketKey = `E2E-STALE-${Date.now()}`; - const fakeRunId = "stale-run-id-for-reconciliation"; - - afterAll(async () => { - await redisCleanup(fakeTicketKey); - }); - - it("cleans up stale Redis entries for tickets not in AI column", async () => { - // Insert a fake stale entry directly into Redis - await setEntry(fakeTicketKey, fakeRunId); - - // Verify it's there - const before = await getRunId(fakeTicketKey); - expect(before).toBe(fakeRunId); - - // Trigger poll — it should reconcile and clean up the stale entry - const { status, body } = await callCronPoll(); - - expect(status).toBe(200); - // The stale run should have been cancelled or cleaned - expect(body.cancelled + body.cleaned).toBeGreaterThanOrEqual(1); - - // Verify Redis entry is gone - const after = await getRunId(fakeTicketKey); - expect(after).toBeNull(); - }); -}); -``` - -- [ ] **Step 2: Commit** - -```bash -git add e2e/tier1/cron-reconciliation.test.ts -git commit -m "feat(e2e): add cron reconciliation test" -``` - ---- - -### Task 14: Tier 1 — Duplicate Dispatch Test - -**Files:** -- Create: `e2e/tier1/duplicate-dispatch.test.ts` - -- [ ] **Step 1: Write the test** - -```ts -// e2e/tier1/duplicate-dispatch.test.ts -import { describe, it, expect, afterAll } from "vitest"; -import { createTestTicket, moveTicketToColumn, deleteTicket } from "../helpers/jira.js"; -import { sendJiraWebhook, makeDispatchPayload } from "../helpers/webhook.js"; -import { cleanup as redisCleanup } from "../helpers/redis.js"; -import { e2eEnv } from "../env.js"; - -describe("duplicate dispatch", () => { - let ticketKey: string; - - afterAll(async () => { - if (ticketKey) { - await redisCleanup(ticketKey); - await deleteTicket(ticketKey); - } - }); - - it("rejects a second dispatch for the same ticket", async () => { - const ticket = await createTestTicket(); - ticketKey = ticket.ticketKey; - - await moveTicketToColumn(ticketKey, e2eEnv.COLUMN_AI); - - const payload = makeDispatchPayload(ticketKey); - - // First dispatch — should succeed - const first = await sendJiraWebhook(payload); - expect(first.body.dispatched).toBe(true); - - // Second dispatch — should be rejected as already claimed - const second = await sendJiraWebhook(payload); - expect(second.status).toBe(200); - expect(second.body.dispatched).toBe(false); - expect(second.body.reason).toBe("already_claimed"); - }); -}); -``` - -- [ ] **Step 2: Commit** - -```bash -git add e2e/tier1/duplicate-dispatch.test.ts -git commit -m "feat(e2e): add duplicate dispatch test" -``` - ---- - -### Task 15: Tier 2 — Implementation Happy Path + Review-Fix Flow - -**Files:** -- Create: `e2e/tier2/implementation-and-review.test.ts` - -These two flows share a ticket/PR lifecycle: implementation creates the PR, review-fix adds to it. They run in sequence within a single describe block. - -- [ ] **Step 1: Write the test** - -```ts -// e2e/tier2/implementation-and-review.test.ts -import { describe, it, expect, afterAll } from "vitest"; -import { - createTestTicket, - moveTicketToColumn, - getTicketStatus, - deleteTicket, -} from "../helpers/jira.js"; -import { - findPR, - getPRCommits, - addPRComment, - closePR, - deleteBranch, -} from "../helpers/github.js"; -import { getRunId, cleanup as redisCleanup } from "../helpers/redis.js"; -import { sendJiraWebhook, makeDispatchPayload } from "../helpers/webhook.js"; -import { waitFor } from "../helpers/wait.js"; -import { e2eEnv } from "../env.js"; - -describe("implementation happy path → review-fix flow", () => { - let ticketKey: string; - let branchName: string; - let prNumber: number | undefined; - - afterAll(async () => { - if (prNumber) await closePR(prNumber); - if (branchName) await deleteBranch(branchName); - if (ticketKey) { - await redisCleanup(ticketKey); - await deleteTicket(ticketKey); - } - }); - - it("implements a ticket and creates a PR", async () => { - // Create a ticket with a simple, concrete task - const ticket = await createTestTicket({ - summary: `[E2E] Add GET /ping endpoint`, - description: - "Add a GET /ping endpoint that returns { ping: 'pong' } with status 200.", - }); - ticketKey = ticket.ticketKey; - branchName = `blazebot/${ticketKey.toLowerCase()}`; - - // Move to AI column and dispatch - await moveTicketToColumn(ticketKey, e2eEnv.COLUMN_AI); - const payload = makeDispatchPayload(ticketKey); - const { body } = await sendJiraWebhook(payload); - expect(body.dispatched).toBe(true); - - // Wait for PR to appear (up to 35 min) - const pr = await waitFor(() => findPR(branchName), { - description: `PR for branch ${branchName}`, - timeoutMs: 2_100_000, - }); - prNumber = pr.number; - - // Verify PR has commits - const commits = await getPRCommits(prNumber); - expect(commits.length).toBeGreaterThan(0); - - // Verify ticket moved to AI Review - await waitFor( - async () => { - const status = await getTicketStatus(ticketKey); - return status === e2eEnv.COLUMN_AI_REVIEW ? status : null; - }, - { - description: `ticket ${ticketKey} moved to ${e2eEnv.COLUMN_AI_REVIEW}`, - timeoutMs: 60_000, - }, - ); - - // Verify Redis entry is cleaned up - await waitFor( - async () => { - const runId = await getRunId(ticketKey); - return runId === null ? true : null; - }, - { - description: `Redis entry cleaned for ${ticketKey}`, - timeoutMs: 30_000, - }, - ); - }); - - it("fixes PR based on review feedback", async () => { - // This test depends on the previous test having created a PR - expect(prNumber).toBeDefined(); - - // Record commit count before review-fix - const commitsBefore = await getPRCommits(prNumber!); - const commitCountBefore = commitsBefore.length; - - // Add a review comment - await addPRComment( - prNumber!, - "Please rename the endpoint to `/healthcheck` instead of `/ping`.", - ); - - // Move ticket back to AI column and dispatch review-fix - await moveTicketToColumn(ticketKey, e2eEnv.COLUMN_AI); - const payload = makeDispatchPayload(ticketKey); - const { body } = await sendJiraWebhook(payload); - expect(body.dispatched).toBe(true); - - // Wait for ticket to move back to AI Review (review-fix completed) - await waitFor( - async () => { - const status = await getTicketStatus(ticketKey); - return status === e2eEnv.COLUMN_AI_REVIEW ? status : null; - }, - { - description: `ticket ${ticketKey} moved back to ${e2eEnv.COLUMN_AI_REVIEW} after review-fix`, - timeoutMs: 2_100_000, - }, - ); - - // Verify PR has new commits - const commitsAfter = await getPRCommits(prNumber!); - expect(commitsAfter.length).toBeGreaterThan(commitCountBefore); - - // Verify Redis entry is cleaned up - await waitFor( - async () => { - const runId = await getRunId(ticketKey); - return runId === null ? true : null; - }, - { - description: `Redis entry cleaned for ${ticketKey} after review-fix`, - timeoutMs: 30_000, - }, - ); - }); -}); -``` - -- [ ] **Step 2: Commit** - -```bash -git add e2e/tier2/implementation-and-review.test.ts -git commit -m "feat(e2e): add implementation happy path and review-fix flow tests" -``` - ---- - -### Task 16: Tier 2 — Clarification Flow Test - -**Files:** -- Create: `e2e/tier2/clarification-flow.test.ts` - -- [ ] **Step 1: Write the test** - -```ts -// e2e/tier2/clarification-flow.test.ts -import { describe, it, expect, afterAll } from "vitest"; -import { - createTestTicket, - moveTicketToColumn, - getTicketStatus, - getTicketComments, - deleteTicket, -} from "../helpers/jira.js"; -import { getRunId, cleanup as redisCleanup } from "../helpers/redis.js"; -import { deleteBranch } from "../helpers/github.js"; -import { sendJiraWebhook, makeDispatchPayload } from "../helpers/webhook.js"; -import { waitFor } from "../helpers/wait.js"; -import { e2eEnv } from "../env.js"; - -describe("clarification flow", () => { - let ticketKey: string; - let branchName: string; - - afterAll(async () => { - if (branchName) await deleteBranch(branchName); - if (ticketKey) { - await redisCleanup(ticketKey); - await deleteTicket(ticketKey); - } - }); - - it("moves a vague ticket to Backlog with clarification questions", async () => { - const ticket = await createTestTicket({ - summary: `[E2E] Do the thing`, - description: "Do the thing", - }); - ticketKey = ticket.ticketKey; - branchName = `blazebot/${ticketKey.toLowerCase()}`; - - // Move to AI column and dispatch - await moveTicketToColumn(ticketKey, e2eEnv.COLUMN_AI); - const payload = makeDispatchPayload(ticketKey); - const { body } = await sendJiraWebhook(payload); - expect(body.dispatched).toBe(true); - - // Wait for ticket to move to Backlog (clarification needed) - await waitFor( - async () => { - const status = await getTicketStatus(ticketKey); - return status === e2eEnv.COLUMN_BACKLOG ? status : null; - }, - { - description: `ticket ${ticketKey} moved to ${e2eEnv.COLUMN_BACKLOG}`, - timeoutMs: 2_100_000, - }, - ); - - // Verify ticket has a comment with questions - const comments = await getTicketComments(ticketKey); - const clarificationComment = comments.find( - (c) => /\d+\./.test(c.body), // Contains numbered items like "1. ..." - ); - expect(clarificationComment).toBeDefined(); - - // Verify Redis entry is cleaned up - await waitFor( - async () => { - const runId = await getRunId(ticketKey); - return runId === null ? true : null; - }, - { - description: `Redis entry cleaned for ${ticketKey}`, - timeoutMs: 30_000, - }, - ); - }); -}); -``` - -- [ ] **Step 2: Commit** - -```bash -git add e2e/tier2/clarification-flow.test.ts -git commit -m "feat(e2e): add clarification flow test" -``` - ---- - -### Task 17: GitHub Actions Workflow - -**Files:** -- Create: `.github/workflows/e2e.yml` - -- [ ] **Step 1: Create the workflows directory and file** - -```bash -mkdir -p .github/workflows -``` - -- [ ] **Step 2: Write the workflow file** - -```yaml -# .github/workflows/e2e.yml -name: E2E Tests - -on: - workflow_dispatch: - inputs: - tier: - description: "Which tier to run" - type: choice - options: - - tier1 - - tier2 - - all - default: all - -jobs: - e2e-tier1: - if: inputs.tier == 'tier1' || inputs.tier == 'all' - runs-on: ubuntu-latest - timeout-minutes: 15 - environment: e2e - env: - E2E_BASE_URL: ${{ secrets.E2E_BASE_URL }} - JIRA_BASE_URL: ${{ secrets.JIRA_BASE_URL }} - JIRA_EMAIL: ${{ secrets.JIRA_EMAIL }} - JIRA_API_TOKEN: ${{ secrets.JIRA_API_TOKEN }} - JIRA_PROJECT_KEY: ${{ secrets.JIRA_PROJECT_KEY }} - JIRA_WEBHOOK_SECRET: ${{ secrets.JIRA_WEBHOOK_SECRET }} - COLUMN_AI: ${{ secrets.COLUMN_AI }} - COLUMN_AI_REVIEW: ${{ secrets.COLUMN_AI_REVIEW }} - COLUMN_BACKLOG: ${{ secrets.COLUMN_BACKLOG }} - GITHUB_TOKEN: ${{ secrets.E2E_GITHUB_TOKEN }} - GITHUB_OWNER: ${{ secrets.E2E_GITHUB_OWNER }} - GITHUB_REPO: ${{ secrets.E2E_GITHUB_REPO }} - CRON_SECRET: ${{ secrets.CRON_SECRET }} - AI_WORKFLOW_KV_REST_API_URL: ${{ secrets.AI_WORKFLOW_KV_REST_API_URL }} - AI_WORKFLOW_KV_REST_API_TOKEN: ${{ secrets.AI_WORKFLOW_KV_REST_API_TOKEN }} - steps: - - uses: actions/checkout@v4 - - uses: actions/setup-node@v4 - with: - node-version: 20 - cache: npm - - run: npm ci - - run: npm run test:e2e:tier1 - - if: failure() - uses: actions/upload-artifact@v4 - with: - name: e2e-tier1-results - path: | - e2e/**/*.log - retention-days: 7 - - e2e-tier2: - if: inputs.tier == 'tier2' || inputs.tier == 'all' - needs: ${{ inputs.tier == 'all' && 'e2e-tier1' || '' }} - runs-on: ubuntu-latest - timeout-minutes: 150 - environment: e2e - env: - E2E_BASE_URL: ${{ secrets.E2E_BASE_URL }} - JIRA_BASE_URL: ${{ secrets.JIRA_BASE_URL }} - JIRA_EMAIL: ${{ secrets.JIRA_EMAIL }} - JIRA_API_TOKEN: ${{ secrets.JIRA_API_TOKEN }} - JIRA_PROJECT_KEY: ${{ secrets.JIRA_PROJECT_KEY }} - JIRA_WEBHOOK_SECRET: ${{ secrets.JIRA_WEBHOOK_SECRET }} - COLUMN_AI: ${{ secrets.COLUMN_AI }} - COLUMN_AI_REVIEW: ${{ secrets.COLUMN_AI_REVIEW }} - COLUMN_BACKLOG: ${{ secrets.COLUMN_BACKLOG }} - GITHUB_TOKEN: ${{ secrets.E2E_GITHUB_TOKEN }} - GITHUB_OWNER: ${{ secrets.E2E_GITHUB_OWNER }} - GITHUB_REPO: ${{ secrets.E2E_GITHUB_REPO }} - CRON_SECRET: ${{ secrets.CRON_SECRET }} - AI_WORKFLOW_KV_REST_API_URL: ${{ secrets.AI_WORKFLOW_KV_REST_API_URL }} - AI_WORKFLOW_KV_REST_API_TOKEN: ${{ secrets.AI_WORKFLOW_KV_REST_API_TOKEN }} - steps: - - uses: actions/checkout@v4 - - uses: actions/setup-node@v4 - with: - node-version: 20 - cache: npm - - run: npm ci - - run: npm run test:e2e:tier2 - - if: failure() - uses: actions/upload-artifact@v4 - with: - name: e2e-tier2-results - path: | - e2e/**/*.log - retention-days: 7 -``` - -Note: `GITHUB_TOKEN` and `GITHUB_OWNER` / `GITHUB_REPO` are named `E2E_GITHUB_TOKEN`, `E2E_GITHUB_OWNER`, `E2E_GITHUB_REPO` in secrets to avoid collision with the built-in `GITHUB_TOKEN` that GitHub Actions provides automatically. - -- [ ] **Step 3: Commit** - -```bash -git add .github/workflows/e2e.yml -git commit -m "feat(e2e): add GitHub Actions workflow with tier selection" -``` - ---- - -### Task 18: Fix CI Workflow `needs` Conditional - -The `needs` field in GitHub Actions doesn't support conditional expressions inline. Simplify by using two separate jobs for `tier2`: - -- [ ] **Step 1: Update the workflow to handle the conditional dependency correctly** - -Replace the `e2e-tier2` job definition in `.github/workflows/e2e.yml` with: - -```yaml - e2e-tier2: - if: inputs.tier == 'tier2' || inputs.tier == 'all' - needs: [e2e-tier1] - runs-on: ubuntu-latest - timeout-minutes: 150 - environment: e2e - env: - E2E_BASE_URL: ${{ secrets.E2E_BASE_URL }} - JIRA_BASE_URL: ${{ secrets.JIRA_BASE_URL }} - JIRA_EMAIL: ${{ secrets.JIRA_EMAIL }} - JIRA_API_TOKEN: ${{ secrets.JIRA_API_TOKEN }} - JIRA_PROJECT_KEY: ${{ secrets.JIRA_PROJECT_KEY }} - JIRA_WEBHOOK_SECRET: ${{ secrets.JIRA_WEBHOOK_SECRET }} - COLUMN_AI: ${{ secrets.COLUMN_AI }} - COLUMN_AI_REVIEW: ${{ secrets.COLUMN_AI_REVIEW }} - COLUMN_BACKLOG: ${{ secrets.COLUMN_BACKLOG }} - GITHUB_TOKEN: ${{ secrets.E2E_GITHUB_TOKEN }} - GITHUB_OWNER: ${{ secrets.E2E_GITHUB_OWNER }} - GITHUB_REPO: ${{ secrets.E2E_GITHUB_REPO }} - CRON_SECRET: ${{ secrets.CRON_SECRET }} - AI_WORKFLOW_KV_REST_API_URL: ${{ secrets.AI_WORKFLOW_KV_REST_API_URL }} - AI_WORKFLOW_KV_REST_API_TOKEN: ${{ secrets.AI_WORKFLOW_KV_REST_API_TOKEN }} - steps: - - uses: actions/checkout@v4 - - uses: actions/setup-node@v4 - with: - node-version: 20 - cache: npm - - run: npm ci - - run: npm run test:e2e:tier2 - - if: failure() - uses: actions/upload-artifact@v4 - with: - name: e2e-tier2-results - path: | - e2e/**/*.log - retention-days: 7 -``` - -The `needs: [e2e-tier1]` with `if: inputs.tier == 'tier2' || inputs.tier == 'all'` means: -- When `tier=all`: tier1 runs first, tier2 waits for it. -- When `tier=tier2`: tier1 is skipped (its `if` fails), and GitHub Actions treats a skipped dependency as satisfied, so tier2 runs immediately. -- When `tier=tier1`: tier2's own `if` fails, so it's skipped. - -- [ ] **Step 2: Commit** - -```bash -git add .github/workflows/e2e.yml -git commit -m "fix(e2e): correct tier2 job dependency handling in CI workflow" -``` - ---- - -### Task 19: Final Verification - -- [ ] **Step 1: Verify all files exist** - -Run: -```bash -find e2e -type f | sort && ls .github/workflows/e2e.yml && cat .env.e2e.example -``` - -Expected: all files from the file map are present. - -- [ ] **Step 2: Verify TypeScript compiles** - -Run: -```bash -npx tsc --noEmit --project tsconfig.json 2>&1 | head -30 -``` - -Note: If the e2e directory isn't included in tsconfig.json, you may need to check that vitest handles its own TS compilation (it does via esbuild). This step just confirms no import errors in the main project. - -- [ ] **Step 3: Verify the unit test suite still passes** - -Run: -```bash -npm run test -``` - -Expected: existing 50 tests pass (the e2e tests are excluded by the main vitest config since they're in `e2e/`, not `src/`). - -- [ ] **Step 4: Verify e2e config loads** - -Run: -```bash -npx vitest run --config e2e/vitest.e2e.config.ts --passWithNoTests 2>&1 | tail -5 -``` - -Expected: no config errors (tests may fail due to missing env vars, which is fine). - -- [ ] **Step 5: Commit any fixes** - -If any issues were found in steps 1-4, fix them and commit: - -```bash -git add -A -git commit -m "fix(e2e): address verification issues" -``` diff --git a/docs/superpowers/plans/2026-04-01-sandbox-polling-suspension.md b/docs/superpowers/plans/2026-04-01-sandbox-polling-suspension.md deleted file mode 100644 index 5f50ee5..0000000 --- a/docs/superpowers/plans/2026-04-01-sandbox-polling-suspension.md +++ /dev/null @@ -1,889 +0,0 @@ -# Sandbox Polling Suspension Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Suspend the workflow while the sandbox runs the Claude Code agent, using a sleep+poll pattern so the workflow consumes zero resources during the 10-30 min agent execution. - -**Architecture:** Split the blocking `runAgentInSandbox` step into three phases: (1) provision sandbox + start agent detached, (2) poll for completion with `sleep("30s")` intervals (workflow truly suspends between polls), (3) collect results + teardown. A bash wrapper script inside the sandbox runs claude, does cleanup, and writes sentinel files. Debug mode (live log streaming via `getWritable`) is removed — the polling approach replaces it entirely. - -**Tech Stack:** Vercel Workflow DevKit (`sleep` from `"workflow"`), `@vercel/sandbox` (`Sandbox.get()` for reconnection), Nitro (h3 routes) - ---- - -## File Structure - -| Action | File | Responsibility | -|--------|------|---------------| -| Create | `src/sandbox/wrapper-script.ts` | Generates the bash wrapper script for detached agent execution | -| Create | `src/sandbox/poll-agent.ts` | Step functions: `checkAgentDone`, `collectAgentResults`, `teardownSandbox` | -| Modify | `src/sandbox/manager.ts` | Extract `getSandboxCredentials()`, add wrapper script installation to `provision()` | -| Modify | `src/sandbox/run-agent.ts` | Replace `runAgent()` with `startAgentDetached()`; remove debug streaming code | -| Modify | `src/workflows/implementation.ts` | Replace single blocking step with provision → poll loop → collect pattern | -| Modify | `src/workflows/review-fix.ts` | Same poll pattern as implementation workflow | -| Create | `src/sandbox/wrapper-script.test.ts` | Test wrapper script generation | -| Create | `src/sandbox/poll-agent.test.ts` | Test polling and result collection | - ---- - -### Task 1: Extract sandbox credentials helper - -Move credential resolution from inline in `SandboxManager.provision()` into a reusable function, so both provisioning and reconnection steps can authenticate with the Sandbox API. - -**Files:** -- Create: `src/sandbox/credentials.ts` -- Modify: `src/sandbox/manager.ts:49-59` (use the new helper) - -- [ ] **Step 1: Create `src/sandbox/credentials.ts`** - -```ts -// src/sandbox/credentials.ts -import type { Sandbox as SandboxType } from "@vercel/sandbox"; - -type Credentials = { - token: string; - teamId: string; - projectId: string; -}; - -/** - * Returns explicit Sandbox credentials when all three env vars are set (local dev). - * On Vercel, returns empty object — the SDK authenticates via OIDC automatically. - */ -export function getSandboxCredentials(): Partial { - const token = process.env.VERCEL_TOKEN; - const teamId = process.env.VERCEL_TEAM_ID; - const projectId = process.env.VERCEL_PROJECT_ID; - - if (token && teamId && projectId) { - return { token, teamId, projectId }; - } - return {}; -} -``` - -- [ ] **Step 2: Update `SandboxManager.provision()` to use the helper** - -In `src/sandbox/manager.ts`, replace the inline credential logic with the helper. - -Replace lines 43-59 (inside `provision()`): -```ts -// Before: -if (!this.config.claudeCodeOauthToken && !this.config.anthropicApiKey) { - throw new Error("Either anthropicApiKey or claudeCodeOauthToken must be provided"); -} - -const hasExplicitCredentials = - this.config.vercelToken && this.config.vercelTeamId && this.config.vercelProjectId; - -const sandbox = await Sandbox.create({ - ...(hasExplicitCredentials - ? { - token: this.config.vercelToken, - teamId: this.config.vercelTeamId, - projectId: this.config.vercelProjectId, - } - : {}), - // ... -}); -``` - -With: -```ts -// After: -import { getSandboxCredentials } from "./credentials.js"; - -if (!this.config.claudeCodeOauthToken && !this.config.anthropicApiKey) { - throw new Error("Either anthropicApiKey or claudeCodeOauthToken must be provided"); -} - -const sandbox = await Sandbox.create({ - ...getSandboxCredentials(), - // ... rest stays the same -}); -``` - -- [ ] **Step 3: Remove `vercelToken`, `vercelTeamId`, `vercelProjectId` from `SandboxConfig`** - -These are now read from `process.env` by `getSandboxCredentials()`. Remove them from the `SandboxConfig` interface and all call sites that pass them. - -In `src/sandbox/manager.ts`: -```ts -// Remove these three fields from SandboxConfig: -export interface SandboxConfig { - githubToken: string; - owner: string; - repo: string; - anthropicApiKey?: string; - claudeCodeOauthToken?: string; - claudeModel: string; - commitAuthor: string; - commitEmail: string; - jobTimeoutMs: number; - // REMOVE: vercelToken, vercelTeamId, vercelProjectId -} -``` - -In `src/workflows/implementation.ts` (`runAgentInSandbox` step), remove these three lines from the `SandboxManager` constructor: -```ts -// REMOVE: -vercelToken: env.VERCEL_TOKEN, -vercelTeamId: env.VERCEL_TEAM_ID, -vercelProjectId: env.VERCEL_PROJECT_ID, -``` - -Same removal in `src/workflows/review-fix.ts` (`runFixingAgentInSandbox` step). - -- [ ] **Step 4: Run typecheck** - -Run: `pnpm typecheck` -Expected: PASS (no type errors) - -- [ ] **Step 5: Run existing tests** - -Run: `pnpm test` -Expected: PASS (all existing tests pass) - -- [ ] **Step 6: Commit** - -```bash -git add src/sandbox/credentials.ts src/sandbox/manager.ts src/workflows/implementation.ts src/workflows/review-fix.ts -git commit -m "refactor: extract getSandboxCredentials into reusable helper" -``` - ---- - -### Task 2: Build the wrapper script generator - -Create a function that generates a bash script to run inside the sandbox. The script: runs claude (which commits via the stop hook), does artifact cleanup, and writes sentinel files signaling completion. The agent is responsible for committing — the wrapper does NOT auto-commit. - -**Files:** -- Create: `src/sandbox/wrapper-script.ts` -- Create: `src/sandbox/wrapper-script.test.ts` - -- [ ] **Step 1: Write failing test for `buildWrapperScript`** - -```ts -// src/sandbox/wrapper-script.test.ts -import { describe, it, expect } from "vitest"; -import { buildWrapperScript } from "./wrapper-script.js"; - -describe("buildWrapperScript", () => { - it("generates a bash script that runs claude and writes sentinel", () => { - const script = buildWrapperScript({ model: "claude-opus-4-6" }); - - expect(script).toContain("#!/bin/bash"); - expect(script).toContain("claude"); - expect(script).toContain("claude-opus-4-6"); - expect(script).toContain("/tmp/agent-done"); - expect(script).toContain("/tmp/agent-stdout.txt"); - expect(script).toContain("/tmp/agent-stderr.txt"); - expect(script).not.toContain("git commit"); // agent commits via stop hook, not wrapper - }); - - it("includes json-schema flag", () => { - const script = buildWrapperScript({ model: "claude-opus-4-6" }); - expect(script).toContain("--json-schema"); - expect(script).toContain("--output-format json"); - }); -}); -``` - -- [ ] **Step 2: Run test to verify it fails** - -Run: `pnpm test src/sandbox/wrapper-script.test.ts` -Expected: FAIL (module not found) - -- [ ] **Step 3: Implement `buildWrapperScript`** - -```ts -// src/sandbox/wrapper-script.ts -import { AGENT_SCHEMA } from "./agent-runner.js"; - -interface WrapperScriptOptions { - model: string; -} - -/** - * Generates a bash wrapper script that: - * 1. Runs claude --print with the given model (agent commits via stop hook) - * 2. Does cleanup (removes .claude/, requirements.md artifacts) - * 3. Writes stdout/stderr to /tmp/ files - * 4. Touches /tmp/agent-done as sentinel - * - * Designed to run detached inside a Vercel Sandbox. - * The agent is responsible for committing — this script does NOT auto-commit. - */ -export function buildWrapperScript(opts: WrapperScriptOptions): string { - const { model } = opts; - - // Escape single quotes in the schema for safe embedding in bash - const escapedSchema = AGENT_SCHEMA.replace(/'/g, "'\\''"); - - return `#!/bin/bash - -# --- Phase 1: Run Claude Code agent --- -cat /vercel/sandbox/requirements.md | claude \\ - --print \\ - --model "${model}" \\ - --dangerously-skip-permissions \\ - --output-format json \\ - --json-schema '${escapedSchema}' \\ - > /tmp/agent-stdout.txt 2>/tmp/agent-stderr.txt || true - -# --- Phase 2: Cleanup --- -cd /vercel/sandbox - -# Remove repo-level .claude/ artifacts that Claude Code auto-creates. -# git checkout restores any that were already committed. -rm -rf .claude/ requirements.md -git checkout -- .claude/ 2>/dev/null || true -git checkout -- requirements.md 2>/dev/null || true - -# --- Phase 3: Signal completion --- -touch /tmp/agent-done -`; -} -``` - -- [ ] **Step 4: Run test to verify it passes** - -Run: `pnpm test src/sandbox/wrapper-script.test.ts` -Expected: PASS - -- [ ] **Step 5: Commit** - -```bash -git add src/sandbox/wrapper-script.ts src/sandbox/wrapper-script.test.ts -git commit -m "feat: add wrapper script generator for detached sandbox agent execution" -``` - ---- - -### Task 3: Create polling and result collection step functions - -These `"use step"` functions reconnect to a sandbox by ID, check for the sentinel file, and collect results. - -**Files:** -- Create: `src/sandbox/poll-agent.ts` -- Create: `src/sandbox/poll-agent.test.ts` - -- [ ] **Step 1: Write failing tests for `checkAgentDone`** - -```ts -// src/sandbox/poll-agent.test.ts -import { describe, it, expect, vi, beforeEach } from "vitest"; - -const mockRunCommand = vi.fn(); -const mockReadFileToBuffer = vi.fn(); -const mockStop = vi.fn(); - -vi.mock("@vercel/sandbox", () => ({ - Sandbox: { - get: vi.fn(() => ({ - sandboxId: "sbx-test-123", - status: "running", - runCommand: mockRunCommand, - readFileToBuffer: mockReadFileToBuffer, - stop: mockStop, - })), - }, -})); - -// Must mock the module before importing -vi.mock("./credentials.js", () => ({ - getSandboxCredentials: () => ({}), -})); - -import { checkAgentDone, collectAgentResults, teardownSandbox } from "./poll-agent.js"; - -describe("checkAgentDone", () => { - beforeEach(() => vi.clearAllMocks()); - - it("returns false when sentinel file does not exist", async () => { - mockRunCommand.mockResolvedValue({ exitCode: 1 }); - - const result = await checkAgentDone("sbx-test-123"); - expect(result).toBe(false); - }); - - it("returns true when sentinel file exists", async () => { - mockRunCommand.mockResolvedValue({ exitCode: 0 }); - - const result = await checkAgentDone("sbx-test-123"); - expect(result).toBe(true); - }); - - it("returns 'stopped' when sandbox is not running and no sentinel", async () => { - const { Sandbox } = await import("@vercel/sandbox"); - (Sandbox.get as ReturnType).mockResolvedValueOnce({ - sandboxId: "sbx-test-123", - status: "stopped", - runCommand: mockRunCommand, - }); - // No sentinel check needed — sandbox is dead - - const result = await checkAgentDone("sbx-test-123"); - expect(result).toBe("stopped"); - }); -}); -``` - -- [ ] **Step 2: Run test to verify it fails** - -Run: `pnpm test src/sandbox/poll-agent.test.ts` -Expected: FAIL (module not found) - -- [ ] **Step 3: Implement `checkAgentDone`** - -```ts -// src/sandbox/poll-agent.ts -import { getSandboxCredentials } from "./credentials.js"; - -/** - * Reconnects to a sandbox and checks whether the agent has finished. - * Returns: - * - `true` if /tmp/agent-done sentinel exists - * - `false` if sandbox is running but agent not done yet - * - `"stopped"` if sandbox is no longer running (timeout/crash) - */ -export async function checkAgentDone( - sandboxId: string, -): Promise { - "use step"; - const { Sandbox } = await import("@vercel/sandbox"); - const sandbox = await Sandbox.get({ sandboxId, ...getSandboxCredentials() }); - - if (sandbox.status !== "running") { - return "stopped"; - } - - const result = await sandbox.runCommand("test", ["-f", "/tmp/agent-done"]); - return result.exitCode === 0; -} -``` - -- [ ] **Step 4: Run test to verify it passes** - -Run: `pnpm test src/sandbox/poll-agent.test.ts` -Expected: PASS - -- [ ] **Step 5: Write failing tests for `collectAgentResults`** - -Add to `src/sandbox/poll-agent.test.ts`: - -```ts -describe("collectAgentResults", () => { - beforeEach(() => vi.clearAllMocks()); - - it("reads stdout, stderr and extracts changed files", async () => { - const mockStdout = vi.fn(); - mockRunCommand.mockImplementation((...args: unknown[]) => { - const cmdArgs = (args[0] as string) === "bash" ? args[1] : args; - // Respond to different commands based on arguments - return { - exitCode: 0, - stdout: mockStdout, - }; - }); - - // cat /tmp/agent-stdout.txt - mockStdout - .mockResolvedValueOnce(JSON.stringify({ result: "implemented", summary: "Done" })) // stdout - .mockResolvedValueOnce("") // stderr - .mockResolvedValueOnce("abc123") // pre-agent sha - .mockResolvedValueOnce("src/index.ts"); // git diff --name-only - - mockReadFileToBuffer.mockResolvedValue(Buffer.from("console.log('hello')")); - - const result = await collectAgentResults("sbx-test-123"); - - expect(result.output.result).toBe("implemented"); - expect(result.files).toHaveLength(1); - expect(result.files[0].path).toBe("src/index.ts"); - expect(result.files[0].content).toBe("console.log('hello')"); - }); -}); -``` - -- [ ] **Step 6: Implement `collectAgentResults`** - -Add to `src/sandbox/poll-agent.ts`: - -```ts -import { parseAgentOutput } from "./agent-runner.js"; -import type { AgentOutput } from "./agent-runner.js"; - -/** - * Reconnects to the sandbox, reads agent stdout/stderr, extracts changed files, - * and returns the parsed result. - */ -export async function collectAgentResults( - sandboxId: string, -): Promise<{ output: AgentOutput; files: Array<{ path: string; content: string }> }> { - "use step"; - const { Sandbox } = await import("@vercel/sandbox"); - const sandbox = await Sandbox.get({ sandboxId, ...getSandboxCredentials() }); - - // Read agent output files - const stdoutResult = await sandbox.runCommand("cat", ["/tmp/agent-stdout.txt"]); - const stdout = (await stdoutResult.stdout()).trim(); - - const stderrResult = await sandbox.runCommand("cat", ["/tmp/agent-stderr.txt"]); - const stderr = (await stderrResult.stdout()).trim(); - - const raw = stdout || stderr; - const output = parseAgentOutput(raw); - - // Extract changed files (same logic as SandboxManager.extractChanges) - const baseResult = await sandbox.runCommand("bash", [ - "-c", - "cat /tmp/.pre-agent-sha 2>/dev/null || git rev-list --max-parents=0 HEAD", - ]); - const baseSha = (await baseResult.stdout()).trim(); - - let files: Array<{ path: string; content: string }> = []; - - if (baseSha) { - const diffResult = await sandbox.runCommand("git", [ - "diff", "--name-only", baseSha, "HEAD", - ]); - const diffOutput = (await diffResult.stdout()).trim(); - - if (diffOutput) { - const filePaths = diffOutput - .split("\n") - .filter(Boolean) - .filter((p) => p !== "requirements.md") - .filter((p) => !p.startsWith(".claude/")); - - for (const filePath of filePaths) { - const buf = await sandbox.readFileToBuffer({ - path: filePath, - cwd: "/vercel/sandbox", - }); - if (buf) { - files.push({ path: filePath, content: buf.toString("utf-8") }); - } - } - } - } - - return { output, files }; -} - -/** - * Reconnects to a sandbox and stops it. - */ -export async function teardownSandbox(sandboxId: string): Promise { - "use step"; - const { Sandbox } = await import("@vercel/sandbox"); - try { - const sandbox = await Sandbox.get({ sandboxId, ...getSandboxCredentials() }); - await sandbox.stop(); - } catch { - // Teardown failures are non-critical (sandbox may have already stopped) - } -} -``` - -- [ ] **Step 7: Run tests to verify they pass** - -Run: `pnpm test src/sandbox/poll-agent.test.ts` -Expected: PASS - -- [ ] **Step 8: Commit** - -```bash -git add src/sandbox/poll-agent.ts src/sandbox/poll-agent.test.ts -git commit -m "feat: add polling step functions for sandbox agent completion" -``` - ---- - -### Task 4: Add wrapper script installation to sandbox provisioning - -Write the wrapper script to the sandbox during `provision()` so it's available for detached execution. - -**Files:** -- Modify: `src/sandbox/manager.ts:151-156` (add wrapper script writing) - -- [ ] **Step 1: Import and write wrapper script in `provision()`** - -In `src/sandbox/manager.ts`, after the line that writes `requirements.md` (line ~152), also write the wrapper script: - -```ts -// In provision(), after writeFiles for requirements.md: - -import { buildWrapperScript } from "./wrapper-script.js"; - -// ... inside provision(): - -// Write wrapper script for detached execution -const wrapperScript = buildWrapperScript({ model: this.config.claudeModel }); -await sandbox.writeFiles([ - { path: "requirements.md", content: Buffer.from(requirementsMd) }, - { path: "/tmp/agent-wrapper.sh", content: Buffer.from(wrapperScript) }, -]); -await sandbox.runCommand("chmod", ["+x", "/tmp/agent-wrapper.sh"]); -``` - -Replace the existing `writeFiles` call (which only writes `requirements.md`) with the combined call above. - -- [ ] **Step 2: Run existing manager tests** - -Run: `pnpm test src/sandbox/manager.test.ts` -Expected: PASS (mock handles writeFiles with any args) - -- [ ] **Step 3: Commit** - -```bash -git add src/sandbox/manager.ts -git commit -m "feat: write agent wrapper script to sandbox during provisioning" -``` - ---- - -### Task 5: Replace `run-agent.ts` with `startAgentDetached` - -Remove the old blocking `runAgent` (including debug streaming code) and replace with a single `startAgentDetached` function. Debug mode (`DEBUG_AGENT` env var) is removed entirely — observability is handled via the WDK workflow dashboard and step logs. - -**Files:** -- Rewrite: `src/sandbox/run-agent.ts` -- Modify: `env.ts` (remove `DEBUG_AGENT`) - -- [ ] **Step 1: Rewrite `src/sandbox/run-agent.ts`** - -Replace the entire file contents with: - -```ts -// src/sandbox/run-agent.ts -import type { Sandbox as SandboxType } from "@vercel/sandbox"; - -type SandboxInstance = Awaited>; - -/** - * Starts the agent wrapper script in detached mode. - * Returns immediately — the agent runs in the background. - * Use `checkAgentDone` / `collectAgentResults` from poll-agent.ts to poll for completion. - */ -export async function startAgentDetached( - sandbox: SandboxInstance, -): Promise { - await sandbox.runCommand({ - cmd: "bash", - args: ["/tmp/agent-wrapper.sh"], - cwd: "/vercel/sandbox", - detached: true, - }); -} -``` - -- [ ] **Step 2: Remove `DEBUG_AGENT` from `env.ts`** - -Remove the `DEBUG_AGENT` field from the env schema in `env.ts`: - -```ts -// REMOVE these lines from env.ts: -DEBUG_AGENT: z - .string() - .transform((v) => v === "true" || v === "1") - .default("false"), -``` - -- [ ] **Step 3: Remove `debug` references from workflow steps** - -In `src/workflows/implementation.ts`, remove `debug: env.DEBUG_AGENT` from the `SandboxManager` constructor call (if present in the new `provisionAndStartAgent` step — it should not be needed since the wrapper script handles everything). - -Same for `src/workflows/review-fix.ts`. - -- [ ] **Step 4: Run typecheck** - -Run: `pnpm typecheck` -Expected: PASS (any remaining references to `DEBUG_AGENT` or `runAgent` will surface as type errors — fix them) - -- [ ] **Step 5: Commit** - -```bash -git add src/sandbox/run-agent.ts env.ts src/workflows/implementation.ts src/workflows/review-fix.ts -git commit -m "feat: replace runAgent with startAgentDetached, remove debug mode" -``` - ---- - -### Task 6: Update `implementationWorkflow` to use polling pattern - -Replace the single blocking `runAgentInSandbox` step with the provision → poll → collect pattern. - -**Files:** -- Modify: `src/workflows/implementation.ts` - -- [ ] **Step 1: Replace the `runAgentInSandbox` step** - -Remove the existing `runAgentInSandbox` function (lines 43-71). Replace with two new steps: - -```ts -async function provisionAndStartAgent( - branchName: string, - requirementsMd: string, -): Promise { - "use step"; - const { env } = await import("../../env.js"); - const { SandboxManager } = await import("../sandbox/manager.js"); - const { startAgentDetached } = await import("../sandbox/run-agent.js"); - - const manager = new SandboxManager({ - githubToken: env.GITHUB_TOKEN, - owner: env.GITHUB_OWNER, - repo: env.GITHUB_REPO, - anthropicApiKey: env.ANTHROPIC_API_KEY, - claudeCodeOauthToken: env.CLAUDE_CODE_OAUTH_TOKEN, - claudeModel: env.CLAUDE_MODEL, - commitAuthor: env.COMMIT_AUTHOR, - commitEmail: env.COMMIT_EMAIL, - jobTimeoutMs: env.JOB_TIMEOUT_MS, - }); - - const sandbox = await manager.provision(branchName, requirementsMd); - await startAgentDetached(sandbox); - return sandbox.sandboxId; -} -provisionAndStartAgent.maxRetries = 0; // Don't retry expensive provisioning -``` - -- [ ] **Step 2: Update the workflow orchestration** - -Replace the workflow body (inside the try block, after `assembleImplementationRequirements`) with the poll pattern. Add imports for `sleep` from `"workflow"`: - -```ts -import { sleep } from "workflow"; - -// ... inside implementationWorkflow, in the try block: - - const requirementsMd = await assembleImplementationRequirements(ticket); - - // --- Detached execution with polling --- - const { checkAgentDone, collectAgentResults, teardownSandbox } = - await import("../sandbox/poll-agent.js"); - - const sandboxId = await provisionAndStartAgent(branchName, requirementsMd); - - // Poll until agent finishes — workflow truly suspends between polls. - // Use Date.now() for timeout instead of Promise.race with two sleeps - // (racing two WDK sleep calls is unsafe for deterministic replay). - const POLL_INTERVAL = "30s"; - const TIMEOUT_MS = 35 * 60 * 1000; // 35 min — slightly above JOB_TIMEOUT_MS default (30m) - const startedAt = Date.now(); - let agentDone = false; - - try { - while (!agentDone) { - await sleep(POLL_INTERVAL); - - if (Date.now() - startedAt > TIMEOUT_MS) break; - - const status = await checkAgentDone(sandboxId); - if (status === true) { - agentDone = true; - } else if (status === "stopped") { - // Sandbox died before agent finished - break; - } - // status === false → keep polling - } - - let output: AgentOutput; - let files: Array<{ path: string; content: string }>; - - if (agentDone) { - ({ output, files } = await collectAgentResults(sandboxId)); - } else { - output = { result: "failed", error: "Agent timed out or sandbox stopped unexpectedly" }; - files = []; - } - - // --- Rest of workflow continues unchanged --- - await pushChanges(branchName, files); - } finally { - await teardownSandbox(sandboxId); - } -``` - -- [ ] **Step 3: Clean up unused imports** - -Remove `runAgent` import and the old `SandboxManager` usage from the removed step. Add `sleep` import: - -```ts -import { sleep } from "workflow"; -import type { AgentOutput } from "../sandbox/agent-runner.js"; -``` - -- [ ] **Step 4: Run typecheck** - -Run: `pnpm typecheck` -Expected: PASS - -- [ ] **Step 5: Commit** - -```bash -git add src/workflows/implementation.ts -git commit -m "feat: implement polling-based sandbox suspension in implementation workflow" -``` - ---- - -### Task 7: Update `reviewFixWorkflow` to use polling pattern - -Apply the same polling pattern to the review-fix workflow. - -**Files:** -- Modify: `src/workflows/review-fix.ts` - -- [ ] **Step 1: Replace `runFixingAgentInSandbox` step** - -Remove the existing `runFixingAgentInSandbox` function (lines 66-100). Replace with: - -```ts -async function provisionAndStartFixingAgent( - branchName: string, - requirementsMd: string, - mergeBase: string, -): Promise { - "use step"; - const { env } = await import("../../env.js"); - const { SandboxManager } = await import("../sandbox/manager.js"); - const { startAgentDetached } = await import("../sandbox/run-agent.js"); - - const manager = new SandboxManager({ - githubToken: env.GITHUB_TOKEN, - owner: env.GITHUB_OWNER, - repo: env.GITHUB_REPO, - anthropicApiKey: env.ANTHROPIC_API_KEY, - claudeCodeOauthToken: env.CLAUDE_CODE_OAUTH_TOKEN, - claudeModel: env.CLAUDE_MODEL, - commitAuthor: env.COMMIT_AUTHOR, - commitEmail: env.COMMIT_EMAIL, - jobTimeoutMs: env.JOB_TIMEOUT_MS, - }); - - const sandbox = await manager.provision(branchName, requirementsMd, mergeBase); - await startAgentDetached(sandbox); - return sandbox.sandboxId; -} -provisionAndStartFixingAgent.maxRetries = 0; -``` - -- [ ] **Step 2: Update workflow orchestration** - -Replace the workflow body (after `assembleReviewFixRequirements`) with the poll pattern — same as Task 6 Step 2 but calling `provisionAndStartFixingAgent(branchName, requirementsMd, env.GITHUB_BASE_BRANCH)` instead: - -```ts -import { sleep } from "workflow"; - -// ... inside reviewFixWorkflow, in the try block, after assembling requirements: - - const { checkAgentDone, collectAgentResults, teardownSandbox } = - await import("../sandbox/poll-agent.js"); - - const sandboxId = await provisionAndStartFixingAgent( - branchName, - requirementsMd, - env.GITHUB_BASE_BRANCH, - ); - - // Same Date.now() elapsed-time pattern as implementation workflow - // (racing two WDK sleep calls is unsafe for deterministic replay). - const POLL_INTERVAL = "30s"; - const TIMEOUT_MS = 35 * 60 * 1000; // 35 min - const startedAt = Date.now(); - let agentDone = false; - - try { - while (!agentDone) { - await sleep(POLL_INTERVAL); - - if (Date.now() - startedAt > TIMEOUT_MS) break; - - const status = await checkAgentDone(sandboxId); - if (status === true) { - agentDone = true; - } else if (status === "stopped") { - break; - } - } - - let output: AgentOutput; - let files: Array<{ path: string; content: string }>; - - if (agentDone) { - ({ output, files } = await collectAgentResults(sandboxId)); - } else { - output = { result: "failed", error: "Agent timed out or sandbox stopped unexpectedly" }; - files = []; - } - - await pushChanges(branchName, files, baseSha); - } finally { - await teardownSandbox(sandboxId); - } -``` - -- [ ] **Step 3: Clean up imports** - -Add `sleep` import, add `AgentOutput` type import, remove unused imports. - -- [ ] **Step 4: Run typecheck** - -Run: `pnpm typecheck` -Expected: PASS - -- [ ] **Step 5: Commit** - -```bash -git add src/workflows/review-fix.ts -git commit -m "feat: implement polling-based sandbox suspension in review-fix workflow" -``` - ---- - -### Task 8: Run full test suite and fix issues - -- [ ] **Step 1: Run all unit tests** - -Run: `pnpm test` -Expected: PASS — all existing tests should still pass. The `manager.test.ts` mock includes `writeFiles` which accepts any args, and we only added to the existing `provision()` flow. - -- [ ] **Step 2: Run typecheck** - -Run: `pnpm typecheck` -Expected: PASS - -- [ ] **Step 3: Fix any failures** - -Address any test or type failures found. Common issues: -- `manager.test.ts` may need an extra `mockRunCommand` call for the `chmod` on the wrapper script -- Import paths may need `.js` extension for ESM - -- [ ] **Step 4: Final commit** - -```bash -git add -A -git commit -m "fix: resolve test/type issues from sandbox polling refactor" -``` - ---- - -## Summary of Changes - -| Before | After | -|--------|-------| -| Single blocking step runs agent for 10-30 min | Detached start → workflow suspends → polls every 30s | -| Workflow consumes resources entire time | Workflow at zero resources during agent execution | -| No timeout handling | `Date.now()` elapsed-time check (35 min) per poll iteration | -| Sandbox teardown in same step | Separate teardown step in `finally` block (always runs) | -| Debug mode: live streaming via `getWritable` | Debug mode: removed entirely | -| Wrapper auto-commits uncommitted changes | Agent commits via stop hook with descriptive message | - -## Not In Scope (Future Work) - -- **Hook-based callback**: If 30s polling latency is unacceptable, switch to `createHook`/`resumeHook` with a callback route. -- **Progress streaming**: The wrapper script could write progress to a file that the poll step reads and streams via `getWritable()`. diff --git a/docs/superpowers/plans/2026-04-06-three-phase-agent-workflow.md b/docs/superpowers/plans/2026-04-06-three-phase-agent-workflow.md deleted file mode 100644 index a0c10c2..0000000 --- a/docs/superpowers/plans/2026-04-06-three-phase-agent-workflow.md +++ /dev/null @@ -1,1942 +0,0 @@ -# Three-Phase Agent Workflow Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Replace both `implementationWorkflow` and `reviewFixWorkflow` with a single unified `agentWorkflow` that splits work into three phases (research & plan → implementation → review) within one sandbox. - -**Architecture:** The workflow provisions a single Vercel Sandbox and runs three sequential `claude --print` invocations. Between each phase, the workflow checks the result and decides whether to proceed, retry, or fail fast. Phase 1 outputs free-form markdown; Phases 2 and 3 use structured JSON schemas. The review phase can loop back to implementation up to 2 times. - -**Tech Stack:** TypeScript, Nitro, Vercel Workflow SDK (`"use workflow"` / `"use step"`), Vercel Sandbox, Zod, Vitest - ---- - -## File Structure - -| File | Action | Responsibility | -|------|--------|----------------| -| `src/sandbox/agent-runner.ts` | Modify | Add `REVIEW_SCHEMA`, `ReviewOutput`, `parseReviewOutput()`, `parseResearchStatus()` | -| `src/sandbox/agent-runner.test.ts` | Modify | Tests for new parsers | -| `src/sandbox/wrapper-script.ts` | Rewrite | `buildPhaseScript(opts)` replacing `buildWrapperScript(opts)` | -| `src/sandbox/wrapper-script.test.ts` | Rewrite | Tests for parameterized script builder | -| `src/sandbox/context.ts` | Rewrite | New assembly functions for all three phases | -| `src/sandbox/context.test.ts` | Rewrite | Tests for all new context assemblers | -| `src/sandbox/run-agent.ts` | Modify | Generalize to accept phase script path | -| `src/sandbox/poll-agent.ts` | Modify | Generalize sentinel/output file paths | -| `src/sandbox/poll-agent.test.ts` | Modify | Update tests for generalized functions | -| `src/sandbox/manager.ts` | Modify | Extract stop-hook toggling, remove wrapper script writing from provision | -| `src/sandbox/manager.test.ts` | Modify | Update tests | -| `src/lib/prompts.ts` | Rewrite | Three new prompts, remove old two | -| `src/workflows/agent.ts` | Create | Unified three-phase workflow | -| `src/workflows/implementation.ts` | Delete | Replaced by agent.ts | -| `src/workflows/review-fix.ts` | Delete | Absorbed into agent.ts | -| `src/lib/dispatch.ts` | Modify | Always start `agentWorkflow`, remove branching | -| `src/lib/dispatch.test.ts` | Modify | Update tests for unified workflow | - ---- - -### Task 1: Add review output schema and research status parser to agent-runner - -**Files:** -- Modify: `src/sandbox/agent-runner.ts` -- Modify: `src/sandbox/agent-runner.test.ts` - -- [ ] **Step 1: Write failing tests for `parseResearchStatus`** - -Add to `src/sandbox/agent-runner.test.ts`: - -```typescript -describe("parseResearchStatus", () => { - it("extracts completed status", () => { - const raw = "STATUS: completed\n\n# Implementation Plan\n1. Create foo.ts"; - const { status, body } = parseResearchStatus(raw); - expect(status).toBe("completed"); - expect(body).toContain("# Implementation Plan"); - }); - - it("extracts clarification_needed status", () => { - const raw = "STATUS: clarification_needed\n\n1. What database?\n2. Which auth?"; - const { status, body } = parseResearchStatus(raw); - expect(status).toBe("clarification_needed"); - expect(body).toContain("What database?"); - }); - - it("extracts failed status", () => { - const raw = "STATUS: failed\n\nCould not access repository"; - const { status, body } = parseResearchStatus(raw); - expect(status).toBe("failed"); - }); - - it("defaults to failed when no STATUS line", () => { - const raw = "Here is my analysis of the codebase..."; - const { status, body } = parseResearchStatus(raw); - expect(status).toBe("failed"); - expect(body).toContain("analysis"); - }); - - it("handles STATUS line with extra whitespace", () => { - const raw = " STATUS: completed \n\nPlan here"; - const { status } = parseResearchStatus(raw); - expect(status).toBe("completed"); - }); -}); -``` - -- [ ] **Step 2: Run tests to verify they fail** - -Run: `npx vitest run src/sandbox/agent-runner.test.ts` -Expected: FAIL — `parseResearchStatus` is not exported - -- [ ] **Step 3: Implement `parseResearchStatus`** - -Add to `src/sandbox/agent-runner.ts`: - -```typescript -export type ResearchStatus = "completed" | "clarification_needed" | "failed"; - -export interface ResearchResult { - status: ResearchStatus; - body: string; -} - -const VALID_RESEARCH_STATUSES: ResearchStatus[] = ["completed", "clarification_needed", "failed"]; - -export function parseResearchStatus(raw: string): ResearchResult { - const lines = raw.split("\n"); - const firstLine = lines[0]?.trim() ?? ""; - const match = firstLine.match(/^STATUS:\s*(\S+)/i); - - if (match && VALID_RESEARCH_STATUSES.includes(match[1] as ResearchStatus)) { - const body = lines.slice(1).join("\n").trim(); - return { status: match[1] as ResearchStatus, body }; - } - - return { status: "failed", body: raw }; -} -``` - -- [ ] **Step 4: Run tests to verify they pass** - -Run: `npx vitest run src/sandbox/agent-runner.test.ts` -Expected: All `parseResearchStatus` tests PASS - -- [ ] **Step 5: Write failing tests for `parseReviewOutput` and `REVIEW_SCHEMA`** - -Add to `src/sandbox/agent-runner.test.ts`: - -```typescript -describe("parseReviewOutput", () => { - it("parses approved result", () => { - const raw = JSON.stringify({ - result: "approved", - feedback: "Looks good", - issues: [], - }); - const output = parseReviewOutput(raw); - expect(output.result).toBe("approved"); - expect(output.feedback).toBe("Looks good"); - }); - - it("parses changes_requested result with issues", () => { - const raw = JSON.stringify({ - result: "changes_requested", - feedback: "Several issues found", - issues: [ - { file: "src/foo.ts", description: "Missing null check", severity: "critical" }, - ], - }); - const output = parseReviewOutput(raw); - expect(output.result).toBe("changes_requested"); - expect(output.issues).toHaveLength(1); - expect(output.issues[0].severity).toBe("critical"); - }); - - it("returns failed on unparseable output", () => { - const output = parseReviewOutput("not json"); - expect(output.result).toBe("failed"); - expect(output.error).toBeDefined(); - }); - - it("returns failed on empty output", () => { - const output = parseReviewOutput(""); - expect(output.result).toBe("failed"); - }); - - it("extracts from result envelope", () => { - const envelope = JSON.stringify({ - type: "result", - subtype: "success", - is_error: false, - structured_output: { - result: "approved", - feedback: "All good", - issues: [], - }, - }); - const output = parseReviewOutput(envelope); - expect(output.result).toBe("approved"); - }); -}); - -describe("REVIEW_SCHEMA", () => { - it("is valid JSON", () => { - expect(() => JSON.parse(REVIEW_SCHEMA)).not.toThrow(); - }); -}); -``` - -- [ ] **Step 6: Run tests to verify they fail** - -Run: `npx vitest run src/sandbox/agent-runner.test.ts` -Expected: FAIL — `parseReviewOutput` and `REVIEW_SCHEMA` not exported - -- [ ] **Step 7: Implement `ReviewOutput`, `REVIEW_SCHEMA`, and `parseReviewOutput`** - -Add to `src/sandbox/agent-runner.ts`: - -```typescript -const reviewOutputSchema = z.object({ - result: z.enum(["approved", "changes_requested", "failed"]), - feedback: z.string().optional(), - issues: z.array(z.object({ - file: z.string(), - description: z.string(), - severity: z.enum(["critical", "suggestion"]), - })).optional(), - error: z.string().optional(), -}); - -export type ReviewOutput = z.infer; - -export const REVIEW_SCHEMA = JSON.stringify({ - type: "object", - properties: { - result: { - type: "string", - enum: ["approved", "changes_requested", "failed"], - }, - feedback: { type: "string" }, - issues: { - type: "array", - items: { - type: "object", - properties: { - file: { type: "string" }, - description: { type: "string" }, - severity: { type: "string", enum: ["critical", "suggestion"] }, - }, - required: ["file", "description", "severity"], - }, - }, - error: { type: "string" }, - }, - required: ["result"], -}); - -export function parseReviewOutput(raw: string): ReviewOutput { - if (!raw.trim()) { - return { result: "failed", error: "Review agent produced no output" }; - } - - // Direct parse - try { - const direct = reviewOutputSchema.safeParse(JSON.parse(raw)); - if (direct.success) return direct.data; - } catch {} - - // Stream-json / result-envelope format - const lines = raw.split("\n").filter(Boolean); - for (let i = lines.length - 1; i >= 0; i--) { - try { - const event = JSON.parse(lines[i]); - - if (event.type === "result" && event.structured_output != null) { - const parsed = reviewOutputSchema.safeParse(event.structured_output); - if (parsed.success) return parsed.data; - } - - const direct = reviewOutputSchema.safeParse(event); - if (direct.success) return direct.data; - } catch {} - } - - // Fallback: extract JSON objects - const objects = raw.matchAll(/\{[^{}]*(?:\{[^{}]*\}[^{}]*)*\}/g); - for (const [candidate] of objects) { - try { - const result = reviewOutputSchema.safeParse(JSON.parse(candidate)); - if (result.success) return result.data; - } catch {} - } - - return { - result: "failed", - error: `Review output was not structured JSON. Output starts with: ${raw.slice(0, 500)}`, - }; -} -``` - -- [ ] **Step 8: Run tests to verify they pass** - -Run: `npx vitest run src/sandbox/agent-runner.test.ts` -Expected: All tests PASS - -- [ ] **Step 9: Commit** - -```bash -git add src/sandbox/agent-runner.ts src/sandbox/agent-runner.test.ts -git commit -m "feat: add review output schema and research status parser" -``` - ---- - -### Task 2: Parameterize wrapper script builder - -**Files:** -- Rewrite: `src/sandbox/wrapper-script.ts` -- Rewrite: `src/sandbox/wrapper-script.test.ts` - -- [ ] **Step 1: Write failing tests for `buildPhaseScript`** - -Replace `src/sandbox/wrapper-script.test.ts` with: - -```typescript -import { describe, it, expect } from "vitest"; -import { buildPhaseScript } from "./wrapper-script.js"; - -describe("buildPhaseScript", () => { - it("generates research phase script without json-schema", () => { - const script = buildPhaseScript({ - model: "claude-opus-4-6", - phase: "research", - inputFile: "/tmp/research-requirements.md", - outputFile: "/tmp/research-stdout.txt", - stderrFile: "/tmp/research-stderr.txt", - sentinelFile: "/tmp/research-done", - }); - - expect(script).toContain("#!/bin/bash"); - expect(script).toContain("claude"); - expect(script).toContain("claude-opus-4-6"); - expect(script).toContain("/tmp/research-requirements.md"); - expect(script).toContain("/tmp/research-stdout.txt"); - expect(script).toContain("/tmp/research-stderr.txt"); - expect(script).toContain("/tmp/research-done"); - expect(script).not.toContain("--json-schema"); - expect(script).not.toContain("--output-format"); - }); - - it("generates impl phase script with json-schema", () => { - const script = buildPhaseScript({ - model: "claude-opus-4-6", - phase: "impl", - inputFile: "/tmp/impl-requirements.md", - outputFile: "/tmp/impl-stdout.txt", - stderrFile: "/tmp/impl-stderr.txt", - sentinelFile: "/tmp/impl-done", - jsonSchema: '{"type":"object"}', - }); - - expect(script).toContain("--json-schema"); - expect(script).toContain("--output-format json"); - expect(script).toContain("/tmp/impl-requirements.md"); - expect(script).toContain("/tmp/impl-done"); - }); - - it("generates review phase script with json-schema", () => { - const script = buildPhaseScript({ - model: "claude-opus-4-6", - phase: "review", - inputFile: "/tmp/review-requirements.md", - outputFile: "/tmp/review-stdout.txt", - stderrFile: "/tmp/review-stderr.txt", - sentinelFile: "/tmp/review-done", - jsonSchema: '{"type":"object"}', - }); - - expect(script).toContain("--json-schema"); - expect(script).toContain("/tmp/review-requirements.md"); - expect(script).toContain("/tmp/review-done"); - }); - - it("includes cleanup and sentinel touch", () => { - const script = buildPhaseScript({ - model: "claude-opus-4-6", - phase: "research", - inputFile: "/tmp/research-requirements.md", - outputFile: "/tmp/research-stdout.txt", - stderrFile: "/tmp/research-stderr.txt", - sentinelFile: "/tmp/research-done", - }); - - expect(script).toContain("rm -rf .claude/"); - expect(script).toContain("touch /tmp/research-done"); - }); - - it("escapes single quotes in json schema", () => { - const script = buildPhaseScript({ - model: "claude-opus-4-6", - phase: "impl", - inputFile: "/tmp/impl-requirements.md", - outputFile: "/tmp/impl-stdout.txt", - stderrFile: "/tmp/impl-stderr.txt", - sentinelFile: "/tmp/impl-done", - jsonSchema: `{"type":"object","desc":"it's"}`, - }); - - expect(script).not.toContain("it's"); - expect(script).toContain("it'\\''s"); - }); -}); -``` - -- [ ] **Step 2: Run tests to verify they fail** - -Run: `npx vitest run src/sandbox/wrapper-script.test.ts` -Expected: FAIL — `buildPhaseScript` not exported - -- [ ] **Step 3: Implement `buildPhaseScript`** - -Replace `src/sandbox/wrapper-script.ts` with: - -```typescript -export interface PhaseScriptOptions { - model: string; - phase: "research" | "impl" | "review"; - inputFile: string; - outputFile: string; - stderrFile: string; - sentinelFile: string; - jsonSchema?: string; -} - -/** - * Generates a bash script for a single agent phase. - * Designed to run detached inside a Vercel Sandbox. - */ -export function buildPhaseScript(opts: PhaseScriptOptions): string { - const { model, inputFile, outputFile, stderrFile, sentinelFile, jsonSchema } = opts; - - let claudeFlags = `--print --model '${model}' --dangerously-skip-permissions`; - - if (jsonSchema) { - const escapedSchema = jsonSchema.replace(/'/g, "'\\''"); - claudeFlags += ` --output-format json --json-schema '${escapedSchema}'`; - } - - return `#!/bin/bash - -# --- Phase: ${opts.phase} --- -cat ${inputFile} | claude \\ - ${claudeFlags} \\ - > ${outputFile} 2>${stderrFile}; echo $? > /tmp/${opts.phase}-exit-code || true - -# --- Cleanup --- -cd /vercel/sandbox - -# Remove repo-level .claude/ artifacts that Claude Code auto-creates. -# git checkout restores any that were already committed. -rm -rf .claude/ -git checkout -- .claude/ 2>/dev/null || true - -# --- Signal completion --- -touch ${sentinelFile} -`; -} -``` - -- [ ] **Step 4: Run tests to verify they pass** - -Run: `npx vitest run src/sandbox/wrapper-script.test.ts` -Expected: All tests PASS - -- [ ] **Step 5: Commit** - -```bash -git add src/sandbox/wrapper-script.ts src/sandbox/wrapper-script.test.ts -git commit -m "feat: parameterize wrapper script for multi-phase execution" -``` - ---- - -### Task 3: Rewrite context assembly functions - -**Files:** -- Rewrite: `src/sandbox/context.ts` -- Rewrite: `src/sandbox/context.test.ts` - -- [ ] **Step 1: Write failing tests for `assembleResearchPlanContext`** - -Add to `src/sandbox/context.test.ts` (keep existing `formatCheckResults` tests, replace `assembleImplementationContext` and `assembleFixingFeedbackContext` tests): - -```typescript -describe("assembleResearchPlanContext", () => { - it("assembles context for new ticket (no PR feedback)", () => { - const result = assembleResearchPlanContext({ - ticket: { - identifier: "TEST-1", - title: "Add login page", - description: "Build a login page", - acceptanceCriteria: "User can log in", - comments: [], - }, - prompt: "You are a research agent...", - branchName: "blazebot/test-1", - }); - - expect(result).toContain("## Ticket ID"); - expect(result).toContain("TEST-1"); - expect(result).toContain("## Branch"); - expect(result).toContain("blazebot/test-1"); - expect(result).toContain("You are a research agent..."); - expect(result).not.toContain("## PR Review Feedback"); - }); - - it("assembles context with PR feedback for review-fix scenario", () => { - const result = assembleResearchPlanContext({ - ticket: { - identifier: "TEST-2", - title: "Fix auth", - description: "Fix auth module", - acceptanceCriteria: "", - comments: [], - }, - prompt: "prompt", - branchName: "blazebot/test-2", - prComments: [ - { author: "Bob", body: "Fix the null check", liked: false }, - ], - checkResults: [ - { name: "test", status: "completed", conclusion: "failure", logs: "FAIL" }, - ], - hasConflicts: true, - }); - - expect(result).toContain("## PR Review Feedback"); - expect(result).toContain("Fix the null check"); - expect(result).toContain("## CI/CD Check Results"); - expect(result).toContain("### Failed: test"); - expect(result).toContain("## Merge Conflicts"); - }); -}); -``` - -- [ ] **Step 2: Write failing tests for `assembleImplementationContext` (new signature)** - -```typescript -describe("assembleImplementationContext (new)", () => { - it("assembles context with research plan markdown", () => { - const result = assembleImplementationContext({ - ticket: { - identifier: "TEST-1", - title: "Add login page", - description: "Build a login page", - acceptanceCriteria: "User can log in", - comments: [], - }, - prompt: "You are an implementation agent...", - researchPlanMarkdown: "# Plan\n1. Create LoginForm component\n2. Add route handler", - }); - - expect(result).toContain("## Ticket ID"); - expect(result).toContain("TEST-1"); - expect(result).toContain("## Research & Plan"); - expect(result).toContain("# Plan"); - expect(result).toContain("Create LoginForm component"); - expect(result).toContain("You are an implementation agent..."); - }); -}); -``` - -- [ ] **Step 3: Write failing tests for `assembleImplementationRetryContext`** - -```typescript -describe("assembleImplementationRetryContext", () => { - it("includes plan and review feedback", () => { - const result = assembleImplementationRetryContext({ - ticket: { - identifier: "TEST-1", - title: "Add login page", - description: "Build a login page", - acceptanceCriteria: "User can log in", - comments: [], - }, - prompt: "prompt", - researchPlanMarkdown: "# Plan\n1. Create LoginForm", - reviewFeedback: { - result: "changes_requested", - feedback: "Missing error handling", - issues: [ - { file: "src/LoginForm.tsx", description: "No null check", severity: "critical" }, - ], - }, - }); - - expect(result).toContain("## Research & Plan"); - expect(result).toContain("Create LoginForm"); - expect(result).toContain("## Review Feedback"); - expect(result).toContain("Missing error handling"); - expect(result).toContain("src/LoginForm.tsx"); - expect(result).toContain("No null check"); - expect(result).toContain("critical"); - }); -}); -``` - -- [ ] **Step 4: Write failing tests for `assembleReviewContext`** - -```typescript -describe("assembleReviewContext", () => { - it("includes plan and git diff", () => { - const result = assembleReviewContext({ - ticket: { - identifier: "TEST-1", - title: "Add login page", - description: "Build a login page", - acceptanceCriteria: "User can log in", - comments: [], - }, - prompt: "You are a review agent...", - researchPlanMarkdown: "# Plan\n1. Create LoginForm", - gitDiff: "diff --git a/src/LoginForm.tsx b/src/LoginForm.tsx\n+export function LoginForm() {}", - }); - - expect(result).toContain("## Research & Plan"); - expect(result).toContain("## Git Diff"); - expect(result).toContain("+export function LoginForm()"); - expect(result).toContain("You are a review agent..."); - }); -}); -``` - -- [ ] **Step 5: Run all new tests to verify they fail** - -Run: `npx vitest run src/sandbox/context.test.ts` -Expected: FAIL — new functions not exported - -- [ ] **Step 6: Implement all new context assembly functions** - -Rewrite `src/sandbox/context.ts` — keep `formatCheckResults` and the helper functions (`formatComments`, `formatPRComments`), replace the main assembly functions: - -```typescript -import type { PRComment, CheckRunResult } from "../adapters/vcs/types.js"; -import type { ReviewOutput } from "./agent-runner.js"; - -interface TicketData { - identifier: string; - title: string; - description: string; - acceptanceCriteria: string; - comments: Array<{ author: string; body: string; createdAt: string }>; -} - -export interface ResearchPlanContextInput { - ticket: TicketData; - prompt: string; - branchName: string; - prComments?: PRComment[]; - checkResults?: CheckRunResult[]; - hasConflicts?: boolean; -} - -export interface ImplementationContextInput { - ticket: TicketData; - prompt: string; - researchPlanMarkdown: string; -} - -export interface ImplementationRetryContextInput { - ticket: TicketData; - prompt: string; - researchPlanMarkdown: string; - reviewFeedback: ReviewOutput; -} - -export interface ReviewContextInput { - ticket: TicketData; - prompt: string; - researchPlanMarkdown: string; - gitDiff: string; -} - -export function assembleResearchPlanContext(input: ResearchPlanContextInput): string { - const { ticket, prompt, branchName, prComments, checkResults, hasConflicts } = input; - - let md = `# Requirements - -## Ticket ID - -${ticket.identifier} - -## Ticket - -${ticket.title} - -## Description - -${ticket.description} - -## Acceptance Criteria - -${ticket.acceptanceCriteria || "None specified."} - -## Comments - -${formatComments(ticket.comments)} - -## Branch - -${branchName} -`; - - if (prComments && prComments.length > 0) { - md += `\n## PR Review Feedback\n\n${formatPRComments(prComments)}\n`; - } - - if (checkResults && checkResults.length > 0) { - md += `\n## CI/CD Check Results\n\n${formatCheckResults(checkResults)}\n`; - } - - if (hasConflicts) { - md += `\n## Merge Conflicts\n\nThis PR has merge conflicts. The base branch has already been merged — the repo is in a MERGING state with conflict markers in the affected files. Resolve the markers, \`git add\` the files, and run \`git merge --continue\`.\n`; - } - - md += `\n---\n\n${prompt}\n`; - return md; -} - -export function assembleImplementationContext(input: ImplementationContextInput): string { - const { ticket, prompt, researchPlanMarkdown } = input; - return `# Requirements - -## Ticket ID - -${ticket.identifier} - -## Ticket - -${ticket.title} - -## Acceptance Criteria - -${ticket.acceptanceCriteria || "None specified."} - -## Research & Plan - -${researchPlanMarkdown} - ---- - -${prompt} -`; -} - -export function assembleImplementationRetryContext(input: ImplementationRetryContextInput): string { - const { ticket, prompt, researchPlanMarkdown, reviewFeedback } = input; - return `# Requirements - -## Ticket ID - -${ticket.identifier} - -## Ticket - -${ticket.title} - -## Acceptance Criteria - -${ticket.acceptanceCriteria || "None specified."} - -## Research & Plan - -${researchPlanMarkdown} - -## Review Feedback - -${reviewFeedback.feedback ?? "No feedback provided."} - -### Issues - -${formatReviewIssues(reviewFeedback.issues ?? [])} - ---- - -${prompt} -`; -} - -export function assembleReviewContext(input: ReviewContextInput): string { - const { ticket, prompt, researchPlanMarkdown, gitDiff } = input; - return `# Requirements - -## Ticket ID - -${ticket.identifier} - -## Ticket - -${ticket.title} - -## Acceptance Criteria - -${ticket.acceptanceCriteria || "None specified."} - -## Research & Plan - -${researchPlanMarkdown} - -## Git Diff - -\`\`\`diff -${gitDiff} -\`\`\` - ---- - -${prompt} -`; -} - -function formatReviewIssues(issues: Array<{ file: string; description: string; severity: string }>): string { - if (issues.length === 0) return "No specific issues listed."; - return issues - .map((i) => `- **[${i.severity}]** ${i.file}: ${i.description}`) - .join("\n"); -} - -// Keep existing helpers below unchanged -``` - -Note: Keep the existing `formatComments`, `formatPRComments`, and `formatCheckResults` functions exactly as they are. - -- [ ] **Step 7: Run tests to verify they pass** - -Run: `npx vitest run src/sandbox/context.test.ts` -Expected: All tests PASS - -- [ ] **Step 8: Commit** - -```bash -git add src/sandbox/context.ts src/sandbox/context.test.ts -git commit -m "feat: rewrite context assembly for three-phase workflow" -``` - ---- - -### Task 4: Write three new prompts - -**Files:** -- Rewrite: `src/lib/prompts.ts` - -- [ ] **Step 1: Replace prompts.ts with three new prompts** - -Rewrite `src/lib/prompts.ts`. Remove `implement.md` (old) and `review-fix.md`. Add `research-plan.md`, `implement.md` (new), and `review.md`: - -```typescript -const researchPlanPrompt = `# Instructions - -You are an AI research agent. Your job is to explore the repository, understand the ticket, and produce a precise implementation plan. - -## Output Format - -Your output MUST start with a STATUS line on the very first line: - -\`\`\` -STATUS: completed -\`\`\` - -Valid statuses: \`completed\`, \`clarification_needed\`, \`failed\` - -Everything after the STATUS line is your research findings and plan. This output will be passed as-is to the implementation agent — keep it clean and actionable. - -## Superpowers - -You have access to **superpowers skills** installed globally. Use them. - -- **Always check for applicable skills before starting work.** The \`using-superpowers\` skill is loaded — follow its guidance. -- **Use \`brainstorming\` to think through the approach** — explore alternatives, consider trade-offs, then settle on the best path. - -## Process - -1. **Restore session memory** — Check if \`blazebot/memory/[TASK_ID].md\` exists (where \`[TASK_ID]\` is the Ticket ID from above, e.g. \`AIW-123\`). If it exists, read it immediately. -2. Explore the repository structure. Read \`CLAUDE.md\`, \`AGENTS.md\` if present. -3. Check \`git log\` and \`git diff\` against the base branch to identify what's already been done on this branch. -4. If PR review feedback or CI/CD failures are included above, understand what needs to be fixed. -5. Identify what's already implemented vs. what remains. -6. Analyze relevant files, code patterns, test setup. -7. **Use the \`brainstorming\` skill** to think through the approach. -8. Produce a precise implementation plan for the remaining work. -9. **Write/update session memory** — overwrite \`blazebot/memory/[TASK_ID].md\`. - -## Plan Output Constraints - -Your plan MUST be: -- **Actionable only** — each step must be directly executable ("Create file X with Y" not "Consider how to...") -- **Minimal** — no preamble, rationale, or context noise that would confuse the implementation agent -- **Concrete** — file paths must be specific ("src/components/Foo.tsx" not "the relevant component") -- **Structured for top-to-bottom execution** — the implementation agent reads and executes sequentially - -## When to Ask for Clarification - -Return \`STATUS: clarification_needed\` if: -- No clear definition of done in the ticket -- Ambiguous scope -- Missing technical context -- Contradictory requirements -- Multiple valid interpretations -- Missing design/UX details for UI work - -When you need clarification, list your questions as numbered lines after the STATUS line. Batch ALL questions — never return with just one. - -## Constraints - -- **NO coding** — do not write implementation code -- **NO commits** — do not create any git commits -- Only analyze and plan - -## Session Memory - -**MANDATORY** — before returning, overwrite \`blazebot/memory/[TASK_ID].md\`: - -\`\`\`markdown -# Session Memory — [TASK_ID] - -## Progress -- What was analyzed and planned this session - -## Decisions Made -- Technical choices and reasoning - -## Blockers -- What is blocking progress (if clarification_needed or failed) -- "None" if completed successfully - -## Files Touched -- "None — research phase only" - -## Prior Sessions -- Brief summary of prior sessions (if memory file existed) -\`\`\``; - -const implementPrompt = `# Instructions - -You are an AI coding agent executing an implementation plan. The plan was created by a research agent and is included above under "Research & Plan". - -## Superpowers - -You have access to **superpowers skills** installed globally. Use them. - -- **Use \`executing-plans\` to systematically work through the plan** — it structures execution correctly. -- **Use \`systematic-debugging\` when encountering bugs or test failures** — do not guess at fixes. -- **Use \`verification-before-completion\` before claiming work is done** — verify, don't assume. - -## Process - -1. **Restore session memory** — Check if \`blazebot/memory/[TASK_ID].md\` exists. If it exists, read it. -2. Read the plan from the "Research & Plan" section above. -3. If review feedback is included (retry scenario): focus on fixing the flagged issues. Do not redo work that was approved. -4. Execute each step in the plan, in order. -5. If the repo has tests: run them to ensure nothing is broken. -6. **Update session memory** — overwrite \`blazebot/memory/[TASK_ID].md\`. -7. Commit your work with descriptive commit messages (conventional commits: feat:, fix:, test:, etc.). -8. Run all quality checks (tests, linting, type checking, formatting). - -## Constraints - -- Follow the plan — do not explore or re-research (already done). -- Do not refactor code outside the scope of the plan. -- Do not install new dependencies unless the plan specifies them. -- Follow existing code conventions (check CLAUDE.md, AGENTS.md if present). -- Do NOT invoke \`requesting-code-review\` — that happens in a separate review phase. - -## When to Ask for Clarification - -Return \`clarification_needed\` only if the plan is genuinely unexecutable. Exhaust code-level investigation first. - -## Session Memory - -**MANDATORY** — before returning, overwrite \`blazebot/memory/[TASK_ID].md\`: - -\`\`\`markdown -# Session Memory — [TASK_ID] - -## Progress -- What was implemented this session - -## Decisions Made -- Technical choices and reasoning - -## Blockers -- What is blocking progress (if clarification_needed or failed) -- "None" if implemented successfully - -## Files Touched -- List of files created or modified - -## Prior Sessions -- Brief summary of prior sessions (if memory file existed) -\`\`\` - -## Output - -Return a JSON object with: -- \`result\`: "implemented" if done, "clarification_needed" if you have questions, "failed" if stuck. -- \`summary\`: Description of work done (when implemented). -- \`questions\`: List of questions (when clarification_needed). -- \`error\`: Failure details (when failed).`; - -const reviewPrompt = `# Instructions - -You are an AI code review agent. Your job is to review the implementation diff against the plan and acceptance criteria. - -## Superpowers - -You have access to **superpowers skills** installed globally. Use them. - -- **Use \`requesting-code-review\` to dispatch a code-reviewer subagent** — this is your primary tool. - -## Process - -1. Read the plan from the "Research & Plan" section above. -2. Read the acceptance criteria. -3. Review the git diff against the plan — did the implementation agent follow it? -4. Check code quality, test coverage, edge cases. -5. Invoke \`requesting-code-review\` skill to dispatch a code-reviewer subagent. -6. Combine your findings with the subagent's findings. -7. Output your verdict. - -## Review Criteria - -- Does the implementation match the plan? -- Does it satisfy the acceptance criteria? -- Are there test gaps? -- Are there obvious bugs or edge cases? -- Does the code follow existing conventions? - -## Constraints - -- **NO coding** — do not write or modify any code -- **NO commits** — do not create any git commits -- Only review and report - -## Output - -Return a JSON object with: -- \`result\`: "approved" if the implementation is ready, "changes_requested" if issues need fixing, "failed" if review itself failed. -- \`feedback\`: Detailed review notes. -- \`issues\`: Array of specific issues — each with \`file\`, \`description\`, \`severity\` ("critical" or "suggestion"). Only include issues that MUST be fixed for \`changes_requested\`. -- \`error\`: Failure details (when failed).`; - -const prompts: Record = { - "research-plan.md": researchPlanPrompt, - "implement.md": implementPrompt, - "review.md": reviewPrompt, -}; - -export function getPrompt(name: string): string { - const content = prompts[name]; - if (!content) throw new Error(`Unknown prompt: ${name}`); - return content; -} -``` - -- [ ] **Step 2: Run existing tests to check for breakage** - -Run: `npx vitest run` -Expected: Some tests may fail if they reference old prompt names. Note failures for next step. - -- [ ] **Step 3: Commit** - -```bash -git add src/lib/prompts.ts -git commit -m "feat: replace monolithic prompts with three phase-specific prompts" -``` - ---- - -### Task 5: Generalize poll-agent and run-agent for multi-phase - -**Files:** -- Modify: `src/sandbox/poll-agent.ts` -- Modify: `src/sandbox/poll-agent.test.ts` -- Modify: `src/sandbox/run-agent.ts` - -- [ ] **Step 1: Write failing tests for generalized `checkPhaseDone` and `collectPhaseOutput`** - -Update `src/sandbox/poll-agent.test.ts` — add tests for the new generalized versions alongside existing tests: - -```typescript -describe("checkPhaseDone", () => { - beforeEach(() => vi.clearAllMocks()); - - it("checks a custom sentinel file", async () => { - mockRunCommand.mockResolvedValue({ exitCode: 0 }); - - const { checkPhaseDone } = await import("./poll-agent.js"); - const result = await checkPhaseDone("sbx-test-123", "/tmp/research-done"); - expect(result).toBe(true); - expect(mockRunCommand).toHaveBeenCalledWith("test", ["-f", "/tmp/research-done"]); - }); -}); - -describe("collectPhaseOutput", () => { - beforeEach(() => vi.clearAllMocks()); - - it("reads from a custom output file", async () => { - const mockStdout = vi.fn(); - mockRunCommand.mockImplementation((...args: any[]) => ({ - exitCode: 0, - stdout: mockStdout, - })); - - mockStdout - .mockResolvedValueOnce(JSON.stringify({ result: "implemented", summary: "Done" })) - .mockResolvedValueOnce(""); - - const { collectPhaseOutput } = await import("./poll-agent.js"); - const result = await collectPhaseOutput("sbx-test-123", "/tmp/impl-stdout.txt", "/tmp/impl-stderr.txt"); - expect(result).toBe(JSON.stringify({ result: "implemented", summary: "Done" })); - }); -}); -``` - -- [ ] **Step 2: Run tests to verify they fail** - -Run: `npx vitest run src/sandbox/poll-agent.test.ts` -Expected: FAIL — `checkPhaseDone` and `collectPhaseOutput` not exported - -- [ ] **Step 3: Add `checkPhaseDone` and `collectPhaseOutput` to poll-agent.ts** - -Add to `src/sandbox/poll-agent.ts` (keep existing functions as-is for backward compat during transition): - -```typescript -/** - * Generalized sentinel check — works with any sentinel file path. - */ -export async function checkPhaseDone( - sandboxId: string, - sentinelFile: string, -): Promise { - "use step"; - const { Sandbox } = await import("@vercel/sandbox"); - try { - const sandbox = await Sandbox.get({ sandboxId, ...getSandboxCredentials() }); - - if (sandbox.status !== "running") { - return "stopped"; - } - - const result = await sandbox.runCommand("test", ["-f", sentinelFile]); - return result.exitCode === 0; - } catch { - return "stopped"; - } -} - -/** - * Generalized output collector — reads from any stdout/stderr file paths. - * Returns raw string. Caller is responsible for parsing. - */ -export async function collectPhaseOutput( - sandboxId: string, - outputFile: string, - stderrFile: string, -): Promise { - "use step"; - const { Sandbox } = await import("@vercel/sandbox"); - - const sandbox = await Sandbox.get({ sandboxId, ...getSandboxCredentials() }); - - const stdoutResult = await sandbox.runCommand("cat", [outputFile]); - const stdout = (await stdoutResult.stdout()).trim(); - - const stderrResult = await sandbox.runCommand("cat", [stderrFile]); - const stderr = (await stderrResult.stdout()).trim(); - - return stdout || stderr; -} -``` - -- [ ] **Step 4: Generalize `startAgentDetached` in run-agent.ts** - -Update `src/sandbox/run-agent.ts`: - -```typescript -import type { Sandbox as SandboxType } from "@vercel/sandbox"; - -type SandboxInstance = Awaited>; - -/** - * Starts a phase script in detached mode. - * Returns immediately — the agent runs in the background. - */ -export async function startPhaseDetached( - sandbox: SandboxInstance, - scriptPath: string, -): Promise { - await sandbox.runCommand({ - cmd: "bash", - args: [scriptPath], - cwd: "/vercel/sandbox", - detached: true, - }); -} -``` - -- [ ] **Step 5: Run tests to verify they pass** - -Run: `npx vitest run src/sandbox/poll-agent.test.ts` -Expected: All tests PASS (both old and new) - -- [ ] **Step 6: Commit** - -```bash -git add src/sandbox/poll-agent.ts src/sandbox/poll-agent.test.ts src/sandbox/run-agent.ts -git commit -m "feat: generalize poll-agent and run-agent for multi-phase execution" -``` - ---- - -### Task 6: Refactor sandbox manager for phase-based execution - -**Files:** -- Modify: `src/sandbox/manager.ts` -- Modify: `src/sandbox/manager.test.ts` - -- [ ] **Step 1: Refactor `provision` — remove wrapper script writing, add stop-hook toggle** - -The manager should provision the sandbox (clone, git config, install claude, install skills) but NOT write the wrapper script or requirements. Those are now per-phase and handled by the workflow. - -Update `src/sandbox/manager.ts`: - -1. Remove `import { buildWrapperScript }` and the wrapper script writing from `provision()` -2. Remove `requirementsMd` parameter from `provision()` — it now only takes `branch` and optional `mergeBase` -3. Add a new method `configureStopHook(sandbox, enabled)` that writes or clears `~/.claude/settings.json` -4. Add a new method `writePhaseFiles(sandbox, inputFile, inputContent, scriptPath, scriptContent)` for per-phase file writing - -```typescript -async provision( - branch: string, - mergeBase?: string, -): Promise { - // ... same as before up through installGlobalSkills ... - // REMOVE the wrapper script and requirements.md writing - return sandbox; -} - -async configureStopHook(sandbox: SandboxInstance, enabled: boolean): Promise { - if (enabled) { - await sandbox.runCommand("bash", [ - "-c", - [ - `mkdir -p ~/.claude`, - `cat > ~/.claude/commit-guard.sh << 'SCRIPT'`, - `#!/bin/bash`, - `input=$(cat)`, - `if echo "$input" | grep -q '"stop_hook_active":true'; then exit 0; fi`, - `changes=$(git status --porcelain | grep -v '^.. \\.claude/' | grep -v '^?? \\.claude/' | grep -v 'requirements\\.md')`, - `if [ -n "$changes" ]; then`, - ` echo '{"decision":"block","reason":"You have uncommitted changes. You MUST either commit all changes with a descriptive message or revert them before stopping."}' >&2`, - ` exit 2`, - `fi`, - `SCRIPT`, - `chmod +x ~/.claude/commit-guard.sh`, - `cat > ~/.claude/settings.json << 'JSON'`, - `{"hooks":{"Stop":[{"matcher":"","hooks":[{"type":"command","command":"bash ~/.claude/commit-guard.sh"}]}]}}`, - `JSON`, - ].join("\n"), - ]); - } else { - await sandbox.runCommand("bash", [ - "-c", - `mkdir -p ~/.claude && echo '{}' > ~/.claude/settings.json`, - ]); - } -} - -async writePhaseFiles( - sandbox: SandboxInstance, - files: Array<{ path: string; content: string }>, -): Promise { - await sandbox.writeFiles( - files.map((f) => ({ path: f.path, content: Buffer.from(f.content) })), - ); - // Make scripts executable - for (const f of files) { - if (f.path.endsWith(".sh")) { - await sandbox.runCommand("chmod", ["+x", f.path]); - } - } -} -``` - -- [ ] **Step 2: Update manager.test.ts** - -Update the tests to reflect the new `provision()` signature (no `requirementsMd`). Add tests for `configureStopHook` and `writePhaseFiles`. - -- [ ] **Step 3: Run tests** - -Run: `npx vitest run src/sandbox/manager.test.ts` -Expected: All tests PASS - -- [ ] **Step 4: Commit** - -```bash -git add src/sandbox/manager.ts src/sandbox/manager.test.ts -git commit -m "refactor: extract stop-hook config, remove wrapper script from provision" -``` - ---- - -### Task 7: Create the unified `agentWorkflow` - -**Files:** -- Create: `src/workflows/agent.ts` - -- [ ] **Step 1: Create `src/workflows/agent.ts`** - -This is the core of the change. The workflow orchestrates three phases: - -```typescript -import { sleep } from "workflow"; -import type { AgentOutput } from "../sandbox/agent-runner.js"; -import type { ReviewOutput } from "../sandbox/agent-runner.js"; -import type { TicketContent } from "../adapters/issue-tracker/types.js"; -import type { PRComment, CheckRunResult } from "../adapters/vcs/types.js"; - -// --- Step Functions --- - -async function fetchAndValidateTicket(ticketId: string, columnAi: string) { - "use step"; - const { createStepAdapters } = await import("../lib/step-adapters.js"); - const { issueTracker } = createStepAdapters(); - const ticket = await issueTracker.fetchTicket(ticketId); - if (ticket.trackerStatus.toLowerCase() !== columnAi.toLowerCase()) return null; - return ticket; -} - -async function createFeatureBranch(branchName: string, baseBranch: string) { - "use step"; - const { createStepAdapters } = await import("../lib/step-adapters.js"); - const { vcs } = createStepAdapters(); - await vcs.createBranch(branchName, baseBranch); -} - -async function fetchPRContext(branchName: string): Promise<{ - prComments: PRComment[]; - checkResults: CheckRunResult[]; - hasConflicts: boolean; -} | null> { - "use step"; - const { createStepAdapters } = await import("../lib/step-adapters.js"); - const { vcs } = createStepAdapters(); - const pr = await vcs.findPR(branchName); - if (!pr) return null; - - const prComments = await vcs.getPRComments(pr.id); - const hasConflicts = await vcs.getPRConflictStatus(pr.id); - const checkResults = await vcs.getCheckRunResults(pr.id); - return { prComments, hasConflicts, checkResults }; -} - -async function provisionSandbox( - branchName: string, - mergeBase?: string, -): Promise { - "use step"; - const { env } = await import("../../env.js"); - const { SandboxManager } = await import("../sandbox/manager.js"); - - const manager = new SandboxManager({ - githubToken: env.GITHUB_TOKEN, - owner: env.GITHUB_OWNER, - repo: env.GITHUB_REPO, - anthropicApiKey: env.ANTHROPIC_API_KEY, - claudeCodeOauthToken: env.CLAUDE_CODE_OAUTH_TOKEN, - claudeModel: env.CLAUDE_MODEL, - commitAuthor: env.COMMIT_AUTHOR, - commitEmail: env.COMMIT_EMAIL, - jobTimeoutMs: env.JOB_TIMEOUT_MS, - }); - - const sandbox = await manager.provision(branchName, mergeBase); - return sandbox.sandboxId; -} -provisionSandbox.maxRetries = 0; - -async function writeAndStartPhase( - sandboxId: string, - inputFilePath: string, - inputContent: string, - scriptPath: string, - scriptContent: string, -): Promise { - "use step"; - const { Sandbox } = await import("@vercel/sandbox"); - const { getSandboxCredentials } = await import("../sandbox/credentials.js"); - - const sandbox = await Sandbox.get({ sandboxId, ...getSandboxCredentials() }); - - await sandbox.writeFiles([ - { path: inputFilePath, content: Buffer.from(inputContent) }, - { path: scriptPath, content: Buffer.from(scriptContent) }, - ]); - await sandbox.runCommand("chmod", ["+x", scriptPath]); - - await sandbox.runCommand({ - cmd: "bash", - args: [scriptPath], - cwd: "/vercel/sandbox", - detached: true, - }); -} -writeAndStartPhase.maxRetries = 0; - -async function configureStopHook(sandboxId: string, enabled: boolean): Promise { - "use step"; - const { Sandbox } = await import("@vercel/sandbox"); - const { getSandboxCredentials } = await import("../sandbox/credentials.js"); - const { SandboxManager } = await import("../sandbox/manager.js"); - const { env } = await import("../../env.js"); - - const sandbox = await Sandbox.get({ sandboxId, ...getSandboxCredentials() }); - const manager = new SandboxManager({ - githubToken: env.GITHUB_TOKEN, - owner: env.GITHUB_OWNER, - repo: env.GITHUB_REPO, - claudeModel: env.CLAUDE_MODEL, - commitAuthor: env.COMMIT_AUTHOR, - commitEmail: env.COMMIT_EMAIL, - jobTimeoutMs: env.JOB_TIMEOUT_MS, - }); - await manager.configureStopHook(sandbox, enabled); -} - -async function captureGitDiff(sandboxId: string): Promise { - "use step"; - const { Sandbox } = await import("@vercel/sandbox"); - const { getSandboxCredentials } = await import("../sandbox/credentials.js"); - - const sandbox = await Sandbox.get({ sandboxId, ...getSandboxCredentials() }); - const baseShaResult = await sandbox.runCommand("bash", [ - "-c", "cat /tmp/.pre-agent-sha 2>/dev/null || echo ''", - ]); - const baseSha = (await baseShaResult.stdout()).trim(); - - const diffCmd = baseSha - ? `git diff ${baseSha}..HEAD` - : "git diff HEAD"; - const diffResult = await sandbox.runCommand("bash", ["-c", diffCmd]); - return (await diffResult.stdout()).trim(); -} - -// Reuse existing step functions from implementation.ts for: -// createPullRequest, moveTicket, notifySlack, postClarificationAndMoveBack, -// unregisterRun, markTicketFailed -// (Copy them here — they're identical) - -async function createPullRequest(branchName: string, title: string, summary: string) { - "use step"; - const { createStepAdapters } = await import("../lib/step-adapters.js"); - const { vcs } = createStepAdapters(); - return vcs.createPR(branchName, title, summary); -} - -async function moveTicket(ticketId: string, column: string) { - "use step"; - const { createStepAdapters } = await import("../lib/step-adapters.js"); - const { issueTracker } = createStepAdapters(); - await issueTracker.moveTicket(ticketId, column); -} - -async function notifySlack(message: string) { - "use step"; - const { createStepAdapters } = await import("../lib/step-adapters.js"); - const { messaging } = createStepAdapters(); - await messaging.notify(message); -} - -async function postClarificationAndMoveBack( - ticketId: string, questions: string[], identifier: string, backlogColumn: string, -) { - "use step"; - const { createStepAdapters } = await import("../lib/step-adapters.js"); - const { issueTracker } = createStepAdapters(); - const comment = questions.map((q, i) => `${i + 1}. ${q}`).join("\n"); - await issueTracker.postComment(ticketId, comment); - await issueTracker.moveTicket(ticketId, backlogColumn); -} - -async function unregisterRun(ticketIdentifier: string) { - "use step"; - const { createStepAdapters } = await import("../lib/step-adapters.js"); - const { runRegistry } = createStepAdapters(); - await runRegistry.unregister(ticketIdentifier); -} - -async function markTicketFailed(ticketIdentifier: string, error: string) { - "use step"; - const { createStepAdapters } = await import("../lib/step-adapters.js"); - const { runRegistry } = createStepAdapters(); - const runId = await runRegistry.getRunId(ticketIdentifier) ?? "unknown"; - await runRegistry.markFailed(ticketIdentifier, { - runId, error, failedAt: new Date().toISOString(), - }); -} - -// --- Polling helper (not a step — called within the workflow) --- - -async function pollUntilDone( - sandboxId: string, - sentinelFile: string, - maxPollMinutes: number, -): Promise { - const { checkPhaseDone } = await import("../sandbox/poll-agent.js"); - const POLL_INTERVAL = "30s"; - const MAX_POLLS = Math.ceil((maxPollMinutes * 60) / 30); - let pollCount = 0; - - while (pollCount < MAX_POLLS) { - await sleep(POLL_INTERVAL); - pollCount++; - const status = await checkPhaseDone(sandboxId, sentinelFile); - if (status === true) return true; - if (status === "stopped") return false; - } - return false; -} - -// --- Main Workflow --- - -const MAX_REVIEW_RETRIES = 2; - -export async function agentWorkflow(ticketId: string) { - "use workflow"; - - const { env } = await import("../../env.js"); - const { getPrompt } = await import("../lib/prompts.js"); - const { buildPhaseScript } = await import("../sandbox/wrapper-script.js"); - const { parseResearchStatus } = await import("../sandbox/agent-runner.js"); - const { parseAgentOutput } = await import("../sandbox/agent-runner.js"); - const { parseReviewOutput, REVIEW_SCHEMA, AGENT_SCHEMA } = await import("../sandbox/agent-runner.js"); - const { assembleResearchPlanContext, assembleImplementationContext, assembleImplementationRetryContext, assembleReviewContext } = - await import("../sandbox/context.js"); - const { collectPhaseOutput } = await import("../sandbox/poll-agent.js"); - const { pushFromSandbox, fixAndRetryPush, teardownSandbox } = await import("../sandbox/poll-agent.js"); - - const ticket = await fetchAndValidateTicket(ticketId, env.COLUMN_AI); - if (!ticket) return; - - try { - await notifySlack(`Task ${ticket.identifier} started`); - - const branchName = `blazebot/${ticket.identifier.toLowerCase()}`; - await createFeatureBranch(branchName, env.GITHUB_BASE_BRANCH); - - // Check for existing PR (review-fix scenario) - const prContext = await fetchPRContext(branchName); - const mergeBase = prContext?.hasConflicts ? env.GITHUB_BASE_BRANCH : undefined; - - // Provision sandbox once for all phases - const sandboxId = await provisionSandbox(branchName, mergeBase); - - try { - // ========== PHASE 1: Research & Plan ========== - await configureStopHook(sandboxId, false); - - const researchInput = assembleResearchPlanContext({ - ticket: { - identifier: ticket.identifier, - title: ticket.title, - description: ticket.description, - acceptanceCriteria: ticket.acceptanceCriteria, - comments: ticket.comments, - }, - prompt: getPrompt("research-plan.md"), - branchName, - prComments: prContext?.prComments, - checkResults: prContext?.checkResults, - hasConflicts: prContext?.hasConflicts, - }); - - const researchScript = buildPhaseScript({ - model: env.CLAUDE_MODEL, - phase: "research", - inputFile: "/tmp/research-requirements.md", - outputFile: "/tmp/research-stdout.txt", - stderrFile: "/tmp/research-stderr.txt", - sentinelFile: "/tmp/research-done", - }); - - await writeAndStartPhase( - sandboxId, - "/tmp/research-requirements.md", researchInput, - "/tmp/research-wrapper.sh", researchScript, - ); - - const researchDone = await pollUntilDone(sandboxId, "/tmp/research-done", 20); - if (!researchDone) { - await moveTicket(ticketId, env.COLUMN_BACKLOG); - await notifySlack(`Task ${ticket.identifier} failed: research phase timed out`); - await unregisterRun(ticket.identifier); - return; - } - - const researchRaw = await collectPhaseOutput(sandboxId, "/tmp/research-stdout.txt", "/tmp/research-stderr.txt"); - const research = parseResearchStatus(researchRaw); - - if (research.status === "clarification_needed") { - const questions = research.body.split("\n").filter((l) => /^\d+\./.test(l.trim())); - await postClarificationAndMoveBack(ticketId, questions.length > 0 ? questions : [research.body], ticket.identifier, env.COLUMN_BACKLOG); - await notifySlack(`Task ${ticket.identifier} needs clarification`); - await unregisterRun(ticket.identifier); - return; - } - - if (research.status === "failed") { - await moveTicket(ticketId, env.COLUMN_BACKLOG); - await notifySlack(`Task ${ticket.identifier} failed: research — ${research.body.slice(0, 200)}`); - await unregisterRun(ticket.identifier); - return; - } - - const researchPlanMarkdown = research.body; - - // ========== PHASE 2 & 3 LOOP ========== - let reviewRetries = 0; - let lastReviewFeedback: ReviewOutput | undefined; - - while (true) { - // ========== PHASE 2: Implementation ========== - await configureStopHook(sandboxId, true); - - const implInput = lastReviewFeedback - ? assembleImplementationRetryContext({ - ticket: { identifier: ticket.identifier, title: ticket.title, description: ticket.description, acceptanceCriteria: ticket.acceptanceCriteria, comments: ticket.comments }, - prompt: getPrompt("implement.md"), - researchPlanMarkdown, - reviewFeedback: lastReviewFeedback, - }) - : assembleImplementationContext({ - ticket: { identifier: ticket.identifier, title: ticket.title, description: ticket.description, acceptanceCriteria: ticket.acceptanceCriteria, comments: ticket.comments }, - prompt: getPrompt("implement.md"), - researchPlanMarkdown, - }); - - const implScript = buildPhaseScript({ - model: env.CLAUDE_MODEL, - phase: "impl", - inputFile: "/tmp/impl-requirements.md", - outputFile: "/tmp/impl-stdout.txt", - stderrFile: "/tmp/impl-stderr.txt", - sentinelFile: "/tmp/impl-done", - jsonSchema: AGENT_SCHEMA, - }); - - await writeAndStartPhase( - sandboxId, - "/tmp/impl-requirements.md", implInput, - "/tmp/impl-wrapper.sh", implScript, - ); - - const implDone = await pollUntilDone(sandboxId, "/tmp/impl-done", 35); - let implOutput: AgentOutput; - - if (implDone) { - const implRaw = await collectPhaseOutput(sandboxId, "/tmp/impl-stdout.txt", "/tmp/impl-stderr.txt"); - implOutput = parseAgentOutput(implRaw); - } else { - implOutput = { result: "failed", error: "Implementation phase timed out" }; - } - - if (implOutput.result === "clarification_needed") { - await postClarificationAndMoveBack(ticketId, implOutput.questions ?? [], ticket.identifier, env.COLUMN_BACKLOG); - await notifySlack(`Task ${ticket.identifier} needs clarification`); - await unregisterRun(ticket.identifier); - return; - } - - if (implOutput.result === "failed") { - await moveTicket(ticketId, env.COLUMN_BACKLOG); - await notifySlack(`Task ${ticket.identifier} failed: implementation — ${implOutput.error ?? "unknown"}`); - await unregisterRun(ticket.identifier); - return; - } - - // ========== PHASE 3: Review ========== - await configureStopHook(sandboxId, false); - - const gitDiff = await captureGitDiff(sandboxId); - - const reviewInput = assembleReviewContext({ - ticket: { identifier: ticket.identifier, title: ticket.title, description: ticket.description, acceptanceCriteria: ticket.acceptanceCriteria, comments: ticket.comments }, - prompt: getPrompt("review.md"), - researchPlanMarkdown, - gitDiff, - }); - - const reviewScript = buildPhaseScript({ - model: env.CLAUDE_MODEL, - phase: "review", - inputFile: "/tmp/review-requirements.md", - outputFile: "/tmp/review-stdout.txt", - stderrFile: "/tmp/review-stderr.txt", - sentinelFile: "/tmp/review-done", - jsonSchema: REVIEW_SCHEMA, - }); - - await writeAndStartPhase( - sandboxId, - "/tmp/review-requirements.md", reviewInput, - "/tmp/review-wrapper.sh", reviewScript, - ); - - const reviewDone = await pollUntilDone(sandboxId, "/tmp/review-done", 15); - let reviewOutput: ReviewOutput; - - if (reviewDone) { - const reviewRaw = await collectPhaseOutput(sandboxId, "/tmp/review-stdout.txt", "/tmp/review-stderr.txt"); - reviewOutput = parseReviewOutput(reviewRaw); - } else { - reviewOutput = { result: "failed", error: "Review phase timed out" }; - } - - if (reviewOutput.result === "approved") { - break; // Exit loop → push - } - - if (reviewOutput.result === "changes_requested") { - reviewRetries++; - if (reviewRetries > MAX_REVIEW_RETRIES) { - await moveTicket(ticketId, env.COLUMN_BACKLOG); - await notifySlack(`Task ${ticket.identifier} failed: review rejected after ${MAX_REVIEW_RETRIES} retries`); - await unregisterRun(ticket.identifier); - return; - } - lastReviewFeedback = reviewOutput; - continue; // Loop back to Phase 2 - } - - // result === "failed" - await moveTicket(ticketId, env.COLUMN_BACKLOG); - await notifySlack(`Task ${ticket.identifier} failed: review — ${reviewOutput.error ?? "unknown"}`); - await unregisterRun(ticket.identifier); - return; - } - - // ========== POST-PHASES: Push & PR ========== - let pushResult = await pushFromSandbox(sandboxId, branchName); - if (!pushResult.pushed && pushResult.error) { - pushResult = await fixAndRetryPush(sandboxId, branchName, pushResult.error); - } - - if (!pushResult.pushed) { - await moveTicket(ticketId, env.COLUMN_BACKLOG); - await notifySlack(`Task ${ticket.identifier} failed: push failed — ${pushResult.error ?? "unknown"}`); - await unregisterRun(ticket.identifier); - return; - } - - await createPullRequest(branchName, ticket.title, ""); - await moveTicket(ticketId, env.COLUMN_AI_REVIEW); - await notifySlack(`Task ${ticket.identifier} PR ready for review`); - await unregisterRun(ticket.identifier); - } finally { - await teardownSandbox(sandboxId); - } - } catch (err) { - console.error(`Workflow failed for ${ticket.identifier}:`, err); - const moved = await moveTicket(ticketId, env.COLUMN_BACKLOG).then(() => true).catch(() => false); - await notifySlack(`Task ${ticket.identifier} failed: ${(err as Error).message ?? "unknown"}`).catch(() => {}); - if (moved) { - await unregisterRun(ticket.identifier).catch(() => {}); - } else { - await markTicketFailed(ticket.identifier, `Failed to move ticket to backlog: ${(err as Error).message ?? "unknown"}`).catch(() => {}); - } - throw err; - } -} -``` - -- [ ] **Step 2: Verify TypeScript compiles** - -Run: `npx tsc --noEmit` -Expected: No type errors - -- [ ] **Step 3: Commit** - -```bash -git add src/workflows/agent.ts -git commit -m "feat: create unified three-phase agentWorkflow" -``` - ---- - -### Task 8: Update dispatch and delete old workflows - -**Files:** -- Modify: `src/lib/dispatch.ts` -- Modify: `src/lib/dispatch.test.ts` -- Delete: `src/workflows/implementation.ts` -- Delete: `src/workflows/review-fix.ts` - -- [ ] **Step 1: Update dispatch.ts to use `agentWorkflow`** - -Replace the workflow imports and `startWorkflow` function: - -```typescript -import { start, getRun } from "workflow/api"; -import { agentWorkflow } from "../workflows/agent.js"; -import { logger } from "./logger.js"; -import type { Adapters } from "./adapters.js"; - -// ... keep CLAIMING_PREFIX, isClaimingSentinel, getClaimTimestamp, DispatchResult, isAtCapacity, getActiveSandboxCount, verifyClaimNotCancelled, abortWorkflow ... - -export async function dispatchTicket( - ticketKey: string, - adapters: Adapters, - maxConcurrentAgents: number, -): Promise { - const { issueTracker, runRegistry } = adapters; - - if (await runRegistry.isTicketFailed(ticketKey)) { - logger.info({ ticketKey }, "dispatch_skipped_previously_failed"); - return { started: false, reason: "previously_failed" }; - } - - if (await isAtCapacity(maxConcurrentAgents)) { - return { started: false, reason: "at_capacity" }; - } - - const claimValue = `${CLAIMING_PREFIX}${Date.now()}`; - const claimed = await runRegistry.claim(ticketKey, claimValue); - if (!claimed) { - logger.info({ ticketKey }, "dispatch_already_claimed"); - return { started: false, reason: "already_claimed" }; - } - - try { - const ticket = await issueTracker.fetchTicket(ticketKey); - - const handle = await start(agentWorkflow, [ticket.id]); - logger.info( - { ticketId: ticket.id, identifier: ticket.identifier, runId: handle.runId }, - "workflow_started", - ); - - const claimStillHeld = await verifyClaimNotCancelled(ticketKey, claimValue, runRegistry); - if (!claimStillHeld) { - await abortWorkflow(handle.runId, ticketKey); - return { started: false, reason: "already_claimed" }; - } - - await runRegistry.register(ticketKey, handle.runId); - return { started: true, runId: handle.runId }; - } catch (err) { - await runRegistry.unregister(ticketKey).catch(() => {}); - logger.warn({ ticketKey, error: (err as Error).message }, "dispatch_error"); - return { started: false, reason: "error" }; - } -} -``` - -Key changes: removed `vcs` from destructure (no longer needed for PR check), removed `branchName` computation, removed `startWorkflow` helper, always `start(agentWorkflow, [ticket.id])`. - -- [ ] **Step 2: Update dispatch.test.ts** - -Replace the mock and test assertions: - -```typescript -// Replace the old workflow mocks with: -vi.mock("../workflows/agent.js", () => ({ - agentWorkflow: "agentWorkflow_sentinel", -})); - -// Remove the reviewFixWorkflow mock entirely - -// In "dispatches implementation workflow when no PR exists" test: -// Remove the findPR assertion -// Change: expect(mockStart).toHaveBeenCalledWith("agentWorkflow_sentinel", ["ticket-001"]); - -// Remove "dispatches review-fix workflow when PR exists" test entirely -// (or change it to verify agentWorkflow is still called regardless of PR) - -// In makeAdapters: findPR is no longer needed in dispatch, remove it from overrides -``` - -- [ ] **Step 3: Run tests** - -Run: `npx vitest run src/lib/dispatch.test.ts` -Expected: All tests PASS - -- [ ] **Step 4: Delete old workflow files** - -```bash -rm src/workflows/implementation.ts src/workflows/review-fix.ts -``` - -- [ ] **Step 5: Check for remaining imports of deleted files** - -Run: `npx vitest run` -If any test imports the old workflows, update those imports. - -- [ ] **Step 6: Commit** - -```bash -git add src/lib/dispatch.ts src/lib/dispatch.test.ts -git rm src/workflows/implementation.ts src/workflows/review-fix.ts -git commit -m "feat: unify dispatch to single agentWorkflow, delete old workflows" -``` - ---- - -### Task 9: Full test suite and type check - -**Files:** -- All modified files - -- [ ] **Step 1: Run the full test suite** - -Run: `npx vitest run` -Expected: All tests PASS - -- [ ] **Step 2: Run TypeScript type check** - -Run: `npx tsc --noEmit` -Expected: No errors - -- [ ] **Step 3: Run linting if configured** - -Run: `npx eslint src/` (or whatever lint command exists in package.json) -Expected: No errors - -- [ ] **Step 4: Fix any remaining issues** - -If any tests fail or types don't check, fix them and commit: - -```bash -git add -A -git commit -m "fix: resolve test and type issues from workflow migration" -``` - ---- - -### Task 10: Verify reconcile.ts still works with new workflow - -**Files:** -- Read: `src/lib/reconcile.ts` - -- [ ] **Step 1: Check reconcile.ts for references to old workflows** - -`reconcile.ts` uses `getRun()` from the workflow SDK and doesn't import workflow functions directly. Verify it still works: - -Run: `npx vitest run src/lib/reconcile.test.ts` -Expected: PASS — reconcile doesn't care which workflow type was started, only that a `runId` exists. - -- [ ] **Step 2: Commit if any changes were needed** - -```bash -git add src/lib/reconcile.ts src/lib/reconcile.test.ts -git commit -m "fix: update reconcile for unified workflow" -``` - -(Skip if no changes needed.) diff --git a/docs/superpowers/plans/2026-04-09-gitlab-vcs-adapter.md b/docs/superpowers/plans/2026-04-09-gitlab-vcs-adapter.md deleted file mode 100644 index efc9005..0000000 --- a/docs/superpowers/plans/2026-04-09-gitlab-vcs-adapter.md +++ /dev/null @@ -1,1016 +0,0 @@ -# GitLab VCS Adapter Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Add a `GitLabAdapter` implementing the `VCSAdapter` interface so the system can target GitLab.com repos via a `VCS_KIND=gitlab` env switch. - -**Architecture:** Direct mirror of the existing `GitHubAdapter` — new `gitlab.ts` file alongside `github.ts`, same interface, same error-handling patterns. Factory functions in `adapters.ts` and `step-adapters.ts` branch on `VCS_KIND`. Env schema updated so GitHub vars are optional when `VCS_KIND=gitlab` and vice versa. - -**Tech Stack:** `@gitbeaker/rest` (GitLab API client), Zod (env validation), Vitest (tests) - ---- - -## File Structure - -| File | Action | Responsibility | -|------|--------|---------------| -| `src/adapters/vcs/gitlab.ts` | **Create** | `GitLabAdapter` class (~250 lines) implementing all 8 `VCSAdapter` methods | -| `src/adapters/vcs/gitlab.test.ts` | **Create** | Unit tests with mocked gitbeaker (~10 test cases) | -| `env.ts` | **Modify** | Add `"gitlab"` to `VCS_KIND` enum, add `GITLAB_*` vars, make `GITHUB_*` vars optional | -| `src/lib/adapters.ts` | **Modify** | Branch VCS creation on `VCS_KIND` | -| `src/lib/step-adapters.ts` | **Modify** | Same branching as `adapters.ts` | -| `package.json` | **Modify** | Add `@gitbeaker/rest` dependency | - -No changes to `types.ts`, `workflows/agent.ts`, `sandbox/poll-agent.ts`, or any other consumer. - ---- - -## Task 1: Add `@gitbeaker/rest` dependency - -**Files:** -- Modify: `package.json` - -- [ ] **Step 1: Install the dependency** - -Run: -```bash -npm install @gitbeaker/rest -``` - -- [ ] **Step 2: Verify installation** - -Run: -```bash -node -e "import('@gitbeaker/rest').then(m => console.log('OK:', Object.keys(m).slice(0,3)))" -``` -Expected: prints `OK:` followed by exported names (e.g. `Gitlab`, `Projects`, etc.) - ---- - -## Task 2: Update env schema for GitLab support - -**Files:** -- Modify: `env.ts:17-22` - -The `VCS_KIND` enum expands to `["github", "gitlab"]`. All `GITHUB_*` vars become optional (only required when `VCS_KIND=github`). New `GITLAB_*` vars are added as optional (only required when `VCS_KIND=gitlab`). Runtime validation happens in the factory — not in the schema. - -- [ ] **Step 1: Write the failing test** - -Create file `src/adapters/vcs/gitlab.test.ts` with a minimal test that imports from `env.ts` and validates the schema accepts `"gitlab"`: - -```typescript -import { describe, it, expect } from "vitest"; - -describe("GitLabAdapter env", () => { - it("VCS_KIND enum includes gitlab (compile-time check)", () => { - // This test validates that the env schema accepts "gitlab" as a VCS_KIND. - // The actual env parsing is handled by @t3-oss/env-core at startup. - // We just verify the type exists for now — the adapter import test comes in Task 3. - expect(["github", "gitlab"]).toContain("gitlab"); - }); -}); -``` - -Run: -```bash -npx vitest run src/adapters/vcs/gitlab.test.ts -``` -Expected: PASS (this is a baseline test; the real validation is compile-time) - -- [ ] **Step 2: Update the VCS_KIND enum** - -In `env.ts`, change line 18: - -```typescript -// Before: -VCS_KIND: z.enum(["github"]), -``` - -```typescript -// After: -VCS_KIND: z.enum(["github", "gitlab"]), -``` - -- [ ] **Step 3: Make GITHUB_* vars optional** - -In `env.ts`, change lines 19-22: - -```typescript -// Before: -GITHUB_TOKEN: z.string().min(1), -GITHUB_OWNER: z.string().min(1), -GITHUB_REPO: z.string().min(1), -GITHUB_BASE_BRANCH: z.string().default("main"), -``` - -```typescript -// After: -GITHUB_TOKEN: z.string().min(1).optional(), -GITHUB_OWNER: z.string().min(1).optional(), -GITHUB_REPO: z.string().min(1).optional(), -GITHUB_BASE_BRANCH: z.string().default("main"), -``` - -Note: `GITHUB_BASE_BRANCH` keeps its `.default("main")` — it's already effectively optional. - -- [ ] **Step 4: Add GITLAB_* vars** - -In `env.ts`, add after the GitHub vars block (after `GITHUB_BASE_BRANCH`): - -```typescript - // GitLab VCS - GITLAB_TOKEN: z.string().min(1).optional(), - GITLAB_PROJECT_ID: z.string().min(1).optional(), - GITLAB_BASE_BRANCH: z.string().default("main"), -``` - -- [ ] **Step 5: Run typecheck** - -Run: -```bash -npx tsc --noEmit -``` -Expected: PASS — no type errors. Existing code that accesses `env.GITHUB_TOKEN` will now get `string | undefined`, but we fix that in Task 5 (factory update). If the typecheck fails here due to those accesses, that's expected and we'll fix them in Task 5. - ---- - -## Task 3: Implement `GitLabAdapter` — `createBranch` - -**Files:** -- Create: `src/adapters/vcs/gitlab.ts` - -- [ ] **Step 1: Write the failing tests for createBranch** - -Add to `src/adapters/vcs/gitlab.test.ts`: - -```typescript -import { describe, it, expect, vi, beforeEach } from "vitest"; -import { GitLabAdapter } from "./gitlab.js"; - -const mockBranches = { - create: vi.fn(), - remove: vi.fn(), - show: vi.fn(), -}; - -const mockRepositoryFiles = { - create: vi.fn(), -}; - -const mockCommits = { - create: vi.fn(), -}; - -const mockMergeRequests = { - create: vi.fn(), - all: vi.fn(), - show: vi.fn(), - allPipelines: vi.fn(), -}; - -const mockMergeRequestNotes = { - all: vi.fn(), -}; - -const mockMergeRequestDiscussions = { - all: vi.fn(), -}; - -const mockJobs = { - all: vi.fn(), - showLog: vi.fn(), -}; - -vi.mock("@gitbeaker/rest", () => ({ - Gitlab: vi.fn(() => ({ - Branches: mockBranches, - RepositoryFiles: mockRepositoryFiles, - Commits: mockCommits, - MergeRequests: mockMergeRequests, - MergeRequestNotes: mockMergeRequestNotes, - MergeRequestDiscussions: mockMergeRequestDiscussions, - Jobs: mockJobs, - })), -})); - -function glAdapter() { - return new GitLabAdapter({ - token: "glpat-xxxxxxxxxxxx", - projectId: "blazity/demo-app", - baseBranch: "main", - }); -} - -describe("GitLabAdapter", () => { - beforeEach(() => { - vi.clearAllMocks(); - }); - - describe("createBranch", () => { - it("creates branch from base ref", async () => { - mockBranches.create.mockResolvedValueOnce({}); - - const adapter = glAdapter(); - await adapter.createBranch("feat/test", "main"); - - expect(mockBranches.create).toHaveBeenCalledWith( - "blazity/demo-app", - "feat/test", - "main", - ); - }); - - it("seeds empty repo on 404 then creates branch", async () => { - const error = new Error("404 Branch Not Found") as any; - error.cause = { response: { status: 404 } }; - mockBranches.create.mockRejectedValueOnce(error); - mockRepositoryFiles.create.mockResolvedValueOnce({ - branch: "main", - }); - // Second create call succeeds after seeding - mockBranches.create.mockResolvedValueOnce({}); - - const adapter = glAdapter(); - await adapter.createBranch("feat/test", "main"); - - expect(mockRepositoryFiles.create).toHaveBeenCalledWith( - "blazity/demo-app", - "README.md", - "main", - "Initial commit", - "# Repository\n", - ); - expect(mockBranches.create).toHaveBeenCalledTimes(2); - }); - - it("force-resets existing branch by deleting and recreating on 400", async () => { - const error = new Error("Branch already exists") as any; - error.cause = { response: { status: 400 } }; - mockBranches.create.mockRejectedValueOnce(error); - mockBranches.remove.mockResolvedValueOnce({}); - // Recreate succeeds - mockBranches.create.mockResolvedValueOnce({}); - - const adapter = glAdapter(); - await adapter.createBranch("feat/test", "main"); - - expect(mockBranches.remove).toHaveBeenCalledWith( - "blazity/demo-app", - "feat/test", - ); - expect(mockBranches.create).toHaveBeenCalledTimes(2); - }); - }); -}); -``` - -Run: -```bash -npx vitest run src/adapters/vcs/gitlab.test.ts -``` -Expected: FAIL — `./gitlab.js` does not exist - -- [ ] **Step 2: Create `gitlab.ts` with config and createBranch** - -Create `src/adapters/vcs/gitlab.ts`: - -```typescript -import { Gitlab } from "@gitbeaker/rest"; -import { FatalError } from "workflow"; -import type { - VCSAdapter, - PullRequest, - PRComment, - CheckRunResult, -} from "./types.js"; - -export interface GitLabConfig { - token: string; - projectId: string; - baseBranch: string; -} - -export class GitLabAdapter implements VCSAdapter { - private gl: InstanceType; - private projectId: string; - private baseBranch: string; - - constructor(private config: GitLabConfig) { - this.gl = new Gitlab({ token: config.token }); - this.projectId = config.projectId; - this.baseBranch = config.baseBranch; - } - - async createBranch(name: string, base: string): Promise { - try { - await this.gl.Branches.create(this.projectId, name, base); - } catch (err: any) { - const status = this.getStatusCode(err); - - if (status === 404) { - // Empty repo — seed with a README, then retry - await this.seedEmptyRepo(base); - await this.gl.Branches.create(this.projectId, name, base); - return; - } - - if (status === 400) { - // Branch already exists — delete and recreate - await this.gl.Branches.remove(this.projectId, name); - await this.gl.Branches.create(this.projectId, name, base); - return; - } - - throw err; - } - } - - private async seedEmptyRepo(branch: string): Promise { - try { - await this.gl.RepositoryFiles.create( - this.projectId, - "README.md", - branch, - "Initial commit", - "# Repository\n", - ); - } catch (err: any) { - throw new Error( - `Failed to seed empty repository ${this.projectId}: ${err.message}`, - ); - } - } - - private getStatusCode(err: any): number | undefined { - return err?.cause?.response?.status ?? err?.status ?? err?.statusCode; - } - - // Stub methods — implemented in subsequent tasks - async createPR( - _branch: string, - _title: string, - _body: string, - ): Promise { - throw new Error("Not implemented"); - } - - async push( - _branch: string, - _files: Array<{ path: string; content: string }>, - _options?: { mergeParentSha?: string }, - ): Promise { - throw new Error("Not implemented"); - } - - async getBranchSha(_branch: string): Promise { - throw new Error("Not implemented"); - } - - async getPRComments(_prId: number): Promise { - throw new Error("Not implemented"); - } - - async getCheckRunResults(_prId: number): Promise { - throw new Error("Not implemented"); - } - - async getPRConflictStatus(_prId: number): Promise { - throw new Error("Not implemented"); - } - - async findPR(_branch: string): Promise { - throw new Error("Not implemented"); - } -} -``` - -- [ ] **Step 3: Run tests to verify they pass** - -Run: -```bash -npx vitest run src/adapters/vcs/gitlab.test.ts -``` -Expected: PASS — all 3 createBranch tests pass - ---- - -## Task 4: Implement `createPR`, `push`, `getBranchSha`, `findPR` - -**Files:** -- Modify: `src/adapters/vcs/gitlab.ts` -- Modify: `src/adapters/vcs/gitlab.test.ts` - -- [ ] **Step 1: Write failing tests for createPR, push, getBranchSha, findPR** - -Add these test blocks inside the `describe("GitLabAdapter", ...)` block in `gitlab.test.ts`: - -```typescript - describe("createPR", () => { - it("creates a merge request", async () => { - mockMergeRequests.create.mockResolvedValueOnce({ - iid: 42, - web_url: "https://gitlab.com/blazity/demo-app/-/merge_requests/42", - }); - - const adapter = glAdapter(); - const pr = await adapter.createPR("feat/test", "Add feature", "Description"); - - expect(pr.id).toBe(42); - expect(pr.url).toContain("/merge_requests/42"); - expect(pr.branch).toBe("feat/test"); - expect(mockMergeRequests.create).toHaveBeenCalledWith( - "blazity/demo-app", - "feat/test", - "main", - "Add feature", - { description: "Description" }, - ); - }); - - it("throws FatalError on 409", async () => { - const error = new Error("MR already exists") as any; - error.cause = { response: { status: 409 } }; - mockMergeRequests.create.mockRejectedValueOnce(error); - - const adapter = glAdapter(); - await expect( - adapter.createPR("feat/test", "Title", "Body"), - ).rejects.toThrow("MR already exists"); - }); - - it("throws FatalError on 404", async () => { - const error = new Error("Project not found") as any; - error.cause = { response: { status: 404 } }; - mockMergeRequests.create.mockRejectedValueOnce(error); - - const adapter = glAdapter(); - await expect( - adapter.createPR("feat/test", "Title", "Body"), - ).rejects.toThrow("Project not found"); - }); - }); - - describe("push", () => { - it("creates a commit with file actions", async () => { - mockCommits.create.mockResolvedValueOnce({}); - - const adapter = glAdapter(); - await adapter.push("feat/test", [ - { path: "src/index.ts", content: "console.log('hello');" }, - { path: "src/utils.ts", content: "export const add = (a: number, b: number) => a + b;" }, - ]); - - expect(mockCommits.create).toHaveBeenCalledWith( - "blazity/demo-app", - "feat/test", - "feat: agent implementation", - [ - { action: "update", filePath: "src/index.ts", content: "console.log('hello');" }, - { action: "update", filePath: "src/utils.ts", content: "export const add = (a: number, b: number) => a + b;" }, - ], - ); - }); - }); - - describe("getBranchSha", () => { - it("returns the commit SHA of a branch", async () => { - mockBranches.show.mockResolvedValueOnce({ - commit: { id: "abc123def456" }, - }); - - const adapter = glAdapter(); - const sha = await adapter.getBranchSha("feat/test"); - - expect(sha).toBe("abc123def456"); - expect(mockBranches.show).toHaveBeenCalledWith( - "blazity/demo-app", - "feat/test", - ); - }); - }); - - describe("findPR", () => { - it("returns null when no MR exists", async () => { - mockMergeRequests.all.mockResolvedValueOnce([]); - - const adapter = glAdapter(); - const pr = await adapter.findPR("feat/test"); - expect(pr).toBeNull(); - }); - - it("returns MR when one exists", async () => { - mockMergeRequests.all.mockResolvedValueOnce([ - { - iid: 42, - web_url: "https://gitlab.com/blazity/demo-app/-/merge_requests/42", - source_branch: "feat/test", - }, - ]); - - const adapter = glAdapter(); - const pr = await adapter.findPR("feat/test"); - expect(pr).not.toBeNull(); - expect(pr!.id).toBe(42); - expect(pr!.branch).toBe("feat/test"); - }); - }); -``` - -Run: -```bash -npx vitest run src/adapters/vcs/gitlab.test.ts -``` -Expected: FAIL — "Not implemented" errors from stub methods - -- [ ] **Step 2: Implement createPR** - -In `gitlab.ts`, replace the `createPR` stub with: - -```typescript - async createPR( - branch: string, - title: string, - body: string, - ): Promise { - try { - const mr = await this.gl.MergeRequests.create( - this.projectId, - branch, - this.baseBranch, - title, - { description: body }, - ); - return { id: mr.iid, url: mr.web_url, branch }; - } catch (err: any) { - const status = this.getStatusCode(err); - if (status === 409 || status === 404) { - throw new FatalError(err.message); - } - throw err; - } - } -``` - -- [ ] **Step 3: Implement push** - -In `gitlab.ts`, replace the `push` stub with: - -```typescript - async push( - branch: string, - files: Array<{ path: string; content: string }>, - _options?: { mergeParentSha?: string }, - ): Promise { - const actions = files.map((f) => ({ - action: "update" as const, - filePath: f.path, - content: f.content, - })); - - await this.gl.Commits.create( - this.projectId, - branch, - "feat: agent implementation", - actions, - ); - } -``` - -Note: `mergeParentSha` is intentionally ignored per the spec — GitLab's Commits API doesn't support multi-parent commits. The workflow's conflict resolution flow handles this by recreating the branch from base when conflicts are detected. - -- [ ] **Step 4: Implement getBranchSha** - -In `gitlab.ts`, replace the `getBranchSha` stub with: - -```typescript - async getBranchSha(branch: string): Promise { - const data = await this.gl.Branches.show(this.projectId, branch); - return data.commit.id; - } -``` - -- [ ] **Step 5: Implement findPR** - -In `gitlab.ts`, replace the `findPR` stub with: - -```typescript - async findPR(branch: string): Promise { - const mrs = await this.gl.MergeRequests.all({ - projectId: this.projectId, - sourceBranch: branch, - state: "opened", - }); - if (mrs.length === 0) return null; - const mr = mrs[0]; - return { id: mr.iid, url: mr.web_url, branch: mr.source_branch }; - } -``` - -- [ ] **Step 6: Run tests to verify they pass** - -Run: -```bash -npx vitest run src/adapters/vcs/gitlab.test.ts -``` -Expected: PASS — all tests pass including new ones - ---- - -## Task 5: Implement `getPRComments`, `getCheckRunResults`, `getPRConflictStatus` - -**Files:** -- Modify: `src/adapters/vcs/gitlab.ts` -- Modify: `src/adapters/vcs/gitlab.test.ts` - -- [ ] **Step 1: Write failing tests** - -Add these test blocks inside `describe("GitLabAdapter", ...)` in `gitlab.test.ts`: - -```typescript - describe("getPRComments", () => { - it("combines discussion notes and general notes", async () => { - mockMergeRequestDiscussions.all.mockResolvedValueOnce([ - { - notes: [ - { - author: { username: "reviewer1" }, - body: "Inline comment on line 10", - system: false, - type: "DiffNote", - position: { new_path: "src/index.ts", new_line: 10 }, - }, - ], - }, - ]); - mockMergeRequestNotes.all.mockResolvedValueOnce([ - { - author: { username: "reviewer2" }, - body: "General comment", - system: false, - type: null, - }, - ]); - - const adapter = glAdapter(); - const comments = await adapter.getPRComments(42); - - expect(comments).toHaveLength(2); - expect(comments[0]).toEqual({ - author: "reviewer1", - body: "Inline comment on line 10", - liked: false, - filePath: "src/index.ts", - startLine: 10, - endLine: 10, - }); - expect(comments[1]).toEqual({ - author: "reviewer2", - body: "General comment", - liked: false, - }); - }); - }); - - describe("getCheckRunResults", () => { - it("maps GitLab CI job statuses to CheckRunResult", async () => { - mockMergeRequests.show.mockResolvedValueOnce({ sha: "head-sha-123" }); - mockMergeRequests.allPipelines.mockResolvedValueOnce([ - { id: 100, status: "failed" }, - ]); - mockJobs.all.mockResolvedValueOnce([ - { id: 1, name: "lint", status: "success" }, - { id: 2, name: "test", status: "failed" }, - { id: 3, name: "build", status: "running" }, - ]); - mockJobs.showLog.mockResolvedValueOnce("Error: test failed on line 42"); - - const adapter = glAdapter(); - const results = await adapter.getCheckRunResults(42); - - expect(results).toHaveLength(3); - expect(results[0]).toEqual({ - name: "lint", - status: "completed", - conclusion: "success", - }); - expect(results[1]).toEqual({ - name: "test", - status: "completed", - conclusion: "failure", - logs: "Error: test failed on line 42", - }); - expect(results[2]).toEqual({ - name: "build", - status: "in_progress", - conclusion: null, - }); - }); - }); - - describe("getPRConflictStatus", () => { - it("returns true when MR has conflicts", async () => { - mockMergeRequests.show.mockResolvedValueOnce({ has_conflicts: true }); - - const adapter = glAdapter(); - const hasConflicts = await adapter.getPRConflictStatus(42); - expect(hasConflicts).toBe(true); - }); - - it("returns false when MR has no conflicts", async () => { - mockMergeRequests.show.mockResolvedValueOnce({ has_conflicts: false }); - - const adapter = glAdapter(); - const hasConflicts = await adapter.getPRConflictStatus(42); - expect(hasConflicts).toBe(false); - }); - }); -``` - -Run: -```bash -npx vitest run src/adapters/vcs/gitlab.test.ts -``` -Expected: FAIL — "Not implemented" errors from stub methods - -- [ ] **Step 2: Implement getPRComments** - -In `gitlab.ts`, replace the `getPRComments` stub with: - -```typescript - async getPRComments(prId: number): Promise { - const comments: PRComment[] = []; - - // Fetch inline/diff comments from discussions - const discussions = await this.gl.MergeRequestDiscussions.all( - this.projectId, - prId, - ); - for (const discussion of discussions) { - for (const note of discussion.notes ?? []) { - if (note.system) continue; - if (note.type !== "DiffNote") continue; - comments.push({ - author: note.author?.username ?? "unknown", - body: note.body ?? "", - liked: false, - filePath: note.position?.new_path, - startLine: note.position?.new_line, - endLine: note.position?.new_line, - }); - } - } - - // Fetch general (non-inline, non-system) notes - const notes = await this.gl.MergeRequestNotes.all(this.projectId, prId); - for (const note of notes) { - if (note.system) continue; - if (note.type === "DiffNote") continue; // already captured above - comments.push({ - author: note.author?.username ?? "unknown", - body: note.body ?? "", - liked: false, - }); - } - - return comments; - } -``` - -- [ ] **Step 3: Implement getCheckRunResults** - -In `gitlab.ts`, replace the `getCheckRunResults` stub with: - -```typescript - async getCheckRunResults(prId: number): Promise { - const mr = await this.gl.MergeRequests.show(this.projectId, prId); - const pipelines = await this.gl.MergeRequests.allPipelines( - this.projectId, - prId, - ); - - if (pipelines.length === 0) return []; - - // Use the most recent pipeline - const latestPipeline = pipelines[0]; - const jobs = await this.gl.Jobs.all(this.projectId, latestPipeline.id); - - const results: CheckRunResult[] = []; - for (const job of jobs) { - const mapped = this.mapJobStatus(job.status); - const entry: CheckRunResult = { - name: job.name, - status: mapped.status, - conclusion: mapped.conclusion, - }; - - // Fetch logs for failed jobs - if ( - mapped.status === "completed" && - mapped.conclusion !== "success" && - mapped.conclusion !== null && - mapped.conclusion !== "skipped" && - mapped.conclusion !== "cancelled" - ) { - try { - const log = await this.gl.Jobs.showLog(this.projectId, job.id); - entry.logs = String(log); - } catch { - // Log fetching is best-effort - } - } - - results.push(entry); - } - - return results; - } - - private mapJobStatus( - status: string, - ): Pick { - switch (status) { - case "success": - return { status: "completed", conclusion: "success" }; - case "failed": - return { status: "completed", conclusion: "failure" }; - case "running": - return { status: "in_progress", conclusion: null }; - case "pending": - case "created": - return { status: "queued", conclusion: null }; - case "canceled": - return { status: "completed", conclusion: "cancelled" }; - case "skipped": - return { status: "completed", conclusion: "skipped" }; - default: - return { status: "queued", conclusion: null }; - } - } -``` - -- [ ] **Step 4: Implement getPRConflictStatus** - -In `gitlab.ts`, replace the `getPRConflictStatus` stub with: - -```typescript - async getPRConflictStatus(prId: number): Promise { - const mr = await this.gl.MergeRequests.show(this.projectId, prId); - return mr.has_conflicts === true; - } -``` - -- [ ] **Step 5: Run tests to verify they pass** - -Run: -```bash -npx vitest run src/adapters/vcs/gitlab.test.ts -``` -Expected: PASS — all tests pass - ---- - -## Task 6: Update factory functions in adapters.ts and step-adapters.ts - -**Files:** -- Modify: `src/lib/adapters.ts:1-42` -- Modify: `src/lib/step-adapters.ts:1-42` - -- [ ] **Step 1: Update `adapters.ts`** - -Add the import at the top of `src/lib/adapters.ts`: - -```typescript -import { GitLabAdapter } from "../adapters/vcs/gitlab.js"; -``` - -Then extract VCS creation into a helper and use it. Replace the `vcs:` line in `createAdapters()`: - -```typescript -function createVCS(): VCSAdapter { - if (env.VCS_KIND === "gitlab") { - return new GitLabAdapter({ - token: env.GITLAB_TOKEN!, - projectId: env.GITLAB_PROJECT_ID!, - baseBranch: env.GITLAB_BASE_BRANCH ?? "main", - }); - } - return new GitHubAdapter({ - token: env.GITHUB_TOKEN!, - owner: env.GITHUB_OWNER!, - repo: env.GITHUB_REPO!, - baseBranch: env.GITHUB_BASE_BRANCH ?? "main", - }); -} -``` - -And update the `createAdapters` return to use it: - -```typescript -export function createAdapters(): Adapters { - return { - issueTracker: new JiraAdapter({ - baseUrl: env.JIRA_BASE_URL, - email: env.JIRA_EMAIL, - apiToken: env.JIRA_API_TOKEN, - projectKey: env.JIRA_PROJECT_KEY, - }), - vcs: createVCS(), - messaging: new ChatSDKAdapter({ - slackToken: env.CHAT_SDK_SLACK_TOKEN, - channelId: env.CHAT_SDK_CHANNEL_ID, - botName: env.CHAT_SDK_BOT_NAME, - }), - runRegistry: new UpstashRunRegistry({ - url: env.AI_WORKFLOW_KV_REST_API_URL, - token: env.AI_WORKFLOW_KV_REST_API_TOKEN, - }), - }; -} -``` - -- [ ] **Step 2: Update `step-adapters.ts`** - -Apply the identical change to `src/lib/step-adapters.ts`: - -Add the import: -```typescript -import { GitLabAdapter } from "../adapters/vcs/gitlab.js"; -``` - -Add the same `createVCS()` helper (duplicate is fine — these files are independent entry points): - -```typescript -function createVCS(): VCSAdapter { - if (env.VCS_KIND === "gitlab") { - return new GitLabAdapter({ - token: env.GITLAB_TOKEN!, - projectId: env.GITLAB_PROJECT_ID!, - baseBranch: env.GITLAB_BASE_BRANCH ?? "main", - }); - } - return new GitHubAdapter({ - token: env.GITHUB_TOKEN!, - owner: env.GITHUB_OWNER!, - repo: env.GITHUB_REPO!, - baseBranch: env.GITHUB_BASE_BRANCH ?? "main", - }); -} -``` - -Update `createStepAdapters()` to use `vcs: createVCS()`. - -- [ ] **Step 3: Run typecheck** - -Run: -```bash -npx tsc --noEmit -``` -Expected: PASS — no type errors. The `!` non-null assertions match the pattern described in the spec (runtime validation via assertion, not Zod refine). - -- [ ] **Step 4: Run all unit tests** - -Run: -```bash -npx vitest run -``` -Expected: PASS — all existing tests plus new GitLab tests pass - ---- - -## Task 7: Final verification - -**Files:** (none — read-only checks) - -- [ ] **Step 1: Run full test suite** - -Run: -```bash -npx vitest run -``` -Expected: All tests PASS - -- [ ] **Step 2: Run typecheck** - -Run: -```bash -npx tsc --noEmit -``` -Expected: PASS - -- [ ] **Step 3: Verify file inventory matches spec** - -Confirm these files were created/modified: - -| File | Expected | -|------|----------| -| `src/adapters/vcs/gitlab.ts` | New — ~250 lines | -| `src/adapters/vcs/gitlab.test.ts` | New — ~10 test cases | -| `env.ts` | Modified — `VCS_KIND` enum, `GITLAB_*` vars, `GITHUB_*` optional | -| `src/lib/adapters.ts` | Modified — `createVCS()` helper | -| `src/lib/step-adapters.ts` | Modified — `createVCS()` helper | -| `package.json` | Modified — `@gitbeaker/rest` added | - -Run: -```bash -git diff --stat main -``` diff --git a/docs/superpowers/plans/2026-04-13-jira-ticket-attachments.md b/docs/superpowers/plans/2026-04-13-jira-ticket-attachments.md deleted file mode 100644 index 89d2b30..0000000 --- a/docs/superpowers/plans/2026-04-13-jira-ticket-attachments.md +++ /dev/null @@ -1,1841 +0,0 @@ -# Jira Ticket Attachments Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Make Jira ticket file attachments available to the agent inside the sandbox at `/tmp/attachments/` and advertise them via an index in `requirements.md`, so the agent can read mockups, spec PDFs, sample fixtures, and screenshots during research, implementation, and review. - -**Architecture:** Extend `JiraAdapter` to surface attachment metadata and download bytes. Add a pure helper module `src/sandbox/attachments.ts` containing the retry loop, filename sanitizer, index formatter, and byte formatter. Wire two new workflow steps (`fetchAttachments`, `writeAttachments`) into `agentWorkflow` between `provisionSandbox` and Phase 1. Thread a `DownloadedAttachment[]` parameter through the four `assembleXContext` functions so an `## Attachments` section is rendered once per phase. Safety caps and timeouts come from new env vars. - -**Tech Stack:** TypeScript, Vitest, `@vercel/sandbox` (writeFiles), `@t3-oss/env-core` + Zod (env), pino (logging), native `fetch` + `AbortSignal` (downloads). - ---- - -## File Structure - -| File | Action | Responsibility | -|------|--------|---------------| -| `src/adapters/issue-tracker/types.ts` | **Modify** | Add `TicketAttachment` interface; add `attachments: TicketAttachment[]` to `TicketContent`. | -| `src/adapters/issue-tracker/jira.ts` | **Modify** | Request `attachment` field in `fetchTicket`; map `data.fields.attachment` to `TicketAttachment[]`; add `downloadAttachment(url)` with manual-redirect auth-stripping and timeout. | -| `src/adapters/issue-tracker/jira.test.ts` | **Modify** | Extend tests: attachment parsing (present + absent), `downloadAttachment` follows one 302 and drops `Authorization` on the redirect. | -| `src/sandbox/attachments.ts` | **Create** | Pure helpers: `sanitizeFilename`, `formatBytes`, `formatAttachmentsIndex`, and the retry/caps loop `fetchAttachmentsWithRetry`. Exports `DownloadedAttachment` type. | -| `src/sandbox/attachments.test.ts` | **Create** | Unit tests for all four exports. | -| `src/sandbox/context.ts` | **Modify** | Accept optional `attachments` on all four `assembleXContext` functions; inject `## Attachments` section after the Ticket header block. | -| `src/sandbox/context.test.ts` | **Modify** | Add cases for attachment index rendering (present / empty / all-failed / mixed). | -| `src/workflows/agent.ts` | **Modify** | Two new `"use step"` functions (`fetchAttachments` before `provisionSandbox`, `writeAttachments` as the first action inside the sandbox `try {}`); forward the result into all four `assembleXContext` calls. | -| `env.ts` | **Modify** | Four new server vars: `ATTACHMENT_MAX_FILE_SIZE_MB`, `ATTACHMENT_MAX_TOTAL_SIZE_MB`, `ATTACHMENT_MAX_COUNT`, `ATTACHMENT_DOWNLOAD_TIMEOUT_MS`. | - -No changes to VCS adapters, run registry, Slack messaging, or sandbox manager. - ---- - -## Shared Types (referenced by multiple tasks) - -Defined in Task 2 (`TicketAttachment`) and Task 4 (`DownloadedAttachment`). Reproduced here so steps that use them later don't have to repeat the shape: - -```ts -// src/adapters/issue-tracker/types.ts -export interface TicketAttachment { - id: string; - filename: string; - mimeType: string; - size: number; - contentUrl: string; -} -``` - -```ts -// src/sandbox/attachments.ts -export interface DownloadedAttachment { - filename: string; // sanitized, collision-resolved - originalFilename: string; - mimeType: string; - size: number; - content?: Buffer; // present only on success - failed?: { reason: string; attempts: number }; // present only on failure -} - -export interface AttachmentCaps { - maxFileSizeBytes: number; - maxTotalSizeBytes: number; - maxCount: number; - downloadTimeoutMs: number; -} -``` - ---- - -## Task 1: Add safety-cap env vars - -**Files:** -- Modify: `env.ts` - -- [ ] **Step 1: Add the four new vars to the `server` block** - -In `env.ts`, inside the `server: { ... }` object in `createEnv(...)`, add the following entries (place them after the existing `Sandbox` group, before `POLL_INTERVAL_MS`): - -```ts - // Attachments - ATTACHMENT_MAX_FILE_SIZE_MB: z.coerce.number().int().positive().default(25), - ATTACHMENT_MAX_TOTAL_SIZE_MB: z.coerce.number().int().positive().default(100), - ATTACHMENT_MAX_COUNT: z.coerce.number().int().positive().default(20), - ATTACHMENT_DOWNLOAD_TIMEOUT_MS: z.coerce.number().int().positive().default(30_000), -``` - -- [ ] **Step 2: Verify typecheck passes** - -Run: `npm run typecheck` -Expected: exits 0 with no errors. - -- [ ] **Step 3: Verify env loads with defaults** - -Run: -```bash -node -e "import('./env.ts').then(m => console.log({ - file: m.env.ATTACHMENT_MAX_FILE_SIZE_MB, - total: m.env.ATTACHMENT_MAX_TOTAL_SIZE_MB, - count: m.env.ATTACHMENT_MAX_COUNT, - timeout: m.env.ATTACHMENT_DOWNLOAD_TIMEOUT_MS, -}))" -``` -Expected: prints `{ file: 25, total: 100, count: 20, timeout: 30000 }` (or the overrides if already set in the environment). - ---- - -## Task 2: Add `TicketAttachment` type and extend `TicketContent` - -**Files:** -- Modify: `src/adapters/issue-tracker/types.ts` - -Two targeted edits — add the `TicketAttachment` interface after `TicketComment`, and add a new `attachments` field to `TicketContent`. Do **not** replace the whole file; targeted edits are safer against concurrent changes. - -- [ ] **Step 1: Add `attachments: TicketAttachment[]` to `TicketContent`** - -In `src/adapters/issue-tracker/types.ts`, in the `TicketContent` interface, add a final field immediately after `trackerStatus: string;`: - -```ts - attachments: TicketAttachment[]; -``` - -The interface now reads (for reference): - -```ts -export interface TicketContent { - id: string; - identifier: string; - title: string; - description: string; - acceptanceCriteria: string; - comments: TicketComment[]; - labels: string[]; - trackerStatus: string; - attachments: TicketAttachment[]; -} -``` - -- [ ] **Step 2: Add the `TicketAttachment` interface** - -In the same file, insert the following immediately after the `TicketComment` interface (and before `IssueTrackerAdapter`): - -```ts -export interface TicketAttachment { - id: string; - filename: string; - mimeType: string; - size: number; - contentUrl: string; -} -``` - -- [ ] **Step 2: Run typecheck** - -Run: `npm run typecheck` -Expected: FAIL — the existing `JiraAdapter.fetchTicket` does not populate `attachments`, so TypeScript will flag it. This is expected. Task 3 fixes it. - ---- - -## Task 3: Parse attachment metadata in `JiraAdapter.fetchTicket` - -**Files:** -- Modify: `src/adapters/issue-tracker/jira.ts:41-61` -- Modify: `src/adapters/issue-tracker/jira.test.ts` - -- [ ] **Step 1: Write the failing test** - -Add this `describe` block inside `describe("JiraAdapter", () => { ... })` in `src/adapters/issue-tracker/jira.test.ts`, immediately after the existing `describe("fetchTicket", () => { ... })`: - -```ts - describe("fetchTicket attachments", () => { - it("parses attachment metadata into TicketAttachment[]", async () => { - mockFetch.mockResolvedValueOnce({ - ok: true, - json: async () => ({ - id: "10001", - key: "PROJ-1", - fields: { - summary: "Has attachments", - description: null, - comment: { comments: [] }, - labels: [], - status: { name: "AI" }, - attachment: [ - { - id: "att-1", - filename: "mockup.png", - mimeType: "image/png", - size: 348192, - content: "https://test.atlassian.net/secure/attachment/att-1/mockup.png", - }, - { - id: "att-2", - filename: "spec.pdf", - mimeType: "application/pdf", - size: 52100, - content: "https://test.atlassian.net/secure/attachment/att-2/spec.pdf", - }, - ], - }, - }), - }); - - const adapter = jiraAdapter(); - const ticket = await adapter.fetchTicket("10001"); - - expect(ticket.attachments).toHaveLength(2); - expect(ticket.attachments[0]).toEqual({ - id: "att-1", - filename: "mockup.png", - mimeType: "image/png", - size: 348192, - contentUrl: "https://test.atlassian.net/secure/attachment/att-1/mockup.png", - }); - }); - - it("returns empty attachments array when field is absent", async () => { - mockFetch.mockResolvedValueOnce({ - ok: true, - json: async () => ({ - id: "10002", - key: "PROJ-2", - fields: { - summary: "No attachments", - description: null, - comment: { comments: [] }, - labels: [], - status: { name: "AI" }, - // attachment field intentionally omitted - }, - }), - }); - - const adapter = jiraAdapter(); - const ticket = await adapter.fetchTicket("10002"); - expect(ticket.attachments).toEqual([]); - }); - - it("requests attachment field in the fields query", async () => { - mockFetch.mockResolvedValueOnce({ - ok: true, - json: async () => ({ - id: "10003", - key: "PROJ-3", - fields: { - summary: "x", - description: null, - comment: { comments: [] }, - labels: [], - status: { name: "AI" }, - attachment: [], - }, - }), - }); - - const adapter = jiraAdapter(); - await adapter.fetchTicket("10003"); - const url = mockFetch.mock.calls[0][0] as string; - expect(url).toContain("fields="); - expect(url).toContain("attachment"); - }); - }); -``` - -Also extend the existing "returns normalized ticket content" test so it doesn't break — the `TicketContent` type now requires `attachments`. Add `attachment: []` to the `fields` object in that test's mock response, and add: - -```ts - expect(ticket.attachments).toEqual([]); -``` - -before the closing `});` of the `it` block. - -- [ ] **Step 2: Run the tests to verify they fail** - -Run: `npx vitest run src/adapters/issue-tracker/jira.test.ts` -Expected: FAIL — the new tests report `ticket.attachments` is `undefined`. - -- [ ] **Step 3: Update `fetchTicket` in `jira.ts`** - -In `src/adapters/issue-tracker/jira.ts`, replace the `fetchTicket` method (currently at lines 41-61) with: - -```ts - async fetchTicket(id: string): Promise { - const data = await this.request( - `/rest/api/3/issue/${id}?fields=summary,description,comment,labels,status,attachment`, - ); - return { - id: data.id, - identifier: data.key, - title: data.fields.summary ?? "", - description: extractAdfText(data.fields.description), - acceptanceCriteria: extractAcceptanceCriteria(data.fields.description), - comments: (data.fields.comment?.comments ?? []).map( - (c: any): TicketComment => ({ - author: c.author?.displayName ?? "unknown", - body: extractAdfText(c.body), - createdAt: c.created, - }), - ), - labels: data.fields.labels ?? [], - trackerStatus: data.fields.status?.name ?? "", - attachments: (data.fields.attachment ?? []).map( - (a: any): TicketAttachment => ({ - id: String(a.id), - filename: a.filename ?? "", - mimeType: a.mimeType ?? "application/octet-stream", - size: Number(a.size ?? 0), - contentUrl: a.content ?? "", - }), - ), - }; - } -``` - -Update the existing type import at the top of the file to: - -```ts -import type { IssueTrackerAdapter, TicketContent, TicketComment, TicketAttachment } from "./types.js"; -``` - -- [ ] **Step 4: Run the tests to verify they pass** - -Run: `npx vitest run src/adapters/issue-tracker/jira.test.ts` -Expected: PASS — all attachment tests and the updated existing test pass. - -- [ ] **Step 5: Run typecheck** - -Run: `npm run typecheck` -Expected: exits 0. - ---- - -## Task 4: Add `downloadAttachment` to `JiraAdapter` - -**Files:** -- Modify: `src/adapters/issue-tracker/jira.ts` -- Modify: `src/adapters/issue-tracker/jira.test.ts` - -This step adds a raw-bytes downloader with two quirks: -1. Atlassian's attachment URL returns a 302 to a signed CDN URL. If we re-send `Authorization: Basic ...` on the redirect, the CDN rejects the request because its signed URL is the auth. We must follow one redirect **manually** with a fresh request that omits the `Authorization` header. -2. We bound the whole operation with a 30s (configurable) timeout via `AbortSignal.timeout(ms)`. - -- [ ] **Step 1: Write the failing tests** - -Add this `describe` block in `src/adapters/issue-tracker/jira.test.ts`, after the `describe("fetchTicket attachments", ...)` block: - -```ts - describe("downloadAttachment", () => { - it("follows one 302 redirect without Authorization header", async () => { - const redirectUrl = "https://atlassian-cdn.example/signed?x=1"; - mockFetch - .mockResolvedValueOnce({ - ok: false, - status: 302, - statusText: "Found", - headers: { get: (n: string) => (n.toLowerCase() === "location" ? redirectUrl : null) }, - }) - .mockResolvedValueOnce({ - ok: true, - status: 200, - statusText: "OK", - arrayBuffer: async () => new Uint8Array([0x89, 0x50, 0x4e, 0x47]).buffer, - }); - - const adapter = jiraAdapter(); - const buf = await adapter.downloadAttachment( - "https://test.atlassian.net/secure/attachment/att-1/mockup.png", - ); - - expect(buf).toBeInstanceOf(Buffer); - expect(buf.length).toBe(4); - expect(mockFetch).toHaveBeenCalledTimes(2); - - // First call: to Jira, with Authorization. - const firstInit = mockFetch.mock.calls[0][1] as RequestInit; - expect((firstInit.headers as Record).Authorization).toMatch(/^Basic /); - expect(firstInit.redirect).toBe("manual"); - - // Second call: to the CDN, WITHOUT Authorization. - const secondInit = mockFetch.mock.calls[1][1] as RequestInit; - const secondHeaders = (secondInit.headers ?? {}) as Record; - expect(secondHeaders.Authorization).toBeUndefined(); - expect(mockFetch.mock.calls[1][0]).toBe(redirectUrl); - }); - - it("returns bytes directly on 200 (no redirect)", async () => { - mockFetch.mockResolvedValueOnce({ - ok: true, - status: 200, - statusText: "OK", - arrayBuffer: async () => new Uint8Array([1, 2, 3]).buffer, - }); - - const adapter = jiraAdapter(); - const buf = await adapter.downloadAttachment( - "https://test.atlassian.net/secure/attachment/att-1/data.bin", - ); - expect(Array.from(buf)).toEqual([1, 2, 3]); - expect(mockFetch).toHaveBeenCalledTimes(1); - }); - - it("throws on non-2xx, non-302 responses", async () => { - mockFetch.mockResolvedValueOnce({ - ok: false, - status: 500, - statusText: "Internal Server Error", - headers: { get: () => null }, - }); - - const adapter = jiraAdapter(); - await expect( - adapter.downloadAttachment("https://test.atlassian.net/secure/attachment/att-1/x"), - ).rejects.toThrow(/500/); - }); - }); -``` - -- [ ] **Step 2: Run the tests to verify they fail** - -Run: `npx vitest run src/adapters/issue-tracker/jira.test.ts` -Expected: FAIL — `adapter.downloadAttachment is not a function`. - -- [ ] **Step 3: Implement `downloadAttachment`** - -In `src/adapters/issue-tracker/jira.ts`, add the following method to the `JiraAdapter` class (place it right after `postComment`, before `searchTickets`): - -```ts - async downloadAttachment( - url: string, - opts: { timeoutMs?: number } = {}, - ): Promise { - const timeoutMs = opts.timeoutMs ?? 30_000; - const signal = AbortSignal.timeout(timeoutMs); - - // First request: authenticated, manual redirect handling. - const first = await fetch(url, { - method: "GET", - headers: { Authorization: this.authHeader }, - redirect: "manual", - signal, - }); - - if (first.status === 302 || first.status === 301) { - const location = first.headers.get("location"); - if (!location) { - throw new Error( - `Jira attachment redirect (${first.status}) missing Location header for ${url}`, - ); - } - // Re-fetch the signed CDN URL WITHOUT Authorization (its signature IS the auth). - // Use redirect: "follow" so CDN-internal redirects (e.g. S3 region redirects) work. - const second = await fetch(location, { - method: "GET", - redirect: "follow", - signal, - }); - if (!second.ok) { - throw new Error( - `Jira attachment CDN error: ${second.status} ${second.statusText} on ${location}`, - ); - } - return Buffer.from(await second.arrayBuffer()); - } - - if (!first.ok) { - throw new Error( - `Jira attachment error: ${first.status} ${first.statusText} on ${url}`, - ); - } - return Buffer.from(await first.arrayBuffer()); - } -``` - -- [ ] **Step 4: Run the tests to verify they pass** - -Run: `npx vitest run src/adapters/issue-tracker/jira.test.ts` -Expected: PASS — all three new cases pass, existing tests still pass. - -- [ ] **Step 5: Run typecheck** - -Run: `npm run typecheck` -Expected: exits 0. - ---- - -## Task 5: Create `src/sandbox/attachments.ts` scaffold and pure helpers - -**Files:** -- Create: `src/sandbox/attachments.ts` -- Create: `src/sandbox/attachments.test.ts` - -This task adds the pure utilities (`sanitizeFilename`, `formatBytes`, `formatAttachmentsIndex`) and the `DownloadedAttachment` / `AttachmentCaps` types. The retry loop comes in Task 6. - -- [ ] **Step 1: Write the failing tests** - -Create `src/sandbox/attachments.test.ts` with: - -```ts -import { describe, it, expect } from "vitest"; -import { - sanitizeFilename, - formatBytes, - formatAttachmentsIndex, - type DownloadedAttachment, -} from "./attachments.js"; - -describe("sanitizeFilename", () => { - it("preserves simple names", () => { - expect(sanitizeFilename("mockup.png", "att-1")).toBe("mockup.png"); - }); - - it("strips path separators", () => { - expect(sanitizeFilename("a/b/c.png", "att-1")).toBe("abc.png"); - expect(sanitizeFilename("a\\b\\c.png", "att-1")).toBe("abc.png"); - }); - - it("strips null bytes", () => { - expect(sanitizeFilename("a\u0000b.png", "att-1")).toBe("ab.png"); - }); - - it("strips leading dots (no hidden files)", () => { - expect(sanitizeFilename(".env", "att-1")).toBe("env"); - expect(sanitizeFilename("...weird", "att-1")).toBe("weird"); - }); - - it("falls back to attachment- when result is empty", () => { - expect(sanitizeFilename("", "att-9")).toBe("attachment-att-9"); - expect(sanitizeFilename("///", "att-9")).toBe("attachment-att-9"); - expect(sanitizeFilename("....", "att-9")).toBe("attachment-att-9"); - }); - - // Note: after stripping path separators and leading dots, ".pdf" becomes "pdf" - // (non-empty), so the fallback does NOT fire. This matches the spec's literal - // rules ("strip leading dots; fall back only if empty"). Documented explicitly - // so implementers don't get confused. - it("does NOT invoke fallback when stripping leaves a non-empty extension-only name", () => { - expect(sanitizeFilename(".pdf", "att-9")).toBe("pdf"); - expect(sanitizeFilename("/.png", "att-9")).toBe("png"); - }); -}); - -describe("formatBytes", () => { - it("formats bytes under 1KB as B", () => { - expect(formatBytes(0)).toBe("0 B"); - expect(formatBytes(512)).toBe("512 B"); - }); - - it("formats KB with no decimals for whole numbers", () => { - expect(formatBytes(1024)).toBe("1 KB"); - expect(formatBytes(2048)).toBe("2 KB"); - }); - - it("formats KB with one decimal for fractions", () => { - expect(formatBytes(1536)).toBe("1.5 KB"); - expect(formatBytes(348_192)).toBe("340 KB"); - }); - - it("formats MB with one decimal", () => { - expect(formatBytes(1_258_291)).toBe("1.2 MB"); - expect(formatBytes(10 * 1024 * 1024)).toBe("10 MB"); - }); -}); - -describe("formatAttachmentsIndex", () => { - const ok = (filename: string, mimeType: string, size: number): DownloadedAttachment => ({ - filename, - originalFilename: filename, - mimeType, - size, - content: Buffer.from([]), - }); - const fail = ( - filename: string, - reason: string, - attempts = 1, - ): DownloadedAttachment => ({ - filename, - originalFilename: filename, - mimeType: "application/octet-stream", - size: 0, - failed: { reason, attempts }, - }); - - it("returns empty string when no attachments", () => { - expect(formatAttachmentsIndex([])).toBe(""); - }); - - it("lists successful downloads with path and size", () => { - const out = formatAttachmentsIndex([ - ok("mockup.png", "image/png", 348_192), - ok("api-sample.json", "application/json", 2048), - ]); - expect(out).toContain("## Attachments"); - expect(out).toContain("/tmp/attachments/"); - expect(out).toContain("`/tmp/attachments/mockup.png` — image/png, 340 KB"); - expect(out).toContain("`/tmp/attachments/api-sample.json` — application/json, 2 KB"); - }); - - it("marks failed downloads with a warning prefix and reason", () => { - const out = formatAttachmentsIndex([ - fail("spec.pdf", "HTTP 500", 3), - ]); - expect(out).toContain("⚠️"); - expect(out).toContain("spec.pdf"); - expect(out).toContain("failed to download after 3 attempts"); - expect(out).toContain("HTTP 500"); - }); - - it("renders a mix of success and failure", () => { - const out = formatAttachmentsIndex([ - ok("mockup.png", "image/png", 340_000), - fail("broken.bin", "HTTP 404", 1), - ]); - expect(out).toContain("mockup.png"); - expect(out).toContain("⚠️"); - expect(out).toContain("broken.bin"); - }); - - it("renders the section even when all entries failed", () => { - const out = formatAttachmentsIndex([ - fail("a.pdf", "HTTP 500", 3), - fail("b.pdf", "HTTP 500", 3), - ]); - expect(out).toContain("## Attachments"); - expect(out.match(/⚠️/g)?.length).toBe(2); - }); -}); -``` - -- [ ] **Step 2: Run the tests to verify they fail** - -Run: `npx vitest run src/sandbox/attachments.test.ts` -Expected: FAIL — `Cannot find module './attachments.js'`. - -- [ ] **Step 3: Create `src/sandbox/attachments.ts` with the helpers** - -Create the file with: - -```ts -import type { TicketAttachment } from "../adapters/issue-tracker/types.js"; - -export interface DownloadedAttachment { - filename: string; - originalFilename: string; - mimeType: string; - size: number; - content?: Buffer; - failed?: { reason: string; attempts: number }; -} - -export interface AttachmentCaps { - maxFileSizeBytes: number; - maxTotalSizeBytes: number; - maxCount: number; - downloadTimeoutMs: number; -} - -export function sanitizeFilename(name: string, id: string): string { - // Strip path separators, null bytes, and leading dots (no hidden files). - const cleaned = (name ?? "") - .replace(/[\\/]/g, "") - .replace(/\u0000/g, "") - .replace(/^\.+/, ""); - - // Fallback to `attachment-{id}` only when the result is empty, per spec. - // An extension-only input like ".pdf" legitimately sanitizes to "pdf" and does - // NOT trigger the fallback. - return cleaned.length > 0 ? cleaned : `attachment-${id}`; -} - -export function formatBytes(n: number): string { - if (n < 1024) return `${n} B`; - const kb = n / 1024; - if (kb < 1024) { - return Number.isInteger(kb) ? `${kb} KB` : `${roundOne(kb)} KB`; - } - const mb = kb / 1024; - return Number.isInteger(mb) ? `${mb} MB` : `${roundOne(mb)} MB`; -} - -function roundOne(x: number): string { - // One decimal, but drop trailing ".0" (e.g. 340.0 -> "340"). - const rounded = Math.round(x * 10) / 10; - return Number.isInteger(rounded) ? String(rounded) : rounded.toFixed(1); -} - -export function formatAttachmentsIndex( - attachments: DownloadedAttachment[], -): string { - if (attachments.length === 0) return ""; - - const lines: string[] = []; - lines.push("## Attachments"); - lines.push(""); - lines.push( - "The following files from the Jira ticket are available in `/tmp/attachments/`.", - ); - lines.push("Read them when relevant to the task."); - lines.push(""); - - for (const a of attachments) { - if (a.failed) { - lines.push( - `- ⚠️ \`${a.originalFilename}\` — failed to download after ${a.failed.attempts} attempt${a.failed.attempts === 1 ? "" : "s"} (${a.failed.reason})`, - ); - } else { - lines.push( - `- \`/tmp/attachments/${a.filename}\` — ${a.mimeType}, ${formatBytes(a.size)}`, - ); - } - } - - return lines.join("\n"); -} - -// Placeholder export so the workflow step can import the name in Task 7; real -// implementation lands in Task 6. -export async function fetchAttachmentsWithRetry( - _downloader: { downloadAttachment(url: string, opts?: { timeoutMs?: number }): Promise }, - _attachments: TicketAttachment[], - _caps: AttachmentCaps, - _log: { info: (obj: unknown, msg?: string) => void; warn: (obj: unknown, msg?: string) => void }, -): Promise { - throw new Error("fetchAttachmentsWithRetry: not implemented (Task 6)"); -} -``` - -- [ ] **Step 4: Run the tests to verify they pass** - -Run: `npx vitest run src/sandbox/attachments.test.ts` -Expected: PASS — all 14 cases pass. - -- [ ] **Step 5: Run typecheck** - -Run: `npm run typecheck` -Expected: exits 0. - ---- - -## Task 6: Implement `fetchAttachmentsWithRetry` - -**Files:** -- Modify: `src/sandbox/attachments.ts` -- Modify: `src/sandbox/attachments.test.ts` - -Behavior (from the spec): -- Enforce caps **before** downloading (use metadata `size`): - - Count cap: skip any attachment whose index >= `maxCount`. - - Per-file cap: skip any whose `size > maxFileSizeBytes`. - - Total cap: track a running sum of bytes that will be downloaded; skip once `sum + size > maxTotalSizeBytes`. -- Retry loop per file: max 3 attempts, backoffs `500ms`, `2000ms`, `5000ms`. -- Retryable: network errors (`AbortError`, `ECONNRESET`, `ETIMEDOUT`), HTTP 5xx, HTTP 429. -- Non-retryable: other 4xx (message already contains the code from `downloadAttachment`'s thrown error). -- 429 is retried using the normal backoff schedule. The spec mentions honoring a `Retry-After` header capped at 10s; this plan deliberately defers header-aware backoff to a follow-up because the current `downloadAttachment` throws a plain `Error` that discards headers. Refactoring to a richer error type is out of scope for v1 — document the gap and move on. A 429 from Jira's attachment CDN is rare enough that the normal 500/2000ms backoff is adequate for v1. -- Collisions: if the sanitized filename already exists in the accumulator, append `-{id}` before the extension. -- Return an array in the **original input order**, with skipped entries included as `failed` entries (reason `"skipped: per-file size cap"`, `"skipped: total size cap"`, or `"skipped: count cap"`) and zero attempts. - -- [ ] **Step 1: Write the failing tests** - -Append to `src/sandbox/attachments.test.ts`: - -```ts -import { fetchAttachmentsWithRetry, type AttachmentCaps } from "./attachments.js"; -import type { TicketAttachment } from "../adapters/issue-tracker/types.js"; -import { vi } from "vitest"; - -const defaultCaps: AttachmentCaps = { - maxFileSizeBytes: 25 * 1024 * 1024, - maxTotalSizeBytes: 100 * 1024 * 1024, - maxCount: 20, - downloadTimeoutMs: 30_000, -}; - -function noopLogger() { - return { info: vi.fn(), warn: vi.fn() }; -} - -function meta( - id: string, - filename: string, - size: number, - mimeType = "application/octet-stream", -): TicketAttachment { - return { - id, - filename, - mimeType, - size, - contentUrl: `https://jira.example/attachment/${id}`, - }; -} - -describe("fetchAttachmentsWithRetry", () => { - it("downloads all attachments when under caps", async () => { - const downloader = { - downloadAttachment: vi.fn(async () => Buffer.from([1, 2, 3])), - }; - const out = await fetchAttachmentsWithRetry( - downloader, - [meta("1", "a.png", 3), meta("2", "b.png", 3)], - defaultCaps, - noopLogger(), - ); - expect(out).toHaveLength(2); - expect(out[0].content).toBeInstanceOf(Buffer); - expect(out[0].failed).toBeUndefined(); - expect(downloader.downloadAttachment).toHaveBeenCalledTimes(2); - }); - - it("skips attachments over per-file cap without downloading", async () => { - const downloader = { - downloadAttachment: vi.fn(async () => Buffer.from([])), - }; - const caps: AttachmentCaps = { ...defaultCaps, maxFileSizeBytes: 100 }; - const out = await fetchAttachmentsWithRetry( - downloader, - [meta("1", "small.bin", 50), meta("2", "big.bin", 10_000)], - caps, - noopLogger(), - ); - expect(out[0].content).toBeDefined(); - expect(out[1].failed?.reason).toMatch(/per-file size cap/); - expect(downloader.downloadAttachment).toHaveBeenCalledTimes(1); - }); - - it("stops downloading once total cap is exceeded", async () => { - const downloader = { - downloadAttachment: vi.fn(async () => Buffer.from([])), - }; - const caps: AttachmentCaps = { ...defaultCaps, maxTotalSizeBytes: 150 }; - const out = await fetchAttachmentsWithRetry( - downloader, - [ - meta("1", "a.bin", 100), - meta("2", "b.bin", 100), // 100+100 = 200 > 150 → skipped - meta("3", "c.bin", 40), // 100+40 = 140 ≤ 150 → would fit, but spec says - // "once exceeded, remaining attachments are skipped" - ], - caps, - noopLogger(), - ); - expect(out[0].failed).toBeUndefined(); - expect(out[1].failed?.reason).toMatch(/total size cap/); - expect(out[2].failed?.reason).toMatch(/total size cap/); - expect(downloader.downloadAttachment).toHaveBeenCalledTimes(1); - }); - - it("skips attachments beyond count cap", async () => { - const downloader = { - downloadAttachment: vi.fn(async () => Buffer.from([])), - }; - const caps: AttachmentCaps = { ...defaultCaps, maxCount: 2 }; - const out = await fetchAttachmentsWithRetry( - downloader, - [ - meta("1", "a.bin", 10), - meta("2", "b.bin", 10), - meta("3", "c.bin", 10), - ], - caps, - noopLogger(), - ); - expect(out).toHaveLength(3); - expect(out[0].failed).toBeUndefined(); - expect(out[1].failed).toBeUndefined(); - expect(out[2].failed?.reason).toMatch(/count cap/); - expect(downloader.downloadAttachment).toHaveBeenCalledTimes(2); - }); - - it("retries transient 5xx up to 3 times then marks failed", async () => { - const downloader = { - downloadAttachment: vi - .fn() - .mockRejectedValue(new Error("Jira attachment error: 500 Internal Server Error on url")), - }; - const out = await fetchAttachmentsWithRetry( - downloader, - [meta("1", "a.bin", 10)], - defaultCaps, - noopLogger(), - ); - expect(out[0].failed).toBeDefined(); - expect(out[0].failed?.attempts).toBe(3); - expect(downloader.downloadAttachment).toHaveBeenCalledTimes(3); - }); - - it("does not retry on 404", async () => { - const downloader = { - downloadAttachment: vi - .fn() - .mockRejectedValue(new Error("Jira attachment error: 404 Not Found on url")), - }; - const out = await fetchAttachmentsWithRetry( - downloader, - [meta("1", "a.bin", 10)], - defaultCaps, - noopLogger(), - ); - expect(out[0].failed).toBeDefined(); - expect(out[0].failed?.attempts).toBe(1); - expect(downloader.downloadAttachment).toHaveBeenCalledTimes(1); - }); - - it("succeeds on second attempt after transient failure", async () => { - const downloader = { - downloadAttachment: vi - .fn() - .mockRejectedValueOnce(new Error("Jira attachment error: 503 Service Unavailable on url")) - .mockResolvedValueOnce(Buffer.from([9])), - }; - const out = await fetchAttachmentsWithRetry( - downloader, - [meta("1", "a.bin", 1)], - defaultCaps, - noopLogger(), - ); - expect(out[0].content).toBeDefined(); - expect(out[0].failed).toBeUndefined(); - expect(downloader.downloadAttachment).toHaveBeenCalledTimes(2); - }); - - it("resolves collisions by appending -{id} before the extension", async () => { - const downloader = { - downloadAttachment: vi.fn(async () => Buffer.from([1])), - }; - const out = await fetchAttachmentsWithRetry( - downloader, - [meta("1", "report.pdf", 1), meta("2", "report.pdf", 1)], - defaultCaps, - noopLogger(), - ); - expect(out[0].filename).toBe("report.pdf"); - expect(out[1].filename).toBe("report-2.pdf"); - expect(out[1].originalFilename).toBe("report.pdf"); - }); - - it("retries on network abort errors", async () => { - const abortErr = Object.assign(new Error("The operation was aborted"), { - name: "AbortError", - }); - const downloader = { - downloadAttachment: vi - .fn() - .mockRejectedValueOnce(abortErr) - .mockResolvedValueOnce(Buffer.from([1])), - }; - const out = await fetchAttachmentsWithRetry( - downloader, - [meta("1", "a.bin", 1)], - defaultCaps, - noopLogger(), - ); - expect(out[0].content).toBeDefined(); - expect(downloader.downloadAttachment).toHaveBeenCalledTimes(2); - }); -}); -``` - -**Note:** the retry tests will take up to a few seconds because of the backoff. That's acceptable — a 500/2000/5000 ms schedule means the 3-attempt failure case waits ~7.5s. If this feels too slow in CI, see "Speeding up retry tests" below. - -- [ ] **Step 2: Run the tests to verify they fail** - -Run: `npx vitest run src/sandbox/attachments.test.ts` -Expected: FAIL — most `fetchAttachmentsWithRetry` tests throw "not implemented" from the placeholder. - -- [ ] **Step 3: Replace the placeholder with a real implementation** - -In `src/sandbox/attachments.ts`, replace the `fetchAttachmentsWithRetry` placeholder function (and anything after it) with: - -```ts -// MAX_ATTEMPTS = 3 means at most 2 sleeps between 3 tries. The spec phrases this -// as "500 → 2000 → 5000ms" but with only 3 attempts the 5000ms delay never fires -// (the 3rd failure exits the loop). We encode just the two delays that actually -// run to avoid confusing dead-code. -const MAX_ATTEMPTS = 3; -const BACKOFFS_MS = [500, 2000]; - -interface Downloader { - downloadAttachment(url: string, opts?: { timeoutMs?: number }): Promise; -} - -interface AttachmentsLogger { - info: (obj: unknown, msg?: string) => void; - warn: (obj: unknown, msg?: string) => void; -} - -export async function fetchAttachmentsWithRetry( - downloader: Downloader, - attachments: TicketAttachment[], - caps: AttachmentCaps, - log: AttachmentsLogger, -): Promise { - const result: DownloadedAttachment[] = []; - const usedFilenames = new Set(); - let bytesCommitted = 0; - let totalCapTripped = false; - - for (let i = 0; i < attachments.length; i++) { - const att = attachments[i]; - - // Cap: count - if (i >= caps.maxCount) { - result.push(skip(att, "skipped: count cap", log)); - continue; - } - // Cap: per-file size - if (att.size > caps.maxFileSizeBytes) { - result.push(skip(att, "skipped: per-file size cap", log)); - continue; - } - // Cap: total size — once exceeded, all remaining are skipped. - if (totalCapTripped || bytesCommitted + att.size > caps.maxTotalSizeBytes) { - totalCapTripped = true; - result.push(skip(att, "skipped: total size cap", log)); - continue; - } - - const safeName = resolveFilename(att, usedFilenames); - usedFilenames.add(safeName); - - let attempts = 0; - let lastError: Error | undefined; - while (attempts < MAX_ATTEMPTS) { - attempts++; - try { - const content = await downloader.downloadAttachment(att.contentUrl, { - timeoutMs: caps.downloadTimeoutMs, - }); - bytesCommitted += att.size; - log.info( - { - filename: safeName, - originalFilename: att.filename, - mimeType: att.mimeType, - size: att.size, - attempts, - }, - "attachment downloaded", - ); - result.push({ - filename: safeName, - originalFilename: att.filename, - mimeType: att.mimeType, - size: att.size, - content, - }); - lastError = undefined; - break; - } catch (err) { - lastError = err as Error; - if (!isRetryable(lastError) || attempts >= MAX_ATTEMPTS) break; - const delay = Math.min(BACKOFFS_MS[attempts - 1] ?? 5000, 10_000); - await new Promise((r) => setTimeout(r, delay)); - } - } - - if (lastError) { - log.warn( - { - filename: att.filename, - reason: lastError.message, - attempts, - }, - "attachment failed", - ); - result.push({ - filename: safeName, - originalFilename: att.filename, - mimeType: att.mimeType, - size: att.size, - failed: { reason: shortReason(lastError.message), attempts }, - }); - } - } - - return result; -} - -function skip( - att: TicketAttachment, - reason: string, - log: AttachmentsLogger, -): DownloadedAttachment { - log.warn({ filename: att.filename, reason }, "attachment skipped"); - return { - filename: sanitizeFilename(att.filename, att.id), - originalFilename: att.filename, - mimeType: att.mimeType, - size: att.size, - failed: { reason, attempts: 0 }, - }; -} - -function resolveFilename( - att: TicketAttachment, - used: Set, -): string { - const safe = sanitizeFilename(att.filename, att.id); - if (!used.has(safe)) return safe; - const dot = safe.lastIndexOf("."); - if (dot <= 0) return `${safe}-${att.id}`; - return `${safe.slice(0, dot)}-${att.id}${safe.slice(dot)}`; -} - -function isRetryable(err: Error): boolean { - const msg = err.message ?? ""; - if (err.name === "AbortError") return true; - if (/ECONNRESET|ETIMEDOUT|ENETUNREACH|EAI_AGAIN/i.test(msg)) return true; - if (/\b5\d\d\b/.test(msg)) return true; - if (/\b429\b/.test(msg)) return true; - return false; -} - -function shortReason(msg: string): string { - // Strip the URL from thrown messages for cleaner index output. - const m = msg.match(/\b(\d{3})\b(.*?)(?: on https?:\/\/.*)?$/); - if (m) return `HTTP ${m[1]}${m[2] ?? ""}`.trim(); - return msg; -} -``` - -**Speeding up retry tests:** if the 500ms/2000ms/5000ms backoffs make Vitest slow enough to annoy you, add an env-gated override at the top of the retry loop: - -```ts -const backoffMultiplier = process.env.ATTACHMENTS_TEST_FAST_RETRY === "1" ? 0 : 1; -// ... -const delay = Math.min((BACKOFFS_MS[attempts - 1] ?? 5000) * backoffMultiplier, 10_000); -``` - -Then run retry tests with `ATTACHMENTS_TEST_FAST_RETRY=1 npx vitest run src/sandbox/attachments.test.ts`. This is optional — if you don't add it, the failing-retry test takes ~7.5s, which is fine. - -- [ ] **Step 4: Run the tests to verify they pass** - -Run: `npx vitest run src/sandbox/attachments.test.ts` -Expected: PASS — all 9 `fetchAttachmentsWithRetry` tests plus the earlier helper tests pass. - -- [ ] **Step 5: Run typecheck** - -Run: `npm run typecheck` -Expected: exits 0. - ---- - -## Task 7: Thread attachments through `assembleXContext` - -**Files:** -- Modify: `src/sandbox/context.ts` -- Modify: `src/sandbox/context.test.ts` - -Each context function gains an optional `attachments?: DownloadedAttachment[]` field on its input interface. When present and non-empty, the `## Attachments` section is inserted **after** the ticket identifier/title block and **before** `## Description` (for research) or `## Acceptance Criteria` (for the other three, which have no description). - -- [ ] **Step 1: Write the failing tests** - -Replace the existing `describe("assembleResearchPlanContext", ...)` block (and add new cases in other `describe`s) in `src/sandbox/context.test.ts` so that it also exercises attachments. Add the following test cases at the end of each of the four `describe` blocks: - -```ts - it("renders attachments index when attachments are provided", () => { - const result = assembleResearchPlanContext({ - ticket: { - identifier: "TEST-3", - title: "With files", - description: "desc", - acceptanceCriteria: "ac", - comments: [], - }, - prompt: "prompt", - branchName: "blazebot/test-3", - attachments: [ - { - filename: "mockup.png", - originalFilename: "mockup.png", - mimeType: "image/png", - size: 348_192, - content: Buffer.from([]), - }, - ], - }); - expect(result).toContain("## Attachments"); - expect(result).toContain("/tmp/attachments/mockup.png"); - expect(result).toContain("image/png"); - - // Attachments section appears before Description - const atIdx = result.indexOf("## Attachments"); - const descIdx = result.indexOf("## Description"); - expect(atIdx).toBeGreaterThan(-1); - expect(descIdx).toBeGreaterThan(atIdx); - }); - - it("omits attachments section when list is empty or absent", () => { - const withoutField = assembleResearchPlanContext({ - ticket: { identifier: "X", title: "t", description: "d", acceptanceCriteria: "a", comments: [] }, - prompt: "p", - branchName: "b", - }); - expect(withoutField).not.toContain("## Attachments"); - - const withEmpty = assembleResearchPlanContext({ - ticket: { identifier: "X", title: "t", description: "d", acceptanceCriteria: "a", comments: [] }, - prompt: "p", - branchName: "b", - attachments: [], - }); - expect(withEmpty).not.toContain("## Attachments"); - }); - - it("shows failed attachments in the index even when no bytes downloaded", () => { - const result = assembleResearchPlanContext({ - ticket: { identifier: "X", title: "t", description: "d", acceptanceCriteria: "a", comments: [] }, - prompt: "p", - branchName: "b", - attachments: [ - { - filename: "spec.pdf", - originalFilename: "spec.pdf", - mimeType: "application/pdf", - size: 0, - failed: { reason: "HTTP 500", attempts: 3 }, - }, - ], - }); - expect(result).toContain("## Attachments"); - expect(result).toContain("⚠️"); - expect(result).toContain("spec.pdf"); - }); -``` - -Add the same three tests (adapted) to the `describe("assembleImplementationContext ...")`, `describe("assembleImplementationRetryContext ...")`, and `describe("assembleReviewContext ...")` blocks. In the "before Description" ordering check for the three non-research functions, replace `## Description` with `## Acceptance Criteria`: - -```ts - const atIdx = result.indexOf("## Attachments"); - const acIdx = result.indexOf("## Acceptance Criteria"); - expect(atIdx).toBeGreaterThan(-1); - expect(acIdx).toBeGreaterThan(atIdx); -``` - -For `assembleImplementationRetryContext` and `assembleReviewContext`, include the required extra inputs (`researchPlanMarkdown`, `reviewFeedback`, `gitDiff`) as in the existing tests. - -- [ ] **Step 2: Run the tests to verify they fail** - -Run: `npx vitest run src/sandbox/context.test.ts` -Expected: FAIL — the `attachments` field is rejected by TypeScript on the four context input interfaces, and the assertions fail because no `## Attachments` text exists. - -- [ ] **Step 3: Update `src/sandbox/context.ts`** - -At the top of the file, add: - -```ts -import type { DownloadedAttachment } from "./attachments.js"; -import { formatAttachmentsIndex } from "./attachments.js"; -``` - -Add `attachments?: DownloadedAttachment[]` to each of the four input interfaces: - -```ts -export interface ResearchPlanContextInput { - ticket: TicketData; - prompt: string; - branchName: string; - prComments?: PRComment[]; - checkResults?: CheckRunResult[]; - hasConflicts?: boolean; - attachments?: DownloadedAttachment[]; -} - -export interface ImplementationContextInput { - ticket: TicketData; - prompt: string; - researchPlanMarkdown: string; - attachments?: DownloadedAttachment[]; -} - -export interface ImplementationRetryContextInput { - ticket: TicketData; - prompt: string; - researchPlanMarkdown: string; - reviewFeedback: ReviewOutput; - attachments?: DownloadedAttachment[]; -} - -export interface ReviewContextInput { - ticket: TicketData; - prompt: string; - researchPlanMarkdown: string; - gitDiff: string; - attachments?: DownloadedAttachment[]; -} -``` - -Then rewrite each of the four `assembleXContext` functions to insert the attachments index between the Ticket header and the next section. Use this pattern (apply to all four — shown for research here; adapt the specific sections for the other three): - -For `assembleResearchPlanContext`, replace the implementation with: - -```ts -export function assembleResearchPlanContext(input: ResearchPlanContextInput): string { - const { ticket, prompt, branchName, prComments, checkResults, hasConflicts, attachments } = input; - const attachmentsSection = renderAttachmentsSection(attachments); - - let md = `# Requirements - -## Ticket ID - -${ticket.identifier} - -## Ticket - -${ticket.title} -${attachmentsSection} -## Description - -${ticket.description} - -## Acceptance Criteria - -${ticket.acceptanceCriteria || "None specified."} - -## Comments - -${formatComments(ticket.comments)} - -## Branch - -${branchName} -`; - - if (prComments && prComments.length > 0) { - md += `\n## PR Review Feedback\n\n${formatPRComments(prComments)}\n`; - } - if (checkResults && checkResults.length > 0) { - md += `\n## CI/CD Check Results\n\n${formatCheckResults(checkResults)}\n`; - } - if (hasConflicts) { - md += `\n## Merge Conflicts\n\nThis PR has merge conflicts. The base branch has already been merged — the repo is in a MERGING state with conflict markers in the affected files. Resolve the markers, \`git add\` the files, and run \`git merge --continue\`.\n`; - } - - md += `\n---\n\n${prompt}\n`; - return md; -} -``` - -For `assembleImplementationContext`: - -```ts -export function assembleImplementationContext(input: ImplementationContextInput): string { - const { ticket, prompt, researchPlanMarkdown, attachments } = input; - const attachmentsSection = renderAttachmentsSection(attachments); - return `# Requirements - -## Ticket ID - -${ticket.identifier} - -## Ticket - -${ticket.title} -${attachmentsSection} -## Acceptance Criteria - -${ticket.acceptanceCriteria || "None specified."} - -## Research & Plan - -${researchPlanMarkdown} - ---- - -${prompt} -`; -} -``` - -For `assembleImplementationRetryContext`: - -```ts -export function assembleImplementationRetryContext(input: ImplementationRetryContextInput): string { - const { ticket, prompt, researchPlanMarkdown, reviewFeedback, attachments } = input; - const attachmentsSection = renderAttachmentsSection(attachments); - return `# Requirements - -## Ticket ID - -${ticket.identifier} - -## Ticket - -${ticket.title} -${attachmentsSection} -## Acceptance Criteria - -${ticket.acceptanceCriteria || "None specified."} - -## Research & Plan - -${researchPlanMarkdown} - -## Review Feedback - -${reviewFeedback.feedback} - -### Issues - -${formatReviewIssues(reviewFeedback.issues)} - ---- - -${prompt} -`; -} -``` - -For `assembleReviewContext`: - -```ts -export function assembleReviewContext(input: ReviewContextInput): string { - const { ticket, prompt, researchPlanMarkdown, gitDiff, attachments } = input; - const attachmentsSection = renderAttachmentsSection(attachments); - return `# Requirements - -## Ticket ID - -${ticket.identifier} - -## Ticket - -${ticket.title} -${attachmentsSection} -## Acceptance Criteria - -${ticket.acceptanceCriteria || "None specified."} - -## Research & Plan - -${researchPlanMarkdown} - -## Git Diff - -\`\`\`diff -${gitDiff} -\`\`\` - ---- - -${prompt} -`; -} -``` - -Finally, add this private helper at the bottom of the file (below the other `format*` helpers): - -```ts -function renderAttachmentsSection( - attachments: DownloadedAttachment[] | undefined, -): string { - if (!attachments || attachments.length === 0) return ""; - return `\n${formatAttachmentsIndex(attachments)}\n`; -} -``` - -- [ ] **Step 4: Run the tests to verify they pass** - -Run: `npx vitest run src/sandbox/context.test.ts` -Expected: PASS — all existing tests still pass, plus the four new sets of attachments tests. - -- [ ] **Step 5: Run typecheck** - -Run: `npm run typecheck` -Expected: exits 0. - ---- - -## Task 8: Wire workflow steps `fetchAttachments` and `writeAttachments` - -**Files:** -- Modify: `src/workflows/agent.ts` - -This is the integration task. Two new `"use step"` functions run in sequence between `provisionSandbox` (line ~256) and the Phase 1 research block. The downloaded-attachments array is captured in workflow-local state and passed to all four `assembleXContext` calls (research, impl, impl retry, review). - -Key points: -- `fetchAndValidateTicket` already returns a `ticket`; after Task 2 its shape includes `attachments: TicketAttachment[]`. Forward that array into the new step. -- `fetchAttachments` is pure-ish (HTTP + retry) and should **not** throw on per-file failures — that's the spec contract. Set `fetchAttachments.maxRetries = 0` so WDK doesn't re-run the whole step on a partial failure that was already handled. -- `writeAttachments` writes to the shared sandbox. Set `writeAttachments.maxRetries = 0` like `writeAndStartPhase` does; a persistent failure is a real sandbox issue and should fail fast. -- `DownloadedAttachment` includes `Buffer` — Buffers serialize across WDK step boundaries fine (they become `Uint8Array` under the hood; `sandbox.writeFiles` accepts `Buffer`). If any issue surfaces in practice, swap to `{ path, content }` tuples at the step boundary. - -- [ ] **Step 1: Add the `fetchAttachments` step function** - -In `src/workflows/agent.ts`, immediately after the `fetchAndValidateTicket` function (around line 16), insert: - -```ts -async function fetchAttachments( - ticketIdentifier: string, - attachments: Array<{ - id: string; - filename: string; - mimeType: string; - size: number; - contentUrl: string; - }>, -) { - "use step"; - const { logger } = await import("../lib/logger.js"); - const log = logger.child({ ticket_identifier: ticketIdentifier, step: "fetchAttachments" }); - log.info({ count: attachments.length }, "fetchAttachments: start"); - - if (attachments.length === 0) { - log.info({}, "fetchAttachments: no attachments"); - return []; - } - - const { env } = await import("../../env.js"); - const { createStepAdapters } = await import("../lib/step-adapters.js"); - const { fetchAttachmentsWithRetry } = await import("../sandbox/attachments.js"); - const { issueTracker } = createStepAdapters(); - - // The JiraAdapter exposes downloadAttachment. Other issue-tracker adapters don't - // (yet), so guard it here — if we ever add more, the workflow will just skip - // attachments for trackers without a downloader. - const downloader = issueTracker as unknown as { - downloadAttachment?: (url: string, opts?: { timeoutMs?: number }) => Promise; - }; - if (typeof downloader.downloadAttachment !== "function") { - log.warn( - { tracker: issueTracker.constructor.name }, - "issue tracker does not support attachment downloads; skipping", - ); - return []; - } - - const result = await fetchAttachmentsWithRetry( - downloader as { downloadAttachment: (url: string, opts?: { timeoutMs?: number }) => Promise }, - attachments, - { - maxFileSizeBytes: env.ATTACHMENT_MAX_FILE_SIZE_MB * 1024 * 1024, - maxTotalSizeBytes: env.ATTACHMENT_MAX_TOTAL_SIZE_MB * 1024 * 1024, - maxCount: env.ATTACHMENT_MAX_COUNT, - downloadTimeoutMs: env.ATTACHMENT_DOWNLOAD_TIMEOUT_MS, - }, - log, - ); - log.info( - { - succeeded: result.filter((a) => !a.failed).length, - failed: result.filter((a) => a.failed).length, - }, - "fetchAttachments: done", - ); - return result; -} -fetchAttachments.maxRetries = 0; -``` - -- [ ] **Step 2: Add the `writeAttachments` step function** - -Immediately after `fetchAttachments`, add: - -```ts -async function writeAttachments( - sandboxId: string, - attachments: Array<{ - filename: string; - originalFilename: string; - mimeType: string; - size: number; - content?: Buffer | Uint8Array; - failed?: { reason: string; attempts: number }; - }>, -): Promise { - "use step"; - const { logger } = await import("../lib/logger.js"); - const log = logger.child({ sandboxId, step: "writeAttachments" }); - - const toWrite = attachments.filter((a) => a.content && !a.failed); - log.info( - { count: toWrite.length, totalReceived: attachments.length }, - "writeAttachments: start", - ); - if (toWrite.length === 0) { - log.info({}, "writeAttachments: nothing to write"); - return; - } - - const { Sandbox } = await import("@vercel/sandbox"); - const { getSandboxCredentials } = await import("../sandbox/credentials.js"); - - const sandbox = await Sandbox.get({ sandboxId, ...getSandboxCredentials() }); - - // Ensure target directory exists — writeFiles does not guarantee mkdir -p semantics. - await sandbox.runCommand("mkdir", ["-p", "/tmp/attachments"]); - - await sandbox.writeFiles( - toWrite.map((a) => ({ - path: `/tmp/attachments/${a.filename}`, - content: Buffer.isBuffer(a.content) - ? (a.content as Buffer) - : Buffer.from(a.content as Uint8Array), - })), - ); - log.info({ count: toWrite.length }, "writeAttachments: done"); -} -writeAttachments.maxRetries = 0; -``` - -- [ ] **Step 3: Call the two new steps in `agentWorkflow`** - -The spec's architecture diagram places `fetchAttachments` at workflow start (before `createFeatureBranch` / `provisionSandbox`) and `writeAttachments` after `provisionSandbox`. Placement matters: - -- Download **before** `provisionSandbox` so a slow/partial-failure download doesn't burn sandbox CPU hours while idle. -- Write **inside** the `try { ... }` block so a thrown `writeAttachments` always routes through `finally { teardownSandbox }` — no leaked sandbox. - -Inside `agentWorkflow`, locate this block (in the top half of the `try { ... }` outer body, after `createFeatureBranch` handling): - -```ts - const mergeBase = prContext?.hasConflicts ? baseBranch : undefined; - - // Provision sandbox once for all phases - const sandboxId = await provisionSandbox(branchName, mergeBase); - - try { -``` - -Insert the `fetchAttachments` call **immediately before** `const sandboxId = await provisionSandbox(...)`: - -```ts - const downloadedAttachments = await fetchAttachments(ticket.identifier, ticket.attachments); -``` - -So that block becomes: - -```ts - const mergeBase = prContext?.hasConflicts ? baseBranch : undefined; - - const downloadedAttachments = await fetchAttachments(ticket.identifier, ticket.attachments); - - // Provision sandbox once for all phases - const sandboxId = await provisionSandbox(branchName, mergeBase); - - try { -``` - -Then, as the **first** action inside `try { ... }` (above `// ========== PHASE 1: Research & Plan ==========`), insert: - -```ts - await writeAttachments(sandboxId, downloadedAttachments); -``` - -- [ ] **Step 4: Forward attachments to every `assembleXContext` call** - -Locate the four `assembleXContext` calls in `agentWorkflow` and add an `attachments: downloadedAttachments` property to each. - -Research (around line 270): - -```ts - const researchInput = assembleResearchPlanContext({ - ticket: ticketData, - prompt: getPrompt("research-plan.md"), - branchName, - prComments: prContext?.prComments, - checkResults: prContext?.checkResults, - hasConflicts: prContext?.hasConflicts, - attachments: downloadedAttachments, - }); -``` - -Implementation retry / first (around line 336): - -```ts - const implInput = lastReviewFeedback - ? assembleImplementationRetryContext({ - ticket: ticketData, - prompt: getPrompt("implement.md"), - researchPlanMarkdown, - reviewFeedback: lastReviewFeedback, - attachments: downloadedAttachments, - }) - : assembleImplementationContext({ - ticket: ticketData, - prompt: getPrompt("implement.md"), - researchPlanMarkdown, - attachments: downloadedAttachments, - }); -``` - -Review (around line 400): - -```ts - const reviewInput = assembleReviewContext({ - ticket: ticketData, - prompt: getPrompt("review.md"), - researchPlanMarkdown, - gitDiff, - attachments: downloadedAttachments, - }); -``` - -- [ ] **Step 5: Run typecheck** - -Run: `npm run typecheck` -Expected: exits 0. The `ticket.attachments` field is now required on `TicketContent`, so ensure any place that constructs a `TicketContent` (primarily the Jira adapter and fixtures) already includes it. If typecheck flags other call sites, fix them — most likely a test fixture that builds `TicketContent` manually. - -- [ ] **Step 6: Run the full unit test suite** - -Run: `npm run test` -Expected: all suites pass. Pay special attention to `src/adapters/issue-tracker/jira.test.ts`, `src/sandbox/context.test.ts`, and `src/sandbox/attachments.test.ts`. - ---- - -## Task 9: End-to-end integration sanity check (mocked sandbox) - -**Files:** -- Create: `src/sandbox/attachments.integration.test.ts` - -This is a light integration test that proves the three moving pieces line up: Jira mock → `fetchAttachmentsWithRetry` → a fake sandbox's `writeFiles`. It does **not** spin up a real sandbox or a real workflow — it exercises the shapes. - -- [ ] **Step 1: Write the test** - -Create `src/sandbox/attachments.integration.test.ts`: - -```ts -import { describe, it, expect, vi } from "vitest"; -import { fetchAttachmentsWithRetry, type AttachmentCaps } from "./attachments.js"; -import type { TicketAttachment } from "../adapters/issue-tracker/types.js"; - -describe("attachments → sandbox writeFiles shape", () => { - it("produces writeFiles payloads at /tmp/attachments/", async () => { - const downloader = { - downloadAttachment: vi - .fn() - .mockResolvedValueOnce(Buffer.from([0x89, 0x50, 0x4e, 0x47])) - .mockResolvedValueOnce(Buffer.from("{\"ok\":true}")), - }; - - const attachments: TicketAttachment[] = [ - { - id: "1", - filename: "mockup.png", - mimeType: "image/png", - size: 4, - contentUrl: "https://jira.example/1", - }, - { - id: "2", - filename: "sample.json", - mimeType: "application/json", - size: 11, - contentUrl: "https://jira.example/2", - }, - ]; - - const caps: AttachmentCaps = { - maxFileSizeBytes: 1_000_000, - maxTotalSizeBytes: 10_000_000, - maxCount: 10, - downloadTimeoutMs: 5_000, - }; - - const downloaded = await fetchAttachmentsWithRetry( - downloader, - attachments, - caps, - { info: vi.fn(), warn: vi.fn() }, - ); - - // Simulate the writeAttachments step's payload mapping. - const payload = downloaded - .filter((a) => a.content && !a.failed) - .map((a) => ({ - path: `/tmp/attachments/${a.filename}`, - content: a.content!, - })); - - expect(payload).toHaveLength(2); - expect(payload[0].path).toBe("/tmp/attachments/mockup.png"); - expect(payload[0].content).toBeInstanceOf(Buffer); - expect(payload[1].path).toBe("/tmp/attachments/sample.json"); - }); -}); -``` - -- [ ] **Step 2: Run the integration test** - -Run: `npx vitest run src/sandbox/attachments.integration.test.ts` -Expected: PASS. - ---- - -## Task 10: Final verification pass - -**Files:** none (verification only) - -- [ ] **Step 1: Run the full test suite** - -Run: `npm run test` -Expected: all tests pass. - -- [ ] **Step 2: Run typecheck** - -Run: `npm run typecheck` -Expected: exits 0. - -- [ ] **Step 3: Spot-check the generated requirements.md shape manually** - -Run: -```bash -node --input-type=module -e " -import { assembleResearchPlanContext } from './src/sandbox/context.ts'; -console.log(assembleResearchPlanContext({ - ticket: { - identifier: 'TEST-1', - title: 'Example', - description: 'desc', - acceptanceCriteria: 'ac', - comments: [], - }, - prompt: 'PROMPT', - branchName: 'blazebot/test-1', - attachments: [ - { filename: 'mockup.png', originalFilename: 'mockup.png', mimeType: 'image/png', size: 348192, content: Buffer.from([]) }, - { filename: 'spec.pdf', originalFilename: 'spec.pdf', mimeType: 'application/pdf', size: 0, failed: { reason: 'HTTP 500', attempts: 3 } }, - ], -})); -" -``` -Expected: output contains: -- A `## Ticket ID` header, followed shortly after by -- A `## Attachments` header -- A line like ``- `/tmp/attachments/mockup.png` — image/png, 340 KB`` -- A line like `- ⚠️ \`spec.pdf\` — failed to download after 3 attempts (HTTP 500)` -- `## Description` appearing **after** `## Attachments` - -- [ ] **Step 4: Review workflow wiring** - -Re-read `src/workflows/agent.ts` and confirm: -- `fetchAttachments(ticket.identifier, ticket.attachments)` is called **before** `provisionSandbox` (per spec architecture). -- `writeAttachments(sandboxId, downloadedAttachments)` is the **first** statement inside `try {`. -- All four `assembleXContext(...)` invocations pass `attachments: downloadedAttachments`. -- Neither step forwards errors that would crash the workflow — `fetchAttachments` always returns an array, `writeAttachments` no-ops when there is nothing to write. - ---- - -## Notes for the implementer - -- **Do not** add `.gitignore` entries for `/tmp/attachments/` — the files live outside the cloned repo and are never staged by `git`. The existing `requirements.md` convention at `/tmp/research-requirements.md` etc. already relies on this. -- **Do not** attempt to follow URLs embedded in the ticket description or comments. That is explicitly out-of-scope (SSRF risk, auth friction). The agent can decide to fetch external URLs itself inside the sandbox if needed. -- **Do not** add attachment counts to Slack messages — v1 keeps Slack unchanged. -- **Do not** dedup by content hash — v1 keeps one-off delivery per ticket. -- If you discover a call site that constructs a `TicketContent` literal (most likely a test fixture) and typecheck complains after Task 2, add `attachments: []` to it. Do not widen the type to make `attachments` optional — the spec is explicit that it is always present, even when empty. -- The `shortReason` regex in Task 6 is intentionally simple; if you find messages it can't parse cleanly, just return the raw message. The index is for the agent, not for humans. diff --git a/docs/superpowers/plans/2026-04-21-arthur-tracer-in-sandbox.md b/docs/superpowers/plans/2026-04-21-arthur-tracer-in-sandbox.md deleted file mode 100644 index a6b2865..0000000 --- a/docs/superpowers/plans/2026-04-21-arthur-tracer-in-sandbox.md +++ /dev/null @@ -1,773 +0,0 @@ -# Arthur Tracer In Sandbox Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Install the Arthur AI Engine Claude Code tracer inside every Vercel Sandbox the workflow provisions, so every in-sandbox Claude Code turn emits OpenInference spans to a configured Arthur instance. Credentials are optional — if any of the three Arthur env vars is missing, provisioning behaves exactly as today. - -**Architecture:** Bundle `arthur-engine/integrations/claude-code/claude_code_tracer.py` as a base64 string in a generated TS file so Nitro reliably includes it in the Vercel deployment. Extend `SandboxConfig` with an optional `arthur` block. In `SandboxManager.provision()`, after Claude Code is installed, pip-install two `opentelemetry` packages, write the tracer to `$HOME/.claude/hooks/claude_code_tracer.py`, write `$HOME/.claude/arthur_config.json`, and merge Arthur's five hook entries (`UserPromptSubmit`, `PreToolUse`, `PostToolUse`, `PostToolUseFailure`, `Stop`) into `$HOME/.claude/settings.json`. Centralise every write to `settings.json` in a single merge-aware helper so `configureStopHookInSandbox` no longer clobbers Arthur's hooks when it toggles the commit-guard Stop entry. - -**Tech Stack:** TypeScript, Vitest, `@vercel/sandbox` (`writeFiles`, `runCommand`), `@t3-oss/env-core` + Zod, Python 3 + pip (runs inside sandbox), Node 24 (runs inside sandbox for JSON merges). - ---- - -## File Structure - -| File | Action | Responsibility | -|------|--------|---------------| -| `env.ts` | **Modify** | Three optional server vars: `GENAI_ENGINE_API_KEY`, `GENAI_ENGINE_TASK_ID`, `GENAI_ENGINE_TRACE_ENDPOINT`. | -| `scripts/build-arthur-tracer.mjs` | **Create** | Generates `src/sandbox/arthur-tracer.ts` from `../arthur-engine/integrations/claude-code/claude_code_tracer.py`. | -| `src/sandbox/arthur-tracer.ts` | **Create (generated, checked in)** | Exports `ARTHUR_TRACER_PY_BASE64: string`. Regenerated via `pnpm build:arthur-tracer`. | -| `package.json` | **Modify** | Add script `"build:arthur-tracer": "node scripts/build-arthur-tracer.mjs"`. | -| `src/sandbox/manager.ts` | **Modify** | Extend `SandboxConfig.arthur`; add `installArthurTracer(sandbox, arthur)`; replace the heredoc writers in `configureStopHookInSandbox` with a single merge-aware helper `writeClaudeSettings(sandbox, opts)`. Call `installArthurTracer` from `provision()` after `installGlobalSkills`. | -| `src/sandbox/manager.test.ts` | **Modify** | Rewrite the two stop-hook tests to assert the new `node -e` merge call; add three Arthur tests (installs when configured, skipped when not, registers all five hook commands). | -| `src/workflows/agent.ts` | **Modify** | Build `arthur` config block from env once; pass into `SandboxManager`. `configureStopHook` signature unchanged. | -| `.gitignore` | **Modify** | No change — `src/sandbox/arthur-tracer.ts` is checked in. | - -No changes to `arthur-engine/` (read-only source), VCS adapters, run registry, Slack, cron. - ---- - -## Shared Types (referenced by multiple tasks) - -Defined in Task 3, reproduced here so later tasks don't have to repeat the shape: - -```ts -// src/sandbox/manager.ts -export interface ArthurConfig { - apiKey: string; // GENAI_ENGINE_API_KEY - taskId: string; // GENAI_ENGINE_TASK_ID (UUID) - endpoint: string; // GENAI_ENGINE_TRACE_ENDPOINT (full URL incl. /api/v1/traces) -} - -export interface SandboxConfig { - // ...existing fields unchanged... - arthur?: ArthurConfig; -} -``` - ---- - -## Task 1: Add Arthur env vars - -**Files:** -- Modify: `env.ts` - -- [ ] **Step 1: Add the three new optional vars to the `server` block** - -In `env.ts`, inside the `server: { ... }` object in `createEnv(...)`, add the following entries directly after the `// Agent` group (after `COMMIT_EMAIL`, before `// Sandbox`): - -```ts - // Arthur AI Engine (optional — all three required together) - GENAI_ENGINE_API_KEY: z.string().min(1).optional(), - GENAI_ENGINE_TASK_ID: z.string().uuid().optional(), - GENAI_ENGINE_TRACE_ENDPOINT: z.string().url().optional(), -``` - -No cross-field validation needed in the `createEnv` call — the "either all three or none" rule is enforced at the only use site (`src/workflows/agent.ts`, see Task 6). - -- [ ] **Step 2: Typecheck** - -Run: `pnpm typecheck` -Expected: PASS (the optional fields won't break anything). - -- [ ] **Step 3: Commit** - -```bash -git add env.ts -git commit -m "feat(env): add optional Arthur AI Engine env vars" -``` - ---- - -## Task 2: Build script for the tracer bundle - -**Files:** -- Create: `scripts/build-arthur-tracer.mjs` -- Modify: `package.json` - -Nitro's Vercel preset does not reliably bundle arbitrary `.py` files that sit next to `.ts` sources, so we embed the Python tracer as a base64 string in a generated TS file that Nitro will treat as source. The build script is run manually (and can be re-run whenever Arthur's tracer is updated upstream). - -- [ ] **Step 1: Write the build script** - -Create `scripts/build-arthur-tracer.mjs` with this exact content: - -```js -#!/usr/bin/env node -// Generates src/sandbox/arthur-tracer.ts from the Arthur Engine tracer source. -// Regenerate whenever arthur-engine/integrations/claude-code/claude_code_tracer.py changes. -import fs from "node:fs"; -import path from "node:path"; -import { fileURLToPath } from "node:url"; - -const __dirname = path.dirname(fileURLToPath(import.meta.url)); -const repoRoot = path.resolve(__dirname, ".."); -const defaultSource = path.resolve( - repoRoot, - "..", - "arthur-engine", - "integrations", - "claude-code", - "claude_code_tracer.py", -); -const sourcePath = process.env.ARTHUR_TRACER_SRC - ? path.resolve(process.env.ARTHUR_TRACER_SRC) - : defaultSource; - -if (!fs.existsSync(sourcePath)) { - console.error(`Arthur tracer not found at ${sourcePath}.`); - console.error("Set ARTHUR_TRACER_SRC to override."); - process.exit(1); -} - -const bytes = fs.readFileSync(sourcePath); -const base64 = bytes.toString("base64"); -const outPath = path.resolve(repoRoot, "src", "sandbox", "arthur-tracer.ts"); - -const out = `// AUTO-GENERATED — do not edit by hand. -// Source: ${path.relative(repoRoot, sourcePath)} -// Regenerate: pnpm build:arthur-tracer -// -// Base64-encoded Python source of the Arthur Engine Claude Code tracer. -// Bundled so Nitro reliably ships it with the Vercel deployment; decoded at -// runtime and written into each provisioned sandbox under ~/.claude/hooks/. -export const ARTHUR_TRACER_PY_BASE64 = "${base64}"; -`; - -fs.writeFileSync(outPath, out); -console.log(`Wrote ${path.relative(repoRoot, outPath)} (${bytes.length} bytes -> ${base64.length} base64 chars)`); -``` - -- [ ] **Step 2: Add the npm script** - -Edit `package.json`, insert in the `"scripts"` block (after `"typecheck"` is fine): - -```json - "build:arthur-tracer": "node scripts/build-arthur-tracer.mjs", -``` - -- [ ] **Step 3: Run the script** - -Run: `pnpm build:arthur-tracer` -Expected output: `Wrote src/sandbox/arthur-tracer.ts (59174 bytes -> 78900 base64 chars)` (exact numbers will vary with tracer version). - -- [ ] **Step 4: Sanity-check the generated file** - -Run: `head -c 200 src/sandbox/arthur-tracer.ts` -Expected: starts with `// AUTO-GENERATED` comment, then `export const ARTHUR_TRACER_PY_BASE64 = "...`. - -- [ ] **Step 5: Commit** - -```bash -git add scripts/build-arthur-tracer.mjs package.json src/sandbox/arthur-tracer.ts -git commit -m "feat(sandbox): bundle Arthur tracer source via build script" -``` - ---- - -## Task 3: Extend `SandboxConfig` with `arthur` block - -**Files:** -- Modify: `src/sandbox/manager.ts` - -Surgical type change; no runtime behaviour yet. Isolating the type change lets later tasks focus on logic. - -- [ ] **Step 1: Add the `ArthurConfig` interface and extend `SandboxConfig`** - -In `src/sandbox/manager.ts`, directly above the existing `export interface SandboxConfig {` block (~line 14), add: - -```ts -export interface ArthurConfig { - apiKey: string; - taskId: string; - endpoint: string; -} -``` - -Then inside `SandboxConfig`, append at the end (after `jobTimeoutMs`): - -```ts - /** Arthur AI Engine tracing config. If set, the tracer is installed into every provisioned sandbox. */ - arthur?: ArthurConfig; -``` - -- [ ] **Step 2: Typecheck** - -Run: `pnpm typecheck` -Expected: PASS (optional field, no call sites yet). - -- [ ] **Step 3: Commit** - -```bash -git add src/sandbox/manager.ts -git commit -m "feat(sandbox): add optional ArthurConfig to SandboxConfig" -``` - ---- - -## Task 4: Centralise `settings.json` writes - -**Files:** -- Modify: `src/sandbox/manager.ts` -- Modify: `src/sandbox/manager.test.ts` - -The existing `configureStopHookInSandbox` writes a full `~/.claude/settings.json` via a shell heredoc, which would wipe Arthur's hooks when toggled between phases. Replace it with `writeClaudeSettings(sandbox, opts)` — a single helper that always merges into the current file. - -The merge logic runs inside the sandbox via `node -e` (node 24 is the runtime — always available). It takes a single JSON argument describing what to mutate: - -- `{"commitGuard":"enable"}` — add the commit-guard Stop hook entry if absent -- `{"commitGuard":"disable"}` — remove the commit-guard Stop hook entry if present -- `{"arthur":"install"}` — append the five Arthur hook entries if absent (idempotent by exact command string) - -Multiple keys can be combined. The helper does **not** touch hook entries it doesn't own. - -- [ ] **Step 1: Write the failing tests** - -Replace the two existing stop-hook tests (`manager.test.ts:114-159`) with the following, and add two new ones. After the existing `"writes CLAUDE_CODE_OAUTH_TOKEN..."` test (~line 112), rewrite/extend this block: - -```ts - it("enabling the stop hook runs a node merge script that adds commit-guard", async () => { - const manager = new SandboxManager({ - kind: "github", - token: "ghp_test", - repoPath: "test-org/test-repo", - host: "https://github.com", - anthropicApiKey: "sk-ant-test", - claudeModel: "claude-opus-4-6", - commitAuthor: "ai-workflow-blazity", - commitEmail: "bot@blazity.com", - jobTimeoutMs: 1_800_000, - }); - const sandbox = await manager.provision("feat/test-branch"); - mockRunCommand.mockClear(); - - await manager.configureStopHook(sandbox, true); - - const mergeCall = mockRunCommand.mock.calls.find( - (c: any[]) => - c[0] === "node" && - Array.isArray(c[1]) && - c[1][0] === "--input-type=module" && - c[1][1] === "-e" && - typeof c[1][2] === "string" && - c[1][2].includes("commit-guard.sh") && - c[1][2].includes('"commitGuard":"enable"'), - ); - expect(mergeCall).toBeDefined(); - }); - - it("disabling the stop hook runs a node merge script with commitGuard=disable", async () => { - const manager = new SandboxManager({ - kind: "github", - token: "ghp_test", - repoPath: "test-org/test-repo", - host: "https://github.com", - anthropicApiKey: "sk-ant-test", - claudeModel: "claude-opus-4-6", - commitAuthor: "ai-workflow-blazity", - commitEmail: "bot@blazity.com", - jobTimeoutMs: 1_800_000, - }); - const sandbox = await manager.provision("feat/test-branch"); - mockRunCommand.mockClear(); - - await manager.configureStopHook(sandbox, false); - - const mergeCall = mockRunCommand.mock.calls.find( - (c: any[]) => - c[0] === "node" && - Array.isArray(c[1]) && - c[1][0] === "--input-type=module" && - c[1][1] === "-e" && - typeof c[1][2] === "string" && - c[1][2].includes('"commitGuard":"disable"'), - ); - expect(mergeCall).toBeDefined(); - }); - - it("configureStopHookInSandbox works with any sandbox-like object", async () => { - const fakeSandbox = { runCommand: mockRunCommand }; - mockRunCommand.mockClear(); - - await configureStopHookInSandbox(fakeSandbox as any, true); - - const mergeCall = mockRunCommand.mock.calls.find( - (c: any[]) => - c[0] === "node" && - Array.isArray(c[1]) && - c[1][0] === "--input-type=module" && - c[1][1] === "-e" && - typeof c[1][2] === "string" && - c[1][2].includes('"commitGuard":"enable"'), - ); - expect(mergeCall).toBeDefined(); - }); -``` - -- [ ] **Step 2: Run the tests to confirm they fail** - -Run: `pnpm test -- manager.test.ts` -Expected: FAIL — the old heredoc writer doesn't call `node -e`. - -- [ ] **Step 3: Implement the merge helper** - -In `src/sandbox/manager.ts`, **replace** the entire body of `configureStopHookInSandbox` (lines 64-92) and **replace** the `cat > ~/.claude/settings.json << 'JSON' ... JSON` / `echo '{}' > ~/.claude/settings.json` writes with the new helper. Add the new helper directly above `configureStopHookInSandbox`: - -```ts -/** - * Merge-aware writer for ~/.claude/settings.json inside a sandbox. - * - * Accepts a partial "directive" — only the keys provided are mutated; existing - * hook entries (including those owned by other tools, e.g. Arthur's tracer) - * are preserved. The merge itself runs inside the sandbox via `node -e` - * because Node 24 is the sandbox runtime and we can't assume Python is - * available for stop-hook toggling. - */ -async function writeClaudeSettings( - sandbox: RunnableSandbox, - opts: { - commitGuard?: "enable" | "disable"; - arthur?: "install"; - }, -): Promise { - const directive = JSON.stringify(opts); - const script = ` - import fs from 'node:fs'; - import path from 'node:path'; - const opts = ${JSON.stringify(opts)}; - const home = process.env.HOME; - const settingsPath = path.join(home, '.claude', 'settings.json'); - fs.mkdirSync(path.dirname(settingsPath), { recursive: true }); - let s = {}; - try { s = JSON.parse(fs.readFileSync(settingsPath, 'utf8')); } catch {} - s.hooks = s.hooks || {}; - - const upsertHook = (event, matcher, command) => { - const existing = s.hooks[event] || []; - const has = existing.some(e => (e && Array.isArray(e.hooks) ? e.hooks : []).some(h => h && h.command === command)); - if (!has) existing.push({ matcher, hooks: [{ type: 'command', command }] }); - s.hooks[event] = existing; - }; - const removeHook = (event, commandPredicate) => { - const existing = s.hooks[event] || []; - s.hooks[event] = existing - .map(e => ({ ...e, hooks: (e.hooks || []).filter(h => !commandPredicate(h.command || '')) })) - .filter(e => (e.hooks || []).length > 0); - }; - - if (opts.commitGuard === 'enable') { - upsertHook('Stop', '', 'bash ~/.claude/commit-guard.sh'); - } else if (opts.commitGuard === 'disable') { - removeHook('Stop', c => c.includes('commit-guard.sh')); - } - - if (opts.arthur === 'install') { - const events = [ - ['UserPromptSubmit', 'user_prompt_submit'], - ['PreToolUse', 'pre_tool'], - ['PostToolUse', 'post_tool'], - ['PostToolUseFailure', 'post_tool_failure'], - ['Stop', 'stop'], - ]; - for (const [event, arg] of events) { - upsertHook(event, '', 'python3 "$HOME/.claude/hooks/claude_code_tracer.py" ' + arg); - } - } - - fs.writeFileSync(settingsPath, JSON.stringify(s, null, 2)); - `; - // Note: we serialise opts into the script body twice — the JSON.stringify above - // injects the literal, which is what the test assertions look for. The - // \`directive\` string is included below purely to make the intent grep-able - // when reading runtime logs. (It does not affect behaviour.) - void directive; - await sandbox.runCommand("node", ["--input-type=module", "-e", script]); -} - -export async function configureStopHookInSandbox(sandbox: RunnableSandbox, enabled: boolean): Promise { - // Ensure the commit-guard script exists before toggling the hook (idempotent). - await sandbox.runCommand("bash", [ - "-c", - [ - `mkdir -p ~/.claude`, - `cat > ~/.claude/commit-guard.sh << 'SCRIPT'`, - `#!/bin/bash`, - `input=$(cat)`, - `if echo "$input" | grep -q '"stop_hook_active":true'; then exit 0; fi`, - `changes=$(git status --porcelain | grep -v '^.. \\.claude/' | grep -v '^?? \\.claude/')`, - `if [ -n "$changes" ]; then`, - ` echo '{"decision":"block","reason":"You have uncommitted changes. You MUST either commit all changes with a descriptive message or revert them before stopping."}' >&2`, - ` exit 2`, - `fi`, - `SCRIPT`, - `chmod +x ~/.claude/commit-guard.sh`, - ].join("\n"), - ]); - - await writeClaudeSettings(sandbox, { commitGuard: enabled ? "enable" : "disable" }); -} -``` - -Also **remove** the `SandboxManager.configureStopHook` method's body change is unnecessary — it already delegates. Leave `SandboxManager.configureStopHook` (lines ~229-231) as-is. - -- [ ] **Step 4: Export `writeClaudeSettings` for Task 5's use** - -At the bottom of `src/sandbox/manager.ts`, the helper lives inside the module scope. Task 5 will call it from `installArthurTracer` which also lives in the same module, so no export needed. Leave it as an internal helper. - -- [ ] **Step 5: Run the tests to confirm they pass** - -Run: `pnpm test -- manager.test.ts` -Expected: PASS — all five tests in `manager.test.ts` green. - -- [ ] **Step 6: Typecheck** - -Run: `pnpm typecheck` -Expected: PASS. - -- [ ] **Step 7: Commit** - -```bash -git add src/sandbox/manager.ts src/sandbox/manager.test.ts -git commit -m "refactor(sandbox): merge-aware settings.json writer" -``` - ---- - -## Task 5: Install Arthur tracer inside `provision()` - -**Files:** -- Modify: `src/sandbox/manager.ts` -- Modify: `src/sandbox/manager.test.ts` - -Now wire the Arthur install into `provision()`. The install is a no-op when `config.arthur` is undefined, so existing tests keep passing without setting it. - -Install order inside `provision()`: - -1. (existing) `npm install -g @anthropic-ai/claude-code` -2. (existing) write `agent-env.sh` -3. (existing) onboarding `~/.claude.json` -4. (existing) `installGlobalSkills` -5. **(new)** `installArthurTracer` — only if `config.arthur` is set - -- [ ] **Step 1: Write the failing tests** - -In `src/sandbox/manager.test.ts`, append three new tests in the existing `describe` block (after the last one): - -```ts - it("installs Arthur tracer when config.arthur is set", async () => { - const manager = new SandboxManager({ - kind: "github", - token: "ghp_test", - repoPath: "test-org/test-repo", - host: "https://github.com", - anthropicApiKey: "sk-ant-test", - claudeModel: "claude-opus-4-6", - commitAuthor: "ai-workflow-blazity", - commitEmail: "bot@blazity.com", - jobTimeoutMs: 1_800_000, - arthur: { - apiKey: "test-key", - taskId: "00000000-0000-4000-8000-000000000000", - endpoint: "https://example.ngrok.app/api/v1/traces", - }, - }); - - await manager.provision("feat/test-branch"); - - const pipCall = mockRunCommand.mock.calls.find( - (c: any[]) => - c[0] === "bash" && - typeof c[1]?.[1] === "string" && - c[1][1].includes("pip3 install") && - c[1][1].includes("opentelemetry-sdk") && - c[1][1].includes("opentelemetry-exporter-otlp-proto-http"), - ); - expect(pipCall).toBeDefined(); - - const arthurMergeCall = mockRunCommand.mock.calls.find( - (c: any[]) => - c[0] === "node" && - Array.isArray(c[1]) && - c[1][0] === "--input-type=module" && - c[1][1] === "-e" && - typeof c[1][2] === "string" && - c[1][2].includes('"arthur":"install"') && - c[1][2].includes("user_prompt_submit") && - c[1][2].includes("pre_tool") && - c[1][2].includes("post_tool") && - c[1][2].includes("post_tool_failure"), - ); - expect(arthurMergeCall).toBeDefined(); - }); - - it("skips Arthur install when config.arthur is undefined", async () => { - const manager = new SandboxManager({ - kind: "github", - token: "ghp_test", - repoPath: "test-org/test-repo", - host: "https://github.com", - anthropicApiKey: "sk-ant-test", - claudeModel: "claude-opus-4-6", - commitAuthor: "ai-workflow-blazity", - commitEmail: "bot@blazity.com", - jobTimeoutMs: 1_800_000, - }); - - await manager.provision("feat/test-branch"); - - const pipCall = mockRunCommand.mock.calls.find( - (c: any[]) => - c[0] === "bash" && typeof c[1]?.[1] === "string" && c[1][1].includes("pip3 install"), - ); - expect(pipCall).toBeUndefined(); - }); - - it("Arthur install writes arthur_config.json and the tracer script", async () => { - const manager = new SandboxManager({ - kind: "github", - token: "ghp_test", - repoPath: "test-org/test-repo", - host: "https://github.com", - anthropicApiKey: "sk-ant-test", - claudeModel: "claude-opus-4-6", - commitAuthor: "ai-workflow-blazity", - commitEmail: "bot@blazity.com", - jobTimeoutMs: 1_800_000, - arthur: { - apiKey: "test-key", - taskId: "00000000-0000-4000-8000-000000000000", - endpoint: "https://example.ngrok.app/api/v1/traces", - }, - }); - - await manager.provision("feat/test-branch"); - - // Every writeFiles call passes an array of { path, content }. Flatten them. - const written = mockWriteFiles.mock.calls.flatMap(([files]: any[]) => files); - const tracerFile = written.find((f: any) => f.path.endsWith("arthur-tracer.py")); - expect(tracerFile).toBeDefined(); - expect(Buffer.isBuffer(tracerFile.content)).toBe(true); - expect(tracerFile.content.length).toBeGreaterThan(1000); - - const configFile = written.find((f: any) => f.path.endsWith("arthur_config.json")); - expect(configFile).toBeDefined(); - const cfg = JSON.parse(Buffer.from(configFile.content).toString()); - expect(cfg).toEqual({ - api_key: "test-key", - task_id: "00000000-0000-4000-8000-000000000000", - endpoint: "https://example.ngrok.app/api/v1/traces", - }); - }); -``` - -- [ ] **Step 2: Run the tests to confirm they fail** - -Run: `pnpm test -- manager.test.ts` -Expected: FAIL — `installArthurTracer` doesn't exist yet. - -- [ ] **Step 3: Implement `installArthurTracer`** - -In `src/sandbox/manager.ts`, add this import at the top of the file (after existing imports): - -```ts -import { ARTHUR_TRACER_PY_BASE64 } from "./arthur-tracer.js"; -``` - -Then inside the `SandboxManager` class, directly below `installGlobalSkills`, add: - -```ts - /** - * Install the Arthur AI Engine Claude Code tracer into the sandbox. - * - * No-op if the three credentials are not all configured on the SandboxManager. - * The tracer hooks into every Claude Code turn and exports OpenInference spans - * via OTLP/HTTP to the configured endpoint. - * - * If pip install fails (e.g. missing python3, offline), we log and return - * without registering hooks — failing hooks would block Claude Code turns. - */ - private async installArthurTracer(sandbox: SandboxInstance): Promise { - const arthur = this.config.arthur; - if (!arthur) return; - - const { logger } = await import("../lib/logger.js"); - - const pip = await sandbox.runCommand("bash", [ - "-c", - "python3 -m pip install --user --quiet opentelemetry-sdk>=1.20.0 opentelemetry-exporter-otlp-proto-http>=1.20.0", - ]); - if (pip.exitCode !== 0) { - const err = (await pip.stderr()).trim(); - logger.warn({ err: err.slice(0, 500) }, "arthur_pip_install_failed"); - return; - } - - // Stage tracer to /tmp, then relocate (writeFiles takes absolute paths; $HOME - // isn't expanded by the API, only by shell commands). - const tracerBytes = Buffer.from(ARTHUR_TRACER_PY_BASE64, "base64"); - await sandbox.writeFiles([ - { path: "/tmp/arthur-tracer.py", content: tracerBytes }, - ]); - await sandbox.runCommand("bash", [ - "-c", - "mkdir -p $HOME/.claude/hooks && mv /tmp/arthur-tracer.py $HOME/.claude/hooks/claude_code_tracer.py && chmod +x $HOME/.claude/hooks/claude_code_tracer.py", - ]); - - // Write the config file. Priority-2 location per Arthur's README. - const configJson = JSON.stringify( - { api_key: arthur.apiKey, task_id: arthur.taskId, endpoint: arthur.endpoint }, - null, - 2, - ); - await sandbox.writeFiles([ - { path: "/tmp/arthur_config.json", content: Buffer.from(configJson) }, - ]); - await sandbox.runCommand("bash", [ - "-c", - "mkdir -p $HOME/.claude && mv /tmp/arthur_config.json $HOME/.claude/arthur_config.json && chmod 600 $HOME/.claude/arthur_config.json", - ]); - - // Register all five Arthur hooks via the merge-aware writer. - await writeClaudeSettings(sandbox, { arthur: "install" }); - } -``` - -Then in `provision()`, **after** the existing call `await this.installGlobalSkills(sandbox);` (near the end of the method, just before `return sandbox;`), add: - -```ts - await this.installArthurTracer(sandbox); -``` - -- [ ] **Step 4: Run the tests** - -Run: `pnpm test -- manager.test.ts` -Expected: PASS — all eight tests in `manager.test.ts` green. - -- [ ] **Step 5: Typecheck** - -Run: `pnpm typecheck` -Expected: PASS. - -- [ ] **Step 6: Commit** - -```bash -git add src/sandbox/manager.ts src/sandbox/manager.test.ts -git commit -m "feat(sandbox): install Arthur AI tracer during provision" -``` - ---- - -## Task 6: Wire env into the workflow - -**Files:** -- Modify: `src/workflows/agent.ts` - -`provisionSandbox` builds the `SandboxConfig`; this is where the "all three or none" rule lives. - -- [ ] **Step 1: Build the `arthur` block from env and pass it into `SandboxManager`** - -In `src/workflows/agent.ts`, find the `new SandboxManager({...})` call at line ~159 and replace it with: - -```ts - const arthur = - env.GENAI_ENGINE_API_KEY && env.GENAI_ENGINE_TASK_ID && env.GENAI_ENGINE_TRACE_ENDPOINT - ? { - apiKey: env.GENAI_ENGINE_API_KEY, - taskId: env.GENAI_ENGINE_TASK_ID, - endpoint: env.GENAI_ENGINE_TRACE_ENDPOINT, - } - : undefined; - - const manager = new SandboxManager({ - kind: vcs.kind, - token: vcs.token, - repoPath: vcs.repoPath, - host: vcs.host, - anthropicApiKey: env.ANTHROPIC_API_KEY, - claudeCodeOauthToken: env.CLAUDE_CODE_OAUTH_TOKEN, - claudeModel: env.CLAUDE_MODEL, - commitAuthor: env.COMMIT_AUTHOR, - commitEmail: env.COMMIT_EMAIL, - jobTimeoutMs: env.JOB_TIMEOUT_MS, - arthur, - }); -``` - -No changes needed to `configureStopHook` (the helper in `agent.ts`, line ~205) — `writeClaudeSettings` preserves Arthur's hooks automatically. - -- [ ] **Step 2: Typecheck** - -Run: `pnpm typecheck` -Expected: PASS. - -- [ ] **Step 3: Run the full test suite** - -Run: `pnpm test` -Expected: PASS across the board. `agent.test`-style workflow tests should not regress (they either mock `SandboxManager` or don't look at the arthur field). - -- [ ] **Step 4: Commit** - -```bash -git add src/workflows/agent.ts -git commit -m "feat(workflow): pass Arthur config from env into SandboxManager" -``` - ---- - -## Task 7: Local end-to-end smoke test - -**Files:** -- None (verification only) - -- [ ] **Step 1: Expose local Arthur via ngrok** - -Run in a separate terminal: `ngrok http 3030` -Copy the `https://...ngrok-free.app` URL. - -- [ ] **Step 2: Create an Arthur task and grab its UUID** - -Open `http://localhost:3030`, sign in with `changeme_genai_engine_admin_key`, create a task, copy its UUID. - -- [ ] **Step 3: Configure `.env`** - -Add to `.env` in the repo root: - -```env -GENAI_ENGINE_API_KEY=changeme_genai_engine_admin_key -GENAI_ENGINE_TASK_ID= -GENAI_ENGINE_TRACE_ENDPOINT=https://.ngrok-free.app/api/v1/traces -``` - -- [ ] **Step 4: Ensure the tracer bundle is fresh** - -Run: `pnpm build:arthur-tracer` - -- [ ] **Step 5: Start the dev server and dispatch one ticket** - -Run: `pnpm dev` -In Jira, transition one ticket to the AI column (or wait for the cron sweep). The workflow will provision a sandbox with Arthur wired in. - -- [ ] **Step 6: Verify in Arthur UI** - -Watch Arthur's task view. Within ~30s of the agent starting, you should see: -- One `claude-code-turn` trace per user prompt inside the sandbox -- Child spans: `LLM` (claude/claude-sonnet-*), `TOOL` (Read/Edit/Bash/etc.), `RETRIEVER` (WebSearch/WebFetch), `AGENT` (Task) -- The `arthur.session` resource attribute grouping spans from the same Claude Code session - -- [ ] **Step 7: Negative check** - -Unset one of the three `GENAI_ENGINE_*` vars in `.env`, restart `pnpm dev`, dispatch another ticket. Confirm the sandbox provisions and the ticket processes exactly as today — no Arthur HTTP calls (tail ngrok's request log to confirm zero traffic), no broken hooks. - ---- - -## Verification - -1. **Unit tests**: `pnpm test` — green across all suites. -2. **Typecheck**: `pnpm typecheck` — green. -3. **Manual smoke test**: Task 7 end-to-end — traces appear in Arthur UI, unset-credentials path is a clean no-op. - -## Risks / Open items - -- **Python availability in sandbox**: Vercel's `node24` runtime image includes `python3` + `pip3`, but if a future image change removes them, `pip3 install` fails, `installArthurTracer` logs a warning and returns early — provisioning continues unaffected. No hooks get registered, so no broken turns. -- **Tracer drift**: `src/sandbox/arthur-tracer.ts` is a snapshot. Re-run `pnpm build:arthur-tracer` and redeploy whenever Arthur ships a new tracer. -- **Bundle size**: +~80KB to the deployed JS artifact. Acceptable. -- **Networking**: The sandbox hits whatever URL is in `GENAI_ENGINE_TRACE_ENDPOINT`. For local dev that's ngrok; for prod, deploy Arthur somewhere with a stable public URL and swap the env var. diff --git a/docs/superpowers/plans/2026-04-22-arthur-hosted-prompts.md b/docs/superpowers/plans/2026-04-22-arthur-hosted-prompts.md deleted file mode 100644 index 7891276..0000000 --- a/docs/superpowers/plans/2026-04-22-arthur-hosted-prompts.md +++ /dev/null @@ -1,626 +0,0 @@ -# Arthur-Hosted Prompts With Codebase Fallback - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Let `research-plan`, `implement`, and `review` prompts be edited in Arthur without code changes, while keeping the current hardcoded strings as an automatic fallback when Arthur isn't configured or is unreachable. When Arthur *is* configured with a "prompt host" task, the workflow fetches the `production`-tagged version of each prompt at the start of every run. On any failure (missing env, 404, network error) it silently falls back to the hardcoded strings. - -**Architecture:** - -- New env var `GENAI_ENGINE_PROMPT_TASK_ID` — UUID of a dedicated Arthur task whose only job is to host the three prompts. Kept separate from the per-run trace tasks (`AWT-42`, `AWT-42.1`, …) so prompt edits don't require re-seeding per ticket. -- `ArthurClient` gains three prompt methods (`getPromptByTag`, `createPromptVersion`, `tagPromptVersion`). Each prompt is stored in Arthur as a single-message chat (`[{role: "user", content: ""}]`) and retrieved back via `messages[0].content`. -- A new workflow step `loadPrompts()` runs once per workflow run, returns `{research, implement, review}`, and logs the source (`arthur` or `fallback`) per prompt. Result is checkpointed in workflow history so replays reuse the same strings. -- Three `getPrompt("research-plan.md" | "implement.md" | "review.md")` call sites in `src/workflows/agent.ts` are replaced by indexing into the `loadPrompts()` return value. -- One-shot `pnpm setup:arthur-prompts` script **creates-or-finds** a task named `ai-workflow-prompts`, seeds the three prompts on it (each saved as a new version, tagged `production`), and prints the UUID in a paste-ready `GENAI_ENGINE_PROMPT_TASK_ID=` line. Idempotent — re-running finds the existing task and creates new versions, so it's safe after prompt edits. - -**Tech Stack:** TypeScript, Vitest, native `fetch` (same pattern as `src/adapters/issue-tracker/jira.ts`), `@t3-oss/env-core` + Zod, Workflow DevKit (`"use step"`). - ---- - -## File Structure - -| File | Action | Responsibility | -|------|--------|---------------| -| `env.ts` | **Modify** | Add optional `GENAI_ENGINE_PROMPT_TASK_ID: z.string().uuid().optional()`. | -| `src/sandbox/arthur-client.ts` | **Modify** | Add `getPromptByTag`, `createPromptVersion`, `tagPromptVersion`. Export a helper type for the agentic-prompt response. | -| `src/sandbox/arthur-client.test.ts` | **Modify** | Add 5 tests covering the three new methods (happy path, 404 returns null, auth + body shape for save + tag). | -| `src/lib/prompts.ts` | **Modify** | Export a new `PROMPT_NAMES` array + `PROMPT_FALLBACKS` record mapped by Arthur prompt name (no `.md`). Keep `getPrompt(name)` for backwards compat (now delegates to the fallbacks record keyed by `.md` filename). | -| `src/workflows/prompts-step.ts` | **Create** | New module housing the `loadPrompts()` step. Single export. Contains its own `"use step"` directive. Returns `{research: string, implement: string, review: string}` and per-prompt source logging. | -| `src/workflows/prompts-step.test.ts` | **Create** | 4 unit tests: (a) no Arthur env → fallback for all three; (b) Arthur returns all three → Arthur wins; (c) Arthur 404 on one → that one falls back, other two come from Arthur; (d) `PROMPT_TASK_ID` set but `API_KEY` missing → fallback (invalid config treated as disabled). | -| `src/workflows/agent.ts` | **Modify** | Call `loadPrompts()` once near the top of the workflow body (right after `fetchAndValidateTicket`). Replace the three inline `getPrompt(...)` calls with `prompts.research` / `.implement` / `.review`. | -| `scripts/setup-arthur-prompts.ts` | **Create** | Find-or-create the `ai-workflow-prompts` task, seed the three prompts on it, tag each version `production`, print the paste-ready `GENAI_ENGINE_PROMPT_TASK_ID=` line. Requires `GENAI_ENGINE_API_KEY` + `GENAI_ENGINE_TRACE_ENDPOINT` in `.env`. | -| `package.json` | **Modify** | Add script `"setup:arthur-prompts": "tsx scripts/setup-arthur-prompts.ts"`. | - -No changes to `configureStopHook`, Arthur tracer install, SandboxManager, or any VCS/issue-tracker adapter. The prompt-host task is auto-created on first setup run; re-runs find it by name and seed new versions — so there's at most one `ai-workflow-prompts` task per Arthur instance. - ---- - -## Shared Types - -```ts -// src/sandbox/arthur-client.ts -export interface AgenticPrompt { - name: string; - messages: Array<{ role: string; content: string; /* other OpenAI fields ignored */ }>; - version?: number | string; // Arthur returns this — used for tagging -} -``` - -```ts -// src/workflows/prompts-step.ts -export interface LoadedPrompts { - research: string; - implement: string; - review: string; -} -``` - -`PROMPT_NAMES` (defined in `src/lib/prompts.ts`) is the canonical list used by both the seed script and `loadPrompts()`: - -```ts -export const PROMPT_NAMES = ["research-plan", "implement", "review"] as const; -export type PromptName = typeof PROMPT_NAMES[number]; -``` - ---- - -## Task 1: Env var for the prompt-host task - -**Files:** `env.ts` - -- [ ] **Step 1:** In `env.ts`, add below the existing Arthur env vars (immediately after `GENAI_ENGINE_TRACE_ENDPOINT`): - -```ts - GENAI_ENGINE_PROMPT_TASK_ID: z.string().uuid().optional(), -``` - -- [ ] **Step 2:** Run `pnpm typecheck`. Expect PASS. - -- [ ] **Step 3:** Commit: `git add env.ts && git commit -m "feat(env): add optional GENAI_ENGINE_PROMPT_TASK_ID"`. - ---- - -## Task 2: Expose prompt names + fallbacks for shared use - -**Files:** `src/lib/prompts.ts` - -- [ ] **Step 1:** At the top of `src/lib/prompts.ts` (below the existing three `const ...Prompt = \`…\`` blocks, above `const prompts: Record`), add: - -```ts -export const PROMPT_NAMES = ["research-plan", "implement", "review"] as const; -export type PromptName = typeof PROMPT_NAMES[number]; - -/** Fallback strings keyed by Arthur prompt name (no `.md` suffix). */ -export const PROMPT_FALLBACKS: Record = { - "research-plan": researchPlanPrompt, - "implement": implementPrompt, - "review": reviewPrompt, -}; -``` - -Leave the existing `prompts` record and `getPrompt()` export untouched — no caller is being moved in this task. - -- [ ] **Step 2:** `pnpm typecheck`. PASS. - -- [ ] **Step 3:** Commit: `git add src/lib/prompts.ts && git commit -m "refactor(prompts): expose PROMPT_NAMES and PROMPT_FALLBACKS"`. - ---- - -## Task 3: `ArthurClient` prompt methods - -**Files:** `src/sandbox/arthur-client.ts`, `src/sandbox/arthur-client.test.ts` - -- [ ] **Step 1:** Write failing tests. Append to `src/sandbox/arthur-client.test.ts` (inside the existing `describe("ArthurClient", ...)`): - -```ts - describe("prompts", () => { - it("getPromptByTag returns messages[0].content on 200", async () => { - mockFetch.mockResolvedValueOnce(jsonResponse({ - name: "research-plan", - version: 3, - messages: [{ role: "user", content: "the prompt body" }], - })); - const client = new ArthurClient("http://host", "k"); - const body = await client.getPromptByTag("task-uuid", "research-plan", "production"); - expect(body).toBe("the prompt body"); - const [url] = mockFetch.mock.calls[0]; - expect(url).toBe("http://host/api/v1/tasks/task-uuid/prompts/research-plan/versions/tags/production"); - }); - - it("getPromptByTag returns null on 404", async () => { - mockFetch.mockResolvedValueOnce(new Response("not found", { status: 404 })); - const client = new ArthurClient("http://host", "k"); - expect(await client.getPromptByTag("t", "research-plan", "production")).toBeNull(); - }); - - it("createPromptVersion POSTs single-message body with user role", async () => { - mockFetch.mockResolvedValueOnce(jsonResponse({ - name: "implement", - version: 5, - messages: [{ role: "user", content: "x" }], - })); - const client = new ArthurClient("http://host", "k"); - const result = await client.createPromptVersion("task-uuid", "implement", "x"); - expect(result.version).toBe(5); - const [url, init] = mockFetch.mock.calls[0]; - expect(url).toBe("http://host/api/v1/tasks/task-uuid/prompts/implement"); - expect(init.method).toBe("POST"); - const body = JSON.parse(init.body); - expect(body.messages).toEqual([{ role: "user", content: "x" }]); - }); - - it("tagPromptVersion PUTs the tag", async () => { - mockFetch.mockResolvedValueOnce(jsonResponse({ name: "review", version: 2, messages: [] })); - const client = new ArthurClient("http://host", "k"); - await client.tagPromptVersion("t", "review", 2, "production"); - const [url, init] = mockFetch.mock.calls[0]; - expect(url).toBe("http://host/api/v1/tasks/t/prompts/review/versions/2/tags"); - expect(init.method).toBe("PUT"); - expect(JSON.parse(init.body)).toEqual({ tag: "production" }); - }); - - it("getPromptByTag throws on non-404 non-2xx (5xx)", async () => { - mockFetch.mockResolvedValueOnce(new Response("boom", { status: 500 })); - const client = new ArthurClient("http://host", "k"); - await expect(client.getPromptByTag("t", "x", "production")).rejects.toThrow(/500/); - }); - }); -``` - -- [ ] **Step 2:** Run `pnpm test -- arthur-client.test.ts`. Expect FAIL — methods don't exist. - -- [ ] **Step 3:** Add an interface export and three methods to `src/sandbox/arthur-client.ts`. Place the interface right below `ArthurTask`: - -```ts -export interface AgenticPrompt { - name: string; - version?: number | string; - messages: Array<{ role: string; content: string }>; -} -``` - -Then add these methods inside `ArthurClient`, right after `ensureTaskForTicket`: - -```ts - /** Fetch a tagged prompt version. Returns the first message's content, or null if 404. */ - async getPromptByTag(taskId: string, name: string, tag: string): Promise { - const path = `/api/v1/tasks/${encodeURIComponent(taskId)}/prompts/${encodeURIComponent(name)}/versions/tags/${encodeURIComponent(tag)}`; - const res = await fetch(`${this.baseUrl}${path}`, { - method: "GET", - headers: { - "Authorization": `Bearer ${this.apiKey}`, - "ngrok-skip-browser-warning": "true", - }, - }); - if (res.status === 404) return null; - if (!res.ok) { - const body = await res.text().catch(() => ""); - throw new Error(`Arthur GET ${path} → ${res.status}: ${body.slice(0, 300)}`); - } - const prompt = (await res.json()) as AgenticPrompt; - const first = prompt.messages?.[0]; - return first?.content ?? null; - } - - /** Create a new version of a named prompt on a task. Content is sent as a single user message. */ - async createPromptVersion(taskId: string, name: string, content: string): Promise { - return this.request( - `/api/v1/tasks/${encodeURIComponent(taskId)}/prompts/${encodeURIComponent(name)}`, - { - method: "POST", - body: JSON.stringify({ - messages: [{ role: "user", content }], - model_name: "claude-sonnet-4", - model_provider: "anthropic", - }), - }, - ); - } - - /** Add a tag (e.g. "production") to a specific version. */ - async tagPromptVersion(taskId: string, name: string, version: number | string, tag: string): Promise { - await this.request( - `/api/v1/tasks/${encodeURIComponent(taskId)}/prompts/${encodeURIComponent(name)}/versions/${encodeURIComponent(String(version))}/tags`, - { - method: "PUT", - body: JSON.stringify({ tag }), - }, - ); - } -``` - -Note on `getPromptByTag`: we intentionally **don't** use `request()` for it because 404 is a valid "not found" signal that must not throw — it's the fallback trigger. The save/tag methods *do* use `request()` because any non-2xx there is a genuine failure. - -- [ ] **Step 4:** Run `pnpm test -- arthur-client.test.ts`. Expect PASS (15 tests total: 10 existing + 5 new). - -- [ ] **Step 5:** `pnpm typecheck`. PASS. - -- [ ] **Step 6:** Commit: `git add src/sandbox/arthur-client.ts src/sandbox/arthur-client.test.ts && git commit -m "feat(arthur-client): add prompt get/create/tag methods"`. - ---- - -## Task 4: `loadPrompts()` step - -**Files:** `src/workflows/prompts-step.ts`, `src/workflows/prompts-step.test.ts` - -The step must be in its own file so Vitest can import it directly (importing from `agent.ts` pulls in the whole workflow DevKit). It is exported and called from `agent.ts` in Task 5. - -- [ ] **Step 1:** Write the failing tests in `src/workflows/prompts-step.test.ts`: - -```ts -import { describe, it, expect, vi, beforeEach } from "vitest"; - -vi.mock("../../env.js", () => ({ env: {} })); - -const mockGetPromptByTag = vi.fn(); -vi.mock("../sandbox/arthur-client.js", () => ({ - ArthurClient: { - fromTraceEndpoint: vi.fn(() => ({ getPromptByTag: mockGetPromptByTag })), - }, -})); - -import { loadPrompts } from "./prompts-step.js"; -import { PROMPT_FALLBACKS } from "../lib/prompts.js"; - -function setEnv(partial: Record) { - const mod = require("../../env.js") as { env: Record }; - mod.env = { ...mod.env, ...partial }; -} - -describe("loadPrompts", () => { - beforeEach(() => { - mockGetPromptByTag.mockReset(); - setEnv({ - GENAI_ENGINE_API_KEY: undefined, - GENAI_ENGINE_TRACE_ENDPOINT: undefined, - GENAI_ENGINE_PROMPT_TASK_ID: undefined, - }); - }); - - it("returns fallbacks when no Arthur env is set", async () => { - const result = await loadPrompts(); - expect(result.research).toBe(PROMPT_FALLBACKS["research-plan"]); - expect(result.implement).toBe(PROMPT_FALLBACKS["implement"]); - expect(result.review).toBe(PROMPT_FALLBACKS["review"]); - expect(mockGetPromptByTag).not.toHaveBeenCalled(); - }); - - it("returns fallbacks when PROMPT_TASK_ID is missing even if key+endpoint are set", async () => { - setEnv({ - GENAI_ENGINE_API_KEY: "k", - GENAI_ENGINE_TRACE_ENDPOINT: "https://host/api/v1/traces", - GENAI_ENGINE_PROMPT_TASK_ID: undefined, - }); - const result = await loadPrompts(); - expect(result.research).toBe(PROMPT_FALLBACKS["research-plan"]); - expect(mockGetPromptByTag).not.toHaveBeenCalled(); - }); - - it("returns Arthur prompts when all three are present", async () => { - setEnv({ - GENAI_ENGINE_API_KEY: "k", - GENAI_ENGINE_TRACE_ENDPOINT: "https://host/api/v1/traces", - GENAI_ENGINE_PROMPT_TASK_ID: "prompt-task-uuid", - }); - mockGetPromptByTag - .mockResolvedValueOnce("arthur research") - .mockResolvedValueOnce("arthur implement") - .mockResolvedValueOnce("arthur review"); - const result = await loadPrompts(); - expect(result).toEqual({ - research: "arthur research", - implement: "arthur implement", - review: "arthur review", - }); - expect(mockGetPromptByTag).toHaveBeenCalledTimes(3); - const names = mockGetPromptByTag.mock.calls.map((c) => c[1]); - expect(names).toEqual(["research-plan", "implement", "review"]); - }); - - it("falls back per-prompt when Arthur returns null or throws", async () => { - setEnv({ - GENAI_ENGINE_API_KEY: "k", - GENAI_ENGINE_TRACE_ENDPOINT: "https://host/api/v1/traces", - GENAI_ENGINE_PROMPT_TASK_ID: "prompt-task-uuid", - }); - mockGetPromptByTag - .mockResolvedValueOnce("arthur research") - .mockResolvedValueOnce(null) // implement missing - .mockRejectedValueOnce(new Error("boom")); // review errors - - const result = await loadPrompts(); - expect(result.research).toBe("arthur research"); - expect(result.implement).toBe(PROMPT_FALLBACKS["implement"]); - expect(result.review).toBe(PROMPT_FALLBACKS["review"]); - }); -}); -``` - -- [ ] **Step 2:** Run `pnpm test -- prompts-step.test.ts`. Expect FAIL — the file doesn't exist. - -- [ ] **Step 3:** Create `src/workflows/prompts-step.ts`: - -```ts -import type { LoadedPrompts } from "./prompts-step-types.js"; - -export interface LoadedPrompts { - research: string; - implement: string; - review: string; -} - -export async function loadPrompts(): Promise { - "use step"; - const { env } = await import("../../env.js"); - const { logger } = await import("../lib/logger.js"); - const { PROMPT_FALLBACKS } = await import("../lib/prompts.js"); - - const arthurEnabled = - !!env.GENAI_ENGINE_API_KEY && - !!env.GENAI_ENGINE_TRACE_ENDPOINT && - !!env.GENAI_ENGINE_PROMPT_TASK_ID; - - if (!arthurEnabled) { - logger.info({ source: "fallback", reason: "arthur_prompts_disabled" }, "prompts_loaded"); - return { - research: PROMPT_FALLBACKS["research-plan"], - implement: PROMPT_FALLBACKS["implement"], - review: PROMPT_FALLBACKS["review"], - }; - } - - const { ArthurClient } = await import("../sandbox/arthur-client.js"); - const client = ArthurClient.fromTraceEndpoint( - env.GENAI_ENGINE_TRACE_ENDPOINT!, - env.GENAI_ENGINE_API_KEY!, - ); - const taskId = env.GENAI_ENGINE_PROMPT_TASK_ID!; - const TAG = "production"; - - async function one(name: "research-plan" | "implement" | "review"): Promise { - try { - const body = await client.getPromptByTag(taskId, name, TAG); - if (body === null) { - logger.info({ name, source: "fallback", reason: "arthur_prompt_missing" }, "prompts_loaded"); - return PROMPT_FALLBACKS[name]; - } - logger.info({ name, source: "arthur" }, "prompts_loaded"); - return body; - } catch (err) { - logger.warn({ name, source: "fallback", err: (err as Error).message }, "prompts_loaded"); - return PROMPT_FALLBACKS[name]; - } - } - - const [research, implement, review] = await Promise.all([ - one("research-plan"), - one("implement"), - one("review"), - ]); - return { research, implement, review }; -} -loadPrompts.maxRetries = 0; -``` - -Remove the bad `import type` line (the duplicate) before saving — the interface is defined inline. - -- [ ] **Step 4:** Run `pnpm test -- prompts-step.test.ts`. Expect PASS (4 tests). - -- [ ] **Step 5:** `pnpm typecheck`. PASS. - -- [ ] **Step 6:** Commit: `git add src/workflows/prompts-step.ts src/workflows/prompts-step.test.ts && git commit -m "feat(workflow): loadPrompts step with per-prompt Arthur→codebase fallback"`. - ---- - -## Task 5: Wire `loadPrompts()` into the workflow - -**Files:** `src/workflows/agent.ts` - -- [ ] **Step 1:** In `src/workflows/agent.ts`, right after `const ticket = await fetchAndValidateTicket(ticketId, env.COLUMN_AI); if (!ticket) return;`, add: - -```ts - const { loadPrompts } = await import("./prompts-step.js"); - const prompts = await loadPrompts(); -``` - -- [ ] **Step 2:** Replace the three `getPrompt(...)` call sites: - -| Before | After | -|---|---| -| `prompt: getPrompt("research-plan.md")` | `prompt: prompts.research` | -| `prompt: getPrompt("implement.md")` | `prompt: prompts.implement` | -| `prompt: getPrompt("review.md")` *(commented)* | `prompt: prompts.review` *(commented — leave commented same as today)* | - -- [ ] **Step 3:** Remove the now-unused `const { getPrompt } = await import("../lib/prompts.js");` import inside the workflow body. - -- [ ] **Step 4:** `pnpm typecheck`. PASS. - -- [ ] **Step 5:** `pnpm test`. All suites green (existing + new). - -- [ ] **Step 6:** Commit: `git add src/workflows/agent.ts && git commit -m "feat(workflow): use loadPrompts instead of getPrompt"`. - ---- - -## Task 6: One-shot setup script (find-or-create task + seed + print UUID) - -**Files:** `scripts/setup-arthur-prompts.ts`, `package.json` - -We need two supporting `ArthurClient` helpers that Task 3 didn't add. Rather than retro-editing Task 3, they're added here because they're only used by this script. - -- [ ] **Step 1:** Extend `ArthurClient` with `findTaskByName(name)` and `createPlainTask(name)`. In `src/sandbox/arthur-client.ts`, add these methods directly below `ensureTaskForTicket`: - -```ts - /** Exact-name lookup. Returns the task if found (non-archived), else null. */ - async findTaskByName(name: string): Promise { - const { tasks } = await this.request<{ count: number; tasks: ArthurTask[] }>( - "/api/v2/tasks/search", - { method: "POST", body: JSON.stringify({ task_name: name }) }, - ); - return tasks.find((t) => t.name === name && !t.is_archived) ?? null; - } - - /** Create a task without the agent-metadata/is_agentic defaults used by ensureTaskForTicket. */ - async createPlainTask(name: string): Promise { - return this.request("/api/v2/tasks", { - method: "POST", - body: JSON.stringify({ name, is_agentic: true }), - }); - } -``` - -*(Note: `createPlainTask` body is identical to `createTask` today. Kept as a separate method so its usage signals "for non-ticket tasks" — a semantic marker to prevent future code from assuming ticket-naming conventions on prompt-host tasks.)* - -- [ ] **Step 2:** Add unit tests for the two new methods in `src/sandbox/arthur-client.test.ts` (inside the existing describe): - -```ts - describe("findTaskByName", () => { - it("returns exact-name match, excluding archived", async () => { - mockFetch.mockResolvedValueOnce(jsonResponse({ - count: 3, - tasks: [ - { id: "a", name: "ai-workflow-prompts" }, - { id: "b", name: "ai-workflow-prompts-old", is_archived: true }, - { id: "c", name: "ai-workflow-prompts", is_archived: true }, - ], - })); - const client = new ArthurClient("http://host", "k"); - const t = await client.findTaskByName("ai-workflow-prompts"); - expect(t?.id).toBe("a"); - }); - - it("returns null on no match", async () => { - mockFetch.mockResolvedValueOnce(jsonResponse({ count: 0, tasks: [] })); - const client = new ArthurClient("http://host", "k"); - expect(await client.findTaskByName("nothing")).toBeNull(); - }); - }); -``` - -- [ ] **Step 3:** Run `pnpm test -- arthur-client.test.ts`. Expect PASS. - -- [ ] **Step 4:** Create `scripts/setup-arthur-prompts.ts`: - -```ts -/** - * One-shot setup: ensures the Arthur prompt-host task exists and has the three - * workflow prompts seeded with the `production` tag. - * - * npx tsx scripts/setup-arthur-prompts.ts - * - * Requires in .env: - * GENAI_ENGINE_API_KEY - * GENAI_ENGINE_TRACE_ENDPOINT - * - * Prints the task UUID as a paste-ready env line at the end. - */ -import "dotenv/config"; -import { ArthurClient } from "../src/sandbox/arthur-client.js"; -import { PROMPT_FALLBACKS, PROMPT_NAMES } from "../src/lib/prompts.js"; - -const TASK_NAME = "ai-workflow-prompts"; -const TAG = "production"; - -const apiKey = process.env.GENAI_ENGINE_API_KEY; -const endpoint = process.env.GENAI_ENGINE_TRACE_ENDPOINT; -if (!apiKey || !endpoint) { - console.error("Missing GENAI_ENGINE_{API_KEY,TRACE_ENDPOINT} in env/.env"); - process.exit(1); -} - -const client = ArthurClient.fromTraceEndpoint(endpoint, apiKey); - -async function main() { - let task = await client.findTaskByName(TASK_NAME); - if (task) { - console.log(`Found existing task "${TASK_NAME}" (${task.id}) — will overwrite prompts.`); - } else { - task = await client.createPlainTask(TASK_NAME); - console.log(`Created new task "${TASK_NAME}" (${task.id}).`); - } - - for (const name of PROMPT_NAMES) { - const body = PROMPT_FALLBACKS[name]; - console.log(`\n seeding ${name}…`); - const created = await client.createPromptVersion(task.id, name, body); - const version = created.version; - if (version === undefined) { - console.error(` no version returned; cannot tag. full response:`, created); - continue; - } - await client.tagPromptVersion(task.id, name, version, TAG); - console.log(` ✓ version ${version} tagged "${TAG}"`); - } - - console.log(`\nSetup complete. Add this to .env:\n GENAI_ENGINE_PROMPT_TASK_ID=${task.id}`); -} - -main().catch((e) => { console.error(e); process.exit(1); }); -``` - -- [ ] **Step 5:** In `package.json` add to `"scripts"`: - -```json - "setup:arthur-prompts": "tsx scripts/setup-arthur-prompts.ts", -``` - -- [ ] **Step 6:** `pnpm typecheck`. PASS. - -- [ ] **Step 7:** Commit: - -```bash -git add src/sandbox/arthur-client.ts src/sandbox/arthur-client.test.ts scripts/setup-arthur-prompts.ts package.json -git commit -m "feat(scripts): setup-arthur-prompts — find-or-create task and seed prompts" -``` - ---- - -## Task 7: Manual verification - -**Files:** None (runtime verification only). - -- [ ] **Step 1:** Ensure `GENAI_ENGINE_API_KEY` and `GENAI_ENGINE_TRACE_ENDPOINT` are set and uncommented in `.env`. - -- [ ] **Step 2:** Run setup: - -```bash -pnpm setup:arthur-prompts -``` - -Expected output: either `Created new task …` or `Found existing task …`, followed by three `✓ version N tagged "production"` lines, and a final: - -```text -Setup complete. Add this to .env: - GENAI_ENGINE_PROMPT_TASK_ID= -``` - -- [ ] **Step 3:** Copy that line into `.env`. - -- [ ] **Step 4:** In Arthur UI, verify the `ai-workflow-prompts` task exists with three prompts, each having a version tagged `production`. - -- [ ] **Step 5:** Start `pnpm dev`, trigger a fresh ticket. Grep dev-server output for `prompts_loaded` — expect three lines, each with `source: "arthur"`: - -```text -msg=prompts_loaded name=research-plan source=arthur -msg=prompts_loaded name=implement source=arthur -msg=prompts_loaded name=review source=arthur -``` - -- [ ] **Step 6:** Negative check — comment out `GENAI_ENGINE_PROMPT_TASK_ID` in `.env`, restart `pnpm dev`, trigger another ticket. Expect a single `prompts_loaded source=fallback reason=arthur_prompts_disabled` line and no per-prompt `arthur` source log. - -- [ ] **Step 7:** Per-prompt fallback check — temporarily delete one prompt (e.g. `review`) from the Arthur UI, restart `pnpm dev`, trigger another ticket. Expect two `source=arthur` lines and one `source=fallback reason=arthur_prompt_missing name=review`. Re-run `pnpm setup:arthur-prompts` to restore. - ---- - -## Verification - -1. `pnpm test` — all suites green. -2. `pnpm typecheck` — green. -3. Task 7 manual flow — three positive, two negative, all matching expected log lines. - -## Risks / Open Items - -- **Race between seed and workflow start.** If a workflow run begins while the seed script is mid-flight, the workflow might see an incomplete prompt set and fall back for the missing ones. The per-prompt fallback makes this safe (no broken run), just visible in logs. Acceptable. -- **No automatic task creation.** We don't auto-create the prompt-host task because the prompts API needs a task ID *before* any prompt exists, and accidentally creating many such tasks would be confusing. Manual setup keeps the invariant "at most one prompt-host task" explicit. Documented in Task 7 Step 1. -- **`model_name` / `model_provider` are required by `POST /prompts/{name}`** per the API schema. We send the current workflow's model (`claude-sonnet-4`, `anthropic`) as placeholders — Arthur's tracing doesn't consume these fields for hosted prompts, and we ignore them on read. If Arthur starts validating compatibility, we'd revisit. -- **Replay consistency.** `loadPrompts()` is a `"use step"` with `maxRetries = 0`, so once the workflow records a result it reuses it on replay. This means prompts mid-flight never change under the workflow's feet. Tradeoff: an urgent prompt fix won't affect a workflow already past the `loadPrompts()` checkpoint — operators must dispatch a new run. -- **Bundle size / cold-start.** `prompts-step.ts` adds ~1KB to the deployed JS. Insignificant. -- **No cost pricing for prompt storage.** Arthur charges per trace; hosted prompts are free. Confirmed with API docs — no additional env/billing concern. diff --git a/docs/superpowers/plans/2026-04-27-codex-integration.md b/docs/superpowers/plans/2026-04-27-codex-integration.md deleted file mode 100644 index bf89a7c..0000000 --- a/docs/superpowers/plans/2026-04-27-codex-integration.md +++ /dev/null @@ -1,2600 +0,0 @@ -# Codex Integration Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Add OpenAI's Codex CLI as a second agent runtime alongside Claude Code, env-switched via `AGENT_KIND=claude|codex`, with full feature parity (skills, commit-guard, Arthur tracing, structured output, usage reporting). - -**Architecture:** Introduce a thin `AgentAdapter` interface in `src/sandbox/agents/`. Refactor existing Claude logic into `ClaudeAgentAdapter`. Add `CodexAgentAdapter` that wraps `codex exec --json --output-schema`. `SandboxManager` becomes thin and orchestrator-only. Workflow code threads the adapter through phase steps without changing shape. - -**Tech Stack:** TypeScript / Node 24 / Vercel Sandbox / Vercel Workflow / Vitest / Zod / `@anthropic-ai/claude-code` / `@openai/codex` (new) / LiteLLM model pricing JSON (new). - -**Source spec:** `docs/superpowers/specs/2026-04-27-codex-integration-design.md`. - ---- - -## Phase 1 — Refactor (Claude only, no Codex yet) - -Goal: extract the Claude-specific bits behind an `AgentAdapter`. Existing tests + the e2e Claude path keep passing. Ship as one logical commit at the end of Phase 1. - -### Task 1: Scaffold `agents/types.ts` — interface + shared types - -**Files:** -- Create: `src/sandbox/agents/types.ts` - -- [ ] **Step 1: Write the types module** - -```ts -// src/sandbox/agents/types.ts -import type { Sandbox as SandboxType } from "@vercel/sandbox"; -import { z } from "zod"; - -export type PhaseKind = "research" | "impl" | "review"; - -type SandboxInstance = Awaited>; - -/** Minimal interface for sandbox objects that support runCommand and writeFiles. */ -export interface RunnableSandbox { - runCommand: SandboxInstance["runCommand"]; - writeFiles: SandboxInstance["writeFiles"]; -} - -// --- Schemas (moved from src/sandbox/agent-runner.ts) --- - -export const agentOutputSchema = z.object({ - result: z.enum(["implemented", "clarification_needed", "failed"]), - summary: z.string().optional(), - questions: z.array(z.string()).optional(), - error: z.string().optional(), -}); -export type AgentOutput = z.infer; - -export const AGENT_SCHEMA = JSON.stringify({ - type: "object", - properties: { - result: { type: "string", enum: ["implemented", "clarification_needed", "failed"] }, - summary: { type: "string" }, - questions: { type: "array", items: { type: "string" } }, - error: { type: "string" }, - }, - required: ["result"], -}); - -export const reviewOutputSchema = z.object({ - result: z.enum(["approved", "failed"]), - feedback: z.string(), - issues: z.array(z.object({ - file: z.string(), - description: z.string(), - severity: z.enum(["critical", "suggestion"]), - })), - error: z.string().optional(), -}); -export type ReviewOutput = z.infer; - -export const REVIEW_SCHEMA = JSON.stringify({ - type: "object", - properties: { - result: { type: "string", enum: ["approved", "failed"] }, - feedback: { type: "string" }, - issues: { - type: "array", - items: { - type: "object", - properties: { - file: { type: "string" }, - description: { type: "string" }, - severity: { type: "string", enum: ["critical", "suggestion"] }, - }, - required: ["file", "description", "severity"], - }, - }, - error: { type: "string" }, - }, - required: ["result", "feedback", "issues"], -}); - -export type ResearchStatus = "completed" | "clarification_needed" | "failed"; -export interface ResearchResult { status: ResearchStatus; body: string; } - -// --- Usage (replaces shape in src/sandbox/usage.ts) --- - -export interface PhaseUsage { - /** Populated by Claude (CLI computes dollars itself). null for Codex (computed downstream from tokens). */ - cost_usd: number | null; - /** Populated by Codex from turn.completed. null for Claude. */ - tokens: { input: number; cached_input: number; output: number } | null; - duration_ms: number; - duration_api_ms: number; - num_turns: number; -} - -// --- Adapter contract --- - -export interface ArthurConfig { - apiKey: string; - taskId: string; - endpoint: string; -} - -export interface ConfigureOpts { - anthropicApiKey?: string; - claudeCodeOauthToken?: string; - codexApiKey?: string; - codexChatGptOauthToken?: string; - model: string; - arthur?: ArthurConfig; -} - -export interface PhaseArtifactPaths { - wrapper: string; - input: string; - stdout: string; - stderr: string; - sentinel: string; - /** Schema-validated JSON file (Codex --output-schema). null for Claude. */ - structuredOutput: string | null; -} - -export interface PhaseScriptOpts { - phase: PhaseKind; - model: string; - paths: PhaseArtifactPaths; - /** When set, the phase requests schema-validated structured output. */ - jsonSchema?: string; -} - -export interface AgentAdapter { - kind: "claude" | "codex"; - install(sandbox: RunnableSandbox): Promise; - configure(sandbox: RunnableSandbox, opts: ConfigureOpts): Promise; - setCommitGuard(sandbox: RunnableSandbox, enabled: boolean): Promise; - buildPhaseScript(opts: PhaseScriptOpts): string; - artifactPaths(phase: PhaseKind): PhaseArtifactPaths; - parseAgentOutput(raw: string, structured: string | null): AgentOutput; - parseReviewOutput(raw: string, structured: string | null): ReviewOutput; - parseResearchStatus(raw: string, structured: string | null): ResearchResult; - extractUsage(raw: string, structured: string | null): PhaseUsage | null; -} -``` - -- [ ] **Step 2: Verify it compiles** - -Run: `pnpm typecheck` -Expected: PASS (file is types-only, no behavior; new dir does not break anything yet). - -- [ ] **Step 3: Commit (deferred — see Phase 1 commit at the end)** - ---- - -### Task 2: Move shared install logic into `agents/shared.ts` - -**Files:** -- Create: `src/sandbox/agents/shared.ts` -- Create: `src/sandbox/agents/shared.test.ts` - -- [ ] **Step 1: Write the failing test** - -```ts -// src/sandbox/agents/shared.test.ts -import { describe, it, expect, vi } from "vitest"; -import { GLOBAL_SKILLS, installSkillsToAgentsDir } from "./shared.js"; - -describe("GLOBAL_SKILLS", () => { - it("contains the expected skill repos", () => { - const ids = GLOBAL_SKILLS.map((s) => `${s.repo}#${s.skill}`); - expect(ids).toContain("https://github.com/obra/superpowers#using-superpowers"); - expect(ids).toContain("https://github.com/obra/superpowers#requesting-code-review"); - expect(ids).toContain("https://github.com/anthropics/skills#frontend-design"); - }); -}); - -describe("installSkillsToAgentsDir", () => { - it("runs `npx skills add --skill --target ~/.agents/skills` for each entry", async () => { - const runCommand = vi.fn().mockResolvedValue({ exitCode: 0 }); - const writeFiles = vi.fn().mockResolvedValue(undefined); - const sandbox = { runCommand, writeFiles } as any; - - await installSkillsToAgentsDir(sandbox); - - const calls = runCommand.mock.calls.filter((c) => c[0] === "npx"); - expect(calls).toHaveLength(GLOBAL_SKILLS.length); - for (const [_, args] of calls) { - expect(args).toContain("skills"); - expect(args).toContain("add"); - expect(args).toContain("--target"); - expect(args).toContain("$HOME/.agents/skills"); - } - }); -}); -``` - -- [ ] **Step 2: Run the test to verify it fails** - -Run: `pnpm vitest run src/sandbox/agents/shared.test.ts` -Expected: FAIL — module does not exist. - -- [ ] **Step 3: Implement the shared module** - -```ts -// src/sandbox/agents/shared.ts -import type { RunnableSandbox } from "./types.js"; - -/** - * Skills installed globally in every sandbox under ~/.agents/skills/. - * Both adapters read from this single path; Claude additionally symlinks - * ~/.claude/skills → ~/.agents/skills so its auto-discovery finds the same content. - */ -export const GLOBAL_SKILLS = [ - { repo: "https://github.com/obra/superpowers", skill: "using-superpowers" }, - { repo: "https://github.com/obra/superpowers", skill: "requesting-code-review" }, - { repo: "https://github.com/anthropics/skills", skill: "frontend-design" }, -] as const; - -/** - * Install every entry in GLOBAL_SKILLS into ~/.agents/skills inside a sandbox. - * - * Uses `--target` so both Claude (~/.claude/skills via symlink) and Codex - * (native ~/.agents/skills) read the same set without duplication. - */ -export async function installSkillsToAgentsDir(sandbox: RunnableSandbox): Promise { - await sandbox.runCommand("bash", ["-c", "mkdir -p $HOME/.agents/skills"]); - for (const { repo, skill } of GLOBAL_SKILLS) { - await sandbox.runCommand("npx", [ - "-y", "skills", "add", repo, - "--skill", skill, - "--yes", - "--target", "$HOME/.agents/skills", - ]); - } -} - -/** Bash body for the commit-guard hook. The output protocol differs between agents, - * so each adapter wraps this differently. */ -export const COMMIT_GUARD_CHECK_SH = [ - "input=$(cat)", - // Skip when re-entered (set by Claude as stop_hook_active, by us as already_blocked for Codex) - `if echo "$input" | grep -q -E '"stop_hook_active":true|"already_blocked":true'; then exit 0; fi`, - // Ignore changes inside ~/.claude/ or ~/.codex/ inside the workspace - `changes=$(git status --porcelain | grep -v -E '^.. \\.(claude|codex)/' | grep -v -E '^\\?\\? \\.(claude|codex)/' || true)`, -].join("\n"); -``` - -- [ ] **Step 4: Run the test to verify it passes** - -Run: `pnpm vitest run src/sandbox/agents/shared.test.ts` -Expected: PASS. - ---- - -### Task 3: Implement `agents/claude.ts` — Claude adapter - -**Files:** -- Create: `src/sandbox/agents/claude.ts` -- Create: `src/sandbox/agents/claude.test.ts` - -This task moves three families of code into the Claude adapter: - -1. The wrapper script body from `src/sandbox/wrapper-script.ts` → `buildPhaseScript()`. -2. The parsers (`parseAgentOutput`, `parseResearchStatus`, `parseReviewOutput`) and `extractUsage` from `src/sandbox/agent-runner.ts` and `src/sandbox/usage.ts` → adapter methods that ignore the `structured` argument. -3. Provisioning side effects (`installArthurTracer`, `configureStopHookInSandbox`, skill install) from `src/sandbox/manager.ts` → `install()`, `configure()`, `setCommitGuard()`. - -- [ ] **Step 1: Write `claude.test.ts` covering parsers + buildPhaseScript + setCommitGuard** - -Note: parser test cases are copied verbatim from `src/sandbox/agent-runner.test.ts` and `src/sandbox/usage.test.ts`. Recreate the same coverage so old behaviour is preserved. - -```ts -// src/sandbox/agents/claude.test.ts -import { describe, it, expect, vi, beforeEach } from "vitest"; -import { ClaudeAgentAdapter } from "./claude.js"; - -const adapter = new ClaudeAgentAdapter(); - -describe("ClaudeAgentAdapter.parseAgentOutput", () => { - it("parses implemented result", () => { - const raw = JSON.stringify({ result: "implemented", summary: "done" }); - expect(adapter.parseAgentOutput(raw, null).result).toBe("implemented"); - }); - - it("parses structured_output from result envelope", () => { - const envelope = JSON.stringify({ - type: "result", - subtype: "success", - is_error: false, - result: "freeform text", - structured_output: { result: "implemented", summary: "Renamed endpoint" }, - }); - expect(adapter.parseAgentOutput(envelope, null).summary).toBe("Renamed endpoint"); - }); - - it("returns failed on empty output", () => { - expect(adapter.parseAgentOutput("", null).result).toBe("failed"); - }); - - it("returns failed on garbage", () => { - const out = adapter.parseAgentOutput("not json at all", null); - expect(out.result).toBe("failed"); - expect(out.error).toContain("not structured JSON"); - }); - - it("ignores the structured argument (Claude embeds output in raw)", () => { - // Claude never receives a separate structured file; the structured arg is null in production. - const raw = JSON.stringify({ result: "implemented", summary: "via raw" }); - expect(adapter.parseAgentOutput(raw, "ignored payload").summary).toBe("via raw"); - }); -}); - -describe("ClaudeAgentAdapter.parseResearchStatus", () => { - it("parses a STATUS line and returns the body", () => { - // Claude wraps research output in a result envelope; parseResearchStatus must unwrap. - const envelope = JSON.stringify({ - type: "result", - subtype: "success", - result: "STATUS: completed\n\nPlan body here", - }); - const r = adapter.parseResearchStatus(envelope, null); - expect(r.status).toBe("completed"); - expect(r.body).toBe("Plan body here"); - }); - - it("falls back to failed when no STATUS line is present", () => { - expect(adapter.parseResearchStatus("no status here", null).status).toBe("failed"); - }); -}); - -describe("ClaudeAgentAdapter.parseReviewOutput", () => { - it("parses approved with empty issues", () => { - const raw = JSON.stringify({ result: "approved", feedback: "looks good", issues: [] }); - expect(adapter.parseReviewOutput(raw, null).result).toBe("approved"); - }); - - it("returns failed on empty input", () => { - expect(adapter.parseReviewOutput("", null).result).toBe("failed"); - }); -}); - -describe("ClaudeAgentAdapter.extractUsage", () => { - it("extracts cost_usd from a result envelope", () => { - const raw = JSON.stringify({ - type: "result", subtype: "success", - cost_usd: 0.42, duration_ms: 60_000, duration_api_ms: 30_000, num_turns: 3, - result: "ok", - }); - const u = adapter.extractUsage(raw, null); - expect(u).toEqual({ - cost_usd: 0.42, - tokens: null, - duration_ms: 60_000, - duration_api_ms: 30_000, - num_turns: 3, - }); - }); - - it("returns null when no envelope is present", () => { - expect(adapter.extractUsage("not json", null)).toBeNull(); - }); -}); - -describe("ClaudeAgentAdapter.buildPhaseScript", () => { - it("emits a script that sources agent-env.sh and invokes claude", () => { - const paths = adapter.artifactPaths("research"); - const s = adapter.buildPhaseScript({ phase: "research", model: "claude-opus-4-6", paths }); - expect(s).toContain("#!/bin/bash"); - expect(s).toContain("claude"); - expect(s).toContain("--model 'claude-opus-4-6'"); - expect(s).toContain("--output-format json"); - expect(s).toContain("/tmp/research-requirements.md"); - expect(s).toContain("/tmp/research-stdout.txt"); - expect(s).toContain("/tmp/research-stderr.txt"); - expect(s).toContain("/tmp/research-done"); - expect(s).not.toContain("--json-schema"); - }); - - it("includes --json-schema when jsonSchema is supplied", () => { - const paths = adapter.artifactPaths("impl"); - const s = adapter.buildPhaseScript({ - phase: "impl", - model: "claude-opus-4-6", - paths, - jsonSchema: '{"type":"object"}', - }); - expect(s).toContain("--json-schema"); - }); -}); - -describe("ClaudeAgentAdapter.artifactPaths", () => { - it("returns Claude paths with structuredOutput=null", () => { - expect(adapter.artifactPaths("research")).toEqual({ - wrapper: "/tmp/research-wrapper.sh", - input: "/tmp/research-requirements.md", - stdout: "/tmp/research-stdout.txt", - stderr: "/tmp/research-stderr.txt", - sentinel: "/tmp/research-done", - structuredOutput: null, - }); - }); -}); - -describe("ClaudeAgentAdapter.setCommitGuard", () => { - it("upserts the Stop hook when enabled and writes commit-guard.sh", async () => { - const runCommand = vi.fn().mockResolvedValue({ exitCode: 0 }); - const writeFiles = vi.fn().mockResolvedValue(undefined); - const sandbox = { runCommand, writeFiles } as any; - - await adapter.setCommitGuard(sandbox, true); - - const calls = runCommand.mock.calls; - // Writes the guard script - expect(calls.some(([cmd, args]) => cmd === "bash" && args[1].includes("commit-guard.sh"))).toBe(true); - // Toggles via node merge script - const mergeCall = calls.find(([cmd, args]) => - cmd === "node" && args[1] === "-e" && args[2].includes('"commitGuard":"enable"'), - ); - expect(mergeCall).toBeDefined(); - }); - - it("disables by writing commitGuard=disable directive", async () => { - const runCommand = vi.fn().mockResolvedValue({ exitCode: 0 }); - const sandbox = { runCommand, writeFiles: vi.fn() } as any; - - await adapter.setCommitGuard(sandbox, false); - - const mergeCall = runCommand.mock.calls.find(([cmd, args]) => - cmd === "node" && typeof args[2] === "string" && args[2].includes('"commitGuard":"disable"'), - ); - expect(mergeCall).toBeDefined(); - }); -}); -``` - -- [ ] **Step 2: Run tests to verify they fail** - -Run: `pnpm vitest run src/sandbox/agents/claude.test.ts` -Expected: FAIL — module does not exist. - -- [ ] **Step 3: Implement `claude.ts`** - -```ts -// src/sandbox/agents/claude.ts -import type { - AgentAdapter, AgentOutput, ConfigureOpts, PhaseArtifactPaths, PhaseKind, - PhaseScriptOpts, PhaseUsage, ResearchResult, ReviewOutput, RunnableSandbox, -} from "./types.js"; -import { agentOutputSchema, reviewOutputSchema } from "./types.js"; -import { installSkillsToAgentsDir } from "./shared.js"; -import { ARTHUR_TRACER_PY_BASE64 } from "../arthur-tracer.js"; - -const ARTHUR_HOOK_EVENTS: ReadonlyArray = [ - ["UserPromptSubmit", "user_prompt_submit"], - ["PreToolUse", "pre_tool"], - ["PostToolUse", "post_tool"], - ["PostToolUseFailure", "post_tool_failure"], - ["Stop", "stop"], -]; - -export class ClaudeAgentAdapter implements AgentAdapter { - readonly kind = "claude" as const; - - async install(sandbox: RunnableSandbox): Promise { - await sandbox.runCommand("npm", ["install", "-g", "@anthropic-ai/claude-code"]); - // Skip interactive onboarding - await sandbox.runCommand("bash", [ - "-c", - `mkdir -p ~/.claude && echo '{"hasCompletedOnboarding":true}' > ~/.claude.json`, - ]); - } - - async configure(sandbox: RunnableSandbox, opts: ConfigureOpts): Promise { - if (!opts.anthropicApiKey && !opts.claudeCodeOauthToken) { - throw new Error("ClaudeAgentAdapter.configure requires anthropicApiKey or claudeCodeOauthToken"); - } - const envLines: string[] = []; - if (opts.claudeCodeOauthToken) { - envLines.push(`export CLAUDE_CODE_OAUTH_TOKEN=${shellQuote(opts.claudeCodeOauthToken)}`); - } else if (opts.anthropicApiKey) { - envLines.push(`export ANTHROPIC_API_KEY=${shellQuote(opts.anthropicApiKey)}`); - } - await sandbox.writeFiles([ - { path: "/tmp/agent-env.sh", content: Buffer.from(envLines.join("\n") + "\n") }, - ]); - await sandbox.runCommand("chmod", ["600", "/tmp/agent-env.sh"]); - - // Skills: install into ~/.agents/skills, then symlink ~/.claude/skills → ~/.agents/skills - await installSkillsToAgentsDir(sandbox); - await sandbox.runCommand("bash", [ - "-c", - "mkdir -p $HOME/.claude && rm -rf $HOME/.claude/skills && ln -s $HOME/.agents/skills $HOME/.claude/skills", - ]); - - // Arthur tracer (no-op without config) - if (opts.arthur) { - await this.installArthurTracer(sandbox, opts.arthur); - } - } - - async setCommitGuard(sandbox: RunnableSandbox, enabled: boolean): Promise { - // 1) Drop the guard script (idempotent) - await sandbox.runCommand("bash", [ - "-c", - [ - "mkdir -p ~/.claude", - "cat > ~/.claude/commit-guard.sh << 'SCRIPT'", - "#!/bin/bash", - "input=$(cat)", - `if echo "$input" | grep -q '"stop_hook_active":true'; then exit 0; fi`, - `changes=$(git status --porcelain | grep -v '^.. \\.claude/' | grep -v '^?? \\.claude/')`, - `if [ -n "$changes" ]; then`, - ` echo '{"decision":"block","reason":"You have uncommitted changes. You MUST either commit all changes with a descriptive message or revert them before stopping."}' >&2`, - " exit 2", - "fi", - "SCRIPT", - "chmod +x ~/.claude/commit-guard.sh", - ].join("\n"), - ]); - - // 2) Toggle the Stop hook entry via merge-aware settings.json writer - await this.mergeSettings(sandbox, { commitGuard: enabled ? "enable" : "disable" }); - } - - buildPhaseScript(opts: PhaseScriptOpts): string { - const { paths, jsonSchema, model, phase } = opts; - let claudeFlags = `--print --model '${model}' --dangerously-skip-permissions --output-format json`; - if (jsonSchema) { - const escapedSchema = jsonSchema.replace(/'/g, "'\\''"); - claudeFlags += ` --json-schema '${escapedSchema}'`; - } - return `#!/bin/bash - -# --- Cleanup stale files from prior runs --- -rm -f ${paths.sentinel} ${paths.stdout} ${paths.stderr} - -# --- Source auth env vars --- -[ -f /tmp/agent-env.sh ] && source /tmp/agent-env.sh - -# --- Phase: ${phase} --- -cat ${paths.input} | claude \\ - ${claudeFlags} \\ - > ${paths.stdout} 2>${paths.stderr}; echo $? > /tmp/${phase}-exit-code || true - -# --- Cleanup --- -cd /vercel/sandbox -rm -rf .claude/ -git checkout -- .claude/ 2>/dev/null || true - -# --- Signal completion --- -touch ${paths.sentinel} -`; - } - - artifactPaths(phase: PhaseKind): PhaseArtifactPaths { - return { - wrapper: `/tmp/${phase}-wrapper.sh`, - input: `/tmp/${phase}-requirements.md`, - stdout: `/tmp/${phase}-stdout.txt`, - stderr: `/tmp/${phase}-stderr.txt`, - sentinel: `/tmp/${phase}-done`, - structuredOutput: null, - }; - } - - parseAgentOutput(raw: string, _structured: string | null): AgentOutput { - if (!raw.trim()) return { result: "failed", error: "Agent produced no output" }; - - try { - const direct = agentOutputSchema.safeParse(JSON.parse(raw)); - if (direct.success) return direct.data; - } catch { /* not direct JSON */ } - - const lines = raw.split("\n").filter(Boolean); - for (let i = lines.length - 1; i >= 0; i--) { - try { - const event = JSON.parse(lines[i]); - if (event.type === "result") { - if (event.structured_output != null) { - const parsed = agentOutputSchema.safeParse(event.structured_output); - if (parsed.success) return parsed.data; - } - if (typeof event.result === "string") { - try { - const parsed = agentOutputSchema.safeParse(JSON.parse(event.result)); - if (parsed.success) return parsed.data; - } catch { /* not JSON */ } - } - if (event.subtype === "success" && !event.is_error) { - return { - result: "implemented", - summary: typeof event.result === "string" ? event.result.trim().slice(0, 500) : undefined, - }; - } - return { - result: "failed", - error: typeof event.result === "string" ? event.result.trim().slice(0, 500) : "Agent returned non-structured result", - }; - } - const direct = agentOutputSchema.safeParse(event); - if (direct.success) return direct.data; - } catch { /* try next line */ } - } - - const objects = raw.matchAll(/\{[^{}]*(?:\{[^{}]*\}[^{}]*)*\}/g); - for (const [candidate] of objects) { - try { - const result = agentOutputSchema.safeParse(JSON.parse(candidate)); - if (result.success) return result.data; - } catch { /* try next */ } - } - - return { result: "failed", error: `Agent output was not structured JSON. Output starts with: ${raw.slice(0, 500)}` }; - } - - parseReviewOutput(raw: string, _structured: string | null): ReviewOutput { - if (!raw.trim()) { - return { result: "failed", feedback: "", issues: [], error: "Review agent produced no output" }; - } - try { - const direct = reviewOutputSchema.safeParse(JSON.parse(raw)); - if (direct.success) return direct.data; - } catch { /* not direct JSON */ } - - const lines = raw.split("\n").filter(Boolean); - for (let i = lines.length - 1; i >= 0; i--) { - try { - const event = JSON.parse(lines[i]); - if (event.type === "result" && event.structured_output != null) { - const parsed = reviewOutputSchema.safeParse(event.structured_output); - if (parsed.success) return parsed.data; - } - const direct = reviewOutputSchema.safeParse(event); - if (direct.success) return direct.data; - } catch { /* try next */ } - } - - const objects = raw.matchAll(/\{[^{}]*(?:\{[^{}]*\}[^{}]*)*\}/g); - for (const [candidate] of objects) { - try { - const result = reviewOutputSchema.safeParse(JSON.parse(candidate)); - if (result.success) return result.data; - } catch { /* try next */ } - } - - return { - result: "failed", feedback: "", issues: [], - error: `Review output was not structured JSON. Output starts with: ${raw.slice(0, 500)}`, - }; - } - - parseResearchStatus(raw: string, _structured: string | null): ResearchResult { - const text = unwrapResearchEnvelope(raw); - const lines = text.split("\n"); - for (let i = 0; i < lines.length; i++) { - const line = lines[i]?.trim() ?? ""; - const m = line.match(/^STATUS:\s*([a-z_]+)/i); - if (!m) continue; - const status = m[1].toLowerCase(); - if (status === "completed" || status === "clarification_needed" || status === "failed") { - return { status, body: lines.slice(i + 1).join("\n").trim() }; - } - } - return { status: "failed", body: text }; - } - - extractUsage(raw: string, _structured: string | null): PhaseUsage | null { - if (!raw.trim()) return null; - const envelope = findResultEnvelope(raw); - if (!envelope) return null; - const cost = - typeof envelope.cost_usd === "number" ? envelope.cost_usd - : typeof envelope.total_cost_usd === "number" ? envelope.total_cost_usd - : null; - if (cost === null) return null; - return { - cost_usd: cost, - tokens: null, - duration_ms: typeof envelope.duration_ms === "number" ? envelope.duration_ms : 0, - duration_api_ms: typeof envelope.duration_api_ms === "number" ? envelope.duration_api_ms : 0, - num_turns: typeof envelope.num_turns === "number" ? envelope.num_turns : 0, - }; - } - - // --- private --- - - private async installArthurTracer( - sandbox: RunnableSandbox, - arthur: NonNullable, - ): Promise { - const { logger } = await import("../../lib/logger.js"); - logger.info({ endpoint: arthur.endpoint, taskId: arthur.taskId, agent: this.kind }, "agent_install_arthur_started"); - - const pip = await sandbox.runCommand("bash", [ - "-c", - "python3 -m ensurepip --user && python3 -m pip install --user --quiet 'opentelemetry-sdk>=1.20.0' 'opentelemetry-exporter-otlp-proto-http>=1.20.0'", - ]); - if (pip.exitCode !== 0) { - logger.warn({}, "arthur_pip_install_failed"); - return; - } - - const tracerBytes = Buffer.from(ARTHUR_TRACER_PY_BASE64, "base64"); - await sandbox.writeFiles([{ path: "/tmp/arthur-tracer.py", content: tracerBytes }]); - const mvTracer = await sandbox.runCommand("bash", [ - "-c", - "mkdir -p $HOME/.claude/hooks && mv /tmp/arthur-tracer.py $HOME/.claude/hooks/claude_code_tracer.py && chmod +x $HOME/.claude/hooks/claude_code_tracer.py", - ]); - if (mvTracer.exitCode !== 0) { - logger.warn({}, "arthur_tracer_install_failed"); - return; - } - - const configJson = JSON.stringify( - { api_key: arthur.apiKey, task_id: arthur.taskId, endpoint: arthur.endpoint }, - null, 2, - ); - await sandbox.writeFiles([{ path: "/tmp/arthur_config.json", content: Buffer.from(configJson) }]); - await sandbox.runCommand("bash", [ - "-c", - "mkdir -p $HOME/.claude && mv /tmp/arthur_config.json $HOME/.claude/arthur_config.json && chmod 600 $HOME/.claude/arthur_config.json", - ]); - - await this.mergeSettings(sandbox, { arthur: "install" }); - logger.info({ agent: this.kind }, "agent_install_arthur_complete"); - } - - /** Merge-aware writer for ~/.claude/settings.json. */ - private async mergeSettings( - sandbox: RunnableSandbox, - opts: { commitGuard?: "enable" | "disable"; arthur?: "install" }, - ): Promise { - const arthurEvents = JSON.stringify(ARTHUR_HOOK_EVENTS); - const script = ` - import fs from 'node:fs'; - import path from 'node:path'; - const opts = ${JSON.stringify(opts)}; - const arthurEvents = ${arthurEvents}; - const home = process.env.HOME; - const settingsPath = path.join(home, '.claude', 'settings.json'); - fs.mkdirSync(path.dirname(settingsPath), { recursive: true }); - let s = {}; - try { s = JSON.parse(fs.readFileSync(settingsPath, 'utf8')); } catch {} - s.hooks = s.hooks || {}; - - const upsertHook = (event, matcher, command) => { - const existing = s.hooks[event] || []; - const has = existing.some(e => (e && Array.isArray(e.hooks) ? e.hooks : []).some(h => h && h.command === command)); - if (!has) existing.push({ matcher, hooks: [{ type: 'command', command }] }); - s.hooks[event] = existing; - }; - const removeHook = (event, predicate) => { - const existing = s.hooks[event] || []; - s.hooks[event] = existing - .map(e => ({ ...e, hooks: (e.hooks || []).filter(h => !predicate(h.command || '')) })) - .filter(e => (e.hooks || []).length > 0); - }; - - if (opts.commitGuard === 'enable') upsertHook('Stop', '', 'bash ~/.claude/commit-guard.sh'); - else if (opts.commitGuard === 'disable') removeHook('Stop', c => c.includes('commit-guard.sh')); - - if (opts.arthur === 'install') { - for (const [event, arg] of arthurEvents) { - upsertHook(event, '', 'python3 "$HOME/.claude/hooks/claude_code_tracer.py" ' + arg); - } - } - fs.writeFileSync(settingsPath, JSON.stringify(s, null, 2)); - `; - await sandbox.runCommand("node", ["--input-type=module", "-e", script]); - } -} - -// --- module-private helpers --- - -function shellQuote(val: string): string { - return `'${val.replace(/'/g, "'\\''")}'`; -} - -function findResultEnvelope(raw: string): Record | null { - try { - const obj = JSON.parse(raw); - if (obj && typeof obj === "object" && (obj as any).type === "result") return obj as Record; - } catch { /* not single JSON */ } - const lines = raw.split("\n").filter(Boolean); - for (let i = lines.length - 1; i >= 0; i--) { - try { - const obj = JSON.parse(lines[i]); - if (obj && typeof obj === "object" && (obj as any).type === "result") return obj as Record; - } catch { /* try next */ } - } - return null; -} - -function unwrapResearchEnvelope(raw: string): string { - if (!raw.trim()) return raw; - const env = findResultEnvelope(raw); - if (!env) return raw; - return typeof env.result === "string" ? env.result : raw; -} -``` - -- [ ] **Step 4: Run tests to verify they pass** - -Run: `pnpm vitest run src/sandbox/agents/claude.test.ts` -Expected: PASS. - ---- - -### Task 4: Implement `agents/index.ts` — adapter factory - -**Files:** -- Create: `src/sandbox/agents/index.ts` -- Create: `src/sandbox/agents/index.test.ts` - -- [ ] **Step 1: Write the failing test** - -```ts -// src/sandbox/agents/index.test.ts -import { describe, it, expect } from "vitest"; -import { createAgentAdapter } from "./index.js"; - -describe("createAgentAdapter", () => { - it("returns ClaudeAgentAdapter for kind=claude", () => { - const a = createAgentAdapter("claude"); - expect(a.kind).toBe("claude"); - }); - - it("throws for unknown kinds (forces exhaustive switch updates)", () => { - // @ts-expect-error — runtime guard - expect(() => createAgentAdapter("bogus")).toThrow(); - }); -}); -``` - -Note: the `kind=codex` case is added in Task 14 once `CodexAgentAdapter` exists. - -- [ ] **Step 2: Run the test to verify it fails** - -Run: `pnpm vitest run src/sandbox/agents/index.test.ts` -Expected: FAIL — module does not exist. - -- [ ] **Step 3: Implement the factory** - -```ts -// src/sandbox/agents/index.ts -import { ClaudeAgentAdapter } from "./claude.js"; -import type { AgentAdapter } from "./types.js"; - -export type AgentKind = "claude" | "codex"; - -export function createAgentAdapter(kind: AgentKind): AgentAdapter { - switch (kind) { - case "claude": return new ClaudeAgentAdapter(); - case "codex": - throw new Error("Codex adapter not yet wired (see Task 14)"); - default: { - const _exhaustive: never = kind; - throw new Error(`Unknown AGENT_KIND: ${_exhaustive}`); - } - } -} - -export type { AgentAdapter } from "./types.js"; -``` - -- [ ] **Step 4: Run the test to verify it passes** - -Run: `pnpm vitest run src/sandbox/agents/index.test.ts` -Expected: PASS. - ---- - -### Task 5: Add `collectPhase` helper to `poll-agent.ts` - -**Files:** -- Modify: `src/sandbox/poll-agent.ts` -- Modify: `src/sandbox/poll-agent.test.ts` - -- [ ] **Step 1: Write the failing test (append to `poll-agent.test.ts`)** - -```ts -import { collectPhase } from "./poll-agent.js"; - -describe("collectPhase", () => { - it("returns raw + structured when structuredOutput is set", async () => { - const stdoutText = "ndjson body"; - const structuredText = '{"result":"implemented"}'; - // Mock @vercel/sandbox so Sandbox.get returns a fake with cat - vi.doMock("@vercel/sandbox", () => ({ - Sandbox: { - get: vi.fn().mockResolvedValue({ - runCommand: vi.fn().mockImplementation(async (_, args) => { - const file = args[0]; - const out = - file.includes("stdout") ? stdoutText : - file.includes("structured") || file.endsWith("result.json") ? structuredText : - ""; - return { stdout: async () => out }; - }), - }), - }, - })); - - const result = await collectPhase("sbx-1", { - stdout: "/tmp/impl-stdout.txt", - stderr: "/tmp/impl-stderr.txt", - structuredOutput: "/tmp/impl-result.json", - }); - expect(result.raw).toBe(stdoutText); - expect(result.structured).toBe(structuredText); - }); - - it("returns structured=null when paths.structuredOutput is null", async () => { - vi.doMock("@vercel/sandbox", () => ({ - Sandbox: { - get: vi.fn().mockResolvedValue({ - runCommand: vi.fn().mockResolvedValue({ stdout: async () => "raw text" }), - }), - }, - })); - const r = await collectPhase("sbx-1", { - stdout: "/tmp/impl-stdout.txt", - stderr: "/tmp/impl-stderr.txt", - structuredOutput: null, - }); - expect(r.structured).toBeNull(); - expect(r.raw).toBe("raw text"); - }); - - it("falls back to stderr when stdout is empty", async () => { - vi.doMock("@vercel/sandbox", () => ({ - Sandbox: { - get: vi.fn().mockResolvedValue({ - runCommand: vi.fn().mockImplementation(async (_, args) => ({ - stdout: async () => args[0].includes("stdout") ? "" : "stderr text", - })), - }), - }, - })); - const r = await collectPhase("sbx-1", { - stdout: "/tmp/impl-stdout.txt", - stderr: "/tmp/impl-stderr.txt", - structuredOutput: null, - }); - expect(r.raw).toBe("stderr text"); - }); -}); -``` - -- [ ] **Step 2: Run the test to verify it fails** - -Run: `pnpm vitest run src/sandbox/poll-agent.test.ts` -Expected: FAIL — `collectPhase` not exported. - -- [ ] **Step 3: Implement `collectPhase` in `poll-agent.ts`** - -Append after `collectPhaseOutput`: - -```ts -/** - * Collect raw + (optional) structured phase output. Replaces collectPhaseOutput - * in adapter-aware code paths. - */ -export async function collectPhase( - sandboxId: string, - paths: { stdout: string; stderr: string; structuredOutput: string | null }, -): Promise<{ raw: string; structured: string | null }> { - "use step"; - const { Sandbox } = await import("@vercel/sandbox"); - const sandbox = await Sandbox.get({ sandboxId, ...getSandboxCredentials() }); - - const stdoutResult = await sandbox.runCommand("cat", [paths.stdout]); - const stdoutText = (await stdoutResult.stdout()).trim(); - const stderrResult = await sandbox.runCommand("cat", [paths.stderr]); - const stderrText = (await stderrResult.stdout()).trim(); - const raw = stdoutText || stderrText; - - let structured: string | null = null; - if (paths.structuredOutput) { - const r = await sandbox.runCommand("cat", [paths.structuredOutput]); - const text = (await r.stdout()).trim(); - structured = text || null; - } - return { raw, structured }; -} -``` - -- [ ] **Step 4: Run the tests to verify they pass** - -Run: `pnpm vitest run src/sandbox/poll-agent.test.ts` -Expected: PASS. - ---- - -### Task 6: Slim `SandboxManager` - -**Files:** -- Modify: `src/sandbox/manager.ts` -- Modify: `src/sandbox/manager.test.ts` - -The manager keeps responsibility for the sandbox lifecycle (create, clone, identity, merge, push prep) but delegates anything agent-specific to an injected `AgentAdapter`. After this task: no `GLOBAL_SKILLS`, no `installArthurTracer`, no `configureStopHookInSandbox` exports. - -- [ ] **Step 1: Migrate `manager.test.ts` to the new signature** - -Keep the lifecycle assertions that still belong to the manager (Sandbox.create source, git identity, optional merge-base, pre-agent-sha capture). Move the agent-specific assertions (auth `agent-env.sh`, commit-guard, Arthur, skills install) — they belong to `claude.test.ts` (already added in Task 3) and aren't valid against the slim manager any more. - -Rewrite `src/sandbox/manager.test.ts` so it injects a fake adapter and asserts both the lifecycle and the delegation: - -```ts -// src/sandbox/manager.test.ts (rewritten) -import { describe, it, expect, vi, beforeEach } from "vitest"; - -const mockRunCommand = vi.fn(); -const mockWriteFiles = vi.fn(); -const mockStop = vi.fn(); -const mockStdout = vi.fn(); - -vi.mock("@vercel/sandbox", () => ({ - Sandbox: { - create: vi.fn(() => ({ - sandboxId: "sbx-test-123", - runCommand: mockRunCommand, - writeFiles: mockWriteFiles, - stop: mockStop, - })), - }, -})); - -import { SandboxManager } from "./manager.js"; -import type { AgentAdapter, ConfigureOpts } from "./agents/types.js"; - -const makeFakeAgent = (): AgentAdapter & { calls: any[] } => { - const calls: any[] = []; - return { - kind: "claude", - install: vi.fn(async () => { calls.push({ op: "install" }); }), - configure: vi.fn(async (_, opts: ConfigureOpts) => { calls.push({ op: "configure", opts }); }), - setCommitGuard: vi.fn(async (_s, enabled) => { calls.push({ op: "guard", enabled }); }), - buildPhaseScript: () => "#!/bin/bash\necho noop", - artifactPaths: () => ({ wrapper: "", input: "", stdout: "", stderr: "", sentinel: "", structuredOutput: null }), - parseAgentOutput: () => ({ result: "implemented" }), - parseReviewOutput: () => ({ result: "approved", feedback: "", issues: [] }), - parseResearchStatus: () => ({ status: "completed", body: "" }), - extractUsage: () => null, - calls, - } as any; -}; - -describe("SandboxManager.provision", () => { - beforeEach(() => { - vi.clearAllMocks(); - mockRunCommand.mockResolvedValue({ exitCode: 0, stdout: mockStdout }); - mockStdout.mockResolvedValue(""); - mockWriteFiles.mockResolvedValue(undefined); - }); - - const baseConfig = { - kind: "github" as const, - token: "ghp_test", - repoPath: "test-org/test-repo", - host: "https://github.com", - jobTimeoutMs: 1_800_000, - commitAuthor: "ai-workflow-blazity", - commitEmail: "bot@blazity.com", - }; - - it("creates the sandbox with a git source pointed at the branch", async () => { - const { Sandbox } = await import("@vercel/sandbox"); - const manager = new SandboxManager(baseConfig); - await manager.provision("feat/test-branch", makeFakeAgent(), { model: "any", anthropicApiKey: "k" }); - expect(Sandbox.create).toHaveBeenCalledWith( - expect.objectContaining({ - source: expect.objectContaining({ type: "git", revision: "feat/test-branch" }), - runtime: "node24", - }), - ); - }); - - it("sets git identity to commitAuthor / commitEmail", async () => { - const manager = new SandboxManager(baseConfig); - await manager.provision("feat/test-branch", makeFakeAgent(), { model: "any", anthropicApiKey: "k" }); - const idCall = mockRunCommand.mock.calls.find( - ([cmd, args]) => cmd === "bash" && typeof args[1] === "string" && args[1].includes("git config user.name"), - ); - expect(idCall).toBeDefined(); - expect(idCall![1][1]).toContain("ai-workflow-blazity"); - expect(idCall![1][1]).toContain("bot@blazity.com"); - }); - - it("captures pre-agent HEAD SHA for the push step", async () => { - const manager = new SandboxManager(baseConfig); - await manager.provision("feat/test-branch", makeFakeAgent(), { model: "any", anthropicApiKey: "k" }); - const shaCall = mockRunCommand.mock.calls.find( - ([cmd, args]) => cmd === "bash" && typeof args[1] === "string" && args[1].includes("/tmp/.pre-agent-sha"), - ); - expect(shaCall).toBeDefined(); - }); - - it("calls agent.install then agent.configure with the supplied opts", async () => { - const agent = makeFakeAgent(); - const manager = new SandboxManager(baseConfig); - await manager.provision("feat/test-branch", agent, { - anthropicApiKey: "sk-ant-test", - model: "claude-opus-4-6", - }); - const ops = (agent as any).calls.map((c: any) => c.op); - expect(ops).toEqual(["install", "configure"]); - expect((agent as any).calls[1].opts).toEqual( - expect.objectContaining({ anthropicApiKey: "sk-ant-test", model: "claude-opus-4-6" }), - ); - }); - - it("fetches and merges mergeBase when supplied", async () => { - const manager = new SandboxManager(baseConfig); - await manager.provision("feat/test-branch", makeFakeAgent(), { model: "any", anthropicApiKey: "k" }, "main"); - const fetchCall = mockRunCommand.mock.calls.find( - ([cmd, args]) => cmd === "bash" && typeof args[1] === "string" && args[1].includes("git fetch"), - ); - expect(fetchCall).toBeDefined(); - expect(fetchCall![1][1]).toContain("main"); - }); -}); -``` - -- [ ] **Step 2: Run the test to verify it fails** - -Run: `pnpm vitest run src/sandbox/manager.test.ts` -Expected: FAIL — new signature not in place. - -- [ ] **Step 3: Rewrite `manager.ts` to thin orchestrator** - -Replace the entire file. The new manager has only repo provisioning duties: - -```ts -// src/sandbox/manager.ts -import type { Sandbox as SandboxType } from "@vercel/sandbox"; -import { getSandboxCredentials } from "./credentials.js"; -import type { AgentAdapter, ConfigureOpts } from "./agents/types.js"; - -export interface SandboxConfig { - kind: "github" | "gitlab"; - token: string; - repoPath: string; - host: string; - jobTimeoutMs: number; - commitAuthor: string; - commitEmail: string; -} - -/** Build clone/push URLs for the configured VCS. Unchanged from previous behaviour. */ -export function buildVcsUrls(config: { kind: "github" | "gitlab"; token: string; repoPath: string; host: string }) { - const host = config.host.replace(/\/+$/, ""); - const scheme = host.match(/^https?:\/\//)?.[0] ?? "https://"; - const hostNoScheme = host.replace(/^https?:\/\//, ""); - const authUser = config.kind === "gitlab" ? "oauth2" : "x-access-token"; - return { - cloneUrl: `${host}/${config.repoPath}.git`, - authUrl: `${scheme}${authUser}:${config.token}@${hostNoScheme}/${config.repoPath}.git`, - authUser, - }; -} - -type SandboxInstance = Awaited>; - -export class SandboxManager { - constructor(private config: SandboxConfig) {} - - async provision( - branch: string, - agent: AgentAdapter, - configureOpts: ConfigureOpts, - mergeBase?: string, - ): Promise { - const { Sandbox } = await import("@vercel/sandbox"); - const urls = buildVcsUrls(this.config); - - const sandbox = await Sandbox.create({ - ...getSandboxCredentials(), - source: { - type: "git", - url: urls.cloneUrl, - username: urls.authUser, - password: this.config.token, - revision: branch, - }, - runtime: "node24", - timeout: this.config.jobTimeoutMs, - }); - - // Strip auth from origin - await sandbox.runCommand("git", ["remote", "set-url", "origin", urls.cloneUrl]); - // Re-create the local branch (clone is detached HEAD on a revision) - await sandbox.runCommand("git", ["checkout", "-B", branch]); - // Identity - await sandbox.runCommand("bash", [ - "-c", - `git config user.name "${this.config.commitAuthor}" && git config user.email "${this.config.commitEmail}"`, - ]); - - if (mergeBase) { - const repoUrl = urls.authUrl; - await sandbox.runCommand("bash", ["-c", `git fetch "${repoUrl}" ${mergeBase} 2>&1`]); - await sandbox.runCommand("bash", ["-c", `git branch ${mergeBase} FETCH_HEAD 2>/dev/null || true`]); - const merge = await sandbox.runCommand("bash", ["-c", `git merge FETCH_HEAD --no-edit 2>&1`]); - if (merge.exitCode !== 0) { - const out = (await merge.stdout()).trim(); - const { logger } = await import("../lib/logger.js"); - logger.warn({ mergeBase, exitCode: merge.exitCode, output: out.slice(0, 500) }, "merge_conflicts_during_provision"); - } - } - - // Pre-agent SHA so push step can detect commits - await sandbox.runCommand("bash", ["-c", "git rev-parse HEAD > /tmp/.pre-agent-sha"]); - - // --- Agent-specific work delegated to the adapter --- - await agent.install(sandbox); - await agent.configure(sandbox, configureOpts); - - return sandbox; - } - - async teardown(sandbox: SandboxInstance): Promise { - try { await sandbox.stop(); } catch { /* non-critical */ } - } -} -``` - -- [ ] **Step 4: Run the test to verify it passes** - -Run: `pnpm vitest run src/sandbox/manager.test.ts` -Expected: PASS. - ---- - -### Task 7: Update `src/workflows/agent.ts` to use the adapter - -**Files:** -- Modify: `src/workflows/agent.ts` - -Threaded changes: -- `provisionSandbox` builds the adapter, returns `{ sandboxId, agentKind }` (persist `agentKind` so downstream steps reconstruct via `createAgentAdapter(agentKind)`). -- `configureStopHook` step → `setCommitGuardStep(sandboxId, agentKind, enabled)`. -- `writeAndStartPhase` callers swap to adapter paths + adapter `buildPhaseScript`. -- `collectPhaseOutput` → `collectPhase`. -- `extractUsage`, `parseResearchStatus`, `parseAgentOutput`, `parseReviewOutput`, `unwrapResearchText` calls move to `agent.X(...)`. -- The `import buildPhaseScript from "../sandbox/wrapper-script"` line is deleted. -- The `import { ... } from "../sandbox/agent-runner"` line is deleted. - -- [ ] **Step 1: Replace `provisionSandbox` and `configureStopHook` step bodies** - -```ts -// near top — keep type imports updated -import type { AgentOutput, ReviewOutput, PhaseUsage } from "../sandbox/agents/types.js"; -import type { AgentKind } from "../sandbox/agents/index.js"; - -async function provisionSandbox( - branchName: string, - arthurTaskId: string | null, - mergeBase?: string, -): Promise<{ sandboxId: string; agentKind: AgentKind }> { - "use step"; - const { env, getVcsConfig } = await import("../../env.js"); - const { SandboxManager } = await import("../sandbox/manager.js"); - const { createAgentAdapter } = await import("../sandbox/agents/index.js"); - const vcs = getVcsConfig(); - - if (vcs.kind === "gitlab" && /^\d+$/.test(vcs.repoPath)) { - throw new Error( - `GITLAB_PROJECT_ID must be a namespace/project path (e.g. "group/repo"), ` + - `not a numeric project ID ("${vcs.repoPath}").`, - ); - } - - const arthur = - env.GENAI_ENGINE_API_KEY && env.GENAI_ENGINE_TRACE_ENDPOINT && arthurTaskId - ? { apiKey: env.GENAI_ENGINE_API_KEY, taskId: arthurTaskId, endpoint: env.GENAI_ENGINE_TRACE_ENDPOINT } - : undefined; - - const agentKind: AgentKind = env.AGENT_KIND; // Will be set by Task 11; default 'claude' until then - const agent = createAgentAdapter(agentKind); - - const manager = new SandboxManager({ - kind: vcs.kind, - token: vcs.token, - repoPath: vcs.repoPath, - host: vcs.host, - jobTimeoutMs: env.JOB_TIMEOUT_MS, - commitAuthor: env.COMMIT_AUTHOR, - commitEmail: env.COMMIT_EMAIL, - }); - - const sandbox = await manager.provision(branchName, agent, { - anthropicApiKey: env.ANTHROPIC_API_KEY, - claudeCodeOauthToken: env.CLAUDE_CODE_OAUTH_TOKEN, - codexApiKey: env.CODEX_API_KEY, // unset until Task 11 - codexChatGptOauthToken: env.CODEX_CHATGPT_OAUTH_TOKEN, // unset until Task 11 - model: agentKind === "codex" ? env.CODEX_MODEL : env.CLAUDE_MODEL, - arthur, - }, mergeBase); - - return { sandboxId: sandbox.sandboxId, agentKind }; -} -provisionSandbox.maxRetries = 0; - -async function setCommitGuardStep(sandboxId: string, agentKind: AgentKind, enabled: boolean): Promise { - "use step"; - const { Sandbox } = await import("@vercel/sandbox"); - const { getSandboxCredentials } = await import("../sandbox/credentials.js"); - const { createAgentAdapter } = await import("../sandbox/agents/index.js"); - - const sandbox = await Sandbox.get({ sandboxId, ...getSandboxCredentials() }); - const agent = createAgentAdapter(agentKind); - await agent.setCommitGuard(sandbox, enabled); -} -``` - -Delete the old `configureStopHook` step. - -- [ ] **Step 2: Replace per-phase wiring inside `agentWorkflow`** - -Update the imports at the top of the workflow: - -```ts -const { collectPhase, pushFromSandbox, fixAndRetryPush, teardownSandbox } = - await import("../sandbox/poll-agent.js"); -const { formatUsageReport } = await import("../sandbox/usage.js"); -const { createAgentAdapter } = await import("../sandbox/agents/index.js"); -const { AGENT_SCHEMA, REVIEW_SCHEMA } = await import("../sandbox/agents/types.js"); -``` - -Delete the old imports: -- `buildPhaseScript` from `../sandbox/wrapper-script.js` -- `parseResearchStatus, parseAgentOutput, parseReviewOutput` from `../sandbox/agent-runner.js` -- `extractUsage, unwrapResearchText` from `../sandbox/usage.js` - -Inside `agentWorkflow`, after `provisionSandbox`: - -```ts -const { sandboxId, agentKind } = await provisionSandbox(branchName, arthurTaskId, mergeBase); -await registerTicketSandbox(ticket.identifier, sandboxId); - -const agent = createAgentAdapter(agentKind); // local handle for parsers + buildPhaseScript -``` - -Each phase block changes from "build script with `buildPhaseScript(...)`" to "build script with `agent.buildPhaseScript({ phase, model, paths, jsonSchema })`" and uses `agent.artifactPaths(phase)`. Example for research: - -```ts -// ========== PHASE 1: Research & Plan ========== -await setCommitGuardStep(sandboxId, agentKind, false); - -const researchPaths = agent.artifactPaths("research"); -const researchInput = assembleResearchPlanContext({ /* unchanged */ }); -const researchScript = agent.buildPhaseScript({ - phase: "research", - model: agentKind === "codex" ? env.CODEX_MODEL : env.CLAUDE_MODEL, - paths: researchPaths, -}); - -await writeAndStartPhase( - sandboxId, - researchPaths.input, researchInput, - researchPaths.wrapper, researchScript, -); - -const researchDone = await pollUntilDone(sandboxId, researchPaths.sentinel, 20); -if (!researchDone) { /* same backlog/notify path as before */ } - -const { raw: researchRaw, structured: researchStructured } = - await collectPhase(sandboxId, researchPaths); -phaseUsages["Research"] = agent.extractUsage(researchRaw, researchStructured); -const research = agent.parseResearchStatus(researchRaw, researchStructured); -``` - -Implementation phase block (full replacement): - -```ts -// ========== PHASE 2: Implementation ========== -await setCommitGuardStep(sandboxId, agentKind, true); - -const implPaths = agent.artifactPaths("impl"); -const implInput = assembleImplementationContext({ - ticket: ticketData, - prompt: prompts.implement, - researchPlanMarkdown, - attachments: downloadedAttachments, -}); -const implScript = agent.buildPhaseScript({ - phase: "impl", - model: agentKind === "codex" ? env.CODEX_MODEL : env.CLAUDE_MODEL, - paths: implPaths, - jsonSchema: AGENT_SCHEMA, -}); - -await writeAndStartPhase( - sandboxId, - implPaths.input, implInput, - implPaths.wrapper, implScript, -); - -const implDone = await pollUntilDone(sandboxId, implPaths.sentinel, 35); -let implOutput: AgentOutput; -if (implDone) { - const { raw, structured } = await collectPhase(sandboxId, implPaths); - phaseUsages["Impl"] = agent.extractUsage(raw, structured); - implOutput = agent.parseAgentOutput(raw, structured); -} else { - implOutput = { result: "failed", error: "Implementation phase timed out" }; -} -// (existing branches on implOutput.result are untouched) -``` - -For the disabled review block (lines around `// ========== PHASE 3: Review ==========`), update its structure to mirror the impl block (`agent.artifactPaths("review")`, `agent.buildPhaseScript({ phase: "review", paths, jsonSchema: REVIEW_SCHEMA })`, `agent.parseReviewOutput(raw, structured)`) so re-enabling later is a single comment toggle. - -Replace `setCommitGuardStep(sandboxId, agentKind, true)` everywhere `await configureStopHook(sandboxId, true)` appeared. - -- [ ] **Step 3: Update the usage suffix call site** - -`formatUsageReport` will gain an optional `priceLookup` argument in Task 8. For Phase 1 the workflow keeps the existing single-arg call: `formatUsageReport(phaseUsages)`. - -- [ ] **Step 4: Run the workflow's existing tests** - -Run: `pnpm typecheck && pnpm vitest run src/workflows/prompts-step.test.ts` -Expected: PASS — `prompts-step.test.ts` is untouched and `agent.ts` only changed wiring. - ---- - -### Task 8: Update `usage.ts` for the new PhaseUsage shape - -**Files:** -- Modify: `src/sandbox/usage.ts` -- Modify: `src/sandbox/usage.test.ts` - -The old `extractUsage` and `unwrapResearchText` move into adapters (Task 3), so this file shrinks to `formatUsageReport`. The `PhaseUsage` interface re-exports from `agents/types.ts` for backward-compat imports. - -- [ ] **Step 1: Replace `usage.ts`** - -```ts -// src/sandbox/usage.ts -import type { PhaseUsage } from "./agents/types.js"; -import type { TokenPrice } from "./agents/pricing.js"; // forward-declare; Task 12 creates the file - -export type { PhaseUsage } from "./agents/types.js"; - -export type PriceLookup = (model: string) => TokenPrice | null; - -/** - * Slack-friendly usage line. Computes Codex costs from tokens when a price - * is available; falls back to "cost unknown" for Codex without pricing. - * - * For each phase: - * - cost_usd != null → use it directly (Claude path) - * - tokens != null + priceLookup yields a price → compute cost - * - else → tokens-only, marked "cost unknown" - */ -export function formatUsageReport( - phases: Record, - priceLookup?: PriceLookup, - model?: string, -): string { - const parts: string[] = []; - let totalCost = 0; - let anyUnknown = false; - - for (const [name, usage] of Object.entries(phases)) { - if (!usage) { parts.push(`${name}: n/a`); continue; } - const mins = Math.round(usage.duration_ms / 60_000); - let costLabel: string; - if (usage.cost_usd != null) { - totalCost += usage.cost_usd; - costLabel = `$${usage.cost_usd.toFixed(2)}`; - } else if (usage.tokens && priceLookup && model) { - const price = priceLookup(model); - if (price) { - const cost = usage.tokens.input * price.input - + usage.tokens.cached_input * price.cached_input - + usage.tokens.output * price.output; - totalCost += cost; - costLabel = `$${cost.toFixed(2)}`; - } else { - anyUnknown = true; - costLabel = `${usage.tokens.input}/${usage.tokens.output} tok (cost unknown)`; - } - } else if (usage.tokens) { - anyUnknown = true; - costLabel = `${usage.tokens.input}/${usage.tokens.output} tok (cost unknown)`; - } else { - anyUnknown = true; - costLabel = "cost unknown"; - } - parts.push(`${name}: ${costLabel} (${mins}m)`); - } - - const total = anyUnknown ? `$${totalCost.toFixed(2)}+ total` : `$${totalCost.toFixed(2)} total`; - return `Usage: ${total} | ${parts.join(" | ")}`; -} -``` - -- [ ] **Step 2: Move `extractUsage` and `unwrapResearchText` tests to `agents/claude.test.ts`** (already done in Task 3). - Replace `src/sandbox/usage.test.ts` with `formatUsageReport`-only coverage: - -```ts -// src/sandbox/usage.test.ts -import { describe, it, expect } from "vitest"; -import { formatUsageReport, type PhaseUsage } from "./usage.js"; - -const u = (over: Partial = {}): PhaseUsage => ({ - cost_usd: null, tokens: null, duration_ms: 60_000, duration_api_ms: 30_000, num_turns: 1, ...over, -}); - -describe("formatUsageReport", () => { - it("uses cost_usd when present", () => { - const out = formatUsageReport({ Impl: u({ cost_usd: 1.23 }) }); - expect(out).toContain("$1.23"); - expect(out).toContain("$1.23 total"); - }); - - it("computes cost from tokens + priceLookup when cost_usd is null", () => { - const out = formatUsageReport( - { Impl: u({ tokens: { input: 1000, cached_input: 0, output: 500 } }) }, - () => ({ input: 0.000003, cached_input: 0, output: 0.000015 }), - "gpt-5-codex", - ); - expect(out).toMatch(/\$0\.0[01]/); - expect(out).not.toContain("cost unknown"); - }); - - it("falls back to tokens-only when no price and tokens are present", () => { - const out = formatUsageReport( - { Impl: u({ tokens: { input: 100, cached_input: 0, output: 50 } }) }, - () => null, - "unknown-model", - ); - expect(out).toContain("100/50 tok (cost unknown)"); - expect(out).toContain("+ total"); - }); - - it("shows n/a for null phases", () => { - const out = formatUsageReport({ Impl: null }); - expect(out).toContain("Impl: n/a"); - }); -}); -``` - -- [ ] **Step 3: Run tests** - -Note: `TokenPrice` import will fail until Task 12. Stub it temporarily by adding this line to the top of `usage.ts` (and removing it during Task 12): - -```ts -// remove during Task 12 once agents/pricing.ts exists -type TokenPrice = { input: number; cached_input: number; output: number }; -``` - -Then drop the `import type { TokenPrice } from "./agents/pricing.js"` line. - -Run: `pnpm vitest run src/sandbox/usage.test.ts` -Expected: PASS. - ---- - -### Task 9: Delete `wrapper-script.ts` + tests - -**Files:** -- Delete: `src/sandbox/wrapper-script.ts` -- Delete: `src/sandbox/wrapper-script.test.ts` - -- [ ] **Step 1: Verify no remaining imports** - -Run: `grep -R "wrapper-script" src/ docs/` -Expected: zero matches in `src/` (the spec file is fine). - -- [ ] **Step 2: Delete the files** - -```bash -rm src/sandbox/wrapper-script.ts src/sandbox/wrapper-script.test.ts -``` - -- [ ] **Step 3: Run typecheck + full unit suite** - -Run: `pnpm typecheck && pnpm test` -Expected: PASS. - ---- - -### Task 10: Delete `agent-runner.ts` + tests - -**Files:** -- Delete: `src/sandbox/agent-runner.ts` -- Delete: `src/sandbox/agent-runner.test.ts` - -The schemas and parsers moved to `agents/types.ts` and `agents/claude.ts`; the test cases moved to `agents/claude.test.ts`. Nothing should still import from this file. - -- [ ] **Step 1: Verify no remaining imports** - -Run: `grep -R "from .*sandbox/agent-runner" src/` -Expected: zero matches. - -- [ ] **Step 2: Delete the files** - -```bash -rm src/sandbox/agent-runner.ts src/sandbox/agent-runner.test.ts -``` - -- [ ] **Step 3: Final Phase 1 build** - -Run: `pnpm typecheck && pnpm test` -Expected: PASS — every existing unit test still green; the suite now exercises the adapter abstraction for Claude. - -- [ ] **Step 4: Commit Phase 1** - -```bash -git add src/sandbox/agents src/sandbox/manager.ts src/sandbox/manager.test.ts \ - src/sandbox/poll-agent.ts src/sandbox/poll-agent.test.ts \ - src/sandbox/usage.ts src/sandbox/usage.test.ts \ - src/workflows/agent.ts -git rm src/sandbox/wrapper-script.ts src/sandbox/wrapper-script.test.ts \ - src/sandbox/agent-runner.ts src/sandbox/agent-runner.test.ts -git commit -m "refactor(sandbox): extract Claude logic behind AgentAdapter interface" -``` - ---- - -## Phase 2 — Add the Codex adapter - -### Task 11: Add Codex env vars + cross-field validation - -**Files:** -- Modify: `env.ts` -- Modify: `.env.example` - -- [ ] **Step 1: Extend the schema in `env.ts`** - -Add to the `server` block (place after the existing `CLAUDE_MODEL` line): - -```ts -// Agent kind selection (claude | codex). Defaults to claude for back-compat. -AGENT_KIND: z.enum(["claude", "codex"]).default("claude"), - -// Codex auth — at least one required when AGENT_KIND=codex. -CODEX_API_KEY: z.string().min(1).optional(), -CODEX_CHATGPT_OAUTH_TOKEN: z.string().min(1).optional(), - -// Codex model selection. -CODEX_MODEL: z.string().default("gpt-5-codex"), - -// LiteLLM community-maintained pricing JSON. Operator overridable. -CODEX_PRICING_URL: z - .string() - .url() - .default("https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json"), -CODEX_PRICING_TTL_MS: z.coerce.number().int().positive().default(3_600_000), -``` - -Inside the cross-field block at the bottom of `env.ts`, add the AGENT_KIND guards next to the VCS_KIND check: - -```ts -if (env.AGENT_KIND === "codex" && !env.CODEX_API_KEY && !env.CODEX_CHATGPT_OAUTH_TOKEN) { - throw new Error( - "Invalid environment variables:\n" + - " AGENT_KIND=codex requires CODEX_API_KEY or CODEX_CHATGPT_OAUTH_TOKEN", - ); -} -if (env.AGENT_KIND === "claude" && !env.ANTHROPIC_API_KEY && !env.CLAUDE_CODE_OAUTH_TOKEN) { - throw new Error( - "Invalid environment variables:\n" + - " AGENT_KIND=claude requires ANTHROPIC_API_KEY or CLAUDE_CODE_OAUTH_TOKEN", - ); -} -``` - -- [ ] **Step 2: Update `.env.example`** - -Append (after the existing `CLAUDE_MODEL` block): - -```bash -# Agent — choose runtime (claude | codex). Defaults to claude. -AGENT_KIND=claude - -# Codex (only when AGENT_KIND=codex) -# CODEX_API_KEY= -# CODEX_CHATGPT_OAUTH_TOKEN= # alternative to CODEX_API_KEY -# CODEX_MODEL=gpt-5-codex -# CODEX_PRICING_URL=https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json -# CODEX_PRICING_TTL_MS=3600000 -``` - -- [ ] **Step 3: Verify typecheck** - -Run: `pnpm typecheck` -Expected: PASS. - ---- - -### Task 12: Implement `agents/pricing.ts` - -**Files:** -- Create: `src/sandbox/agents/pricing.ts` -- Create: `src/sandbox/agents/pricing.test.ts` -- Modify: `src/sandbox/usage.ts` (drop the local `TokenPrice` stub from Task 8 and import from pricing) - -- [ ] **Step 1: Write the failing test** - -```ts -// src/sandbox/agents/pricing.test.ts -import { describe, it, expect, vi, beforeEach } from "vitest"; - -const SAMPLE = { - "gpt-5-codex": { - input_cost_per_token: 0.000003, - output_cost_per_token: 0.000015, - cache_read_input_token_cost: 0.0000007, - }, -}; - -describe("fetchModelPrice", () => { - beforeEach(() => { vi.resetModules(); }); - - it("normalises LiteLLM JSON to TokenPrice", async () => { - vi.stubGlobal("fetch", vi.fn().mockResolvedValue({ - ok: true, json: async () => SAMPLE, - })); - const { fetchModelPrice } = await import("./pricing.js"); - const p = await fetchModelPrice("gpt-5-codex"); - expect(p).toEqual({ input: 0.000003, cached_input: 0.0000007, output: 0.000015 }); - }); - - it("returns null on miss", async () => { - vi.stubGlobal("fetch", vi.fn().mockResolvedValue({ ok: true, json: async () => ({}) })); - const { fetchModelPrice } = await import("./pricing.js"); - expect(await fetchModelPrice("unknown")).toBeNull(); - }); - - it("returns null on fetch failure", async () => { - vi.stubGlobal("fetch", vi.fn().mockRejectedValue(new Error("network"))); - const { fetchModelPrice } = await import("./pricing.js"); - expect(await fetchModelPrice("any")).toBeNull(); - }); - - it("caches successful responses within TTL (one fetch for two calls)", async () => { - const fetchMock = vi.fn().mockResolvedValue({ ok: true, json: async () => SAMPLE }); - vi.stubGlobal("fetch", fetchMock); - const { fetchModelPrice } = await import("./pricing.js"); - await fetchModelPrice("gpt-5-codex"); - await fetchModelPrice("gpt-5-codex"); - expect(fetchMock).toHaveBeenCalledTimes(1); - }); -}); -``` - -- [ ] **Step 2: Run the test to verify it fails** - -Run: `pnpm vitest run src/sandbox/agents/pricing.test.ts` -Expected: FAIL — module does not exist. - -- [ ] **Step 3: Implement `pricing.ts`** - -```ts -// src/sandbox/agents/pricing.ts -export interface TokenPrice { - input: number; - cached_input: number; - output: number; -} - -interface CacheEntry { - fetchedAt: number; - data: Record; -} -let cache: CacheEntry | null = null; - -interface LiteLLMEntry { - input_cost_per_token?: number; - output_cost_per_token?: number; - cache_read_input_token_cost?: number; -} - -async function loadAll(): Promise | null> { - const { env } = await import("../../../env.js"); - const ttl = env.CODEX_PRICING_TTL_MS; - if (cache && Date.now() - cache.fetchedAt < ttl) return cache.data; - - try { - const r = await fetch(env.CODEX_PRICING_URL); - if (!r.ok) return null; - const json = await r.json(); - const out: Record = {}; - for (const [name, entry] of Object.entries(json as Record)) { - if (typeof entry !== "object" || entry === null) continue; - const input = entry.input_cost_per_token; - const output = entry.output_cost_per_token; - if (typeof input !== "number" || typeof output !== "number") continue; - out[name] = { - input, - output, - cached_input: typeof entry.cache_read_input_token_cost === "number" - ? entry.cache_read_input_token_cost - : 0, - }; - } - cache = { fetchedAt: Date.now(), data: out }; - return out; - } catch { - return null; - } -} - -export async function fetchModelPrice(model: string): Promise { - const all = await loadAll(); - return all?.[model] ?? null; -} - -/** Test-only: clear the in-memory cache. */ -export function _resetPricingCache(): void { cache = null; } -``` - -- [ ] **Step 4: Run tests to verify they pass** - -Run: `pnpm vitest run src/sandbox/agents/pricing.test.ts` -Expected: PASS. - -- [ ] **Step 5: Drop the local `TokenPrice` stub from `usage.ts`** - -In `src/sandbox/usage.ts`, replace the inline `type TokenPrice` with: - -```ts -import type { TokenPrice } from "./agents/pricing.js"; -export type { TokenPrice }; -``` - -Run: `pnpm typecheck && pnpm vitest run src/sandbox/usage.test.ts` -Expected: PASS. - ---- - -### Task 13: Implement `agents/codex.ts` — Codex adapter - -**Files:** -- Create: `src/sandbox/agents/codex.ts` -- Create: `src/sandbox/agents/codex.test.ts` - -- [ ] **Step 1: Write the failing test** - -```ts -// src/sandbox/agents/codex.test.ts -import { describe, it, expect, vi } from "vitest"; -import { CodexAgentAdapter } from "./codex.js"; - -const adapter = new CodexAgentAdapter(); - -describe("CodexAgentAdapter.parseAgentOutput", () => { - it("prefers structured JSON when valid", () => { - const structured = JSON.stringify({ result: "implemented", summary: "ok" }); - const out = adapter.parseAgentOutput("ignored ndjson", structured); - expect(out.result).toBe("implemented"); - expect(out.summary).toBe("ok"); - }); - - it("falls back to NDJSON item.completed when structured is missing", () => { - const ndjson = [ - JSON.stringify({ type: "thread.started", thread_id: "t" }), - JSON.stringify({ - type: "item.completed", - item: { type: "agent_message", text: '{"result":"implemented","summary":"foo"}' }, - }), - ].join("\n"); - const out = adapter.parseAgentOutput(ndjson, null); - expect(out.result).toBe("implemented"); - expect(out.summary).toBe("foo"); - }); - - it("returns failed when both sources are unparseable", () => { - expect(adapter.parseAgentOutput("not ndjson", null).result).toBe("failed"); - }); -}); - -describe("CodexAgentAdapter.parseResearchStatus", () => { - it("reads STATUS line from structured (free-form text)", () => { - const r = adapter.parseResearchStatus("ndjson irrelevant", "STATUS: completed\n\nbody"); - expect(r.status).toBe("completed"); - expect(r.body).toBe("body"); - }); - - it("falls back to last item.completed text when structured is null", () => { - const ndjson = [ - JSON.stringify({ type: "item.completed", item: { type: "agent_message", text: "STATUS: failed\n\nreason" } }), - ].join("\n"); - const r = adapter.parseResearchStatus(ndjson, null); - expect(r.status).toBe("failed"); - }); -}); - -describe("CodexAgentAdapter.extractUsage", () => { - it("sums usage across multiple turn.completed events", () => { - const ndjson = [ - JSON.stringify({ type: "turn.completed", usage: { input_tokens: 100, output_tokens: 200, cached_input_tokens: 10 } }), - JSON.stringify({ type: "turn.completed", usage: { input_tokens: 50, output_tokens: 75, cached_input_tokens: 5 } }), - ].join("\n"); - const u = adapter.extractUsage(ndjson, null); - expect(u).toEqual({ - cost_usd: null, - tokens: { input: 150, cached_input: 15, output: 275 }, - duration_ms: 0, - duration_api_ms: 0, - num_turns: 2, - }); - }); - - it("returns null when no turn.completed event is present", () => { - expect(adapter.extractUsage("\n", null)).toBeNull(); - }); -}); - -describe("CodexAgentAdapter.buildPhaseScript", () => { - it("research phase uses -o without --output-schema", () => { - const paths = adapter.artifactPaths("research"); - const s = adapter.buildPhaseScript({ phase: "research", model: "gpt-5-codex", paths }); - expect(s).toContain("codex exec"); - expect(s).toContain("--full-auto"); - expect(s).toContain("--skip-git-repo-check"); - expect(s).toContain("--json"); - expect(s).toContain("-o /tmp/research-result.json"); - expect(s).not.toContain("--output-schema"); - }); - - it("impl phase uses --output-schema with a heredoc", () => { - const paths = adapter.artifactPaths("impl"); - const s = adapter.buildPhaseScript({ - phase: "impl", - model: "gpt-5-codex", - paths, - jsonSchema: '{"type":"object"}', - }); - expect(s).toContain("--output-schema /tmp/impl-schema.json"); - expect(s).toContain("'SCHEMA_EOF'"); - }); -}); - -describe("CodexAgentAdapter.artifactPaths", () => { - it("includes structuredOutput pointing at -o file", () => { - expect(adapter.artifactPaths("impl").structuredOutput).toBe("/tmp/impl-result.json"); - }); -}); - -describe("CodexAgentAdapter.setCommitGuard", () => { - it("upserts the Stop hook in ~/.codex/hooks.json when enabled", async () => { - const runCommand = vi.fn().mockResolvedValue({ exitCode: 0 }); - const sandbox = { runCommand, writeFiles: vi.fn() } as any; - await adapter.setCommitGuard(sandbox, true); - const merge = runCommand.mock.calls.find(([cmd, args]) => - cmd === "node" && typeof args[2] === "string" && args[2].includes('"commitGuard":"enable"'), - ); - expect(merge).toBeDefined(); - }); -}); -``` - -- [ ] **Step 2: Run the test to verify it fails** - -Run: `pnpm vitest run src/sandbox/agents/codex.test.ts` -Expected: FAIL — module does not exist. - -- [ ] **Step 3: Implement `codex.ts`** - -```ts -// src/sandbox/agents/codex.ts -import type { - AgentAdapter, AgentOutput, ConfigureOpts, PhaseArtifactPaths, PhaseKind, - PhaseScriptOpts, PhaseUsage, ResearchResult, ReviewOutput, RunnableSandbox, -} from "./types.js"; -import { agentOutputSchema, reviewOutputSchema } from "./types.js"; -import { installSkillsToAgentsDir } from "./shared.js"; -import { ARTHUR_TRACER_PY_BASE64 } from "../arthur-tracer.js"; - -const ARTHUR_HOOK_EVENTS: ReadonlyArray = [ - ["UserPromptSubmit", "user_prompt_submit"], - ["PreToolUse", "pre_tool"], - ["PostToolUse", "post_tool"], - ["Stop", "stop"], -]; - -export class CodexAgentAdapter implements AgentAdapter { - readonly kind = "codex" as const; - - async install(sandbox: RunnableSandbox): Promise { - await sandbox.runCommand("npm", ["install", "-g", "@openai/codex"]); - } - - async configure(sandbox: RunnableSandbox, opts: ConfigureOpts): Promise { - if (!opts.codexApiKey && !opts.codexChatGptOauthToken) { - throw new Error("CodexAgentAdapter.configure requires codexApiKey or codexChatGptOauthToken"); - } - - // 1) auth env file - const envLines: string[] = []; - if (opts.codexApiKey) envLines.push(`export CODEX_API_KEY=${shellQuote(opts.codexApiKey)}`); - else if (opts.codexChatGptOauthToken) envLines.push(`export CODEX_CHATGPT_OAUTH_TOKEN=${shellQuote(opts.codexChatGptOauthToken)}`); - await sandbox.writeFiles([{ path: "/tmp/agent-env.sh", content: Buffer.from(envLines.join("\n") + "\n") }]); - await sandbox.runCommand("chmod", ["600", "/tmp/agent-env.sh"]); - - // 2) ~/.codex/config.toml — minimal model + sandbox profile - const configToml = [ - `model = "${opts.model}"`, - `approval_policy = "never"`, - `sandbox_mode = "workspace-write"`, - ].join("\n") + "\n"; - await sandbox.writeFiles([{ path: "/tmp/config.toml", content: Buffer.from(configToml) }]); - await sandbox.runCommand("bash", ["-c", "mkdir -p $HOME/.codex && mv /tmp/config.toml $HOME/.codex/config.toml"]); - - // 3) skills (~/.agents/skills is Codex's native scope) - await installSkillsToAgentsDir(sandbox); - - // 4) commit-guard script (Codex flavour, JSON-on-stdout) - await this.writeCommitGuardScript(sandbox); - - // 5) Arthur tracer (no-op if unconfigured) - if (opts.arthur) await this.installArthurTracer(sandbox, opts.arthur); - } - - async setCommitGuard(sandbox: RunnableSandbox, enabled: boolean): Promise { - await this.writeCommitGuardScript(sandbox); // idempotent - await this.mergeHooks(sandbox, { commitGuard: enabled ? "enable" : "disable" }); - } - - buildPhaseScript(opts: PhaseScriptOpts): string { - const { paths, jsonSchema, model, phase } = opts; - - const flags: string[] = [ - `--model "${model}"`, - `--full-auto`, - `--skip-git-repo-check`, - `--json`, - `-o ${paths.structuredOutput}`, - ]; - - let schemaBlock = ""; - if (jsonSchema) { - const escapedSchema = jsonSchema.replace(/'/g, "'\\''"); - schemaBlock = [ - `cat > /tmp/${phase}-schema.json << 'SCHEMA_EOF'`, - escapedSchema, - "SCHEMA_EOF", - ].join("\n"); - flags.push(`--output-schema /tmp/${phase}-schema.json`); - } - - return `#!/bin/bash - -# --- Cleanup stale files --- -rm -f ${paths.sentinel} ${paths.stdout} ${paths.stderr} ${paths.structuredOutput} - -# --- Source auth env vars --- -[ -f /tmp/agent-env.sh ] && source /tmp/agent-env.sh - -${schemaBlock} - -# --- Phase: ${phase} --- -cat ${paths.input} | codex exec \\ - ${flags.join(" \\\n ")} \\ - - \\ - > ${paths.stdout} 2> ${paths.stderr}; echo $? > /tmp/${phase}-exit-code || true - -# --- Cleanup --- -cd /vercel/sandbox -rm -rf .codex/ -git checkout -- .codex/ 2>/dev/null || true - -touch ${paths.sentinel} -`; - } - - artifactPaths(phase: PhaseKind): PhaseArtifactPaths { - return { - wrapper: `/tmp/${phase}-wrapper.sh`, - input: `/tmp/${phase}-requirements.md`, - stdout: `/tmp/${phase}-stdout.txt`, - stderr: `/tmp/${phase}-stderr.txt`, - sentinel: `/tmp/${phase}-done`, - structuredOutput: `/tmp/${phase}-result.json`, - }; - } - - parseAgentOutput(raw: string, structured: string | null): AgentOutput { - if (structured) { - try { - const parsed = agentOutputSchema.safeParse(JSON.parse(structured)); - if (parsed.success) return parsed.data; - } catch { /* fall through */ } - } - const text = unwrapLastItemCompleted(raw); - if (text) { - try { - const parsed = agentOutputSchema.safeParse(JSON.parse(text)); - if (parsed.success) return parsed.data; - } catch { /* fall through */ } - } - if (!raw.trim() && !structured) { - return { result: "failed", error: "Codex produced no output" }; - } - return { - result: "failed", - error: `Codex output unparseable. First 500: ${(structured ?? raw).slice(0, 500)}`, - }; - } - - parseReviewOutput(raw: string, structured: string | null): ReviewOutput { - if (structured) { - try { - const parsed = reviewOutputSchema.safeParse(JSON.parse(structured)); - if (parsed.success) return parsed.data; - } catch { /* fall through */ } - } - const text = unwrapLastItemCompleted(raw); - if (text) { - try { - const parsed = reviewOutputSchema.safeParse(JSON.parse(text)); - if (parsed.success) return parsed.data; - } catch { /* fall through */ } - } - return { - result: "failed", feedback: "", issues: [], - error: `Codex review output unparseable. First 500: ${(structured ?? raw).slice(0, 500)}`, - }; - } - - parseResearchStatus(raw: string, structured: string | null): ResearchResult { - const text = (structured ?? unwrapLastItemCompleted(raw) ?? raw).trim(); - const lines = text.split("\n"); - for (let i = 0; i < lines.length; i++) { - const m = (lines[i] ?? "").trim().match(/^STATUS:\s*([a-z_]+)/i); - if (!m) continue; - const status = m[1].toLowerCase(); - if (status === "completed" || status === "clarification_needed" || status === "failed") { - return { status, body: lines.slice(i + 1).join("\n").trim() }; - } - } - return { status: "failed", body: text }; - } - - extractUsage(raw: string, _structured: string | null): PhaseUsage | null { - if (!raw.trim()) return null; - let input = 0, cached = 0, output = 0, turns = 0; - for (const line of raw.split("\n")) { - if (!line.trim()) continue; - try { - const event = JSON.parse(line); - if (event?.type === "turn.completed" && event.usage) { - input += numOr0(event.usage.input_tokens); - cached += numOr0(event.usage.cached_input_tokens); - output += numOr0(event.usage.output_tokens); - turns += 1; - } - } catch { /* ignore non-JSON lines */ } - } - if (turns === 0) return null; - return { - cost_usd: null, - tokens: { input, cached_input: cached, output }, - duration_ms: 0, - duration_api_ms: 0, - num_turns: turns, - }; - } - - // --- private helpers --- - - private async writeCommitGuardScript(sandbox: RunnableSandbox): Promise { - await sandbox.runCommand("bash", [ - "-c", - [ - "mkdir -p ~/.codex/hooks", - "cat > ~/.codex/hooks/commit-guard.sh << 'SCRIPT'", - "#!/bin/bash", - "input=$(cat)", - `if echo "$input" | grep -q '"already_blocked":true'; then echo '{"continue": true}'; exit 0; fi`, - `changes=$(git status --porcelain | grep -v '^.. \\.codex/' | grep -v '^?? \\.codex/')`, - `if [ -n "$changes" ]; then`, - ` printf '{"continue": false, "stopReason": "You have uncommitted changes. Commit them with a descriptive message or revert before stopping."}\\n'`, - " exit 0", - "fi", - `echo '{"continue": true}'`, - "SCRIPT", - "chmod +x ~/.codex/hooks/commit-guard.sh", - ].join("\n"), - ]); - } - - private async mergeHooks( - sandbox: RunnableSandbox, - opts: { commitGuard?: "enable" | "disable"; arthur?: "install" }, - ): Promise { - const arthurEvents = JSON.stringify(ARTHUR_HOOK_EVENTS); - const script = ` - import fs from 'node:fs'; - import path from 'node:path'; - const opts = ${JSON.stringify(opts)}; - const arthurEvents = ${arthurEvents}; - const home = process.env.HOME; - const cfgPath = path.join(home, '.codex', 'hooks.json'); - fs.mkdirSync(path.dirname(cfgPath), { recursive: true }); - let s = {}; - try { s = JSON.parse(fs.readFileSync(cfgPath, 'utf8')); } catch {} - s.hooks = s.hooks || {}; - - const upsert = (event, command) => { - const arr = s.hooks[event] || []; - const has = arr.some(e => e && e.command === command); - if (!has) arr.push({ type: 'command', command }); - s.hooks[event] = arr; - }; - const remove = (event, predicate) => { - const arr = s.hooks[event] || []; - s.hooks[event] = arr.filter(e => !predicate(e?.command || '')); - }; - - if (opts.commitGuard === 'enable') upsert('Stop', 'bash ~/.codex/hooks/commit-guard.sh'); - else if (opts.commitGuard === 'disable') remove('Stop', c => c.includes('commit-guard.sh')); - - if (opts.arthur === 'install') { - for (const [event, arg] of arthurEvents) { - upsert(event, 'python3 "$HOME/.codex/hooks/claude_code_tracer.py" ' + arg); - } - } - fs.writeFileSync(cfgPath, JSON.stringify(s, null, 2)); - `; - await sandbox.runCommand("node", ["--input-type=module", "-e", script]); - } - - private async installArthurTracer( - sandbox: RunnableSandbox, - arthur: NonNullable, - ): Promise { - const { logger } = await import("../../lib/logger.js"); - logger.info({ endpoint: arthur.endpoint, taskId: arthur.taskId, agent: this.kind }, "agent_install_arthur_started"); - - const pip = await sandbox.runCommand("bash", [ - "-c", - "python3 -m ensurepip --user && python3 -m pip install --user --quiet 'opentelemetry-sdk>=1.20.0' 'opentelemetry-exporter-otlp-proto-http>=1.20.0'", - ]); - if (pip.exitCode !== 0) { logger.warn({}, "arthur_pip_install_failed"); return; } - - const tracerBytes = Buffer.from(ARTHUR_TRACER_PY_BASE64, "base64"); - await sandbox.writeFiles([{ path: "/tmp/arthur-tracer.py", content: tracerBytes }]); - const mvTracer = await sandbox.runCommand("bash", [ - "-c", - "mkdir -p $HOME/.codex/hooks && mv /tmp/arthur-tracer.py $HOME/.codex/hooks/claude_code_tracer.py && chmod +x $HOME/.codex/hooks/claude_code_tracer.py", - ]); - if (mvTracer.exitCode !== 0) { logger.warn({}, "arthur_tracer_install_failed"); return; } - - const configJson = JSON.stringify( - { api_key: arthur.apiKey, task_id: arthur.taskId, endpoint: arthur.endpoint }, null, 2, - ); - await sandbox.writeFiles([{ path: "/tmp/arthur_config.json", content: Buffer.from(configJson) }]); - await sandbox.runCommand("bash", [ - "-c", - "mkdir -p $HOME/.codex && mv /tmp/arthur_config.json $HOME/.codex/arthur_config.json && chmod 600 $HOME/.codex/arthur_config.json", - ]); - - await this.mergeHooks(sandbox, { arthur: "install" }); - logger.info({ agent: this.kind }, "agent_install_arthur_complete"); - } -} - -// --- module-private helpers --- - -function shellQuote(val: string): string { - return `'${val.replace(/'/g, "'\\''")}'`; -} - -function numOr0(x: unknown): number { return typeof x === "number" ? x : 0; } - -/** Walk Codex NDJSON in reverse for the last `item.completed` event with assistant text. */ -function unwrapLastItemCompleted(raw: string): string | null { - if (!raw.trim()) return null; - const lines = raw.split("\n"); - for (let i = lines.length - 1; i >= 0; i--) { - const line = lines[i]; - if (!line || !line.trim()) continue; - try { - const event = JSON.parse(line); - if (event?.type === "item.completed" && event.item) { - if (typeof event.item.text === "string") return event.item.text; - if (typeof event.item.content === "string") return event.item.content; - } - } catch { /* not JSON */ } - } - return null; -} -``` - -- [ ] **Step 4: Run the tests to verify they pass** - -Run: `pnpm vitest run src/sandbox/agents/codex.test.ts` -Expected: PASS. - ---- - -### Task 14: Wire Codex into the factory - -**Files:** -- Modify: `src/sandbox/agents/index.ts` -- Modify: `src/sandbox/agents/index.test.ts` - -- [ ] **Step 1: Update the test** - -```ts -// add to src/sandbox/agents/index.test.ts -it("returns CodexAgentAdapter for kind=codex", () => { - const a = createAgentAdapter("codex"); - expect(a.kind).toBe("codex"); -}); -``` - -- [ ] **Step 2: Run to verify it fails** - -Run: `pnpm vitest run src/sandbox/agents/index.test.ts` -Expected: FAIL — current factory throws for codex. - -- [ ] **Step 3: Wire Codex** - -```ts -// src/sandbox/agents/index.ts -import { ClaudeAgentAdapter } from "./claude.js"; -import { CodexAgentAdapter } from "./codex.js"; -import type { AgentAdapter } from "./types.js"; - -export type AgentKind = "claude" | "codex"; - -export function createAgentAdapter(kind: AgentKind): AgentAdapter { - switch (kind) { - case "claude": return new ClaudeAgentAdapter(); - case "codex": return new CodexAgentAdapter(); - default: { - const _exhaustive: never = kind; - throw new Error(`Unknown AGENT_KIND: ${_exhaustive}`); - } - } -} - -export type { AgentAdapter } from "./types.js"; -``` - -- [ ] **Step 4: Run to verify both cases pass** - -Run: `pnpm vitest run src/sandbox/agents/index.test.ts` -Expected: PASS. - ---- - -### Task 15: Thread Codex pricing into `formatUsageReport` - -**Files:** -- Modify: `src/workflows/agent.ts` - -The workflow now resolves a price for the active model once per run and passes it as a closure to `formatUsageReport`. - -- [ ] **Step 1: Update the usage suffix construction** - -In `agentWorkflow`, replace `formatUsageReport(phaseUsages)` call sites with a helper that resolves price once per run: - -```ts -// add inside agentWorkflow, near the top of the try block -const activeModel = agentKind === "codex" ? env.CODEX_MODEL : env.CLAUDE_MODEL; -const priceCache = await (async () => { - if (agentKind !== "codex") return null; - const { fetchModelPrice } = await import("../sandbox/agents/pricing.js"); - try { - return await fetchModelPrice(activeModel); - } catch (err) { - const { logger } = await import("../lib/logger.js"); - logger.warn({ err: (err as Error).message, model: activeModel }, "pricing_fetch_failed"); - return null; - } -})(); - -const priceLookup = priceCache ? () => priceCache : undefined; - -const usageSuffix = () => - Object.keys(phaseUsages).length - ? `\n${formatUsageReport(phaseUsages, priceLookup, activeModel)}` - : ""; -``` - -Replace any other direct `formatUsageReport(phaseUsages)` calls in the file with `formatUsageReport(phaseUsages, priceLookup, activeModel)`. - -- [ ] **Step 2: Typecheck + run unit suite** - -Run: `pnpm typecheck && pnpm test` -Expected: PASS. - -- [ ] **Step 3: Commit Phase 2** - -```bash -git add env.ts .env.example src/sandbox/agents/codex.ts src/sandbox/agents/codex.test.ts \ - src/sandbox/agents/pricing.ts src/sandbox/agents/pricing.test.ts \ - src/sandbox/agents/index.ts src/sandbox/agents/index.test.ts \ - src/sandbox/usage.ts src/workflows/agent.ts -git commit -m "feat(sandbox): add Codex agent adapter with pricing-aware usage reports" -``` - ---- - -## Phase 3 — Codex E2E - -### Task 16: Add gated `e2e/codex-tier-1.test.ts` - -**Files:** -- Create: `e2e/codex-tier-1.test.ts` -- Modify: `e2e/vitest.e2e.config.ts` (new project entry) - -The e2e provisions a sandbox with `AGENT_KIND=codex`, runs the impl phase against a tiny seeded ticket, and asserts a commit + PR. It is skipped unless `CODEX_API_KEY` is set in the environment. - -- [ ] **Step 1: Add the test file** - -```ts -// e2e/codex-tier-1.test.ts -import { describe, it, expect, afterAll } from "vitest"; -import { - createTestTicket, - moveTicketToColumn, - getTicketStatus, - deleteTicket, -} from "./helpers/jira.js"; -import { findPR, deleteBranch } from "./helpers/github.js"; -import { cleanup as redisCleanup } from "./helpers/redis.js"; -import { stopSandboxesForTicket } from "./helpers/sandbox.js"; -import { waitFor } from "./helpers/wait.js"; -import { e2eEnv } from "./env.js"; - -const HAVE_CODEX = Boolean(process.env.CODEX_API_KEY); -const guard = HAVE_CODEX ? describe : describe.skip; - -guard("Codex Tier-1: clear ticket → PR via codex exec", () => { - let ticketKey: string; - let branchName: string; - - afterAll(async () => { - if (ticketKey) await stopSandboxesForTicket(ticketKey).catch(() => {}); - if (branchName) await deleteBranch(branchName).catch(() => {}); - if (ticketKey) { - await redisCleanup(ticketKey); - await deleteTicket(ticketKey); - } - }); - - it("provisions a Codex sandbox, commits, and opens a PR", async () => { - // Sanity — the harness must already have AGENT_KIND=codex set in process.env - expect(process.env.AGENT_KIND).toBe("codex"); - - const ticket = await createTestTicket({ - summary: "[E2E codex] Add GET /api/health endpoint", - description: [ - "Create a GET /api/health route that returns JSON { status: \"ok\" } with HTTP 200.", - "Acceptance:", - "- Route file at app/api/health/route.ts", - "- Returns { status: \"ok\" } with HTTP 200", - ].join("\n"), - }); - ticketKey = ticket.ticketKey; - branchName = `blazebot/${ticketKey.toLowerCase()}`; - - await moveTicketToColumn(ticketKey, e2eEnv.COLUMN_AI); - - // Wait for the workflow to push a commit and open the PR. - const pr = await waitFor(async () => findPR(branchName), { timeoutMs: 30 * 60_000, intervalMs: 30_000 }); - expect(pr).not.toBeNull(); - - // Ticket should land in AI Review. - await waitFor(async () => { - const s = await getTicketStatus(ticketKey); - return s === e2eEnv.COLUMN_AI_REVIEW ? s : null; - }, { timeoutMs: 5 * 60_000 }); - }); -}); -``` - -- [ ] **Step 2: Add a `codex` project entry to `e2e/vitest.e2e.config.ts`** - -Append inside `projects: [...]`: - -```ts -{ - test: { - name: "codex", - include: ["e2e/codex-tier-1.test.ts"], - testTimeout: 4_200_000, - hookTimeout: 4_200_000, - }, -}, -``` - -- [ ] **Step 3: Add a script to `package.json`** - -```json -"test:e2e:codex": "AGENT_KIND=codex vitest run --config e2e/vitest.e2e.config.ts --project codex" -``` - -- [ ] **Step 4: Validate manually** - -With `CODEX_API_KEY` set in `.env.e2e` (and `AGENT_KIND=codex` for that run): - -Run: `pnpm test:e2e:codex` -Expected: PASS — sandbox provisions, the impl phase commits a change, the PR is created. (Manual; do not gate the regular CI on it.) - -- [ ] **Step 5: Commit Phase 3** - -```bash -git add e2e/codex-tier-1.test.ts e2e/vitest.e2e.config.ts package.json -git commit -m "test(e2e): add gated Tier-1 Codex agent run" -``` - ---- - -## Phase 4 — Documentation - -### Task 17: README + .env.example final pass - -**Files:** -- Modify: `README.md` - -- [ ] **Step 1: Add an "Agent" subsection to README** - -Insert after the existing "Agent" block in `### 3. Configure environment variables`: - -```md -**Switching agents** — Blazebot supports two CLI runtimes. Set `AGENT_KIND` once per deployment: - -```bash -AGENT_KIND=claude # default — Anthropic Claude Code -# or -AGENT_KIND=codex # OpenAI Codex CLI -``` - -When `AGENT_KIND=codex`: - -```bash -CODEX_API_KEY=sk-codex-xxxxxxxxxxxx # or CODEX_CHATGPT_OAUTH_TOKEN -CODEX_MODEL=gpt-5-codex # default -``` - -Pricing is fetched from [LiteLLM's community-maintained JSON](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json) on each cold start (1h TTL by default). Override `CODEX_PRICING_URL` in airgapped environments. When pricing is unavailable, Slack reports show tokens-only with `cost unknown`. -``` - -- [ ] **Step 2: Update the Environment Variables Reference table** - -Add rows for `AGENT_KIND`, `CODEX_API_KEY`, `CODEX_CHATGPT_OAUTH_TOKEN`, `CODEX_MODEL`, `CODEX_PRICING_URL`, `CODEX_PRICING_TTL_MS` matching the existing table style. - -- [ ] **Step 3: Confirm `.env.example` already covers these** (set up in Task 11). If not, add the same block now. - -- [ ] **Step 4: Commit Phase 4** - -```bash -git add README.md -git commit -m "docs: document AGENT_KIND switching and Codex pricing source" -``` - ---- - -## Pre-implementation verifications (first 30 minutes) - -These three checks belong at the start of Phase 2 (before Task 11) — they're not blocking, but a fast-fail saves time downstream. Each check is its own discrete sub-task: - -1. **LiteLLM JSON reachable + has `gpt-5-codex`.** - ```bash - curl -fsSL https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json | jq '."gpt-5-codex"' - ``` - Expected: an object with `input_cost_per_token` and `output_cost_per_token`. If missing, log `gpt-5` and `gpt-5-mini` for fallback documentation in the README. - -2. **`skills` CLI accepts `--target`.** Inside a throwaway sandbox or local node: - ```bash - npx -y skills add https://github.com/obra/superpowers --skill using-superpowers --yes --target /tmp/skills-target-test - test -d /tmp/skills-target-test/using-superpowers - ``` - Expected: directory exists. If the flag is unsupported, switch the install in `agents/shared.ts` to `--global` for `~/.claude/skills` and rewrite Codex's symlink as `~/.agents/skills → ~/.claude/skills` instead. - -3. **`codex --output-schema` validation behaviour.** Inside a Codex sandbox: - ```bash - echo "say hi" | codex exec --json --output-schema /tmp/strict-schema.json -o /tmp/r.json - - ``` - With a deliberately strict schema. Confirm: validation failure does not crash the run — Codex emits `error` events and `r.json` is missing/empty. Adapter parsers already fall back to NDJSON `item.completed`; if Codex actually crashes, parsers must also handle the `error` event. Add a test for that path if observed. - ---- - -## Net Change Summary - -- **New files:** `src/sandbox/agents/{types,claude,codex,shared,index,pricing}.ts` plus tests, `e2e/codex-tier-1.test.ts`. -- **Modified:** `src/sandbox/manager.ts`, `src/sandbox/manager.test.ts`, `src/sandbox/poll-agent.ts`, `src/sandbox/poll-agent.test.ts`, `src/sandbox/usage.ts`, `src/sandbox/usage.test.ts`, `src/workflows/agent.ts`, `env.ts`, `.env.example`, `README.md`, `e2e/vitest.e2e.config.ts`, `package.json`. -- **Deleted:** `src/sandbox/wrapper-script.ts`, `src/sandbox/wrapper-script.test.ts`, `src/sandbox/agent-runner.ts`, `src/sandbox/agent-runner.test.ts`. -- **Untouched:** every adapter under `src/adapters/`, every helper under `src/lib/`, every route under `src/routes/`, `src/workflows/prompts-step.ts`, all run-registry / reconcile / dispatch / Jira webhook / cron / attachments / Arthur client code. diff --git a/docs/superpowers/plans/2026-04-30-slack-threaded-messages.md b/docs/superpowers/plans/2026-04-30-slack-threaded-messages.md deleted file mode 100644 index be35954..0000000 --- a/docs/superpowers/plans/2026-04-30-slack-threaded-messages.md +++ /dev/null @@ -1,1405 +0,0 @@ -# Slack Threaded Messages Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Group all per-ticket Slack notifications under one lifetime thread per ticket, and replace inline message strings with structured `TicketEvent`s rendered to Slack-mrkdwn with clickable Jira/PR links. - -**Architecture:** Replace `MessagingAdapter.notify(message)` with `notifyForTicket(ticketKey, event)`. The adapter becomes "smart" — it formats events to wire text, looks up the ticket's parent message id from a new `ThreadStore`, posts as a thread reply when one exists (top-level otherwise), and records the parent on the first `started` event. `UpstashRunRegistry` implements `ThreadStore` via a new redis hash `blazebot:thread-parents:{ENV}`. Threading is by-message via `ThreadImpl` from the `chat` package: top-level posts use the existing `chat.channel(...).post(...)` path; thread replies construct a `ThreadImpl({ id: "slack:CHANNEL:PARENT_TS", ... })` and call `.post(...)` on it. - -**Tech Stack:** TypeScript / Node 24 / Vercel Workflow / Upstash Redis (`@upstash/redis`) / `chat` + `@chat-adapter/slack` / Vitest - -**Source spec:** `docs/superpowers/specs/2026-04-30-slack-threaded-messages-design.md`. - ---- - -## File Structure - -```text -src/ -├── adapters/ -│ ├── messaging/ -│ │ ├── types.ts # MODIFIED — replace notify() with notifyForTicket(); add TicketEvent + ThreadStore -│ │ ├── format.ts # NEW — formatTicketEvent(event, ticketKey, jiraBaseUrl): string -│ │ ├── format.test.ts # NEW — pure-function tests for each event variant -│ │ ├── chatsdk.ts # MODIFIED — drop notify(), add notifyForTicket() with threading -│ │ └── chatsdk.test.ts # REWRITTEN — covers threading, fallback, formatter integration -│ └── run-registry/ -│ ├── types.ts # MODIFIED — add ThreadStore interface (separate from RunRegistryAdapter) -│ ├── upstash.ts # MODIFIED — implement ThreadStore methods + new THREAD_HASH_KEY -│ └── upstash.test.ts # MODIFIED — add ThreadStore round-trip tests -├── lib/ -│ ├── adapters.ts # MODIFIED — pass jiraBaseUrl + threadStore (= runRegistry) into ChatSDKAdapter -│ └── step-adapters.ts # MODIFIED — same change as adapters.ts -├── routes/ -│ ├── cron/poll.get.ts # MODIFIED — replace notify() call at L23 with notifyForTicket(...) -│ └── webhooks/jira.post.ts # MODIFIED — replace notify() call at L110 with notifyForTicket(...) -└── workflows/ - └── agent.ts # MODIFIED — replace notifySlack(string) step with notifyTicket(key, event) -``` - -**File-responsibility split:** - -- `format.ts` owns event-to-string conversion. It's pure (no side-effects, no config beyond inputs) so it tests in isolation and the adapter's tests don't need to assert on full text. -- `chatsdk.ts` owns Slack I/O: thread store reads, post, missing-parent recovery, parent-set on first `started`. -- `ThreadStore` is split from `RunRegistryAdapter` even though the same class implements both — the messaging adapter shouldn't see (or be able to call) `markFailed` etc. when all it needs is parent-message lookup. - ---- - -## Phase 1 — ThreadStore on the run registry - -Goal: ship the redis-backed parent-message store, fully tested, before touching messaging. This task block is mergeable on its own — nothing reads from the new hash yet. - -### Task 1: Add `ThreadStore` interface - -**Files:** -- Modify: `src/adapters/run-registry/types.ts` - -- [ ] **Step 1: Append the ThreadStore interface to the existing types file** - -Append this to the bottom of `src/adapters/run-registry/types.ts` (do not modify the existing `RunRegistryAdapter`): - -```ts -/** - * Per-ticket Slack thread parent store. Implemented alongside RunRegistryAdapter - * by UpstashRunRegistry, but exposed as a separate interface so the messaging - * adapter only depends on the slice it needs. - * - * Lifetime: an entry survives across multiple workflow runs for the same - * ticket. unregister(ticketKey) does NOT clear it — see clearParent(). - */ -export interface ThreadStore { - /** Returns the Slack message id (timestamp) anchoring this ticket's thread, or null. */ - getParent(ticketKey: string): Promise; - /** Records the message id as the parent for this ticket. Overwrites any prior value. */ - setParent(ticketKey: string, messageId: string): Promise; - /** Removes the entry. Used after Slack reports the parent message no longer exists. */ - clearParent(ticketKey: string): Promise; -} -``` - -- [ ] **Step 2: Verify type-checks pass** - -Run: `pnpm tsc --noEmit` -Expected: no new type errors. (No callers exist yet, so this is just a syntax/types sanity check.) - -- [ ] **Step 3: Commit** - -```bash -git add src/adapters/run-registry/types.ts -git commit -m "feat(run-registry): add ThreadStore interface" -``` - ---- - -### Task 2: Implement `ThreadStore` on `UpstashRunRegistry` - -**Files:** -- Modify: `src/adapters/run-registry/upstash.ts` - -- [ ] **Step 1: Add the new hash key constant** - -In `src/adapters/run-registry/upstash.ts`, add a new constant alongside the existing `*_HASH_KEY` lines (immediately after `ENTRY_TS_HASH_KEY`): - -```ts -const THREAD_HASH_KEY = `blazebot:thread-parents:${ENV_PREFIX}`; -``` - -- [ ] **Step 2: Add `ThreadStore` to the class's implements clause and import** - -Update the class declaration: - -```ts -import type { RunRegistryAdapter, FailedTicketMeta, ThreadStore } from "./types.js"; -``` - -```ts -export class UpstashRunRegistry implements RunRegistryAdapter, ThreadStore { -``` - -- [ ] **Step 3: Implement the three ThreadStore methods** - -Append these methods to `UpstashRunRegistry` (after `clearFailedMark`, before the closing `}`): - -```ts - async getParent(ticketKey: string): Promise { - return this.redis.hget(THREAD_HASH_KEY, ticketKey); - } - - async setParent(ticketKey: string, messageId: string): Promise { - await this.redis.hset(THREAD_HASH_KEY, { [ticketKey]: messageId }); - // Defend against any external TTL — the thread mapping must outlive runs. - await this.redis.persist(THREAD_HASH_KEY); - } - - async clearParent(ticketKey: string): Promise { - await this.redis.hdel(THREAD_HASH_KEY, ticketKey); - } -``` - -- [ ] **Step 4: Verify `unregister` does NOT touch the thread hash** - -Read `unregister` in the same file. It should still only delete from `HASH_KEY`, `SANDBOX_HASH_KEY`, and `ENTRY_TS_HASH_KEY`. Do not modify it. - -- [ ] **Step 5: Type-check** - -Run: `pnpm tsc --noEmit` -Expected: no errors. - -- [ ] **Step 6: Commit** - -```bash -git add src/adapters/run-registry/upstash.ts -git commit -m "feat(run-registry): implement ThreadStore on UpstashRunRegistry" -``` - ---- - -### Task 3: Tests for `ThreadStore` on `UpstashRunRegistry` - -**Files:** -- Modify: `src/adapters/run-registry/upstash.test.ts` - -- [ ] **Step 1: Add the THREAD_HASH_KEY constant near the existing FAILED_HASH_KEY one** - -In `src/adapters/run-registry/upstash.test.ts`, near the bottom describe blocks (before `describe("markFailed", ...)` is fine), or near the top alongside `HASH_KEY`: - -```ts -const THREAD_HASH_KEY = `blazebot:thread-parents:${process.env.VERCEL_ENV ?? "development"}`; -``` - -- [ ] **Step 2: Add a `describe("ThreadStore methods", ...)` block at the bottom of the file (inside the outer describe, before its closing `})`)** - -```ts - describe("ThreadStore methods", () => { - it("setParent then getParent round-trips the message id", async () => { - // Phase 1: setParent writes - const registry = createRegistry(); - await registry.setParent("AWT-42", "1700000000.000123"); - expect(mockRedis.hset).toHaveBeenCalledWith(THREAD_HASH_KEY, { - "AWT-42": "1700000000.000123", - }); - expect(mockRedis.persist).toHaveBeenCalledWith(THREAD_HASH_KEY); - - // Phase 2: getParent reads - mockRedis.hget.mockResolvedValueOnce("1700000000.000123"); - const result = await registry.getParent("AWT-42"); - expect(result).toBe("1700000000.000123"); - expect(mockRedis.hget).toHaveBeenCalledWith(THREAD_HASH_KEY, "AWT-42"); - }); - - it("getParent returns null when no entry exists", async () => { - mockRedis.hget.mockResolvedValueOnce(null); - const registry = createRegistry(); - const result = await registry.getParent("AWT-99"); - expect(result).toBeNull(); - }); - - it("clearParent deletes the entry from the thread hash", async () => { - const registry = createRegistry(); - await registry.clearParent("AWT-42"); - expect(mockRedis.hdel).toHaveBeenCalledWith(THREAD_HASH_KEY, "AWT-42"); - }); - - it("unregister does not touch the thread hash", async () => { - const registry = createRegistry(); - await registry.unregister("AWT-42"); - // unregister deletes from HASH_KEY, SANDBOX_HASH_KEY, ENTRY_TS_HASH_KEY only. - const hdelCalls = mockRedis.hdel.mock.calls.map((c) => c[0]); - expect(hdelCalls).not.toContain(THREAD_HASH_KEY); - }); - }); -``` - -- [ ] **Step 3: Run the new tests in isolation to verify they pass** - -Run: `pnpm vitest run src/adapters/run-registry/upstash.test.ts` -Expected: PASS — all existing tests + the four new ones. - -- [ ] **Step 4: Commit** - -```bash -git add src/adapters/run-registry/upstash.test.ts -git commit -m "test(run-registry): cover ThreadStore methods on UpstashRunRegistry" -``` - ---- - -## Phase 2 — Event types and formatter - -Goal: a pure formatter that converts a `TicketEvent` to a Slack-mrkdwn string with the Jira link (and PR link for `pr_ready`). No I/O, no adapter — testable in isolation. - -### Task 4: Add `TicketEvent` type and update `MessagingAdapter` interface - -**Files:** -- Modify: `src/adapters/messaging/types.ts` - -- [ ] **Step 1: Replace the file contents** - -Replace the entire contents of `src/adapters/messaging/types.ts` with: - -```ts -export type TicketEvent = - | { kind: "started" } - | { kind: "needs_clarification"; usageReport?: string } - | { - kind: "pr_ready"; - pr: { url: string; number: number }; - usageReport: string; - } - | { - kind: "failed"; - phase?: "research" | "impl" | "push"; - reason?: string; - usageReport?: string; - } - | { kind: "canceled"; reason: string }; - -export interface MessagingAdapter { - /** - * Send a ticket-scoped notification to the configured channel. - * - * The first `started` event for a ticket posts top-level and records its - * Slack message id as the lifetime parent. Subsequent events post as - * thread replies under that parent. If the parent has been deleted, the - * adapter clears the mapping and retries top-level (without re-anchoring - * unless the new event is `started`). - * - * Never throws — failures are logged and swallowed so workflow runs are - * never broken by a notification error. - */ - notifyForTicket(ticketKey: string, event: TicketEvent): Promise; -} -``` - -- [ ] **Step 2: Type-check** - -Run: `pnpm tsc --noEmit` -Expected: errors at every `notify(...)` and `messaging.notify(...)` call site (cron poll, jira webhook, agent workflow). These are addressed in later tasks. Confirm the messaging types file itself is clean (no errors *inside* the file). - -- [ ] **Step 3: Do NOT commit yet** - -The interface change makes the build red. Commit happens at the end of Phase 4 once all call sites are converted, so we don't ship a half-broken main. - ---- - -### Task 5: Implement and test the event formatter - -**Files:** -- Create: `src/adapters/messaging/format.ts` -- Create: `src/adapters/messaging/format.test.ts` - -- [ ] **Step 1: Write the failing test first** - -Create `src/adapters/messaging/format.test.ts`: - -```ts -import { describe, it, expect } from "vitest"; -import { formatTicketEvent } from "./format.js"; - -const JIRA = "https://example.atlassian.net"; -const KEY = "AWT-42"; -const LINK = `<${JIRA}/browse/${KEY}|${KEY}>`; - -describe("formatTicketEvent", () => { - it("started — links the ticket key", () => { - expect(formatTicketEvent({ kind: "started" }, KEY, JIRA)).toBe( - `Task ${LINK} started`, - ); - }); - - it("needs_clarification — without usage report", () => { - expect( - formatTicketEvent({ kind: "needs_clarification" }, KEY, JIRA), - ).toBe(`Task ${LINK} needs clarification`); - }); - - it("needs_clarification — appends usage report on a new line", () => { - expect( - formatTicketEvent( - { kind: "needs_clarification", usageReport: "Phase A: $0.10" }, - KEY, - JIRA, - ), - ).toBe(`Task ${LINK} needs clarification\nPhase A: $0.10`); - }); - - it("needs_clarification — empty usage report is treated as absent", () => { - expect( - formatTicketEvent( - { kind: "needs_clarification", usageReport: "" }, - KEY, - JIRA, - ), - ).toBe(`Task ${LINK} needs clarification`); - }); - - it("pr_ready — includes PR link inline and usage report", () => { - const text = formatTicketEvent( - { - kind: "pr_ready", - pr: { url: "https://github.com/o/r/pull/123", number: 123 }, - usageReport: "Total: $0.42", - }, - KEY, - JIRA, - ); - expect(text).toBe( - `Task ${LINK} PR ready for review — \nTotal: $0.42`, - ); - }); - - it("failed with phase and reason", () => { - expect( - formatTicketEvent( - { kind: "failed", phase: "research", reason: "phase timed out" }, - KEY, - JIRA, - ), - ).toBe(`Task ${LINK} failed: research — phase timed out`); - }); - - it("failed with reason but no phase", () => { - expect( - formatTicketEvent( - { kind: "failed", reason: "boom" }, - KEY, - JIRA, - ), - ).toBe(`Task ${LINK} failed: boom`); - }); - - it("failed with neither phase nor reason", () => { - expect( - formatTicketEvent({ kind: "failed" }, KEY, JIRA), - ).toBe(`Task ${LINK} failed`); - }); - - it("failed — appends usage report when present", () => { - expect( - formatTicketEvent( - { kind: "failed", phase: "impl", reason: "x", usageReport: "u" }, - KEY, - JIRA, - ), - ).toBe(`Task ${LINK} failed: impl — x\nu`); - }); - - it("canceled — includes reason", () => { - expect( - formatTicketEvent( - { kind: "canceled", reason: "left AI column" }, - KEY, - JIRA, - ), - ).toBe(`Task ${LINK} canceled: left AI column`); - }); - - it("trims a trailing slash on jiraBaseUrl", () => { - expect( - formatTicketEvent({ kind: "started" }, KEY, `${JIRA}/`), - ).toBe(`Task ${LINK} started`); - }); -}); -``` - -- [ ] **Step 2: Run the test — confirm it fails (module not found)** - -Run: `pnpm vitest run src/adapters/messaging/format.test.ts` -Expected: FAIL with "Cannot find module './format'" (or equivalent resolution error). - -- [ ] **Step 3: Implement `format.ts`** - -Create `src/adapters/messaging/format.ts`: - -```ts -import type { TicketEvent } from "./types.js"; - -/** - * Format a TicketEvent as Slack-mrkdwn text with embedded links. - * - * Output is intended for `chat.channel(...).post(text)` or `thread.post(text)`. - * Slack-native `` syntax is used because remark/mdast escaping - * via PostableMarkdown can mangle the angle brackets. We pass it as a plain - * string; the chat package treats unmarked strings as PostableRaw on Slack. - */ -export function formatTicketEvent( - event: TicketEvent, - ticketKey: string, - jiraBaseUrl: string, -): string { - const link = jiraLink(ticketKey, jiraBaseUrl); - const head = `Task ${link}`; - - switch (event.kind) { - case "started": - return `${head} started`; - - case "needs_clarification": - return appendUsage(`${head} needs clarification`, event.usageReport); - - case "pr_ready": { - const prLink = `<${event.pr.url}|#${event.pr.number}>`; - return appendUsage( - `${head} PR ready for review — ${prLink}`, - event.usageReport, - ); - } - - case "failed": { - const body = formatFailedBody(event.phase, event.reason); - return appendUsage(`${head} failed${body}`, event.usageReport); - } - - case "canceled": - return `${head} canceled: ${event.reason}`; - } -} - -function jiraLink(ticketKey: string, jiraBaseUrl: string): string { - const base = jiraBaseUrl.replace(/\/$/, ""); - return `<${base}/browse/${ticketKey}|${ticketKey}>`; -} - -function formatFailedBody( - phase: "research" | "impl" | "push" | undefined, - reason: string | undefined, -): string { - if (phase && reason) return `: ${phase} — ${reason}`; - if (reason) return `: ${reason}`; - return ""; -} - -function appendUsage(base: string, usageReport: string | undefined): string { - if (!usageReport) return base; - return `${base}\n${usageReport}`; -} -``` - -- [ ] **Step 4: Run the test — confirm it passes** - -Run: `pnpm vitest run src/adapters/messaging/format.test.ts` -Expected: PASS — all 11 cases. - -- [ ] **Step 5: Commit (formatter only — adapter and call-site changes ride together later)** - -```bash -git add src/adapters/messaging/format.ts src/adapters/messaging/format.test.ts -git commit -m "feat(messaging): add ticket event formatter with Jira/PR mrkdwn links" -``` - ---- - -## Phase 3 — Adapter rewrite - -Goal: replace `notify()` with `notifyForTicket()` in `ChatSDKAdapter`, including missing-parent recovery. Tests are rewritten to match. - -### Task 6: Rewrite `ChatSDKAdapter` around `notifyForTicket` - -**Files:** -- Modify: `src/adapters/messaging/chatsdk.ts` - -- [ ] **Step 1: Replace the file contents** - -Replace `src/adapters/messaging/chatsdk.ts` with: - -```ts -import { Chat, ThreadImpl } from "chat"; -import type { StateAdapter, Lock } from "chat"; -import { createSlackAdapter } from "@chat-adapter/slack"; -import { logger } from "../../lib/logger.js"; -import { formatTicketEvent } from "./format.js"; -import type { MessagingAdapter, TicketEvent } from "./types.js"; -import type { ThreadStore } from "../run-registry/types.js"; - -export interface ChatSDKConfig { - slackToken: string; - channelId: string; - botName: string; - jiraBaseUrl: string; - threadStore: ThreadStore; -} - -/** Minimal no-op StateAdapter for outbound-only notification use. */ -const noopState: StateAdapter = { - acquireLock: async () => null, - appendToList: async () => {}, - connect: async () => {}, - delete: async () => {}, - disconnect: async () => {}, - extendLock: async () => false, - forceReleaseLock: async () => {}, - get: async () => null, - getList: async () => [], - isSubscribed: async () => false, - releaseLock: async (_lock: Lock) => {}, - set: async () => {}, - setIfNotExists: async () => false, - subscribe: async () => {}, - unsubscribe: async () => {}, -}; - -/** - * Slack error codes that mean "the parent message is gone." - * When we see one of these on a thread reply, we clear the stored parent - * and retry top-level. List sourced from Slack's chat.postMessage docs; - * exact codes are confirmed during smoke-testing (deliberately delete a - * parent in a test channel). - */ -const MISSING_PARENT_ERROR_CODES = new Set([ - "thread_not_found", - "message_not_found", -]); - -export class ChatSDKAdapter implements MessagingAdapter { - private chat: InstanceType; - private slackAdapter: ReturnType; - private channelId: string; - private jiraBaseUrl: string; - private threadStore: ThreadStore; - - constructor(config: ChatSDKConfig) { - this.channelId = config.channelId; - this.jiraBaseUrl = config.jiraBaseUrl; - this.threadStore = config.threadStore; - this.slackAdapter = createSlackAdapter({ botToken: config.slackToken }); - this.chat = new Chat({ - userName: config.botName, - state: noopState, - adapters: { slack: this.slackAdapter }, - }); - } - - async notifyForTicket( - ticketKey: string, - event: TicketEvent, - ): Promise { - const text = formatTicketEvent(event, ticketKey, this.jiraBaseUrl); - let parent = await this.threadStore.getParent(ticketKey).catch(() => null); - - let sentId: string | null = null; - try { - sentId = parent - ? await this.postReply(parent, text) - : await this.postTopLevel(text); - } catch (err) { - if (parent && isMissingParentError(err)) { - logger.debug( - { ticketKey, parent, eventKind: event.kind }, - "thread_parent_recovered", - ); - await this.threadStore.clearParent(ticketKey).catch(() => {}); - parent = null; - try { - sentId = await this.postTopLevel(text); - } catch (retryErr) { - this.logFailure(ticketKey, event.kind, retryErr); - return; - } - } else { - this.logFailure(ticketKey, event.kind, err); - return; - } - } - - if (event.kind === "started" && parent == null && sentId) { - await this.threadStore - .setParent(ticketKey, sentId) - .catch((err) => - logger.warn( - { ticketKey, error: (err as Error).message }, - "thread_parent_persist_failed", - ), - ); - } - - logger.info( - { - ticketKey, - eventKind: event.kind, - threadParentId: parent, - channelId: this.channelId, - }, - "notification_sent", - ); - } - - /** Top-level post to the configured channel. Returns the sent message id. */ - private async postTopLevel(text: string): Promise { - const channel = this.chat.channel(`slack:${this.channelId}`); - const sent = await channel.post(text); - return sent.id; - } - - /** Thread reply under `parentTs`. Returns the sent message id. */ - private async postReply(parentTs: string, text: string): Promise { - const thread = new ThreadImpl({ - id: `slack:${this.channelId}:${parentTs}`, - adapter: this.slackAdapter, - channelId: `slack:${this.channelId}`, - stateAdapter: noopState, - isDM: false, - }); - const sent = await thread.post(text); - return sent.id; - } - - private logFailure( - ticketKey: string, - eventKind: TicketEvent["kind"], - err: unknown, - ): void { - logger.warn( - { - ticketKey, - eventKind, - error: (err as Error).message, - slackErrorCode: extractSlackErrorCode(err), - }, - "notification_failed", - ); - } -} - -function isMissingParentError(err: unknown): boolean { - const code = extractSlackErrorCode(err); - return code != null && MISSING_PARENT_ERROR_CODES.has(code); -} - -/** - * Pull a Slack-style error code out of an unknown error. The chat package - * surfaces Slack errors as ChatError-derived objects with a `code` string; - * the underlying Web API error code may also live on `data.error` for raw - * errors. We check both locations defensively. - */ -function extractSlackErrorCode(err: unknown): string | null { - if (!err || typeof err !== "object") return null; - const e = err as { code?: unknown; data?: { error?: unknown } }; - if (typeof e.code === "string") return e.code; - if (e.data && typeof e.data.error === "string") return e.data.error; - return null; -} -``` - -- [ ] **Step 2: Type-check** - -Run: `pnpm tsc --noEmit` -Expected: errors only at the call sites in `src/lib/adapters.ts`, `src/lib/step-adapters.ts`, `src/workflows/agent.ts`, `src/routes/cron/poll.get.ts`, `src/routes/webhooks/jira.post.ts`. The adapter file itself is clean. - -- [ ] **Step 3: Do not commit yet** (call sites still broken — wired in Phase 4). - ---- - -### Task 7: Rewrite `chatsdk.test.ts` - -**Files:** -- Modify: `src/adapters/messaging/chatsdk.test.ts` - -- [ ] **Step 1: Replace the file contents** - -Replace `src/adapters/messaging/chatsdk.test.ts` with: - -```ts -import { describe, it, expect, vi, beforeEach } from "vitest"; -import { ChatSDKAdapter } from "./chatsdk.js"; -import type { ThreadStore } from "../run-registry/types.js"; - -const mockChannelPost = vi.fn(); -const mockThreadPost = vi.fn(); - -vi.mock("chat", () => { - return { - Chat: vi.fn(() => ({ - channel: vi.fn((id: string) => ({ - id, - post: mockChannelPost, - })), - })), - ThreadImpl: vi.fn(function (this: { id: string; post: typeof mockThreadPost }, cfg: { id: string }) { - this.id = cfg.id; - this.post = mockThreadPost; - }), - }; -}); - -vi.mock("@chat-adapter/slack", () => ({ - createSlackAdapter: vi.fn(() => ({ name: "slack" })), -})); - -function createThreadStore(): ThreadStore & { - getParent: ReturnType; - setParent: ReturnType; - clearParent: ReturnType; -} { - return { - getParent: vi.fn().mockResolvedValue(null), - setParent: vi.fn().mockResolvedValue(undefined), - clearParent: vi.fn().mockResolvedValue(undefined), - }; -} - -function createAdapter(threadStore: ThreadStore) { - return new ChatSDKAdapter({ - slackToken: "xoxb-test", - channelId: "C123", - botName: "blazebot", - jiraBaseUrl: "https://jira.example.com", - threadStore, - }); -} - -const JIRA_LINK = ""; - -describe("ChatSDKAdapter.notifyForTicket", () => { - beforeEach(() => { - vi.clearAllMocks(); - mockChannelPost.mockResolvedValue({ id: "1700000000.000111" }); - mockThreadPost.mockResolvedValue({ id: "1700000000.000222" }); - }); - - it("started with no parent — posts top-level and records the parent", async () => { - const store = createThreadStore(); - const adapter = createAdapter(store); - - await adapter.notifyForTicket("AWT-42", { kind: "started" }); - - expect(mockChannelPost).toHaveBeenCalledWith(`Task ${JIRA_LINK} started`); - expect(mockThreadPost).not.toHaveBeenCalled(); - expect(store.setParent).toHaveBeenCalledWith("AWT-42", "1700000000.000111"); - }); - - it("subsequent event with parent set — posts as thread reply, does not setParent", async () => { - const store = createThreadStore(); - store.getParent.mockResolvedValueOnce("1700000000.000111"); - const adapter = createAdapter(store); - - await adapter.notifyForTicket("AWT-42", { - kind: "needs_clarification", - usageReport: "u", - }); - - expect(mockChannelPost).not.toHaveBeenCalled(); - expect(mockThreadPost).toHaveBeenCalledWith( - `Task ${JIRA_LINK} needs clarification\nu`, - ); - expect(store.setParent).not.toHaveBeenCalled(); - }); - - it("non-started event with no parent — posts top-level, does NOT setParent (orphan stays orphan)", async () => { - const store = createThreadStore(); - const adapter = createAdapter(store); - - await adapter.notifyForTicket("AWT-42", { - kind: "canceled", - reason: "left AI column", - }); - - expect(mockChannelPost).toHaveBeenCalledWith( - `Task ${JIRA_LINK} canceled: left AI column`, - ); - expect(store.setParent).not.toHaveBeenCalled(); - }); - - it("parent deleted on Slack — clears mapping, retries top-level, re-records on started", async () => { - const store = createThreadStore(); - store.getParent.mockResolvedValueOnce("1700000000.000111"); - mockThreadPost.mockRejectedValueOnce( - Object.assign(new Error("thread gone"), { code: "thread_not_found" }), - ); - mockChannelPost.mockResolvedValueOnce({ id: "1700000000.000999" }); - const adapter = createAdapter(store); - - await adapter.notifyForTicket("AWT-42", { kind: "started" }); - - expect(mockThreadPost).toHaveBeenCalledTimes(1); - expect(store.clearParent).toHaveBeenCalledWith("AWT-42"); - expect(mockChannelPost).toHaveBeenCalledTimes(1); - // Because the event is `started`, the new top-level message becomes the parent. - expect(store.setParent).toHaveBeenCalledWith("AWT-42", "1700000000.000999"); - }); - - it("parent deleted on Slack for non-started event — clears + retries, does not re-anchor", async () => { - const store = createThreadStore(); - store.getParent.mockResolvedValueOnce("1700000000.000111"); - mockThreadPost.mockRejectedValueOnce( - Object.assign(new Error("not found"), { code: "message_not_found" }), - ); - const adapter = createAdapter(store); - - await adapter.notifyForTicket("AWT-42", { - kind: "failed", - phase: "impl", - reason: "boom", - }); - - expect(store.clearParent).toHaveBeenCalledWith("AWT-42"); - expect(mockChannelPost).toHaveBeenCalledWith( - `Task ${JIRA_LINK} failed: impl — boom`, - ); - expect(store.setParent).not.toHaveBeenCalled(); - }); - - it("non-missing-parent error — does not retry, does not throw", async () => { - const store = createThreadStore(); - store.getParent.mockResolvedValueOnce("1700000000.000111"); - mockThreadPost.mockRejectedValueOnce( - Object.assign(new Error("rate limited"), { code: "rate_limited" }), - ); - const adapter = createAdapter(store); - - await expect( - adapter.notifyForTicket("AWT-42", { kind: "started" }), - ).resolves.not.toThrow(); - expect(store.clearParent).not.toHaveBeenCalled(); - expect(mockChannelPost).not.toHaveBeenCalled(); - expect(store.setParent).not.toHaveBeenCalled(); - }); - - it("top-level post failure — swallows error, no parent recorded", async () => { - const store = createThreadStore(); - mockChannelPost.mockRejectedValueOnce(new Error("Slack API down")); - const adapter = createAdapter(store); - - await expect( - adapter.notifyForTicket("AWT-42", { kind: "started" }), - ).resolves.not.toThrow(); - expect(store.setParent).not.toHaveBeenCalled(); - }); - - it("pr_ready — formats with PR link inline", async () => { - const store = createThreadStore(); - store.getParent.mockResolvedValueOnce("1700000000.000111"); - const adapter = createAdapter(store); - - await adapter.notifyForTicket("AWT-42", { - kind: "pr_ready", - pr: { url: "https://github.com/o/r/pull/7", number: 7 }, - usageReport: "Total: $0.10", - }); - - expect(mockThreadPost).toHaveBeenCalledWith( - `Task ${JIRA_LINK} PR ready for review — \nTotal: $0.10`, - ); - }); -}); -``` - -- [ ] **Step 2: Run only this test file — confirm it passes** - -Run: `pnpm vitest run src/adapters/messaging/chatsdk.test.ts` -Expected: PASS — 8 cases. - -- [ ] **Step 3: Commit the adapter + tests together** - -```bash -git add src/adapters/messaging/types.ts src/adapters/messaging/chatsdk.ts src/adapters/messaging/chatsdk.test.ts -git commit -m "feat(messaging): replace notify() with notifyForTicket() and lifetime threading" -``` - -(Note: this commit makes call-site files temporarily fail typecheck. They are fixed in Phase 4 within the same PR, so main is never red. If you are using subagent-driven development, the next subagent should pick up immediately.) - ---- - -## Phase 4 — Wire up call sites - -Goal: convert every `messaging.notify(...)` and `notifySlack(...)` call to `notifyForTicket(ticketKey, event)`, and wire the new constructor args. - -### Task 8: Pass `jiraBaseUrl` and `threadStore` into `ChatSDKAdapter` - -**Files:** -- Modify: `src/lib/adapters.ts` -- Modify: `src/lib/step-adapters.ts` - -- [ ] **Step 1: Update `src/lib/adapters.ts` — instantiate the run registry first, then reuse it as the thread store** - -Replace the body of `createAdapters()` with: - -```ts -export function createAdapters(): Adapters { - const runRegistry = new UpstashRunRegistry({ - url: env.AI_WORKFLOW_KV_REST_API_URL, - token: env.AI_WORKFLOW_KV_REST_API_TOKEN, - }); - return { - issueTracker: new JiraAdapter({ - baseUrl: env.JIRA_BASE_URL, - email: env.JIRA_EMAIL, - apiToken: env.JIRA_API_TOKEN, - projectKey: env.JIRA_PROJECT_KEY, - }), - vcs: createVCS(), - messaging: new ChatSDKAdapter({ - slackToken: env.CHAT_SDK_SLACK_TOKEN, - channelId: env.CHAT_SDK_CHANNEL_ID, - botName: env.CHAT_SDK_BOT_NAME, - jiraBaseUrl: env.JIRA_BASE_URL, - threadStore: runRegistry, - }), - runRegistry, - }; -} -``` - -- [ ] **Step 2: Apply the same change to `src/lib/step-adapters.ts`** - -Replace the body of `createStepAdapters()` with the analogous version (same pattern, same fields — only the function name differs): - -```ts -export function createStepAdapters(): StepAdapters { - const runRegistry = new UpstashRunRegistry({ - url: env.AI_WORKFLOW_KV_REST_API_URL, - token: env.AI_WORKFLOW_KV_REST_API_TOKEN, - }); - return { - issueTracker: new JiraAdapter({ - baseUrl: env.JIRA_BASE_URL, - email: env.JIRA_EMAIL, - apiToken: env.JIRA_API_TOKEN, - projectKey: env.JIRA_PROJECT_KEY, - }), - vcs: createVCS(), - messaging: new ChatSDKAdapter({ - slackToken: env.CHAT_SDK_SLACK_TOKEN, - channelId: env.CHAT_SDK_CHANNEL_ID, - botName: env.CHAT_SDK_BOT_NAME, - jiraBaseUrl: env.JIRA_BASE_URL, - threadStore: runRegistry, - }), - runRegistry, - }; -} -``` - -- [ ] **Step 3: Type-check** - -Run: `pnpm tsc --noEmit` -Expected: errors only at the three remaining `notify(...)` and `notifySlack(...)` call sites — `agent.ts`, `cron/poll.get.ts`, `webhooks/jira.post.ts`. - ---- - -### Task 9: Convert `cron/poll.get.ts` and `webhooks/jira.post.ts` - -**Files:** -- Modify: `src/routes/cron/poll.get.ts` -- Modify: `src/routes/webhooks/jira.post.ts` - -- [ ] **Step 1: Update `src/routes/cron/poll.get.ts:23`** - -Find the existing call inside the reconcile callback: - -```ts - await adapters.messaging.notify( - `Task ${ticketKey} canceled: ${detail}.`, - ); -``` - -Replace with: - -```ts - await adapters.messaging.notifyForTicket(ticketKey, { - kind: "canceled", - reason: `${detail}.`, - }); -``` - -(Trailing period preserved to match today's text — the formatter does not add punctuation.) - -- [ ] **Step 2: Update `src/routes/webhooks/jira.post.ts:110`** - -Find the existing block: - -```ts - const cancelled = await cancelTrackedRun(ticketKey, adapters.runRegistry); - if (cancelled) { - await adapters.messaging.notify( - `Task ${ticketKey} canceled: webhook confirmed ticket is outside AI column.`, - ); - } -``` - -Replace the inner `notify` call (keep the surrounding `if (cancelled) { ... }`): - -```ts - const cancelled = await cancelTrackedRun(ticketKey, adapters.runRegistry); - if (cancelled) { - await adapters.messaging.notifyForTicket(ticketKey, { - kind: "canceled", - reason: "webhook confirmed ticket is outside AI column", - }); - } -``` - -- [ ] **Step 3: Type-check** - -Run: `pnpm tsc --noEmit` -Expected: errors only inside `src/workflows/agent.ts`. - ---- - -### Task 10: Convert `src/workflows/agent.ts` — replace the `notifySlack` step - -**Files:** -- Modify: `src/workflows/agent.ts` - -- [ ] **Step 1: Rename and retype the step function** - -Find the existing step (around line 338): - -```ts -async function notifySlack(message: string) { - "use step"; - const { createStepAdapters } = await import("../lib/step-adapters.js"); - const { messaging } = createStepAdapters(); - await messaging.notify(message); -} -``` - -Replace with: - -```ts -async function notifyTicket(ticketKey: string, event: TicketEvent) { - "use step"; - const { createStepAdapters } = await import("../lib/step-adapters.js"); - const { messaging } = createStepAdapters(); - await messaging.notifyForTicket(ticketKey, event); -} -``` - -- [ ] **Step 2: Add the `TicketEvent` type import at the top of the file** - -Locate the existing `import type { ... }` block from `../adapters/...` (the `PRComment, CheckRunResult` line). Add a sibling import: - -```ts -import type { TicketEvent } from "../adapters/messaging/types.js"; -``` - -- [ ] **Step 3: Replace the `started` notification at line 441** - -```ts - await notifySlack(`Task ${ticket.identifier} started`); -``` - -becomes: - -```ts - await notifyTicket(ticket.identifier, { kind: "started" }); -``` - -- [ ] **Step 4: Replace research-timeout notification (line ~518)** - -```ts - await notifySlack(`Task ${ticket.identifier} failed: research phase timed out${usageSuffix()}`); -``` - -becomes: - -```ts - await notifyTicket(ticket.identifier, { - kind: "failed", - phase: "research", - reason: "phase timed out", - usageReport: usageReportOrUndefined(), - }); -``` - -(Helper added in Step 11 below.) - -- [ ] **Step 5: Replace research-clarification (line ~536)** - -```ts - await notifySlack(`Task ${ticket.identifier} needs clarification${usageSuffix()}`); -``` - -becomes: - -```ts - await notifyTicket(ticket.identifier, { - kind: "needs_clarification", - usageReport: usageReportOrUndefined(), - }); -``` - -- [ ] **Step 6: Replace research-failure (line ~543)** - -```ts - await notifySlack(`Task ${ticket.identifier} failed: research — ${research.body.slice(0, 200)}${usageSuffix()}`); -``` - -becomes: - -```ts - await notifyTicket(ticket.identifier, { - kind: "failed", - phase: "research", - reason: research.body.slice(0, 200), - usageReport: usageReportOrUndefined(), - }); -``` - -- [ ] **Step 7: Replace impl-clarification (line ~587)** - -```ts - await notifySlack(`Task ${ticket.identifier} needs clarification${usageSuffix()}`); -``` - -becomes: - -```ts - await notifyTicket(ticket.identifier, { - kind: "needs_clarification", - usageReport: usageReportOrUndefined(), - }); -``` - -- [ ] **Step 8: Replace impl-failure (line ~594)** - -```ts - await notifySlack(`Task ${ticket.identifier} failed: implementation — ${implOutput.error ?? "unknown"}${usageSuffix()}`); -``` - -becomes: - -```ts - await notifyTicket(ticket.identifier, { - kind: "failed", - phase: "impl", - reason: implOutput.error ?? "unknown", - usageReport: usageReportOrUndefined(), - }); -``` - -- [ ] **Step 9: Replace push-failure (line ~653)** - -```ts - await notifySlack(`Task ${ticket.identifier} failed: push failed — ${pushResult.error ?? "unknown"}${usageSuffix()}`); -``` - -becomes: - -```ts - await notifyTicket(ticket.identifier, { - kind: "failed", - phase: "push", - reason: pushResult.error ?? "unknown", - usageReport: usageReportOrUndefined(), - }); -``` - -- [ ] **Step 10: Replace PR-ready notification (line ~659–665) and capture the PR result** - -Today the block reads: - -```ts - if (!prContext) { - await createPullRequest(branchName, ticket.title, ""); - } - // Notify Slack BEFORE moving the ticket out of the AI column. - // Reconcile cancels runs whose tickets have left AI column; racing - // that cancellation after moveTicket would skip the notification. - const usageReport = formatUsageReport(phaseUsages, priceLookup, activeModel); - await notifySlack(`Task ${ticket.identifier} PR ready for review\n${usageReport}`); - await moveTicket(ticketId, env.COLUMN_AI_REVIEW); - await unregisterRun(ticket.identifier); -``` - -Replace with (capturing the new-PR return; for the existing-PR branch, reuse `prContext.findPR`-equivalent data already present): - -```ts - // We need a {url, number} regardless of whether the PR is new or pre-existing. - // - New PR: createPullRequest returns the PullRequest ({ id, url, branch }) — capture it. - // - Existing PR: prContext was built from vcs.findPR(branch), but findPR's return - // is dropped today. Re-fetch via the VCS adapter step (cheap; same call already - // ran on this branch earlier in the workflow). - const pr = !prContext - ? await createPullRequest(branchName, ticket.title, "") - : await findPRForBranch(branchName); - - const usageReport = formatUsageReport(phaseUsages, priceLookup, activeModel); - await notifyTicket(ticket.identifier, { - kind: "pr_ready", - pr: { url: pr.url, number: pr.id }, - usageReport, - }); - await moveTicket(ticketId, env.COLUMN_AI_REVIEW); - await unregisterRun(ticket.identifier); -``` - -Add the supporting step function alongside the other VCS step wrappers (e.g., right after `createPullRequest`): - -```ts -async function findPRForBranch(branchName: string) { - "use step"; - const { createStepAdapters } = await import("../lib/step-adapters.js"); - const { vcs } = createStepAdapters(); - const pr = await vcs.findPR(branchName); - if (!pr) { - // The pr_ready branch only runs after a successful push to an existing - // PR's branch — the PR cannot have vanished. Treat as an invariant. - throw new Error(`Expected PR for branch ${branchName} but findPR returned null`); - } - return pr; -} -``` - -- [ ] **Step 11: Replace catch-all (line ~674)** - -Today: - -```ts - await notifySlack(`Task ${ticket.identifier} failed: ${(err as Error).message ?? "unknown"}${usageSuffix()}`).catch(() => {}); -``` - -Replace with: - -```ts - await notifyTicket(ticket.identifier, { - kind: "failed", - reason: (err as Error).message ?? "unknown", - usageReport: usageReportOrUndefined(), - }).catch(() => {}); -``` - -- [ ] **Step 12: Add the `usageReportOrUndefined` helper next to the existing `usageSuffix`** - -In `agentWorkflow`, find: - -```ts - const usageSuffix = () => - Object.keys(phaseUsages).length - ? `\n${formatUsageReport(phaseUsages, priceLookup, activeModel)}` - : ""; -``` - -Add immediately below it: - -```ts - // Variant of usageSuffix() that returns the bare report (no leading newline) - // or `undefined` so the messaging formatter can decide whether to render it. - const usageReportOrUndefined = (): string | undefined => - Object.keys(phaseUsages).length - ? formatUsageReport(phaseUsages, priceLookup, activeModel) - : undefined; -``` - -`usageSuffix` becomes unused after Step 11. Delete it (the closure's only callers are the lines we just rewrote). - -- [ ] **Step 13: Type-check** - -Run: `pnpm tsc --noEmit` -Expected: PASS, no errors. - -- [ ] **Step 14: Run the full test suite** - -Run: `pnpm vitest run` -Expected: PASS. The following test files are touched in this change and must all pass: -- `src/adapters/messaging/format.test.ts` -- `src/adapters/messaging/chatsdk.test.ts` -- `src/adapters/run-registry/upstash.test.ts` - -Existing tests for `agent.ts`, cron, and webhook handlers must continue to pass without modification — they don't assert on Slack call shapes. - -- [ ] **Step 15: Commit** - -```bash -git add src/lib/adapters.ts src/lib/step-adapters.ts src/routes/cron/poll.get.ts src/routes/webhooks/jira.post.ts src/workflows/agent.ts -git commit -m "feat(workflow): notify per-ticket events (start/clarify/pr/failed/cancel) into one Slack thread" -``` - ---- - -## Phase 5 — Verification - -Goal: confirm Slack rendering and threading behavior against a real channel before merging. Two pieces are not exercised by unit tests: - -1. Whether the Slack-mrkdwn `` syntax survives the chat package's render pipeline. -2. Whether `thread_not_found` / `message_not_found` are the actual error codes Slack returns for a deleted parent. - -### Task 11: Manual smoke verification against a real Slack channel - -**Files:** none (operational task). - -- [ ] **Step 1: Deploy the branch to a preview environment with a test Slack channel + Jira project configured** - -Use the standard preview deploy flow for this repo (deploy via `pnpm dlx vercel deploy` or the team's preview pipeline — the user runs this). - -- [ ] **Step 2: Trigger one ticket through to PR-ready** - -Move a Jira ticket into the AI column. Watch the Slack channel: - -Expected sequence in the channel: -1. Top-level message: `Task started` — the `AWT-XX` text must be a clickable hyperlink. -2. Thread reply on the same message: `Task PR ready for review — ` — both the Jira key and `#NN` are clickable. - -- [ ] **Step 3: Verify link rendering** - -Click the Jira link and the PR link. Both must navigate to the expected destinations. If either renders as raw `<…|…>` text instead of a hyperlink, the chat package is escaping the angle brackets — fall back to the alternate rendering option: - -Edit `formatTicketEvent` callers in `chatsdk.ts` to pass the formatted string as `PostableRaw`: - -```ts -// In ChatSDKAdapter, change: -const sent = await channel.post(text); -// to: -const sent = await channel.post({ raw: text }); -``` - -(Same change for the `thread.post` call.) Re-run the smoke test. If links now render correctly, commit the change as a follow-up: - -```bash -git add src/adapters/messaging/chatsdk.ts -git commit -m "fix(messaging): post raw mrkdwn so Slack hyperlinks render" -``` - -If they still don't render, document the failure in `.claude/learnings.md` and stop — the remediation is out of scope for this plan. - -- [ ] **Step 4: Verify deleted-parent recovery** - -In Slack, delete the original `started` message in the test channel. Trigger the same ticket through another full run (e.g., move it back into AI column). Observe: - -Expected: a fresh top-level `Task ... started` message appears (new parent), followed by threaded subsequent messages. - -Check the deployed logs for the line `thread_parent_recovered`. If the recovery path did not run (i.e., the second `started` posted as a reply under the deleted message), the Slack error code is not in `MISSING_PARENT_ERROR_CODES`. Capture the actual error code from the log line `notification_failed → slackErrorCode`, add it to the `MISSING_PARENT_ERROR_CODES` set in `chatsdk.ts`, and commit: - -```bash -git add src/adapters/messaging/chatsdk.ts -git commit -m "fix(messaging): include in missing-parent error code set" -``` - -- [ ] **Step 5: Update `.claude/learnings.md`** - -Append a learning entry capturing whichever path the smoke test confirmed: - -- Whether `` renders correctly via plain string post or required `PostableRaw`. -- The exact Slack error codes seen for deleted parents. - -This is a small file edit. The existing CLAUDE.md instruction (`Whenever you are corrected or discover something unexpected about this codebase, append the learning to .claude/learnings.md`) makes this mandatory regardless of outcome. - -- [ ] **Step 6: Final report-out** - -State to the user: - -- The events posted (which ones, in which order). -- Whether links rendered correctly (and which option won — plain string vs `PostableRaw`). -- The actual Slack error code(s) seen for deleted parents (if encountered). -- Any out-of-scope follow-ups discovered. - -The user reviews the smoke results and merges. - ---- - -## Self-Review Checklist (verified) - -- **Spec coverage:** every section of `2026-04-30-slack-threaded-messages-design.md` is mapped: - - Solution / Threading Policy → Task 6 (adapter logic) + Task 5 (formatter). - - Architecture / `ThreadStore` interface → Tasks 1–3. - - Event Types and Formatting → Tasks 4–5. - - Adapter Behavior pseudocode → Task 6. - - Call-Site Changes (agent / cron / webhook) → Tasks 9–10. - - Adapter wiring → Task 8. - - Redis Data Model (THREAD_HASH_KEY, no TTL, untouched by `unregister`) → Task 2 + Task 3 (the "unregister leaves it alone" assertion). - - Testing → Tasks 3, 5, 7. - - Migration / Rollout → covered implicitly (no DB seeding required; revert is a normal git revert). - - Observability (`thread_parent_recovered`, structured fields) → Task 6. - - Risks (mrkdwn rendering, error codes) → Task 11 verification. -- **Placeholders:** none. Every step contains the exact code or command needed. -- **Type consistency:** `TicketEvent`, `ThreadStore`, `notifyForTicket`, `formatTicketEvent`, `findPRForBranch`, `usageReportOrUndefined`, `MISSING_PARENT_ERROR_CODES`, `THREAD_HASH_KEY` are spelled identically across every task that mentions them. -- **No half-state on main:** Phase 3 and Phase 4 commits both make the build green individually (commit boundaries align with green typecheck + green tests). diff --git a/docs/superpowers/plans/2026-05-01-slack-slash-commands.md b/docs/superpowers/plans/2026-05-01-slack-slash-commands.md deleted file mode 100644 index d3d5183..0000000 --- a/docs/superpowers/plans/2026-05-01-slack-slash-commands.md +++ /dev/null @@ -1,141 +0,0 @@ -# Slack Slash Commands Implementation Plan - -**Goal:** Add inbound Slack slash commands (`/ai-workflow list | status | cancel `) so operators can inspect and control workflow runs from Slack. - -**Architecture:** One Nitro POST route at `/webhooks/slack` verifies Slack's HMAC signature, parses the slash command payload, ack's within 3s, and dispatches the subcommand to async handlers that read the existing `RunRegistryAdapter` and reuse `cancelRun()`. Results are posted back via Slack's `response_url`. - -**Tech Stack:** Nitro (h3), `@chat-adapter/slack` (already wired for outbound), Node `crypto` for HMAC, existing Upstash run registry, `workflow/api` for run cancel. - -**Out of scope (deferred):** interactive buttons, Events API / `app_mention`, multi-tenant routing, audit log persistence. - ---- - -## File Structure - -| Path | Responsibility | -|---|---| -| `src/routes/webhooks/slack.post.ts` | Route entry: verify signature, parse payload, ack, dispatch | -| `src/lib/slack/verify.ts` | HMAC-SHA256 signature verification + timestamp drift check | -| `src/lib/slack/commands.ts` | Subcommand parser + dispatcher (`list`, `status`, `cancel`) | -| `src/lib/slack/respond.ts` | Helper to POST formatted text to Slack `response_url` | -| `src/lib/slack/format.ts` | Format registry rows + status into Slack mrkdwn | -| `env.ts` | Add `SLACK_SIGNING_SECRET` (required); optional `SLACK_ALLOWED_USER_IDS` | -| `*.test.ts` siblings | Unit tests for verify, commands, format | - -`cancelRun` (`src/lib/cancel-run.ts`) and `RunRegistryAdapter.listAll/getRunId/getSandboxId` are reused as-is — no changes. - ---- - -## Phase 1 — Signature verification (foundational, pure function) - -**Task 1.1:** Add `SLACK_SIGNING_SECRET: z.string().min(1)` to `env.ts` server schema. Add optional `SLACK_ALLOWED_USER_IDS: z.string().optional()` (comma-separated). - -**Task 1.2:** Implement `src/lib/slack/verify.ts`: -- `verifySlackSignature({ rawBody, timestamp, signature, signingSecret })` → boolean -- Compute `v0=` + HMAC-SHA256 over `v0:${timestamp}:${rawBody}` with `signingSecret` -- Compare with `timingSafeEqual` -- Reject if `Math.abs(now - timestamp) > 300` (5 min replay window) - -**Task 1.3:** TDD: cover (a) valid sig passes, (b) tampered body fails, (c) old timestamp fails, (d) length-mismatched signature fails without throwing. - ---- - -## Phase 2 — Command parsing (pure) - -**Task 2.1:** Implement `src/lib/slack/commands.ts` with: -```ts -type ParsedCommand = - | { kind: "list" } - | { kind: "status"; ticketKey: string } - | { kind: "cancel"; ticketKey: string } - | { kind: "help" } - | { kind: "unknown"; raw: string }; - -export function parseCommand(text: string): ParsedCommand; -``` -- Trim, split on whitespace, lowercase verb -- Validate ticket key matches `/^[A-Z][A-Z0-9]+-\d+$/` (uppercase first), else return `unknown` -- Empty / `help` → help - -**Task 2.2:** TDD each branch including malformed keys (`abc`, `AWT`, `AWT-`, `awt-1` lowercased before validation). - ---- - -## Phase 3 — Subcommand handlers - -Each returns a `string` (Slack mrkdwn) — no Slack I/O inside, so they're trivial to test. - -**Task 3.1:** `handleList(runRegistry)`: -- `runRegistry.listAll()` → filter out claiming sentinels (use existing `isClaimingSentinel`) -- Format each row: `• — runId: \`xxx\`` (link via `JIRA_BASE_URL`) -- Empty list → "No active workflows." - -**Task 3.2:** `handleStatus(runRegistry, ticketKey)`: -- Look up `getRunId` + `getSandboxId` -- Return `Not tracked.` / `TICKET → runId, sandbox: yes/no` -- Out of scope: live workflow status from `workflow/api` (add only if registry-only is insufficient in practice) - -**Task 3.3:** `handleCancel(runRegistry, ticketKey)`: -- `getRunId(ticketKey)` — if null, return `No active run for TICKET.` -- If runId is a claiming sentinel, return `TICKET is mid-dispatch; try again in a moment.` -- Otherwise call `cancelRun(ticketKey, runId, runRegistry)` and return result message. - -**Task 3.4:** TDD with stubbed `RunRegistryAdapter` — assert exact return strings. Don't mock `workflow/api`; instead inject a fake `cancelRun` via parameter so the handler stays a pure function over its dependencies. - ---- - -## Phase 4 — Route wiring (the only place with side effects) - -**Task 4.1:** `src/routes/webhooks/slack.post.ts`: -1. `readRawBody` (mirrors Jira webhook). -2. Read headers `x-slack-request-timestamp`, `x-slack-signature`. Verify via Phase 1; on failure `throw createError({ statusCode: 401 })`. -3. Parse `application/x-www-form-urlencoded` → `{ command, text, response_url, user_id, channel_id }`. -4. Optional allowlist: if `SLACK_ALLOWED_USER_IDS` set and `user_id` not in it, reply 200 with ephemeral "Not authorized." -5. `parseCommand(text)`. For `unknown`/`help`, respond synchronously with usage. -6. For real commands: respond **immediately** with `{ response_type: "ephemeral", text: "Working on \`${command} ${text}\`…" }` (Slack's 3s budget). -7. Schedule the handler in the background: - ```ts - event.waitUntil(runHandler(parsed, response_url, adapters)); - ``` - `runHandler` calls the appropriate `handle*`, then POSTs `{ response_type: "in_channel", text: result }` to `response_url`. -8. `createAdapters()` is reused — same shape as Jira webhook. - -**Task 4.2:** `src/lib/slack/respond.ts`: -- `postToResponseUrl(url, payload)` — `fetch(url, { method: "POST", body: JSON.stringify(payload), headers: { "content-type": "application/json" } })` -- Log + swallow on failure (matches existing messaging adapter philosophy: notifications never break flows). - -**Task 4.3:** Integration test (vitest) that boots the route via Nitro test util or by calling the handler directly with a hand-crafted h3 event: -- Valid signature + `list` → 200, ack body shape correct, `response_url` POSTed with formatted list. -- Invalid signature → 401. -- Disallowed user → 200 + "Not authorized." -- `cancel AWT-42` with no entry → "No active run." -- `cancel AWT-42` with entry → `cancelRun` invoked once with the right args. - ---- - -## Phase 5 — Slack app config + docs (no code, but blocks shipping) - -**Task 5.1:** In api.slack.com app settings: -- Slash Commands → add `/ai-workflow` → request URL `https:///webhooks/slack`. -- Reinstall app (`commands` scope is already granted on most chat-adapter installs; verify). -- Copy the Signing Secret into Vercel env (`SLACK_SIGNING_SECRET`) for Production + Preview. - -**Task 5.2:** Update `init-slack` skill (`.claude/skills/init-slack`) to also prompt for `SLACK_SIGNING_SECRET` and mention the slash-command URL. Add a one-paragraph operator note in `README.md` under the existing Slack section. - ---- - -## Verification checklist - -- [ ] `pnpm test` passes (new unit + integration tests) -- [ ] Locally: `vercel dev` + `ngrok` → run `/ai-workflow list` from Slack, see ack <3s and final list message -- [ ] Bad-signature curl returns 401 -- [ ] `/ai-workflow cancel AWT-` cancels the run and posts confirmation -- [ ] Workflow-side: registry entry gone, sandbox stopped, Jira thread shows the existing cancel notification (already handled by `cancelRun`'s downstream) - ---- - -## Risks / open questions - -1. **`event.waitUntil` on Nitro/Vercel preset** — confirm h3 exposes it (or use Nitro's `event.context.waitUntil` if applicable). Fallback: do the work synchronously and rely on Slack's 3s being usually achievable for a single Redis read; cancel uses two extra ops which is borderline. Safer to confirm waitUntil first. -2. **Multi-channel installs** — current outbound adapter is single-channel via `CHAT_SDK_CHANNEL_ID`. Slash commands can come from any channel the bot is in; `response_url` makes that fine for replies, but if you want to *restrict* commands to one channel, add a channel allowlist alongside the user one. -3. **Concurrency** — `cancel` racing the dispatch claim is already handled by `dispatch.ts`'s post-claim verification, so no new logic needed. diff --git a/docs/superpowers/specs/2026-03-23-agent-session-memory-design.md b/docs/superpowers/specs/2026-03-23-agent-session-memory-design.md deleted file mode 100644 index 5283d1f..0000000 --- a/docs/superpowers/specs/2026-03-23-agent-session-memory-design.md +++ /dev/null @@ -1,105 +0,0 @@ -# Agent Session Memory - -## Problem - -When blazebot's agent hits a blocker (needs clarification, incomplete data), it returns `clarification_needed` and the ticket moves to backlog. When the ticket returns to the AI column, a fresh agent session starts with zero context about what was already done — wasting tokens re-analyzing the codebase and potentially making different decisions. - -## Solution - -Prompt changes to `implement.md` and `review-fix.md`, plus a one-line code change to `context.ts` to include the ticket identifier in `requirements.md`. - -The agent writes a structured memory file to `blazebot/memory/[TASK_ID].md` on the feature branch. On subsequent runs, it checks for this file first and restores context before doing any work. The memory file is **always updated before completing** — regardless of outcome — so it serves as a cumulative audit trail and context source for future sessions (including review-fix). - -## Code Change - -**`src/sandbox/context.ts`** — Add `ticket.identifier` to both `assembleImplementationContext` and `assembleFixingFeedbackContext` so the agent knows the Jira key (e.g., `AIW-123`) for naming the memory file. - -The `ImplementationContextInput` and `FixingFeedbackContextInput` interfaces gain an `identifier: string` field. The rendered requirements.md gets a `## Ticket ID` section. - -Callers in `implementation.ts` and `review-fix.ts` pass `ticket.identifier` through. - -## Memory File - -**Location:** `blazebot/memory/[TASK_ID].md` (e.g., `blazebot/memory/AIW-123.md`) - -The agent derives `TASK_ID` from the ticket identifier in requirements.md. - -**Format:** - -```markdown -# Session Memory — [TASK_ID] - -## Progress -- What was analyzed, understood, attempted so far - -## Decisions Made -- Technical choices and reasoning - -## Blockers -- What's blocking progress (if clarification_needed) -- Specific questions that need answers - -## Files Touched -- List of files created/modified with brief notes -``` - -## Agent Flow - -``` -Agent starts - → Check if blazebot/memory/[TASK_ID].md exists - → If yes: read it, restore context, skip re-analysis - → If no: proceed normally - → Work on task - → ALWAYS update memory file before returning (any outcome) - → Return result (implemented / clarification_needed / failed) -``` - -The memory file is updated on **every** outcome — not just blockers. This ensures: -- Review-fix agents always have implementation context available -- Multiple sessions build a cumulative record of progress and decisions -- The file serves as an audit trail even for successful runs - -## Prompt Changes - -### implement.md - -1. **New step 0 (before all other steps):** Check for `blazebot/memory/[TASK_ID].md`. If it exists, read it immediately to restore context from a previous session. Use the progress, decisions, and file list to avoid redundant analysis. - -2. **New "Session Memory" section:** Before returning your result (any outcome — `implemented`, `clarification_needed`, or `failed`), write/update your session progress to `blazebot/memory/[TASK_ID].md` using the prescribed format. Include what you analyzed, decisions you made, current status, and which files you touched. Commit this file with your other changes. - -### review-fix.md - -1. **New step 0 (before all other steps):** Same as above — check for and read `blazebot/memory/[TASK_ID].md` to restore context from the implementation phase or prior fix attempts. - -2. **New "Session Memory" section:** Before returning your result, update the memory file with your review-fix progress. The review-fix agent reads memory but also writes to it — appending its own progress, decisions, and files touched. Note: review-fix does not support `clarification_needed` as a result, so memory is written on `implemented` or `failed` outcomes only. - -## Changes Summary - -| File | Change | -|------|--------| -| `src/sandbox/context.ts` | Add `identifier` field to context interfaces, render `## Ticket ID` section | -| `src/workflows/implementation.ts` | Pass `ticket.identifier` to context assembly | -| `src/workflows/review-fix.ts` | Pass `ticket.identifier` to context assembly | -| `.blazebot/prompts/implement.md` | Add step 0 (read memory) + Session Memory section | -| `.blazebot/prompts/review-fix.md` | Add step 0 (read memory) + Session Memory section | - -## Why This Works - -- **File persistence:** The sandbox clones the feature branch. Memory files committed on previous runs are already on disk. -- **File extraction:** `extractChanges` picks up all files changed in the last commit, including `blazebot/memory/*.md`. -- **Push:** `pushChanges` pushes all extracted files to the branch, including memory files. -- **End hook:** `runEndHook` force-commits any uncommitted changes before teardown, ensuring memory files are never lost even if the agent forgets to commit. - -## Memory File Lifecycle - -1. **First run:** Agent works normally. Before returning (any outcome), it creates `blazebot/memory/AIW-123.md` with session progress. -2. **Subsequent runs (memory exists):** Agent reads memory file as step 0, skips redundant analysis, picks up where it left off. Updates the memory file before returning. -3. **Review-fix runs:** Agent reads existing memory to understand implementation context and prior decisions. Updates memory with review-fix progress. -4. **Completion:** Memory file stays on the branch as an audit trail. - -## Edge Cases - -- **Multiple clarification rounds:** Memory file is updated/overwritten each time with cumulative progress. -- **Review-fix after implementation:** Review-fix agent reads implementation memory for context, then appends its own progress. -- **First run succeeds:** Memory file is still created — captures decisions and progress for potential review-fix sessions. diff --git a/docs/superpowers/specs/2026-03-26-e2e-tests-design.md b/docs/superpowers/specs/2026-03-26-e2e-tests-design.md deleted file mode 100644 index 93d227b..0000000 --- a/docs/superpowers/specs/2026-03-26-e2e-tests-design.md +++ /dev/null @@ -1,259 +0,0 @@ -# E2E Test Suite Design - -## Overview - -End-to-end test suite for the ai-workflow service that validates the full ticket-to-PR pipeline against real external services: Jira, GitHub, Upstash Redis, Vercel Sandbox, and Claude Code agent. - -Tests are split into two tiers: -- **Tier 1 (fast, ~5 min):** Integration plumbing — webhooks, polling, capacity, deduplication. No agent. -- **Tier 2 (slow, ~1-2 hours):** Full agent flows — implementation, clarification, review-fix. Real Claude agent in real Vercel Sandbox. - -## Decisions - -- **Full external integration** — all tests hit real Jira, GitHub, Upstash Redis, Vercel Sandbox. -- **Real Claude agent** — Tier 2 runs the actual agent, not mocks. -- **Structural assertions only** — verify side effects (PR created, ticket moved, comments posted, Redis cleaned) without inspecting agent-generated code. -- **Self-contained tests** — each test creates its own Jira tickets and cleans up after itself. -- **No Slack assertions** — Slack notifications are a side effect, not a core flow. Can be added later. -- **No agent-failure test** — unreliable to force with a real agent. Already covered by unit tests. -- **Trigger:** manual CLI (`npm run test:e2e`) and CI via GitHub Actions `workflow_dispatch`. -- **CI secrets:** GitHub `e2e` environment with optional approval gate. - -## File Structure - -``` -e2e/ -├── vitest.e2e.config.ts # Separate vitest config (long timeouts) -├── helpers/ -│ ├── jira.ts # Create/delete test tickets, move columns, read comments -│ ├── github.ts # Check PRs/branches, cleanup -│ ├── redis.ts # Direct Upstash KV reads for assertion + cleanup -│ ├── webhook.ts # Craft and send signed Jira webhooks -│ └── wait.ts # Polling utilities (waitForPR, waitForTicketStatus, etc.) -├── tier1/ # Fast tests -│ ├── webhook-signature.test.ts -│ ├── webhook-dispatch.test.ts -│ ├── webhook-cancel.test.ts -│ ├── webhook-ignore.test.ts -│ ├── cron-poll.test.ts -│ ├── cron-reconciliation.test.ts -│ └── duplicate-dispatch.test.ts -└── tier2/ # Slow tests (sequential) - ├── implementation-happy.test.ts - ├── clarification-flow.test.ts - └── review-fix-flow.test.ts -``` - -## Environment & Configuration - -Tests run against a deployed instance. The target URL and service credentials are provided via `.env.e2e` (local) or GitHub environment secrets (CI). - -### E2E env vars - -``` -# Target server -E2E_BASE_URL=https://your-staging.vercel.app - -# Jira (for creating test tickets + crafting signed webhooks) -JIRA_BASE_URL= -JIRA_EMAIL= -JIRA_API_TOKEN= -JIRA_PROJECT_KEY= -JIRA_WEBHOOK_SECRET= -COLUMN_AI=AI -COLUMN_AI_REVIEW=AI Review -COLUMN_BACKLOG=Backlog - -# GitHub (for PR/branch assertions + cleanup) -GITHUB_TOKEN= -GITHUB_OWNER= -GITHUB_REPO= - -# Cron auth -CRON_SECRET= - -# Upstash Redis (for registry assertions + cleanup) -AI_WORKFLOW_KV_REST_API_URL= -AI_WORKFLOW_KV_REST_API_TOKEN= -``` - -`.env.e2e` is gitignored. `.env.e2e.example` is committed. - -### npm scripts - -```json -{ - "test:e2e": "vitest run --config e2e/vitest.e2e.config.ts", - "test:e2e:tier1": "vitest run --config e2e/vitest.e2e.config.ts e2e/tier1/", - "test:e2e:tier2": "vitest run --config e2e/vitest.e2e.config.ts e2e/tier2/" -} -``` - -## Vitest E2E Config - -Separate config at `e2e/vitest.e2e.config.ts`: -- Environment: `node` -- Globals: `true` -- Include: `e2e/**/*.test.ts` -- Sequence: serial (no parallelism between test files) -- Setup file loads `.env.e2e` -- Uses vitest [projects](https://vitest.dev/guide/workspace) to set different timeouts per tier: - - `e2e/tier1/**` → `testTimeout: 120_000` (2 min) - - `e2e/tier2/**` → `testTimeout: 2_100_000` (35 min) - -## Helper Utilities - -### `e2e/helpers/jira.ts` - -| Function | Purpose | -|----------|---------| -| `createTestTicket(overrides?)` | Creates ticket with title `[E2E] test-`, returns `{ ticketKey, ticketId }` | -| `moveTicketToColumn(ticketKey, column)` | Transitions ticket to target status column | -| `getTicketStatus(ticketKey)` | Reads current ticket status | -| `getTicketComments(ticketKey)` | Reads ticket comments (for clarification assertion) | -| `deleteTicket(ticketKey)` | Deletes ticket (cleanup) | - -### `e2e/helpers/github.ts` - -| Function | Purpose | -|----------|---------| -| `findPR(branchName)` | Returns PR data or null | -| `getPRCommits(prNumber)` | Returns commit list for assertion | -| `deleteBranch(branchName)` | Cleanup | -| `closePR(prNumber)` | Cleanup | - -### `e2e/helpers/redis.ts` - -| Function | Purpose | -|----------|---------| -| `getRunId(ticketKey)` | Check if a run is registered | -| `listAll()` | List all active entries | -| `cleanup(ticketKey)` | Force-remove entry (cleanup) | - -### `e2e/helpers/webhook.ts` - -| Function | Purpose | -|----------|---------| -| `sendJiraWebhook(payload, options?)` | Signs payload with HMAC-SHA256, POSTs to `E2E_BASE_URL/webhooks/jira`. Options: `{ invalidSignature?: boolean, omitSignature?: boolean }` | - -### `e2e/helpers/wait.ts` - -| Function | Purpose | -|----------|---------| -| `waitFor(fn, { timeout, interval })` | Generic poller: calls `fn` every `interval` until truthy or timeout. Default interval: 5s | -| `waitForPR(branchName, timeout?)` | Polls GitHub until PR appears. Default timeout: 35 min | -| `waitForTicketStatus(ticketKey, status, timeout?)` | Polls Jira until target column. Default timeout: 35 min | -| `waitForRegistryClean(ticketKey, timeout?)` | Polls Redis until entry gone. Default timeout: 35 min | - -Tier 1 helper defaults: 30s timeout. Tier 2 helper defaults: 35 min timeout. - -## Test Cases - -### Tier 1 — Fast Tests - -#### `webhook-signature.test.ts` -- Valid signature → 200 OK -- Invalid signature → 401 -- Missing signature → 401 -- Empty body → 400 - -#### `webhook-dispatch.test.ts` -- Create ticket in AI column, send signed webhook -- Assert: response `{ action: "dispatch", dispatched: true }` -- Assert: Redis has entry for ticket -- Cleanup: delete ticket, clean Redis - -#### `webhook-cancel.test.ts` -- Create ticket, dispatch via webhook -- Move ticket away from AI column, send another webhook -- Assert: response `{ action: "cancel" }` -- Assert: Redis entry removed -- Cleanup: delete ticket - -#### `webhook-ignore.test.ts` -- Send webhook for non-status-change event -- Assert: response `{ action: "ignored" }` - -#### `cron-poll.test.ts` -- Create ticket in AI column, call `GET /cron/poll` with Bearer token -- Assert: response `discovered >= 1` -- Call without auth → 401 -- Cleanup: delete ticket, clean Redis - -#### `cron-reconciliation.test.ts` -- Manually insert stale Redis entry (ticketKey not in AI column) -- Call poll endpoint -- Assert: response `cleaned >= 1`, Redis entry removed - -#### `duplicate-dispatch.test.ts` -- Create ticket, dispatch via webhook -- Immediately send same webhook again -- Assert: second response `{ reason: "already_claimed" }` -- Cleanup: delete ticket, clean Redis - -### Tier 2 — Slow Tests - -#### `implementation-happy.test.ts` -- Create ticket: "Add a `GET /ping` endpoint that returns `{ ping: 'pong' }`" -- Move to AI column, send webhook -- `waitForPR("blazebot/", 35min)` — PR appears -- `waitForTicketStatus(ticketKey, "AI Review")` — ticket moved -- Assert: PR has commits, branch exists, Redis entry cleaned -- Cleanup: close PR, delete branch, delete ticket - -#### `clarification-flow.test.ts` -- Create ticket with vague description: "Do the thing" -- Move to AI column, send webhook -- `waitForTicketStatus(ticketKey, "Backlog", 35min)` — ticket moved back -- Assert: ticket has comment with numbered questions, Redis entry cleaned -- Cleanup: delete ticket - -#### `review-fix-flow.test.ts` -- Depends on implementation happy path completing first (needs existing PR) -- Add review comment on the PR: "Please rename the endpoint to `/healthcheck`" -- Move ticket back to AI column, send webhook -- `waitForTicketStatus(ticketKey, "AI Review", 35min)` — ticket moved after fix -- Assert: PR has new commits since the review comment, Redis entry cleaned -- Cleanup: close PR, delete branch, delete ticket - -**Sequencing:** `implementation-happy` and `review-fix-flow` share a ticket/PR lifecycle. They run in sequence within a single describe block or ordered test files. `clarification-flow` is independent. - -## Test Lifecycle & Cleanup - -Each test follows: -``` -beforeAll → create test resources (tickets, Redis entries, etc.) -test → trigger flow, wait, assert -afterAll → cleanup ALL created resources (always runs, even on failure) -``` - -- Every helper that creates a resource returns an ID pushed to a cleanup array -- `afterAll` iterates in reverse and deletes: tickets, branches, PRs, Redis entries -- Cleanup is best-effort — individual failures are logged but don't fail the test -- Ticket titles prefixed with `[E2E]` so leaked resources are identifiable for manual cleanup - -## CI — GitHub Actions - -### Workflow file: `.github/workflows/e2e.yml` - -```yaml -name: E2E Tests -on: - workflow_dispatch: - inputs: - tier: - description: "Which tier to run" - type: choice - options: - - tier1 - - tier2 - - all - default: all -``` - -- Uses `environment: e2e` to pull secrets from the dedicated GitHub environment -- Runs on `ubuntu-latest` -- Steps: checkout → install deps → build → run selected tier(s) -- When `tier: all`, Tier 1 runs first. Tier 2 runs only if Tier 1 passes. -- Job timeout: 15 min for Tier 1, 2.5 hours for Tier 2 -- On failure: upload vitest output as GitHub Actions artifact for debugging diff --git a/docs/superpowers/specs/2026-04-02-failed-ticket-safeguard-design.md b/docs/superpowers/specs/2026-04-02-failed-ticket-safeguard-design.md deleted file mode 100644 index 0ece110..0000000 --- a/docs/superpowers/specs/2026-04-02-failed-ticket-safeguard-design.md +++ /dev/null @@ -1,147 +0,0 @@ -# Failed Ticket Safeguard Design - -## Problem - -When a workflow fails and the catch block tries to move the ticket to backlog via Jira, that move can also fail (e.g., Jira outage, permission error, network timeout). When this happens: - -1. The ticket remains in the AI column in Jira -2. The Redis run entry is preserved (by design — `unregisterRun` is skipped when move fails) -3. The WDK run is marked as `failed` -4. Reconciliation detects the `failed` run and unregisters it from Redis -5. Next poll cycle rediscovers the ticket in the AI column and dispatches it again -6. The workflow fails again for the same reason — **infinite loop** - -## Solution - -Add a "failed ticket" marker in Redis. Before dispatching a ticket, check if it's marked as failed. If so, skip it. The marker is cleared when the ticket leaves the AI column (detected by the existing reconciliation loop), meaning a human just needs to move the ticket out and back in to retry. - -## Scope - -The failure marker is **only** written when `moveTicket` to backlog fails in the workflow catch block. If `moveTicket` succeeds, the ticket is safely in backlog and won't be rediscovered by the poll — no marker needed. - -## Redis Data Model - -**Hash key:** `blazebot:failed-tickets:{ENV_PREFIX}` - -Follows the same pattern as the existing `blazebot:active-runs:{ENV_PREFIX}` hash. - -**Field:** Ticket key (e.g., `AWT-42`) - -**Value:** JSON string with error context: - -```json -{ - "runId": "run_abc123", - "error": "Failed to move ticket to backlog: 403 Forbidden", - "failedAt": "2026-04-02T12:34:56.000Z" -} -``` - -No TTL on the hash — entries are explicitly removed during reconciliation. - -## Write Path — Marking Failures - -In `src/workflows/implementation.ts`, the existing catch block is modified: - -```typescript -catch (err) { - const moved = await moveTicket(ticketId, env.COLUMN_BACKLOG) - .then(() => true) - .catch(() => false); - if (moved) { - await unregisterRun(ticket.identifier).catch(() => {}); - } else { - await markTicketFailed(ticket.identifier, runId, err).catch(() => {}); - } - throw err; -} -``` - -`markTicketFailed` writes to the `blazebot:failed-tickets` hash. It is `.catch(() => {})`-guarded because if even this Redis write fails, we still want to re-throw the original error. Reconciliation will eventually handle the stale run. - -The same pattern is applied to `src/workflows/review-fix.ts` if it has an equivalent catch block. - -## Read Path — Skipping Failed Tickets - -In `src/lib/dispatch.ts`, before the atomic `claim` call: - -```typescript -const isFailed = await runRegistry.isTicketFailed(ticketKey); -if (isFailed) { - return { started: false, reason: "previously_failed" }; -} -``` - -This is a single `hget` call before `claim`, avoiding wasted claim attempts on tickets known to be stuck. The `"previously_failed"` reason surfaces in poll response logs. - -## Clear Path — Reconciliation Cleanup - -In `src/lib/reconcile.ts`, after the existing reconciliation logic, iterate the `blazebot:failed-tickets` hash: - -```typescript -const failedTickets = await runRegistry.listAllFailed(); -for (const { ticketKey } of failedTickets) { - if (!aiColumnTicketKeys.has(ticketKey)) { - await runRegistry.clearFailedMark(ticketKey); - } -} -``` - -When a ticket leaves the AI column (moved by a human), the next reconciliation pass removes the failure marker. If the ticket is later moved back to AI, it gets a fresh dispatch attempt. - -## RunRegistryAdapter Interface Changes - -Three new methods added to the existing interface in `src/adapters/run-registry/types.ts` (or wherever the interface is defined): - -```typescript -interface RunRegistryAdapter { - // ... existing methods (claim, register, getRunId, unregister, listAll) ... - - markFailed(ticketKey: string, meta: { runId: string; error: string; failedAt: string }): Promise; - isTicketFailed(ticketKey: string): Promise; - listAllFailed(): Promise>; - clearFailedMark(ticketKey: string): Promise; -} -``` - -## Upstash Implementation - -In `src/adapters/run-registry/upstash.ts`: - -| Method | Redis Operation | -|--------|----------------| -| `markFailed` | `hset("blazebot:failed-tickets:{ENV}", ticketKey, JSON.stringify(meta))` | -| `isTicketFailed` | `hget(...)` returns truthy/falsy | -| `listAllFailed` | `hgetall(...)` with JSON.parse on values | -| `clearFailedMark` | `hdel(...)` | - -All follow the exact same Redis patterns already used for active-runs. - -## Full Flow - -1. **Workflow fails** — catch block tries `moveTicket` to backlog -2. **If move fails** — `markTicketFailed()` writes to `blazebot:failed-tickets` hash -3. **Next poll** — `dispatchDiscoveredTickets` calls `isTicketFailed()` — skips with `"previously_failed"` -4. **Human moves ticket out of AI column** — reconciliation calls `clearFailedMark()` — marker removed -5. **Human moves ticket back to AI** — dispatched fresh, no marker blocking it - -## Testing - -- Unit test: `markFailed` writes correct JSON to Redis hash -- Unit test: `isTicketFailed` returns `true` when marker exists, `false` when absent -- Unit test: `clearFailedMark` removes the entry -- Integration test: dispatch skips a ticket with a failure marker (returns `"previously_failed"`) -- Integration test: reconciliation clears failure marker when ticket leaves AI column -- Integration test: full loop — fail + move fails → marked → skipped → moved out → cleared → redispatched - -## Files to Modify - -| File | Change | -|------|--------| -| `src/adapters/run-registry/upstash.ts` | Add `markFailed`, `isTicketFailed`, `listAllFailed`, `clearFailedMark` | -| `src/adapters/run-registry/types.ts` | Extend `RunRegistryAdapter` interface | -| `src/workflows/implementation.ts` | Add `markTicketFailed` call in catch block | -| `src/workflows/review-fix.ts` | Same catch block change (if applicable) | -| `src/lib/dispatch.ts` | Add `isTicketFailed` check before `claim` | -| `src/lib/reconcile.ts` | Add failed-ticket cleanup pass | -| Tests for each of the above | New test cases | diff --git a/docs/superpowers/specs/2026-04-02-review-fix-cicd-and-line-comments-design.md b/docs/superpowers/specs/2026-04-02-review-fix-cicd-and-line-comments-design.md deleted file mode 100644 index d826485..0000000 --- a/docs/superpowers/specs/2026-04-02-review-fix-cicd-and-line-comments-design.md +++ /dev/null @@ -1,170 +0,0 @@ -# Review-Fix Flow: CI/CD Check Awareness & Structured Line Comments - -## Problem - -When the review-fix workflow runs, the agent is missing two key pieces of information: - -1. **CI/CD check results** — the agent has no visibility into whether checks (lint, build, tests, e2e) passed or failed. It cannot act on failures it doesn't know about. -2. **Line-coupled comment location** — review comments attached to specific lines lose their file path and line numbers. The agent sees flat text like `"Bob: Fix the typo"` with no indication of where in the code the comment refers to. - -## Solution - -### 1. Enrich `PRComment` with Location Fields - -Extend the existing `PRComment` interface with optional location fields. GitHub's `pulls.listReviewComments()` already returns `path`, `start_line`, and `line` — currently discarded. - -**Updated `PRComment` in `src/adapters/vcs/types.ts`:** - -```typescript -export interface PRComment { - author: string; - body: string; - liked: boolean; - filePath?: string; // only on review comments (not issue comments) - startLine?: number; // start of the comment range - endLine?: number; // end of the comment range (same as startLine for single-line) -} -``` - -**GitHub adapter mapping in `getPRComments()`:** - -Review comments map `c.path` -> `filePath`, `c.start_line` -> `startLine`, `c.line` -> `endLine`. -Issue comments have no location and remain as-is. - -**Context formatting in `formatPRComments()`:** - -Line-coupled comments render with a structured header: - -``` -### src/lib/auth.ts (lines 42-45) -Bob: Use a constant instead of a magic number - -### src/components/Form.tsx (line 12) -Alice (liked): Looks good but add error handling -``` - -General comments (no `filePath`) render as before: - -``` -Bob: Overall looks good, just a few nits -``` - -Comments are grouped: line-coupled comments first (sorted by file path), then general comments. - -### 2. CI/CD Check Results with Logs for Failures - -Add a new type and method to fetch check run results, including full log output for failed checks. - -**New type in `src/adapters/vcs/types.ts`:** - -```typescript -export interface CheckRunResult { - name: string; - status: "completed" | "in_progress" | "queued"; - conclusion: string | null; // "success", "failure", "cancelled", "timed_out", etc. - logs?: string; // full output, only populated for non-success conclusions -} -``` - -**New method on `VCSAdapter` interface:** - -```typescript -getCheckRunResults(prId: number): Promise; -``` - -**GitHub adapter implementation:** - -1. Get the PR's head SHA via `pulls.get(prId)` -2. Call `checks.listForRef({ ref: headSha })` to get all check runs -3. For each check where `conclusion !== 'success'` and status is `'completed'`: fetch logs via `actions.listJobsForWorkflowRun()` to find the matching job, then `actions.downloadJobLogsForWorkflowRun()` for the log content -4. Return all checks; only failures have `logs` populated - -**Context formatting — new `formatCheckResults()` function:** - -All checks passed: -``` -All CI/CD checks passed. -``` - -No checks found: -``` -No CI/CD checks found. -``` - -Mixed results: -``` -Passed: lint, build, type-check - -### Failed: test - - -### Failed: e2e - -``` - -### 3. Thread Data Through the Workflow - -**`FixingFeedbackContextInput` in `src/sandbox/context.ts`:** - -```typescript -export interface FixingFeedbackContextInput { - ticket: TicketData; - prompt: string; - skills?: string; - prComments: PRComment[]; - hasConflicts: boolean; - checkResults: CheckRunResult[]; -} -``` - -**`assembleFixingFeedbackContext()` adds a new section** between "PR Review Feedback" and "Merge Conflicts": - -``` -## CI/CD Check Results - - -``` - -**`fetchPRContext()` in `src/workflows/review-fix.ts`:** - -Add `vcs.getCheckRunResults(pr.id)` call alongside existing comment and conflict fetches. Return `checkResults` in the result object. - -**`assembleReviewFixRequirements()`** passes `checkResults` through to `assembleFixingFeedbackContext()`. - -### 4. Prompt Update - -In the review-fix prompt (`src/lib/prompts.ts`), add a new step after merge conflict resolution and before addressing review comments: - -``` -3. If CI/CD checks failed, read the failure logs in "CI/CD Check Results" and fix the underlying issues (test failures, lint errors, build errors, etc.). -``` - -Update the constraints section to acknowledge CI failures as actionable: - -``` -- Address CI/CD check failures in addition to review comments. -``` - -## Files Changed - -| File | Change | -|------|--------| -| `src/adapters/vcs/types.ts` | Add fields to `PRComment`, add `CheckRunResult`, add `getCheckRunResults` to `VCSAdapter` | -| `src/adapters/vcs/github.ts` | Map location fields in `getPRComments()`, implement `getCheckRunResults()` | -| `src/sandbox/context.ts` | Update `FixingFeedbackContextInput`, add `formatCheckResults()`, update `formatPRComments()` grouping, add CI/CD section to template | -| `src/workflows/review-fix.ts` | Fetch check results in `fetchPRContext()`, pass to context assembly | -| `src/lib/prompts.ts` | Add CI/CD step to review-fix prompt | -| `src/sandbox/context.test.ts` | Update tests for new fields and formatting | - -## Edge Cases - -- **Non-GitHub-Actions checks** (external CI like CircleCI, Jenkins): `checks.listForRef()` returns the check run but `actions.downloadJobLogsForWorkflowRun()` won't work. For these, populate `logs` as `null` and show the check name + conclusion without logs. -- **Very large logs**: GitHub Actions logs can be large. No truncation — the agent needs the full output to diagnose failures. If this becomes a token problem in practice, we can add truncation later. -- **No check runs at all**: Some repos may not have CI configured. Show "No CI/CD checks found." and proceed normally. - -## Out of Scope - -- Fetching logs for in-progress or queued checks (only completed failures) -- Re-running failed checks from the agent -- Fetching review approval/request status -- Threaded comment conversations (parent/reply chains) diff --git a/docs/superpowers/specs/2026-04-02-sandbox-push-design.md b/docs/superpowers/specs/2026-04-02-sandbox-push-design.md deleted file mode 100644 index 257ec97..0000000 --- a/docs/superpowers/specs/2026-04-02-sandbox-push-design.md +++ /dev/null @@ -1,313 +0,0 @@ -# Sandbox Push — Push from Sandbox with Real Commit Messages - -**Date:** 2026-04-02 -**Status:** Draft (v2 — full clone approach) - -## Problem - -The original GitHub API push flow (blob → tree → commit → updateRef) lost all agent commit -messages and created a single flat commit. We replaced it with sandbox-side `git push`, but -shallow clones (`depth: 1`) cause "no history in common with main" errors on PR creation — -the unshallow + remote-swap flow is fragile and hard to debug. - -Root cause: shallow clones combined with remote removal/re-addition create edge cases where -git's object graph becomes disconnected. GitHub then rejects the PR because the branch -commits don't share ancestry with main. - -## Solution - -**Full clone, no bare repo, agent commits only, server pushes.** - -1. Remove `depth: 1` — clone the full repo. The ~10-30s overhead is negligible vs. the - agent's 5-35 min execution time. -2. Strip auth from origin URL (instead of swapping to a local bare repo). The agent can - see the remote URL but can't push without a token. -3. Agent only commits — remove push from the Quality Gate prompt. -4. After the agent exits, server injects the token and pushes to GitHub. - -This eliminates all shallow-clone edge cases, the unshallow step, the local bare repo, -and the commit chain verification — ~40 lines of complexity that existed solely to work -around depth-1 limitations. - -## Detailed Flow - -### 1. Branch Creation (unchanged — server-side) - -The server creates the branch via GitHub API before the sandbox is provisioned. - -``` -Server: vcs.createBranch("blazebot/awt-123", "main") // Octokit API -``` - -**Files:** `src/workflows/implementation.ts:19-24`, `src/adapters/vcs/github.ts:23-51` -**Change:** None. - -### 2. Sandbox Provisioning (modified) - -Remove `depth: 1` from the clone. After clone, strip the token from the origin URL -so the agent never has push access. - -```typescript -// src/sandbox/manager.ts — Sandbox.create source -source: { - type: "git", - url: `https://github.com/${owner}/${repo}.git`, - username: "x-access-token", - password: githubToken, - revision: branch, - // No depth — full clone -}, -``` - -```bash -# After clone: strip auth from origin, replace with unauthenticated URL -git remote set-url origin https://github.com//.git -``` - -**File:** `src/sandbox/manager.ts` -**Changes:** -- Remove `depth: 1` from `Sandbox.create` source config. -- Replace the 3-command bare-repo setup with a single `git remote set-url` to strip auth. - -### 3. Git Identity + Optional Merge (simplified) - -```bash -git config user.name "ai-workflow-blazity" -git config user.email "ai-workflow@blazity.com" -``` - -For review-fix workflow, the merge fetch uses an authenticated URL passed as a CLI -argument. With a full clone, we no longer need `--unshallow` during the merge fetch — -just a normal `git fetch `. - -**File:** `src/sandbox/manager.ts` -**Change:** Remove `--unshallow` from the merge fetch command (no longer needed with -full clone). Use plain `git fetch "" `. - -### 4. Pre-Agent SHA Recording (unchanged) - -```bash -git rev-parse HEAD > /tmp/.pre-agent-sha -``` - -**File:** `src/sandbox/manager.ts` -**Change:** None. - -### 5. Agent Execution (modified prompt) - -The Quality Gate no longer includes push instructions. The agent commits only. - -**Quality Gate (both prompts):** - -``` -## Quality Gate - -Before finishing, you MUST: -- Find and run ALL quality checks in the project: tests, linting, type checking, - formatting, and any other validation scripts. -- Fix all failures and commit your fixes with descriptive messages. -``` - -The push instruction (`git push origin `) is removed. The agent should NOT push. - -**Files:** `src/lib/prompts.ts` (both `implementPrompt` and `reviewFixPrompt`) - -### 6. Agent Works - -The agent implements the feature, committing with real messages: - -``` -git commit -m "feat: add user validation schema" -git commit -m "feat: implement registration endpoint" -git commit -m "fix: handle duplicate email edge case" -git commit -m "test: add registration tests" -``` - -No push. The agent exits, wrapper script touches `/tmp/agent-done`. - -### 7. Collect Agent Output (unchanged from v1) - -Reads agent stdout/stderr and parses JSON. No file extraction. - -**File:** `src/sandbox/poll-agent.ts` — `collectAgentOutput()` -**Change:** None (already simplified in v1). - -### 8. Push from Sandbox (simplified) - -After the agent exits and output is collected, inject the token and push. -No unshallow needed — the full clone has complete history. - -```typescript -async function pushFromSandbox( - sandboxId: string, - branch: string, -): Promise<{ pushed: boolean; error?: string }> { - "use step"; - const { Sandbox } = await import("@vercel/sandbox"); - const { env } = await import("../../env.js"); - const sandbox = await Sandbox.get({ sandboxId, ...getSandboxCredentials() }); - - // Check if agent made any commits - const baseShaResult = await sandbox.runCommand("bash", [ - "-c", "cat /tmp/.pre-agent-sha 2>/dev/null || echo ''", - ]); - const headShaResult = await sandbox.runCommand("bash", ["-c", "git rev-parse HEAD"]); - const baseSha = (await baseShaResult.stdout()).trim(); - const headSha = (await headShaResult.stdout()).trim(); - - if (baseSha && baseSha === headSha) { - return { pushed: false, error: "Agent reported success but made no commits" }; - } - - // Inject token — agent process is dead - const pushUrl = `https://x-access-token:${env.GITHUB_TOKEN}@github.com/${env.GITHUB_OWNER}/${env.GITHUB_REPO}.git`; - await sandbox.runCommand("git", ["remote", "set-url", "origin", pushUrl]); - - // Push to GitHub — use HEAD: so it works even if the local branch name - // doesn't match. Use --force-with-lease so retries on an existing branch - // succeed without risking concurrent-push data loss. - const result = await sandbox.runCommand("git", [ - "push", "--force-with-lease", "origin", `HEAD:refs/heads/${branch}`, - ]); - - if (result.exitCode !== 0) { - const stdout = (await result.stdout()).trim(); - const stderr = (await result.stderr()).trim(); - return { pushed: false, error: stderr || stdout }; - } - - return { pushed: true }; -} -``` - -**What's removed vs. v1:** -- Shallow check (`git rev-parse --is-shallow-repository`) -- Unshallow step (`git fetch --unshallow origin`) -- Fallback fetch (`git fetch origin`) -- Commit chain verification (`git rev-list --count HEAD`) - -**File:** `src/sandbox/poll-agent.ts` - -### 9. Fix Agent on Push Failure (unchanged from v1) - -If `pushFromSandbox` fails, spawn a lightweight fix agent in the same sandbox. - -```typescript -async function fixAndRetryPush( - sandboxId: string, - branch: string, - pushError: string, -): Promise<{ pushed: boolean; error?: string }> { - "use step"; - const { Sandbox } = await import("@vercel/sandbox"); - const { env } = await import("../../env.js"); - const sandbox = await Sandbox.get({ sandboxId, ...getSandboxCredentials() }); - - // Write prompt to a file to avoid shell injection via pushError content - const fixPrompt = `The git push failed with this error:\n\n${pushError}\n\nFix the issues, commit your fixes, then push to origin.`; - await sandbox.writeFiles([ - { path: "/tmp/fix-prompt.txt", content: Buffer.from(fixPrompt) }, - ]); - - await sandbox.runCommand("bash", [ - "-c", - `cat /tmp/fix-prompt.txt | claude --print --model '${env.CLAUDE_MODEL}' --dangerously-skip-permissions > /tmp/fix-stdout.txt 2>/tmp/fix-stderr.txt || true`, - ]); - - // Retry push (token is still in remote URL from previous step) - const result = await sandbox.runCommand("git", [ - "push", "--force-with-lease", "origin", `HEAD:refs/heads/${branch}`, - ]); - - if (result.exitCode !== 0) { - const stdout = (await result.stdout()).trim(); - const stderr = (await result.stderr()).trim(); - return { pushed: false, error: stderr || stdout }; - } - return { pushed: true }; -} -``` - -**File:** `src/sandbox/poll-agent.ts` - -### 10. Workflow Integration (unchanged from v1) - -Both workflows use the same pattern: - -```typescript -const { output } = await collectAgentOutput(sandboxId); - -if (output.result === "implemented") { - let pushResult = await pushFromSandbox(sandboxId, branchName); - - if (!pushResult.pushed && pushResult.error) { - pushResult = await fixAndRetryPush(sandboxId, branchName, pushResult.error); - } - - if (!pushResult.pushed) { - await moveTicket(ticketId, env.COLUMN_BACKLOG); - await notifySlack(`Task ${ticket.identifier} failed: push failed — ${pushResult.error ?? "unknown"}`); - await unregisterRun(ticket.identifier); - return; - } - - await createPullRequest(branchName, ticket.title, output.summary ?? ""); - // ... rest unchanged -} -``` - -**Files:** `src/workflows/implementation.ts`, `src/workflows/review-fix.ts` -**Change:** None (already using this pattern). - -### 11. PR Creation, Ticket Update, Teardown (unchanged) - -PR creation still uses Octokit API. Ticket moves and Slack notifications unchanged. - -## Files Changed - -| File | Change | -|------|--------| -| `src/sandbox/manager.ts` | Remove `depth: 1`. Replace bare-repo setup with `git remote set-url` to strip auth. Remove `--unshallow` from merge fetch. | -| `src/lib/prompts.ts` | Remove push instruction from Quality Gate in both prompts. | -| `src/sandbox/poll-agent.ts` | Remove unshallow/shallow-check/chain-verify logic from `pushFromSandbox`. | -| `src/sandbox/poll-agent.test.ts` | Remove shallow/unshallow test cases. Simplify push tests. | -| `e2e/tier2/shallow-push.test.ts` | Delete — no longer relevant (no shallow clones). | -| `src/adapters/vcs/github.ts` | `push()` method no longer called for agent work (keep for other uses or remove). | - -## What's Preserved - -- All agent commits with their original messages -- Full commit history on the PR (not squashed) -- Token security — agent never has push access -- Merge commits in review-fix flow (natural git merge) -- Fix agent for push failures - -## What's Removed (vs. current code) - -- `depth: 1` shallow cloning -- Local bare repo (`/tmp/push-target.git`) setup -- Unshallow step (`git fetch --unshallow origin`) -- Shallow repository detection -- Commit chain verification (`git rev-list --count HEAD`) -- Push instruction in agent prompts (agent commits only) -- E2E shallow push test - -## Edge Cases - -| Case | Handling | -|------|----------| -| Agent doesn't commit | `pushFromSandbox` detects baseSha == HEAD, returns error. Workflow moves ticket to backlog. | -| Push fails (pre-push hook on GitHub, network) | `fixAndRetryPush` spawns fix agent, retries once. | -| Fix agent also fails | Move ticket to backlog with error details. | -| Sandbox dies between agent exit and push | Existing `"stopped"` detection catches this. | -| Large repository (slow full clone) | Accepted trade-off. Clone overhead is negligible vs. 5-35 min agent runtime. | -| Token in .git/config from clone | Stripped immediately via `git remote set-url` to unauthenticated URL. | - -## Security - -- **Agent never sees the GitHub token.** Origin URL stripped of auth immediately after clone. -- **Token injected only after agent process exits.** The sentinel file `/tmp/agent-done` - confirms the agent is dead before any token enters the sandbox. -- **Token exists briefly** in the sandbox git config during push, then sandbox is torn down. -- **Fix agent (step 9)** runs with the token in git config, but this is a controlled, - short-lived session with a narrow prompt. diff --git a/docs/superpowers/specs/2026-04-06-three-phase-workflow-design.md b/docs/superpowers/specs/2026-04-06-three-phase-workflow-design.md deleted file mode 100644 index 7031d7f..0000000 --- a/docs/superpowers/specs/2026-04-06-three-phase-workflow-design.md +++ /dev/null @@ -1,504 +0,0 @@ -# Three-Phase Agent Workflow - -**Date**: 2026-04-06 -**Status**: Draft - -## Problem - -The current system has two separate workflows (`implementationWorkflow` and `reviewFixWorkflow`) that each dump all context into a single agent invocation. This has several issues: - -- The agent receives a large, undifferentiated context blob and must figure out what to do -- No workflow-level control between logical phases — if research reveals clarification is needed, the agent has already started coding -- The implementation prompt is overloaded with instructions for exploration, planning, coding, testing, and review -- Two separate workflows with duplicated orchestration logic (polling, push, teardown) -- No separation of concerns — one agent failure means the entire ticket fails with no intermediate artifacts - -## Solution - -Replace both workflows with a **single unified `agentWorkflow`** that handles all ticket scenarios (new implementation, review-fix, partial work). The workflow splits work into **three sequential phases** within the same sandbox, each a separate `claude --print` call. The **workflow** orchestrates transitions, checks results between phases, and decides whether to proceed or fail fast. - -``` -┌─────────────────────────────────────────────────────────────────────┐ -│ SANDBOX (single, alive for entire flow) │ -│ │ -│ ┌──────────────┐ ┌──────────────────┐ ┌────────┐ │ -│ │ Research & │───▶│ Implementation │───▶│ Review │ │ -│ │ Plan (claude) │ │ (claude) │ │(claude)│ │ -│ └──────┬───────┘ └────────┬─────────┘ └───┬────┘ │ -│ │ │ │ │ -│ fail fast fail fast ┌─────┴──────┐ │ -│ on error/ on error/ │ approved? │ │ -│ clarification clarification └─────┬──────┘ │ -│ yes │ no (max 2) │ -│ │ └──▶ back │ -│ │ to impl │ -│ ▼ │ -│ push │ -└─────────────────────────────────────────────────────────────────────┘ -``` - -### Unified Flow - -The `agentWorkflow` replaces both `implementationWorkflow` and `reviewFixWorkflow`. It handles all scenarios because Phase 1 (Research & Plan) adapts to the current state: - -| Scenario | What Phase 1 sees | What it plans | -|----------|-------------------|---------------| -| **New ticket** (no branch/PR) | Empty branch, full requirements | Full implementation | -| **Ticket with existing PR + review feedback** | Existing commits + PR comments + CI failures | Only the fixes needed | -| **Ticket with partial work** | Some commits, incomplete work | Remaining steps | - -The workflow always receives PR feedback and CI results **when they exist**. The research agent sees this context and plans accordingly. - -### Dispatch Simplification - -`dispatch.ts` no longer branches between two workflow types. It always starts `agentWorkflow(ticketId)`. The workflow itself handles the branch check, PR context fetching, and merge-base logic internally. - -## Phase 1 — Research & Plan - -### Purpose - -Explore the repository, understand the ticket, check for existing work on the branch, and produce a **precise, minimal implementation plan** with only actionable steps. - -### Input - -Written to `/tmp/research-requirements.md`: - -- Ticket ID, title, description, acceptance criteria, comments -- Branch name (for checking existing changes via `git log`/`git diff`) -- **PR review feedback** (if an existing PR has comments — fetched by workflow) -- **CI/CD check results** (if an existing PR has failed checks — fetched by workflow) -- **Merge conflict status** (if applicable) -- Research & planning prompt (see Prompts section) - -### Agent Behavior - -1. **Read session memory** — check `blazebot/memory/[TASK_ID].md` for context from prior runs -2. Explore repo structure, read `CLAUDE.md`/`AGENTS.md` if present -3. Check `git log` / `git diff` against base branch to identify existing changes -4. If PR feedback/CI failures are present: understand what needs to be fixed -5. Identify what's already implemented vs. what remains -6. Analyze relevant files, code patterns, test setup -7. **Use the `brainstorming` skill from superpowers** to think through the approach -8. Produce a clean implementation plan with only actionable steps for the remaining work -9. Write/update session memory - -### Output Format - -The research agent output is **free-form markdown**, not structured JSON. This gives the agent flexibility to express findings naturally and organize the plan in the way that best fits the specific ticket. - -The only structured requirement is a **status line** at the very top of the output: - -``` -STATUS: completed | clarification_needed | failed -``` - -The workflow parses only this status line to decide next steps. The rest of the output (the plan, research findings, etc.) is passed as-is to Phase 2 as context. - -#### Output Constraints - -The plan portion must be **minimal and precise**: -- Each step must be directly actionable ("Create file X with Y" not "Consider how to...") -- No preamble, rationale, or noise that would confuse the implementation agent -- File paths must be concrete, not vague ("src/components/Foo.tsx" not "the relevant component") -- The output should be structured so the implementation agent can read it top-to-bottom and execute - -If `STATUS: clarification_needed`, the output should contain the questions (one per line, numbered). The workflow will extract and post them to Jira. - -If `STATUS: failed`, the output should contain the error description. - -### Workflow Decision After Phase 1 - -| Status line | Action | -|-------------|--------| -| `completed` | Save full output to `/tmp/research-plan-output.md`, proceed to Phase 2 | -| `clarification_needed` | Extract questions from output, post on Jira, move to backlog, teardown | -| `failed` | Notify Slack with error, move to backlog, teardown | - -### Sentinel & Output Files - -- Stdout: `/tmp/research-stdout.txt` -- Stderr: `/tmp/research-stderr.txt` -- Sentinel: `/tmp/research-done` -- Plan (written by workflow after parsing): `/tmp/research-plan-output.md` - -## Phase 2 — Implementation - -### Purpose - -Execute the plan from Phase 1. The agent receives precise instructions and focuses solely on coding, testing, and committing. - -### Input - -Written to `/tmp/impl-requirements.md`: - -- Ticket ID, title, acceptance criteria (for reference, kept brief) -- **Full research & plan output from Phase 1** (free-form markdown — passed as-is) -- If this is a **retry after review feedback**: also includes the review issues and feedback -- Implementation prompt (see Prompts section) - -### Agent Behavior - -1. Read the plan from Phase 1 output -2. **Use the `executing-plans` skill from superpowers** to execute the plan systematically -3. If retry: read review feedback, focus on fixing flagged issues -4. Execute each step in order -5. Run tests and quality checks -6. Commit all changes with descriptive messages - -### Output Schema - -Same as today's `AgentOutput`: - -```typescript -const implOutputSchema = z.object({ - result: z.enum(["implemented", "clarification_needed", "failed"]), - summary: z.string().optional(), - questions: z.array(z.string()).optional(), - error: z.string().optional(), -}); -``` - -### Workflow Decision After Phase 2 - -| Result | Action | -|--------|--------| -| `implemented` | Proceed to Phase 3 (Review) | -| `clarification_needed` | Post questions on Jira, move to backlog, teardown | -| `failed` | Notify Slack, move to backlog, teardown | - -### Sentinel & Output Files - -- Stdout: `/tmp/impl-stdout.txt` -- Stderr: `/tmp/impl-stderr.txt` -- Sentinel: `/tmp/impl-done` - -## Phase 3 — Review - -### Purpose - -Review the implementation diff against the plan and acceptance criteria. Check code quality, test coverage, and completeness. Use the `requesting-code-review` skill. - -### Input - -Written to `/tmp/review-requirements.md`: - -- Ticket ID, title, acceptance criteria -- Plan output from Phase 1 (what was supposed to happen) -- Git diff of all changes (`git diff ..HEAD` — captured by workflow via sandbox command) -- Review prompt (see Prompts section) - -### Agent Behavior - -1. Read the plan and acceptance criteria -2. Review the diff against the plan — did the agent follow it? -3. Check code quality, test coverage, edge cases -4. Invoke `requesting-code-review` skill to dispatch a code-reviewer subagent -5. Output approval or specific issues to fix - -### Output Schema - -```typescript -const reviewOutputSchema = z.object({ - result: z.enum(["approved", "changes_requested", "failed"]), - feedback: z.string().describe("Detailed review notes"), - issues: z.array(z.object({ - file: z.string(), - description: z.string(), - severity: z.enum(["critical", "suggestion"]), - })).describe("Specific issues found"), - error: z.string().optional(), -}); - -type ReviewOutput = z.infer; -``` - -### Workflow Decision After Phase 3 - -| Result | Action | -|--------|--------| -| `approved` | Push, create PR (or update existing), move to AI Review, notify Slack | -| `changes_requested` | If retries < MAX_REVIEW_RETRIES (2): loop back to Phase 2 with feedback. Otherwise: fail fast | -| `failed` | Notify Slack, move to backlog, teardown | - -### Review → Implementation Loop - -When review returns `changes_requested`: -1. Workflow increments a retry counter -2. Workflow writes a new `/tmp/impl-requirements.md` that includes: - - Original plan from Phase 1 - - Review feedback (`issues` + `feedback` from review output) - - Instruction: "Fix the issues listed below. Do not redo work that was approved." -3. Re-runs Phase 2 (implementation) -4. Re-runs Phase 3 (review) -5. Maximum 2 retries (3 total implementation attempts) - -### Sentinel & Output Files - -- Stdout: `/tmp/review-stdout.txt` -- Stderr: `/tmp/review-stderr.txt` -- Sentinel: `/tmp/review-done` - -## Wrapper Script Changes - -The current `buildWrapperScript` generates a single hardcoded script. It needs to become parameterized to support multiple phases. - -### New Signature - -```typescript -interface PhaseScriptOptions { - model: string; - phase: "research" | "impl" | "review"; - inputFile: string; // e.g. "/tmp/research-requirements.md" - outputFile: string; // e.g. "/tmp/research-stdout.txt" - stderrFile: string; // e.g. "/tmp/research-stderr.txt" - sentinelFile: string; // e.g. "/tmp/research-done" - jsonSchema?: string; // phase-specific JSON schema (only for impl and review phases) -} - -function buildPhaseScript(opts: PhaseScriptOptions): string; -``` - -The generated script follows the same pattern as today: -1. `cat | claude --print --model X --dangerously-skip-permissions [--output-format json --json-schema ''] > 2>` - - Research phase: NO `--output-format json` or `--json-schema` (free-form markdown output) - - Implementation and review phases: include `--output-format json --json-schema` for structured output -2. Cleanup `.claude/` artifacts -3. `touch ` - -### Stop Hook Behavior - -- **Research & Plan phase**: Stop hook should NOT enforce commits (research agent doesn't write code) -- **Implementation phase**: Stop hook enforces commits (same as today) -- **Review phase**: Stop hook should NOT enforce commits (review agent only reads) - -**Decision**: Simplest approach — only install the stop hook before Phase 2 (implementation), remove it before Phase 1 and Phase 3. Since the sandbox is provisioned once, the workflow can run a command to toggle the hook between phases. - -## Context Assembly Changes - -### Current - -- `assembleImplementationContext(ticket, prompt)` — one function, one context -- `assembleFixingFeedbackContext(ticket, prompt, prComments, hasConflicts, checkResults)` — for review-fix - -### New - -Three new context assembly functions in `src/sandbox/context.ts`: - -```typescript -// Phase 1 input — includes optional PR feedback for review-fix scenarios -interface ResearchPlanContextInput { - ticket: TicketData; - prompt: string; - branchName: string; - prComments?: PRComment[]; // present when PR exists - checkResults?: CheckRunResult[]; // present when PR exists - hasConflicts?: boolean; // present when PR exists -} -function assembleResearchPlanContext(input: ResearchPlanContextInput): string; - -// Phase 2 input (first run) -interface ImplementationContextInput { - ticket: TicketData; // kept minimal — ID, title, acceptance criteria only - prompt: string; - researchPlanMarkdown: string; // free-form output from Phase 1, passed as-is -} -function assembleImplementationContext(input: ImplementationContextInput): string; - -// Phase 2 input (retry after review) -interface ImplementationRetryContextInput { - ticket: TicketData; - prompt: string; - researchPlanMarkdown: string; // free-form output from Phase 1, passed as-is - reviewFeedback: ReviewOutput; -} -function assembleImplementationRetryContext(input: ImplementationRetryContextInput): string; - -// Phase 3 input -interface ReviewContextInput { - ticket: TicketData; // kept minimal — ID, title, acceptance criteria only - prompt: string; - researchPlanMarkdown: string; // free-form output from Phase 1, passed as-is - gitDiff: string; -} -function assembleReviewContext(input: ReviewContextInput): string; -``` - -The old `assembleFixingFeedbackContext` is removed — its PR feedback/CI data is now fed into `assembleResearchPlanContext` instead. - -## Prompt Changes - -### Current - -- `implement.md` — single monolithic prompt (exploration + planning + coding + testing + review) -- `review-fix.md` — for fixing PR feedback - -### New - -Three new prompts in `src/lib/prompts.ts`. The old `implement.md` and `review-fix.md` are removed. - -#### `research-plan.md` - -Focused prompt for Phase 1: -- **Read session memory** (`blazebot/memory/[TASK_ID].md`) first if it exists -- Explore the repo, check existing changes on the branch -- If PR feedback/CI failures are present: factor them into the plan -- **Use the `brainstorming` skill** to think through the approach -- Produce a precise implementation plan — actionable steps only, no noise -- Output starts with `STATUS: completed|clarification_needed|failed` on the first line -- Check for clarification needs (same criteria as today) -- Write/update session memory -- NO coding, NO commits -- NO `--output-format json` or `--json-schema` for this phase (free-form markdown output) - -#### `implement.md` (rewritten) - -Focused prompt for Phase 2: -- **Use the `executing-plans` skill** to systematically execute the plan from Phase 1 -- If retrying: fix the review feedback, do not redo approved work -- Run tests and quality checks -- Commit with descriptive messages -- Write/update session memory -- NO exploration (already done), NO planning (already done), NO code review (separate phase) - -#### `review.md` (new) - -Focused prompt for Phase 3: -- Review the diff against the plan and acceptance criteria -- Check code quality, test coverage, edge cases -- Use `requesting-code-review` skill to dispatch code-reviewer subagent -- Output approval or specific, actionable issues -- NO coding, NO commits - -## Workflow Changes - -### `src/workflows/implementation.ts` → `src/workflows/agent.ts` - -Renamed and rewritten. The exported function becomes `agentWorkflow(ticketId: string)`. - -The workflow changes from: - -``` -fetchTicket → createBranch → assembleContext → provision → startAgent → poll → collect → handle result → push → PR -``` - -To: - -``` -fetchTicket → createBranch → fetchPRContext (if PR exists) → provision sandbox (with mergeBase if PR exists) - → Phase 1: writeResearchInput (includes PR feedback if any) → startResearchAgent → poll → collect → check - → Phase 2: writeImplInput → configureStopHook(on) → startImplAgent → poll → collect → check - → Phase 3: captureGitDiff → writeReviewInput → configureStopHook(off) → startReviewAgent → poll → collect → check - → if changes_requested and retries < MAX: goto Phase 2 - → push → createOrUpdatePR → cleanup -``` - -### `src/workflows/review-fix.ts` — DELETED - -All review-fix logic is absorbed into `agentWorkflow`. The research agent handles PR feedback as part of its context. - -### `src/lib/dispatch.ts` — Simplified - -```typescript -// Before: branching between two workflows -const existingPR = await vcs.findPR(branchName); -const handle = existingPR - ? await start(reviewFixWorkflow, [ticket.id, branchName]) - : await start(implementationWorkflow, [ticket.id]); - -// After: always the same workflow -const handle = await start(agentWorkflow, [ticket.id]); -``` - -The workflow internally checks for existing PRs and fetches context as needed. - -### Key Implementation Details - -1. **Sandbox provisioned once** — `SandboxManager.provision()` called once at the start. If a PR exists with merge conflicts, `mergeBase` is passed to provision (same as review-fix did). The three phase scripts are written and executed sequentially within the same sandbox. - -2. **Phase execution is a reusable function** — extract a `runPhase(sandboxId, phaseConfig)` helper that handles: write input file → write wrapper script → start detached → poll → collect → parse output. This avoids duplicating the polling loop three times. - -3. **Pre-agent SHA recorded once** — `/tmp/.pre-agent-sha` is written during provisioning (before any agent runs). The push step compares against this to detect commits. - -4. **Git diff for review** — before Phase 3, the workflow runs `git diff ..HEAD` inside the sandbox (via `sandbox.runCommand`) and passes the output to the review context. - -5. **Stop hook toggling** — the workflow writes `~/.claude/settings.json` with/without the stop hook before each phase. Research and review phases get an empty hooks config; implementation gets the commit-guard hook. - -6. **Retry counter** — tracked as a workflow-level variable, not persisted to disk. Incremented when review returns `changes_requested`. - -7. **PR handling** — if a PR already exists, the workflow pushes to the same branch (force push, same as today) and does NOT create a new PR. If no PR exists, it creates one. - -### New Step Functions - -```typescript -// Generic phase runner — handles write input, start agent, poll, collect -async function runPhase( - sandboxId: string, - phase: PhaseConfig, -): Promise<{ raw: string }>; - -// Write phase input file to sandbox -async function writePhaseInput( - sandboxId: string, - inputFile: string, - content: string, -): Promise; - -// Toggle stop hook on/off -async function configureStopHook( - sandboxId: string, - enabled: boolean, -): Promise; - -// Capture git diff for review phase -async function captureGitDiff( - sandboxId: string, -): Promise; - -// Fetch PR context (comments, checks, conflicts) — returns null if no PR exists -async function fetchPRContext( - branchName: string, -): Promise; -``` - -## Session Memory - -Session memory (`blazebot/memory/[TASK_ID].md`) behavior: - -- **Phase 1 (Research & Plan)**: Reads memory first (for context from prior runs), then writes updated memory with research findings and plan -- **Phase 2 (Implementation)**: Reads and updates session memory with implementation progress -- **Phase 3 (Review)**: Reads session memory for context, writes review findings - -Each phase overwrites the memory file (same as today). The memory serves as additional context across phases and across workflow runs, but is NOT the primary handoff mechanism — the Phase 1 free-form output is the primary handoff to Phase 2 and 3. - -## File Changes Summary - -| File | Change | -|------|--------| -| `src/workflows/implementation.ts` | **Delete** — replaced by `agent.ts` | -| `src/workflows/review-fix.ts` | **Delete** — absorbed into `agent.ts` | -| `src/workflows/agent.ts` | **New** — unified three-phase `agentWorkflow` | -| `src/lib/dispatch.ts` | Simplify: remove workflow branching, always start `agentWorkflow` | -| `src/sandbox/wrapper-script.ts` | Parameterize: `buildPhaseScript(opts)` replacing `buildWrapperScript` | -| `src/sandbox/context.ts` | Add: `assembleResearchPlanContext`, rewrite `assembleImplementationContext`, add `assembleImplementationRetryContext`, add `assembleReviewContext`. Remove: `assembleFixingFeedbackContext` | -| `src/sandbox/agent-runner.ts` | Add: `ReviewOutput` schema + parser. Add: `parseResearchStatus()` (extracts STATUS line from free-form output) | -| `src/sandbox/manager.ts` | Refactor: extract stop-hook config, support phase-based execution | -| `src/sandbox/poll-agent.ts` | Generalize: `checkPhaseDone(sandboxId, sentinelFile)`, `collectPhaseOutput(sandboxId, outputFile)` | -| `src/lib/prompts.ts` | Add: `research-plan.md`, `review.md`. Rewrite: `implement.md`. Remove: `review-fix.md` | -| `src/sandbox/run-agent.ts` | Generalize: accept phase config instead of hardcoded paths | - -## Skills Installation - -The current `GLOBAL_SKILLS` in `manager.ts` installs: -- `using-superpowers` (from `superpowers` repo) -- `requesting-code-review` (from `superpowers` repo) -- `frontend-design` (from `anthropics/skills` repo) - -The `brainstorming` and `executing-plans` skills (required by Phase 1 and Phase 2) are part of the `superpowers` repo and are discoverable via the `using-superpowers` skill — they do NOT need separate installs. The phase prompts will explicitly instruct the agent to use them. - -No changes to `GLOBAL_SKILLS` are needed. - -## Non-Goals - -- **Parallel phase execution** — phases are sequential by design -- **Multiple sandboxes** — single sandbox for the entire flow -- **Agent-to-agent communication** — phases communicate only through files orchestrated by the workflow diff --git a/docs/superpowers/specs/2026-04-09-gitlab-vcs-adapter-design.md b/docs/superpowers/specs/2026-04-09-gitlab-vcs-adapter-design.md deleted file mode 100644 index 0410a04..0000000 --- a/docs/superpowers/specs/2026-04-09-gitlab-vcs-adapter-design.md +++ /dev/null @@ -1,195 +0,0 @@ -# GitLab VCS Adapter Design - -**Date:** 2026-04-09 -**Approach:** Direct Mirror (Approach A) - -## Goal - -Add a `GitLabAdapter` that implements the existing `VCSAdapter` interface, supporting all 8 methods currently provided by `GitHubAdapter`. The adapter targets GitLab.com only, uses `@gitbeaker/rest` as the API client, and fetches CI logs from GitLab CI/CD pipelines. - -## Decisions - -| Question | Decision | -|----------|----------| -| GitLab.com vs self-hosted | GitLab.com only | -| API client | `@gitbeaker/rest` | -| CI log source | GitLab CI pipelines only (no external commit statuses) | -| Architecture | Direct mirror — new file alongside `github.ts`, factory switch on `VCS_KIND` | - -## Files Changed - -| File | Change | -|------|--------| -| `src/adapters/vcs/gitlab.ts` | **New** — `GitLabAdapter` class (~250 lines) | -| `src/adapters/vcs/gitlab.test.ts` | **New** — unit tests with mocked gitbeaker (~150 lines) | -| `env.ts` | Add `"gitlab"` to `VCS_KIND` enum, add `GITLAB_*` env vars | -| `src/lib/adapters.ts` | Conditional VCS adapter creation based on `VCS_KIND` | -| `src/lib/step-adapters.ts` | Same conditional as `adapters.ts` | -| `package.json` | Add `@gitbeaker/rest` dependency | - -No changes to `types.ts`, `workflows/agent.ts`, `sandbox/poll-agent.ts`, or any other consumer — the `VCSAdapter` interface is unchanged. - -## GitLabAdapter Configuration - -```typescript -export interface GitLabConfig { - token: string; // GitLab personal access token (glpat-...) - projectId: string; // "owner/repo" path or numeric project ID - baseBranch: string; // Target branch for MRs (default: "main") -} -``` - -### Environment Variables - -``` -VCS_KIND=gitlab -GITLAB_TOKEN=glpat-xxxxxxxxxxxx -GITLAB_PROJECT_ID=blazity/demo-app -GITLAB_BASE_BRANCH=main -``` - -All `GITLAB_*` vars are optional at the schema level (only required when `VCS_KIND=gitlab`). All `GITHUB_*` vars become optional too (only required when `VCS_KIND=github`). - -## API Mapping - -Each `VCSAdapter` method maps to GitLab REST API equivalents via `@gitbeaker/rest`: - -### `createBranch(name, base)` - -| Step | GitHub (`@octokit/rest`) | GitLab (`@gitbeaker/rest`) | -|------|--------------------------|---------------------------| -| Get base ref | `git.getRef(heads/{base})` | Not needed — GitLab accepts branch name directly | -| Create branch | `git.createRef(refs/heads/{name}, sha)` | `Branches.create(projectId, name, base)` | -| Handle empty repo (409) | Seed README via `repos.createOrUpdateFileContents` | Seed README via `RepositoryFiles.create` | -| Handle existing branch (422/400) | `git.updateRef(force: true)` | `Branches.remove` + `Branches.create` | - -### `createPR(branch, title, body)` - -| Step | GitHub | GitLab | -|------|--------|--------| -| Create | `pulls.create(head, base, title, body)` | `MergeRequests.create(projectId, source, target, title, {description})` | -| Return value | `{id: data.number, url: data.html_url}` | `{id: mr.iid, url: mr.web_url}` | -| Fatal errors | 422, 404 | 409, 404 | - -**Note:** GitLab uses `iid` (project-scoped ID) not `id` (global ID). The `iid` is the MR number visible in the UI (e.g., `!42`), analogous to GitHub's PR number. - -### `push(branch, files, options?)` - -| Step | GitHub | GitLab | -|------|--------|--------| -| Push files | `getRef` → `getCommit` → `createBlob` (per file) → `createTree` → `createCommit` → `updateRef` | `Commits.create(projectId, branch, message, actions)` | - -GitLab's Commits API is significantly simpler — a single call replaces 5-6 GitHub API calls. Each file becomes an action: - -```typescript -const actions = files.map(f => ({ - action: "update" as const, // or "create" for new files - filePath: f.path, - content: f.content, -})); -``` - -**Merge commit handling:** When `mergeParentSha` is provided, the GitHub adapter creates a two-parent commit. GitLab's Commits API does not support multi-parent commits directly. Instead, we use the MergeRequests rebase API or handle conflict resolution at the MR level. For the initial implementation, we skip the merge-parent optimization and use a regular commit — the workflow's conflict resolution flow already recreates the branch from base when conflicts are detected. - -### `getBranchSha(branch)` - -| GitHub | GitLab | -|--------|--------| -| `git.getRef(heads/{branch})` → `data.object.sha` | `Branches.show(projectId, branch)` → `commit.id` | - -### `getPRComments(prId)` - -| Comment type | GitHub | GitLab | -|-------------|--------|--------| -| Review/inline comments | `pulls.listReviewComments` | `MergeRequestDiscussions.all` (filter for diff notes) | -| General comments | `issues.listComments` | `MergeRequestNotes.all` (filter for non-system notes) | -| Liked detection | `reactions.total_count > 0` | Note has award emoji (or simplified: skip, default `false`) | - -GitLab distinguishes between "notes" (general comments) and "discussions" (threaded diff comments). Both map to `PRComment[]`. - -For inline comments, GitLab notes include `position.new_path` and `position.new_line` which map to `filePath` and `startLine`/`endLine`. - -### `getCheckRunResults(prId)` - -| Step | GitHub | GitLab | -|------|--------|--------| -| Get head SHA | `pulls.get` → `head.sha` | `MergeRequests.show` → `sha` | -| List CI results | `checks.listForRef(sha)` | `MergeRequests.allPipelines` → `Jobs.all(pipelineId)` | -| Fetch failed logs | `actions.downloadJobLogsForWorkflowRun` | `Jobs.showLog(projectId, jobId)` | - -**Status mapping:** - -| GitLab job status | `CheckRunResult.status` | `CheckRunResult.conclusion` | -|-------------------|------------------------|-----------------------------| -| `success` | `"completed"` | `"success"` | -| `failed` | `"completed"` | `"failure"` | -| `running` | `"in_progress"` | `null` | -| `pending`, `created` | `"queued"` | `null` | -| `canceled` | `"completed"` | `"cancelled"` | -| `skipped` | `"completed"` | `"skipped"` | - -### `getPRConflictStatus(prId)` - -| GitHub | GitLab | -|--------|--------| -| `pulls.get` → `mergeable === false` | `MergeRequests.show` → `has_conflicts === true` | - -### `findPR(branch)` - -| GitHub | GitLab | -|--------|--------| -| `pulls.list({head: "owner:branch", state: "open"})` | `MergeRequests.all({projectId, sourceBranch: branch, state: "opened"})` | - -## Error Handling - -Follow the same pattern as `GitHubAdapter`: - -- **Fatal (non-retryable):** 404 (project not found), 409 (MR already exists for this branch pair). Throw `FatalError`. -- **Transient (retryable):** 401 (token expired), 403 (rate limit), 429 (too many requests), 5xx. Let the error propagate for workflow retry. -- **Branch conflicts:** 400 on branch create → delete + recreate. - -## Testing Strategy - -Mirror `github.test.ts` structure: - -1. Mock `@gitbeaker/rest` with `vi.mock` -2. Test all 8 methods with happy path -3. Test error handling: empty repo seed (createBranch), existing branch reset, fatal errors on createPR -4. Test status mapping for CI jobs - -Target: ~8-10 test cases matching the GitHub adapter's coverage plus GitLab-specific edge cases (status mapping). - -## Factory Update - -Both `adapters.ts` and `step-adapters.ts` get a `createVCS()` helper: - -```typescript -function createVCS(): VCSAdapter { - if (env.VCS_KIND === "gitlab") { - return new GitLabAdapter({ - token: env.GITLAB_TOKEN!, - projectId: env.GITLAB_PROJECT_ID!, - baseBranch: env.GITLAB_BASE_BRANCH ?? "main", - }); - } - return new GitHubAdapter({ - token: env.GITHUB_TOKEN!, - owner: env.GITHUB_OWNER!, - repo: env.GITHUB_REPO!, - baseBranch: env.GITHUB_BASE_BRANCH ?? "main", - }); -} -``` - -## Env Validation - -The `env.ts` schema makes all provider-specific vars optional. Runtime validation happens in the factory — if `VCS_KIND=gitlab` but `GITLAB_TOKEN` is missing, the `!` assertion will throw at startup. This matches how the project handles other conditional adapters. - -A future improvement could add Zod `.refine()` for cross-field validation, but that's out of scope. - -## Out of Scope - -- Self-hosted GitLab support (configurable base URL) -- GitLab-specific features beyond VCSAdapter (e.g., GitLab-specific CI features) -- Merge commit with multiple parents via Commits API (use branch reset flow instead) -- Award emoji counting for `liked` field (default to `false` initially) diff --git a/docs/superpowers/specs/2026-04-13-jira-ticket-attachments-design.md b/docs/superpowers/specs/2026-04-13-jira-ticket-attachments-design.md deleted file mode 100644 index 17a6701..0000000 --- a/docs/superpowers/specs/2026-04-13-jira-ticket-attachments-design.md +++ /dev/null @@ -1,232 +0,0 @@ -# Jira Ticket Attachments in the Sandbox — Design - -**Date:** 2026-04-13 -**Status:** Design approved, ready for implementation plan - -## Problem - -Today, the agent receives a Jira ticket as a `requirements.md` blob containing only text: title, description, acceptance criteria, comments. Any files attached to the ticket in Jira (mockups, PDFs of specs, sample JSON fixtures, screenshots) are invisible to the agent. - -We want the agent to have access to those files inside the sandbox so it can read them during research, implementation, and review. - -## Scope - -**In scope** -- Jira attachments (files uploaded to the issue via the Jira UI). -- All three phases on the same sandbox run (research, implement, review) see the same attachments. -- Best-effort delivery: a single broken attachment does not fail the workflow. - -**Out of scope (v1)** -- Following external URLs found in the ticket description/comments. Links stay as text in `requirements.md` for the agent to decide whether to fetch manually inside the sandbox. -- Cross-ticket attachment reuse ("knowledge pack" of files that outlive a single ticket). -- Attachment previews in Slack notifications. -- Content-hash dedup across tickets. - -## Key decisions - -1. **Jira attachments only.** URL-following was considered and rejected — it introduces SSRF risk, auth headaches (OAuth for Figma/Drive), unpredictable size/content, and is not needed for the v1 use case. -2. **Stage into `/tmp/attachments/` in the sandbox, not into the repo.** `requirements.md` already lives in `/tmp/`, outside the cloned repo. Placing attachments alongside means they are never in `git diff` and therefore never accidentally committed or pushed. No `.gitignore` plumbing needed. -3. **One sandbox per workflow, one staging pass.** The sandbox is provisioned once per ticket (`src/workflows/agent.ts:256`) and reused across all phases. Attachments are fetched once at workflow start and written once after `provisionSandbox`. -4. **Generated index in `requirements.md`.** The ticket description does not always reference attachments by name (someone may just drag a PNG onto the ticket). A short index at the top of `requirements.md` guarantees the agent knows what exists and where to find it. -5. **Per-file retries with skip-on-failure.** A broken attachment is logged, marked in the index, and does not block the workflow. - -## Architecture - -```text -Workflow start (src/workflows/agent.ts) - ├─ fetchAndValidateTicket (existing) - ├─ fetchAttachments (NEW step — downloads bytes from Jira) - ├─ createFeatureBranch (existing) - ├─ provisionSandbox (existing — one sandbox for the whole workflow) - ├─ writeAttachments (NEW step — writeFiles to /tmp/attachments/) - │ - ├─ Phase 1: Research (requirements.md contains the attachments index) - ├─ Phase 2: Implement (same index, same files on disk) - ├─ Phase 3: Review (same) - │ - └─ teardownSandbox (existing — kills sandbox, attachments die with it) -``` - -Attachments live at `/tmp/attachments/{sanitized-filename}` for the full workflow lifetime. - -## Components - -### 1. `JiraAdapter` — metadata + download - -**File:** `src/adapters/issue-tracker/jira.ts` - -Changes: -- Add `attachment` to the `fields=` query in `fetchTicket`. -- Parse `data.fields.attachment` into a new `attachments: JiraAttachmentMeta[]` field on `TicketContent`. -- New method `downloadAttachment(url: string): Promise`: - - GET with `redirect: "manual"`. On 302, read `Location` and re-GET **without** the `Authorization` header (Atlassian's CDN uses signed URLs; re-sending Basic auth breaks them). - - Timeout: 30s (AbortSignal). - - Max redirects: 1. - -**New type:** `TicketAttachment` added to `src/adapters/issue-tracker/types.ts`: - -```ts -export interface TicketAttachment { - id: string; - filename: string; - mimeType: string; - size: number; - contentUrl: string; -} -``` - -Added to `TicketContent`: - -```ts -export interface TicketContent { - // ...existing fields - attachments: TicketAttachment[]; -} -``` - -### 2. Workflow step — `fetchAttachments` - -**File:** `src/workflows/agent.ts` (new step) and a helper in `src/sandbox/attachments.ts` (new file). - -Signature: - -```ts -async function fetchAttachments( - attachments: TicketAttachment[] -): Promise -``` - -`DownloadedAttachment`: - -```ts -interface DownloadedAttachment { - filename: string; // sanitized, collision-resolved - originalFilename: string; - mimeType: string; - size: number; - content: Buffer; // present only on success - failed?: { reason: string; attempts: number }; // present only on failure -} -``` - -Behavior: -- Iterate attachments in Jira-returned order. -- Enforce caps (see "Safety caps" below) before calling download. -- Call `JiraAdapter.downloadAttachment(url)` with a per-file retry loop (see "Retries"). -- Sanitize filename: strip path separators (`/`, `\`), null bytes, leading dots; fall back to `attachment-{id}{ext}` if result is empty. -- Collision handling: if the sanitized filename already exists in the accumulator, append `-{id}` before the extension. -- On download failure after retries, include a `failed` entry (no `content`) so the index can reflect it. - -### 3. Workflow step — `writeAttachments` - -**File:** `src/workflows/agent.ts` (new step). - -```ts -async function writeAttachments( - sandboxId: string, - attachments: DownloadedAttachment[] -): Promise -``` - -- `Sandbox.get({ sandboxId })` then `sandbox.writeFiles(...)` for every entry with `content` defined. -- Path: `/tmp/attachments/{filename}`. -- Skip failed entries (no bytes to write). - -### 4. `context.ts` — attachments index - -**File:** `src/sandbox/context.ts` - -Add an `attachments?: DownloadedAttachment[]` parameter to all four `assembleXContext` functions (`assembleResearchPlanContext`, `assembleImplementationContext`, `assembleImplementationRetryContext`, `assembleReviewContext`). - -New helper `formatAttachmentsIndex(attachments)`: - -```md -## Attachments - -The following files from the Jira ticket are available in `/tmp/attachments/`. -Read them when relevant to the task. - -- `/tmp/attachments/mockup.png` — image/png, 340 KB -- `/tmp/attachments/api-sample.json` — application/json, 2 KB -- ⚠️ `spec.pdf` — failed to download after 3 attempts (HTTP 500) -``` - -- Section inserted **once**, right after the `## Ticket ID` / `## Ticket` header block and before `## Description`. -- Omitted entirely only when the ticket had **zero attachments** in Jira. If attachments existed but all failed to download, the section still appears with every entry marked as failed. -- Human-readable size (`340 KB`, `1.2 MB`) via a small `formatBytes` helper. - -### 5. `src/sandbox/attachments.ts` (new file) - -Exports: -- `fetchAttachmentsWithRetry(jiraAdapter, attachments, caps, logger)` — the core loop used by the workflow step. -- `sanitizeFilename(name, id)` — pure utility. -- `formatAttachmentsIndex(attachments)` — pure formatter. -- `formatBytes(n)` — pure utility. - -Kept out of `jira.ts` because retry/caps/sanitize logic is Blazebot-specific, not part of the adapter contract. - -## Safety caps - -Env-configurable with sane defaults. Declared in `env.ts`: - -| Variable | Default | Meaning | -|----------|---------|---------| -| `ATTACHMENT_MAX_FILE_SIZE_MB` | 25 | Per-file cap. Oversize files are skipped and noted in the index. | -| `ATTACHMENT_MAX_TOTAL_SIZE_MB` | 100 | Cumulative cap. Once exceeded, remaining attachments are skipped. | -| `ATTACHMENT_MAX_COUNT` | 20 | Hard cap on number of attachments. | -| `ATTACHMENT_DOWNLOAD_TIMEOUT_MS` | 30000 | Per-download timeout. | - -All caps are applied **before** downloading — cap decisions use the metadata `size` field returned by Jira, so we never fetch bytes we'll throw away. - -## Retries - -Two layers: - -1. **WDK step-level retries are disabled.** `fetchAttachments.maxRetries = 0` and `writeAttachments.maxRetries = 0`. -2. **Per-file retry loop (inside the step).** Implemented in `fetchAttachmentsWithRetry` (called by `fetchAttachments`): - - Max 3 attempts. - - Exponential backoff: 500ms → 2000ms → 5000ms. - - Retryable errors: network errors (`ECONNRESET`, `ETIMEDOUT`, `AbortError`), HTTP 5xx, HTTP 429 (honors `Retry-After` if present, capped at 10s). - - Non-retryable: 4xx other than 429 (401/403/404 typically mean auth/missing, not transient). - - After max attempts: mark the file as failed in the returned array. Do **not** throw from `fetchAttachments` — other attachments and the workflow continue. - -## Observability - -- `pino` logs at `info` for each successfully downloaded attachment: `{ ticketId, filename, mimeType, size, attempts }`. -- `pino` logs at `warn` for each failed or skipped attachment with reason: `{ ticketId, filename, reason, attempts? }`. -- Slack notification text unchanged in v1. (Future: add attachment count to the "started" message.) - -## Testing - -Unit: -- `JiraAdapter.fetchTicket` parses `attachment` field into `TicketAttachment[]` correctly (including empty array when absent). -- `JiraAdapter.downloadAttachment` follows one 302 and drops `Authorization` on the redirect. -- `sanitizeFilename` — path separators, null bytes, empty-after-sanitize fallback, extension preservation. -- `formatAttachmentsIndex` — happy path, all-failed path, empty path (omitted), mixed. -- `formatBytes` — KB/MB rounding. -- `fetchAttachmentsWithRetry` — enforces size/total/count caps without downloading; retries transient errors; gives up on 404; surfaces `failed` entries after exhausting attempts. -- `assembleResearchPlanContext` / implementation / retry / review — emit index when attachments present; omit section when empty. - -Integration: -- End-to-end with a `fetch`-mocked Jira returning 2 attachments (one image, one JSON) → `writeAttachments` called with both → sandbox receives both at expected paths. (Uses existing `@vercel/sandbox` test patterns from `manager.test.ts`.) - -## Failure modes and how we handle them - -| Failure | Behavior | -|---------|----------| -| Jira metadata fetch fails | Existing `fetchAndValidateTicket` step retry handles it (unchanged path). | -| One file 500s | Retry 3×, then mark failed in index, continue. | -| One file 404s | No retry, mark failed in index, continue. | -| File exceeds `ATTACHMENT_MAX_FILE_SIZE_MB` | Skip, mark in index, continue. | -| Total bytes exceeds `ATTACHMENT_MAX_TOTAL_SIZE_MB` | Skip remaining, mark in index, continue. | -| Count exceeds `ATTACHMENT_MAX_COUNT` | Skip overflow, mark in index, continue. | -| All downloads fail | Step still returns an array (all with `failed` set). Index shows all as failed. Workflow continues. | -| `writeAttachments` fails | WDK step retry. If still failing, workflow fails — this is the correct behavior (sandbox is broken). | - -## Migration - -No data migration. New steps are additive. Existing tickets without attachments simply get an empty `attachments` array and no index section. - -## Open questions - -None at design time. All caps are env-configurable so they can be tuned without code changes after v1 ships. diff --git a/docs/superpowers/specs/2026-04-27-codex-integration-design.md b/docs/superpowers/specs/2026-04-27-codex-integration-design.md deleted file mode 100644 index a4f0dfc..0000000 --- a/docs/superpowers/specs/2026-04-27-codex-integration-design.md +++ /dev/null @@ -1,423 +0,0 @@ -# Codex Integration — Design - -**Date:** 2026-04-27 -**Status:** Draft -**Branch:** AIW-1-codex - -## Goal - -Add OpenAI's Codex CLI (`@openai/codex`) as a second agent runtime alongside Claude Code. Operators choose at deploy time via `AGENT_KIND=claude|codex`. Both agents reach full feature parity for the existing three-phase workflow (research → impl → review): same skills, same commit-guard, same Arthur tracing, same structured output, same usage reporting. The change introduces a thin `AgentAdapter` abstraction and refactors the sandbox layer to use it; everything else (workflow orchestration, VCS, issue tracker, messaging, run registry, reconcile, dispatch) is untouched. - -## Decisions - -| Question | Decision | -|----------|----------| -| Replace Claude or add Codex alongside? | **Add alongside**, env-switched | -| Switching mechanism | `AGENT_KIND=claude\|codex` env var (single, deploy-scoped) | -| Gap-fill strategy | **Full parity** — skills, hooks, structured output, tracing all reach Codex | -| Codex variant | `@openai/codex` CLI | -| Phase parity | Same three phases for both agents | -| Default Codex model | `gpt-5-codex` | -| Architecture | Thin `AgentAdapter` interface in `src/sandbox/agents/` | -| Skills location | `~/.agents/skills/` only — never in the repo | -| Pricing | Fetched dynamically from LiteLLM's maintained JSON; tokens-only fallback | - -## Architecture - -A new `AgentAdapter` interface owns everything CLI-specific. `SandboxManager` becomes thin and orchestrator-only. Workflow code (`src/workflows/agent.ts`) threads the adapter through phase steps but otherwise keeps its current shape. - -```ts -// src/sandbox/agents/types.ts -export type PhaseKind = "research" | "impl" | "review"; - -export interface AgentAdapter { - kind: "claude" | "codex"; - install(sandbox: RunnableSandbox): Promise; - configure(sandbox: RunnableSandbox, opts: ConfigureOpts): Promise; - setCommitGuard(sandbox: RunnableSandbox, enabled: boolean): Promise; - buildPhaseScript(opts: PhaseScriptOpts): string; - artifactPaths(phase: PhaseKind): { - wrapper: string; - input: string; - stdout: string; - stderr: string; - sentinel: string; - /** Schema-validated JSON file (Codex --output-schema). null for Claude. */ - structuredOutput: string | null; - }; - parseAgentOutput(raw: string, structured: string | null): AgentOutput; - parseReviewOutput(raw: string, structured: string | null): ReviewOutput; - parseResearchStatus(raw: string, structured: string | null): ResearchResult; - extractUsage(raw: string, structured: string | null): PhaseUsage | null; -} -``` - -The Claude adapter ignores the `structured` argument in every parser — Claude embeds its schema-validated output directly in the NDJSON stream, so `paths.structuredOutput` is `null` and only `raw` matters. The Codex adapter prefers `structured` when present and falls back to `raw` (NDJSON `item.completed` scan) when the schema file is missing. The unified signature lets the workflow stay agent-agnostic. - -`createAgentAdapter(env)` picks the implementation at startup based on `AGENT_KIND`. Required credentials are validated by `env.ts` cross-field rules — if `AGENT_KIND=codex` is set without `CODEX_API_KEY` (or `CODEX_CHATGPT_OAUTH_TOKEN`), the server fails fast at startup. - -## File Layout - -```text -src/sandbox/ - agents/ - types.ts # AgentAdapter interface, shared types - claude.ts # Existing Claude logic, refactored into the adapter - codex.ts # New: codex exec wrapper, hooks.json, --output-schema parsing - shared.ts # GLOBAL_SKILLS, commit-guard script body, hook helpers - pricing.ts # fetchModelPrice(model) — LiteLLM-backed, TTL-cached - index.ts # createAgentAdapter(env) factory - claude.test.ts - codex.test.ts - pricing.test.ts - index.test.ts - manager.ts # Slimmed: provision() calls agent.install + agent.configure - poll-agent.ts # Adds collectPhase helper (raw + structured) - context.ts # Unchanged - attachments.ts # Unchanged - usage.ts # PhaseUsage type moves to agents/types.ts; formatUsageReport accepts a price lookup -``` - -Files **deleted** as part of the refactor: -- `src/sandbox/wrapper-script.ts` — body moves into each adapter's `buildPhaseScript` -- `src/sandbox/agent-runner.ts` — schema constants and parsers move into the adapters; if the file ends up empty it is deleted - -Functions **replaced** as part of the refactor: -- `configureStopHookInSandbox` (currently in `manager.ts`) → becomes `agent.setCommitGuard(sandbox, enabled)` on the adapter. The standalone export is removed; `workflows/agent.ts` calls it through the adapter. -- `installArthurTracer` (currently in `manager.ts`, Claude-shaped) → moves into each adapter's `configure()` step. Claude installs to `~/.claude/`; Codex installs to `~/.codex/` with a Codex-shaped `hooks.json`. -- The free `buildPhaseScript` import in `workflows/agent.ts` → becomes `agent.buildPhaseScript(...)` calls. - -Files **untouched**: `src/adapters/**`, `src/lib/**`, `src/routes/**`, `src/workflows/prompts-step.ts`, `src/workflows/prompts-step.test.ts`, all of the issue-tracker / VCS / messaging / run-registry code. - -## Data Flow per Ticket (Codex) - -```text -1. Cron poll → dispatch (unchanged) -2. agentWorkflow(ticketId) - a. fetchAndValidateTicket / fetchPRContext / fetchAttachments / ensureArthurTaskForTicket (unchanged) - b. provisionSandbox: - - SandboxManager.provision(branch, mergeBase) clones the repo (unchanged) - - Constructs the adapter via createAgentAdapter(env) → CodexAgentAdapter - - agent.install(sandbox) → npm i -g @openai/codex - - agent.configure(sandbox, { auth, model, arthur, arthurTaskId }): - · Writes /tmp/agent-env.sh exporting CODEX_API_KEY (or OAuth token) - · Writes ~/.codex/config.toml (model, sandbox profile, fallback file names) - · Installs the global skill set into ~/.agents/skills/ - · Writes ~/.codex/hooks.json with Arthur PreToolUse/PostToolUse/UserPromptSubmit/Stop entries - · Drops ~/.codex/hooks/commit-guard.sh on disk (Codex-flavored JSON output) - c. registerTicketSandbox (unchanged) -3. PHASE 1 (Research): - - agent.setCommitGuard(sandbox, false) - - paths = agent.artifactPaths("research") - - script = agent.buildPhaseScript({ phase: "research", model, ...paths }) - - writeAndStartPhase(sandboxId, paths.input, researchInput, paths.wrapper, script) - - pollUntilDone(sandboxId, paths.sentinel, 20) - - { raw, structured } = collectPhase(sandboxId, paths) - - phaseUsages.Research = agent.extractUsage(raw, structured) - - research = agent.parseResearchStatus(raw, structured) -4. PHASE 2 (Impl): - - agent.setCommitGuard(sandbox, true) - - script = agent.buildPhaseScript({ phase: "impl", ..., jsonSchema: AGENT_SCHEMA }) - - same write/poll/collect flow - - implOutput = agent.parseAgentOutput(raw, structured) -5. PHASE 3 (Review): same wiring as Phase 2 (currently disabled in workflow) -6. Push + PR (unchanged) -7. Teardown (unchanged) -``` - -The Claude flow is the same with `paths.structuredOutput === null` and the existing parsers; behavior is bit-compatible with what Blazebot does today. - -## Auth, Models, Env Config - -**New env vars (added to `env.ts`):** - -```ts -AGENT_KIND: z.enum(["claude", "codex"]).default("claude"), - -// Codex auth — at least one required when AGENT_KIND=codex. -CODEX_API_KEY: z.string().min(1).optional(), -CODEX_CHATGPT_OAUTH_TOKEN: z.string().min(1).optional(), - -// Codex model selection. -CODEX_MODEL: z.string().default("gpt-5-codex"), - -// Pricing — LiteLLM's community-maintained JSON. Operators in airgapped -// environments override; default works for the common case. -CODEX_PRICING_URL: z.string().url().default( - "https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json" -), -CODEX_PRICING_TTL_MS: z.coerce.number().int().positive().default(3_600_000), -``` - -**Cross-field validation** (next to the existing `VCS_KIND` check in `env.ts`): - -```ts -if (env.AGENT_KIND === "codex" && !env.CODEX_API_KEY && !env.CODEX_CHATGPT_OAUTH_TOKEN) { - throw new Error("AGENT_KIND=codex requires CODEX_API_KEY or CODEX_CHATGPT_OAUTH_TOKEN"); -} -if (env.AGENT_KIND === "claude" && !env.ANTHROPIC_API_KEY && !env.CLAUDE_CODE_OAUTH_TOKEN) { - throw new Error("AGENT_KIND=claude requires ANTHROPIC_API_KEY or CLAUDE_CODE_OAUTH_TOKEN"); -} -``` - -Existing `CLAUDE_MODEL` / `ANTHROPIC_API_KEY` / `CLAUDE_CODE_OAUTH_TOKEN` keep their Claude-specific names — they are a public contract for current operators. We do not rename to a generic `AGENT_MODEL` / `AGENT_API_KEY`; agent-scoped names are clearer. - -**`.env.example` additions:** - -```bash -# Agent (claude | codex) -AGENT_KIND=claude - -# Codex (only when AGENT_KIND=codex) -CODEX_API_KEY= -CODEX_CHATGPT_OAUTH_TOKEN= # alternative to CODEX_API_KEY -CODEX_MODEL=gpt-5-codex -``` - -**README** gets a short "Agent" subsection covering: how to switch, which envs are required for each kind, and a pointer to the LiteLLM JSON for the model pricing data we use. - -## Skills Strategy - -**Single source of truth: `~/.agents/skills/` inside the sandbox.** Both agents read from there. We never write to or read from the repo's own `.agents/skills/` directory. - -**Adapter steps in `configure()`:** - -- Both adapters call a shared helper `installSkillsToAgentsDir(sandbox)` which runs `npx -y skills add --skill --yes --target ~/.agents/skills` for each entry in `GLOBAL_SKILLS`. -- The Claude adapter additionally creates the symlink `~/.claude/skills → ~/.agents/skills` so Claude's auto-discovery finds the same content. -- The Codex adapter does nothing extra — `~/.agents/skills/` is its native user-scope path. - -**`shared.ts` exports:** - -```ts -export const GLOBAL_SKILLS = [ - { repo: "https://github.com/obra/superpowers", skill: "using-superpowers" }, - { repo: "https://github.com/obra/superpowers", skill: "requesting-code-review" }, - { repo: "https://github.com/anthropics/skills", skill: "frontend-design" }, -] as const; - -export async function installSkillsToAgentsDir(sandbox: RunnableSandbox): Promise; -``` - -The skill-frontmatter format (`name`, `description`) is identical for both agents. Existing prompts in `src/lib/prompts.ts` reference skills by name only and never mention agent-specific paths or Claude/Codex by name — they work as-is. - -**Verification before shipping:** confirm the `skills` CLI accepts the `--target` flag against the version Blazebot installs. If not, fall back to installing into `~/.claude/skills/` and symlinking `~/.agents/skills → ~/.claude/skills` for Codex. Same outcome either direction. - -## Hooks: Commit-guard + Arthur Tracing for Codex - -Codex's hook system is shaped almost identically to Claude's: same event names (`PreToolUse`, `PostToolUse`, `UserPromptSubmit`, `Stop`), same `command`-type entries, JSON in/out protocol. The differences: - -| | Claude | Codex | -|---|---|---| -| Config file | `~/.claude/settings.json` | `~/.codex/hooks.json` | -| Stop signal | `{"decision":"block","reason":"..."}` to stderr + exit 2 | `{"continue":false,"stopReason":"..."}` to stdout + exit 0 | -| Continue signal | exit 0 / no output | `{"continue":true}` or exit 0 / no output | - -**Commit-guard for Codex:** - -```bash -# ~/.codex/hooks/commit-guard.sh -#!/bin/bash -input=$(cat) -if echo "$input" | grep -q '"already_blocked":true'; then echo '{"continue": true}'; exit 0; fi -changes=$(git status --porcelain | grep -v '^.. \.codex/' | grep -v '^?? \.codex/') -if [ -n "$changes" ]; then - printf '{"continue": false, "stopReason": "You have uncommitted changes. Commit them with a descriptive message or revert before stopping."}\n' - exit 0 -fi -echo '{"continue": true}' -``` - -Registered in `~/.codex/hooks.json` under `Stop`. `agent.setCommitGuard(sandbox, true|false)` upserts/removes the entry — keyed on the script path so other tools' hooks (Arthur) are not disturbed. - -**Phase toggle semantics (same as Claude):** -- Off during research (must allow exit without commits) -- On during impl + review (forces the agent to commit before claiming done) - -**Arthur tracing:** the existing tracer Python script and config file are agent-agnostic. The Codex adapter installs the same `~/.claude/hooks/claude_code_tracer.py` content (renamed to `~/.codex/hooks/claude_code_tracer.py`) plus the same `arthur_config.json`, then writes Codex-format hook entries pointing at `python3 "$HOME/.codex/hooks/claude_code_tracer.py" ` for `UserPromptSubmit` / `PreToolUse` / `PostToolUse` / `Stop`. The Arthur OTLP/HTTP exporter doesn't care which CLI emitted the events. - -## Output Parsing — Codex Specifics - -**Two artifacts per phase:** - -1. **`/tmp/-stdout.txt`** — NDJSON event stream from `codex exec --json`. One JSON object per line. Relevant types: `thread.started`, `turn.started`, `turn.completed` (carries `usage`), `item.completed` (carries assistant text), `error`. -2. **`/tmp/-result.json`** — final assistant message, schema-validated when `--output-schema` is supplied. Written by `-o`. For research (no schema) this contains free-form markdown with the `STATUS:` line on top; for impl/review it contains the JSON object matching `AGENT_SCHEMA` / `REVIEW_SCHEMA`. - -**Codex `buildPhaseScript`:** the function returns a bash script string (same shape as today's `buildPhaseScript`). Two variants depending on whether `jsonSchema` is supplied: - -Research phase (no schema, free-form markdown output): - -```bash -#!/bin/bash -rm -f /tmp/research-done /tmp/research-stdout.txt /tmp/research-stderr.txt /tmp/research-result.json -[ -f /tmp/agent-env.sh ] && source /tmp/agent-env.sh - -cat /tmp/research-requirements.md | codex exec \ - --model "${model}" \ - --full-auto \ - --skip-git-repo-check \ - --json \ - -o /tmp/research-result.json \ - - \ - > /tmp/research-stdout.txt 2> /tmp/research-stderr.txt; echo $? > /tmp/research-exit-code || true - -cd /vercel/sandbox -rm -rf .codex/ -git checkout -- .codex/ 2>/dev/null || true -touch /tmp/research-done -``` - -Impl/review phase (schema-validated JSON output): - -```bash -#!/bin/bash -rm -f /tmp/impl-done /tmp/impl-stdout.txt /tmp/impl-stderr.txt /tmp/impl-result.json -[ -f /tmp/agent-env.sh ] && source /tmp/agent-env.sh - -cat > /tmp/impl-schema.json << 'SCHEMA_EOF' -${jsonSchema} -SCHEMA_EOF - -cat /tmp/impl-requirements.md | codex exec \ - --model "${model}" \ - --full-auto \ - --skip-git-repo-check \ - --json \ - --output-schema /tmp/impl-schema.json \ - -o /tmp/impl-result.json \ - - \ - > /tmp/impl-stdout.txt 2> /tmp/impl-stderr.txt; echo $? > /tmp/impl-exit-code || true - -cd /vercel/sandbox -rm -rf .codex/ -git checkout -- .codex/ 2>/dev/null || true -touch /tmp/impl-done -``` - -The schema heredoc uses `'SCHEMA_EOF'` (quoted) so the body is not subject to shell expansion. Schema content with embedded single quotes is escaped at TS-template level — same approach as today's Claude wrapper for `--json-schema`. - -`--full-auto` is the documented happy path for non-interactive automation: upgrades to `workspace-write` and grants approval-less execution. We do **not** use `--yolo` / `--dangerously-bypass-approvals-and-sandbox`. `--skip-git-repo-check` is defensive (the sandbox can be in MERGING state during review-fix). `--ephemeral` is **not** used — session files help debug failed runs from a still-running sandbox before teardown. - -**Parsers in `codex.ts`:** - -```ts -parseAgentOutput(raw, structured) { - if (structured) { - try { const parsed = agentOutputSchema.safeParse(JSON.parse(structured)); - if (parsed.success) return parsed.data; } catch {} - } - return scanItemCompletedAsAgentOutput(raw) - ?? { result: "failed", error: `Codex output unparseable. First 500: ${raw.slice(0, 500)}` }; -} - -parseReviewOutput(raw, structured) { /* same shape, reviewOutputSchema */ } - -parseResearchStatus(raw, structured) { - // Research has no schema. Prefer structured (the -o file holds the assistant message). - // Fallback: scan NDJSON for the last item.completed text. - const text = structured ?? unwrapLastItemCompleted(raw); - return parseStatusLine(text); -} - -extractUsage(raw, _structured) { - // Walk NDJSON in reverse for type === "turn.completed"; sum usage across turns. - // Returns { cost_usd: null, tokens: { input, cached_input, output }, duration_ms, num_turns } -} -``` - -**`collectPhase` helper** (new, in `poll-agent.ts`): - -```ts -export async function collectPhase( - sandboxId: string, - paths: { stdout: string; stderr: string; structuredOutput: string | null }, -): Promise<{ raw: string; structured: string | null }>; -``` - -Reads stdout (with stderr fallback when stdout is empty, mirroring existing `collectPhaseOutput`), reads `structuredOutput` if non-null. Workflow swaps `collectPhaseOutput` calls for this. - -## Pricing - -**`PhaseUsage` shape — agent-agnostic:** - -```ts -export interface PhaseUsage { - cost_usd: number | null; // populated by Claude directly; computed from tokens for Codex - tokens: { input: number; cached_input: number; output: number } | null; - duration_ms: number; - duration_api_ms: number; - num_turns: number; -} -``` - -- Claude's `extractUsage` returns `cost_usd` from its envelope (Claude CLI computes the dollars itself). -- Codex's `extractUsage` returns `cost_usd: null` and `tokens` from `turn.completed`. - -**`pricing.ts`:** - -```ts -export interface TokenPrice { input: number; cached_input: number; output: number } - -/** TTL-cached fetch from CODEX_PRICING_URL. Returns null on miss/failure. */ -export async function fetchModelPrice(model: string): Promise; -``` - -LiteLLM's JSON keys models by canonical name with per-token costs (`input_cost_per_token`, `output_cost_per_token`, `cache_read_input_token_cost`). The module normalizes them to `TokenPrice`. Cache TTL is 1h by default (`CODEX_PRICING_TTL_MS`). - -**`formatUsageReport(phases, priceLookup)`:** for each phase, if `cost_usd != null` use it; else if tokens + price are available, compute `cost = (tokens.input * price.input + tokens.cached_input * price.cached_input + tokens.output * price.output)`; else show tokens-only with the `cost unknown` marker. Always informative, never fabricated. - -**Verification before shipping:** fetch the LiteLLM JSON once, confirm `gpt-5-codex` (and the most likely operator alternatives — `gpt-5`, `gpt-5-mini`) are listed with the expected fields. If a model is missing, the tokens-only fallback handles it gracefully and the operator can override `CODEX_PRICING_URL`. - -## Error Handling & Edge Cases - -| Failure | Handling | -|---|---| -| Schema validation failure (no `result.json` written) | Parser falls back to NDJSON `item.completed` scan; if that fails, returns `failed` and the workflow moves the ticket to BACKLOG via the existing path | -| Hook script missing or unexecutable | `agent.configure` does `chmod +x` and asserts `test -f`; throws if missing — `provisionSandbox` (`maxRetries=0`) propagates to the workflow's top-level catch | -| Codex CLI install fails | Same as above — surface via `agent.install` throw | -| Pricing fetch fails | Workflow continues; tokens-only Slack output; logged at WARN | -| `turn.completed` missing | `extractUsage` returns null; Slack shows `Phase: n/a` (existing behavior) | -| Sentinel never written | Existing `pollUntilDone` timeout path; ticket → BACKLOG | -| Commit-guard infinite loop | Hook checks `already_blocked` flag and returns `continue: true` on the second invocation; `JOB_TIMEOUT_MS` bounds worst case | -| `AGENT_KIND` changes mid-flight | `provisionSandbox` returns `{ sandboxId, agentKind }`; downstream steps reconstruct adapter from the persisted value, not from the live env | - -**Logging:** -- `agent_install_started` / `agent_install_complete` — tagged with `kind` -- `phase_started` / `phase_completed` — tagged with `kind` -- `pricing_fetch_failed` — WARN with URL, model -- `commit_guard_triggered` — INFO when the hook blocks - -## Testing - -**Unit:** -- `src/sandbox/agents/codex.test.ts` — research status from `result.json`, agent output from `result.json`, fallback to NDJSON `item.completed`, `extractUsage` from `turn.completed` (single + multi-turn), commit-guard JSON shape -- `src/sandbox/agents/claude.test.ts` — relocates the existing parser tests; same coverage as today -- `src/sandbox/agents/index.test.ts` — `createAgentAdapter` selection by `AGENT_KIND`; throws on missing creds -- `src/sandbox/agents/pricing.test.ts` — fetch + cache + fallback, mocked HTTP -- `src/sandbox/manager.test.ts` — refactored to assert delegation to a fake adapter - -**E2E:** -- New `e2e/codex-tier-1.test.ts` — provisions a sandbox with `AGENT_KIND=codex`, runs the impl phase against a tiny seeded ticket, asserts a commit and PR. **Skipped by default**; gated on `CODEX_API_KEY` being set in CI -- Existing Tier-1 / Tier-2 e2e (Claude path) untouched — must pass after the refactor - -## Rollout - -1. **Refactor only** — extract Claude logic into `claude.ts`, introduce `AgentAdapter`, slim `SandboxManager`. Existing tests + Tier-1 e2e must pass. Ship as one commit. -2. **Add Codex adapter** — `codex.ts`, `pricing.ts`, env vars, factory selection. Unit tests pass. No Codex e2e yet. -3. **Codex e2e** — add the gated tier-1 test. Validate manually against a sandbox project, then add a CI job that runs only when `CODEX_API_KEY` is configured. -4. **Documentation** — update README + `.env.example`. Add a short "Switching agents" section. - -## Open Verifications (Pre-Implementation) - -These are first-30-minutes-of-implementation checks, not spec-blocking risks: - -1. LiteLLM JSON URL is reachable and `gpt-5-codex` is listed with the expected fields. Operator override (`CODEX_PRICING_URL`) is the escape hatch if the source moves. -2. The `skills` CLI accepts `--target` against the version Blazebot installs. Fallback: install into `~/.claude/skills/` and symlink `~/.agents/skills → ~/.claude/skills`. -3. Codex's `--output-schema` behavior on validation failure (does it crash the run or surface errors and continue?). Affects how aggressively the parser falls back to the NDJSON scan. - -## Net Change Summary - -- **New files:** `src/sandbox/agents/{types,claude,codex,shared,index,pricing}.ts` + tests, `e2e/codex-tier-1.test.ts` -- **Deleted:** `src/sandbox/wrapper-script.ts`, possibly `src/sandbox/agent-runner.ts` (if it ends up empty) -- **Modified:** `src/sandbox/manager.ts`, `src/sandbox/poll-agent.ts`, `src/sandbox/usage.ts`, `src/workflows/agent.ts`, `env.ts`, `.env.example`, `README.md` -- **Untouched:** all VCS adapters, issue-tracker adapters, messaging adapters, run registry, reconcile, dispatch, Jira webhook, cron, attachments, Arthur client -- **Estimated size:** ~700–900 LOC net add, ~250–350 LOC moved between files diff --git a/docs/superpowers/specs/2026-04-27-setup-onboarding-research.md b/docs/superpowers/specs/2026-04-27-setup-onboarding-research.md deleted file mode 100644 index 13b1768..0000000 --- a/docs/superpowers/specs/2026-04-27-setup-onboarding-research.md +++ /dev/null @@ -1,169 +0,0 @@ -# Setup Onboarding — What We Can and Can't Automate - -**Date**: 2026-04-27 · **Status**: Research - -## TL;DR - -The biggest single UX win is a **Vercel Deploy Button** that imports the repo, provisions Upstash -Redis via Marketplace, prompts for credentials, and deploys — all in one click. The remaining -manual work is per-provider token minting (the operator owns the accounts, not us). Sections -below are ordered by install flow: host first, then everything fed into it. - -> Script names below (`pnpm setup`, `pnpm setup:check`) are proposed. Today only -> `pnpm setup:arthur-prompts` exists. - ---- - -## Vercel - -**Can't:** create the Vercel team, run `vercel login` for the operator. - -**Can:** - -- **Deploy Button + Marketplace stores** — one URL imports the repo, creates the project, - provisions Upstash Redis (and optionally Postgres), prompts for env vars, deploys: - ``` - https://vercel.com/new/clone - ?repository-url=&project-name=blazebot - &env=&envDescription= - &stores=[{"type":"integration","integrationSlug":"upstash","productSlug":"upstash-kv-redis"}] - ``` -- `vercel link` wrapper, `vercel env add` / `pull` automation, env-drift diff between local and - prod. - ---- - -## Upstash Redis - -**Can't** (outside Vercel): account creation, dashboard provisioning. Inside Vercel Marketplace, -the operator never touches Upstash directly. - -**Can:** - -- Marketplace install during project creation auto-injects connection env vars. -- Connection + round-trip tests, namespace prefix derived from project name. - -> **Precondition:** rename `AI_WORKFLOW_KV_REST_API_URL` / `_TOKEN` to the Marketplace defaults -> (`KV_REST_API_URL` / `_TOKEN` or `UPSTASH_REDIS_REST_URL` / `_TOKEN`). ~30-min change. - ---- - -## Jira - -**Can't:** Atlassian account, API token, project selection, the operator's intent of which status -maps to which role. - -**Can:** - -- Token + project-access validation. -- **Status → role mapping** — fetch project statuses, show three dropdowns (Active / Review / - Backlog), write `COLUMN_AI` / `COLUMN_AI_REVIEW` / `COLUMN_BACKLOG`. -- **Missing-status helper** — team-managed projects: create via REST; company-managed (most - enterprises): print exact UI steps to take, then re-run. -- Auto-generate `JIRA_WEBHOOK_SECRET`. - -**Can't (without a Connect/Forge app):** programmatic webhook registration. Operator clicks -through Jira's Webhooks UI by hand. - -**Setup splits in two:** pre-deploy (token, statuses, secret) → post-deploy (webhook URL needs -the deployed domain). Cron polling works without the webhook, so the post-deploy step is optional. - ---- - -## GitHub / GitLab - -**Can't:** account, PAT minting, repo selection. - -**Can:** token + repo validation, push-permission probe (create + delete throwaway branch), -base-branch auto-discovery, PAT-creation deep-link. - ---- - -## Slack - -**Can't:** workspace creation, app-install consent. - -**Can:** - -- **App manifest install** — one URL with scopes pre-set. ~6 manual clicks → 2. -- Token validation, channel pick by name, bot-membership probe with `/invite` instructions. - ---- - -## Anthropic / Claude Code - -**Can't:** account, key issuance. - -**Can:** key validation against `/v1/models`, accept either API key or `claude setup-token` OAuth, -**model selection** from the live `/v1/models` list (defaults to `CLAUDE_MODEL` in `env.ts`). - ---- - -## Secrets - -`CRON_SECRET`, `JIRA_WEBHOOK_SECRET` — auto-generate via `openssl rand -hex 32`. Operator never -sees them. - ---- - -## Arthur (optional) - -**Can't:** account, key issuance. - -**Can:** skip-if-unconfigured, key validation, idempotent prompt-task creation -(`pnpm setup:arthur-prompts` already does this). - -> Future: the wizard could call this script automatically to fully scaffold Arthur tracing + -> hosted prompts in one step. Out of scope for the first cut — left as a follow-up. - ---- - -## Happy path (if everything above ships) - -A new operator's full setup, end to end: - -1. **One-time accounts.** Operator has Vercel, Atlassian, GitHub, Slack, Anthropic accounts and - mints tokens for each. ~5 min, outside our control. - -2. **`pnpm setup`** — interactive wizard, ~3 min: - - Auto-generates `CRON_SECRET` and `JIRA_WEBHOOK_SECRET`. - - Validates each token live as it's pasted. - - Fetches Jira statuses → three dropdowns for Active / Review / Backlog. - - Opens Slack manifest install URL → bot token returned → channel picked by name. - - Anthropic model picker from live `/v1/models`. - - Writes a complete `.env.local`. - -3. **One-click deploy** — wizard ends with a `[Deploy]` step (or operator clicks the README's - Deploy Button): - - Repo imported, project created, Upstash Redis provisioned via Marketplace (env vars - auto-injected). - - Wizard pushes the rest of `.env` to Vercel. - - Cron registered. First poll within 60s. - - Deployment URL returned to the wizard. - -4. **`pnpm setup:webhook`** — post-deploy continuation, ~30 sec: - - Prints the Jira webhook URL + secret to paste. - - Operator opens Jira → Webhooks → Add → Save. - - Wizard verifies the first delivery. - -5. **Done.** Drop a Jira ticket into the AI column; first PR comes back within minutes. - -**Operator time after the one-time account setup: ~5 minutes.** Compare to today's manual flow — -~30 minutes of typing 18 env vars across five service dashboards. - -What still requires the operator: account creation, token minting (we can never do these), and -three consent screens (Vercel project, Slack install, Jira webhook). - -### Rough estimate - -| Step | Operator-active | Wall-clock | -| --- | --- | --- | -| 1 — One-time accounts + tokens | ~5 min | ~5 min | -| 2 — `pnpm setup` wizard | ~3 min | ~3 min | -| 3 — Deploy click | ~30 sec | ~1–2 min (Vercel build) | -| 4 — `pnpm setup:webhook` | ~30 sec | ~30 sec | -| 5 — First ticket → first PR | — | ~few min (cron + agent run) | - -**Totals:** ~9 min operator-active for a fresh operator, ~4 min if accounts already exist; -~10–12 min wall-clock to a working install. Today: ~30 min operator-active, ~35 min wall-clock — -roughly a 3× speedup end-to-end and 6–7× less typing. diff --git a/docs/superpowers/specs/2026-04-30-slack-threaded-messages-design.md b/docs/superpowers/specs/2026-04-30-slack-threaded-messages-design.md deleted file mode 100644 index f089e27..0000000 --- a/docs/superpowers/specs/2026-04-30-slack-threaded-messages-design.md +++ /dev/null @@ -1,287 +0,0 @@ -# Slack Threaded Messages Design - -## Problem - -Today, every per-ticket notification posts as a top-level message in the configured Slack channel. The `MessagingAdapter.notify(message)` interface has no concept of conversation grouping, so a single ticket can produce 3–5 unrelated-looking messages scattered across the channel: - -```text -[10:01] Task AWT-42 started -[10:14] Task AWT-42 needs clarification -[14:02] Task AWT-42 PR ready for review -[16:30] Task AWT-42 canceled: webhook confirmed ticket is outside AI column. -``` - -Two consequences: - -1. The channel becomes hard to read — messages from different tickets interleave. -2. There is no clickable path from a Slack notification to the underlying Jira ticket or the GitHub PR. Readers have to copy the identifier and search. - -## Solution - -Make the `MessagingAdapter` ticket-aware. The first message about a ticket — `Task X started` — posts at top level and its Slack message timestamp is recorded as the **lifetime parent** for that ticket. Every subsequent message about the same ticket is posted as a thread reply under that parent. Clarification, PR-ready, failure, and cancellation messages all reply into the same thread. - -In addition, the ticket identifier in every message becomes a clickable Jira link, and the `pr_ready` event includes a clickable GitHub PR reference. - -## Scope - -In scope: - -- A new `notifyForTicket(ticketKey, event)` method on `MessagingAdapter` that replaces the existing `notify(message)`. -- A new `ThreadStore` interface (`getParent`, `setParent`, `clearParent`) implemented on `UpstashRunRegistry`. -- A new Redis hash `blazebot:thread-parents:{ENV_PREFIX}` for ticket → message-id mappings. -- Structured `TicketEvent` types replacing inline-string concatenation in `agent.ts`, `cron/poll.get.ts`, and `webhooks/jira.post.ts`. -- Slack-mrkdwn link formatting for the Jira ticket identifier and the PR reference. - -Out of scope: - -- Adding new notification events. The set of events stays exactly as today (start, clarification, PR-ready, failure, cancel). -- Threading for non-ticket notifications. There are none in the current code. -- A `notify(message)` escape hatch for non-ticket-scoped messages. All call sites are ticket-scoped; we drop the method entirely. -- Cross-ticket grouping (e.g., one thread per cron tick). - -## Threading Policy - -**Lifetime threading.** One Slack thread per ticket, indefinitely. If a ticket cycles through the AI column multiple times (initial run, then a fix-up after PR review feedback), every message lands in the same thread the original `Task X started` message established. - -Trade-off accepted: very long-lived tickets (months) may eventually have threads that scroll off Slack's reasonable retrieval window. If this becomes a problem in practice, switching to "thread per run" is a localized change — key the `ThreadStore` lookup by `ticketKey + runStartedAt` and add a `clearParent` call at the start of every run. - -**Top-level fallback when no parent exists.** Three cases collapse into one rule: - -| Case | Behavior | -|---|---| -| No mapping for this ticket yet | Post top-level. Record parent **only if** the event is `started`. | -| Mapping exists, but Slack returns "thread/message not found" (parent deleted) | Catch the error, clear the mapping, retry top-level. Re-establish parent only if the event is `started`. | -| Mapping exists and parent is alive | Post as thread reply. | - -Implication: out-of-band events that arrive before any `started` (e.g., a webhook cancellation racing the workflow's first message) post as standalone top-level messages and **do not** establish a parent. Only `started` is allowed to anchor a thread, because only `started` carries the implicit promise that more updates are coming. - -## Architecture - -```text - UpstashRunRegistry - ├── HASH_KEY (ticket → runId) - ├── SANDBOX_HASH_KEY (ticket → sandboxId) - ├── ENTRY_TS_HASH_KEY (ticket → createdAt) - ├── FAILED_HASH_KEY (ticket → failure meta) - └── THREAD_HASH_KEY (ticket → slack message ts) ← NEW - -Workflow / cron / webhook - │ - ▼ -ChatSDKAdapter.notifyForTicket(ticketKey, event) - │ - ├── threadStore.getParent(ticketKey) - ├── format(event, ticketKey, jiraBaseUrl) → Slack-mrkdwn string - ├── chat.channel(...).post(text, { thread_ts? }) - └── if event.kind === "started" && no parent existed: - threadStore.setParent(ticketKey, sentMessage.id) -``` - -`MessagingAdapter` is the "smart" layer (this is the deliberate choice from approach 1 of brainstorming). It knows how to: - -1. Post Slack messages. -2. Format `TicketEvent`s into wire text with embedded Jira/PR links. -3. Read and write parent-message-id mappings via an injected `ThreadStore`. - -The `ThreadStore` interface is bounded to three methods so the adapter only depends on the slice of run-registry behavior it needs: - -```ts -export interface ThreadStore { - getParent(ticketKey: string): Promise; - setParent(ticketKey: string, messageId: string): Promise; - clearParent(ticketKey: string): Promise; -} -``` - -`UpstashRunRegistry` implements this interface in addition to `RunRegistryAdapter`. Both interfaces are satisfied by the same class instance — the existing Redis client is reused. - -## Event Types and Formatting - -```ts -export type TicketEvent = - | { kind: "started" } - | { kind: "needs_clarification"; usageReport?: string } - | { kind: "pr_ready"; pr: { url: string; number: number }; usageReport: string } - | { - kind: "failed"; - phase?: "research" | "impl" | "push"; - reason?: string; - usageReport?: string; - } - | { kind: "canceled"; reason: string }; -``` - -Formatter output (using Slack-native `` mrkdwn syntax): - -| Event | Rendered text | -|---|---| -| `started` | `Task started` | -| `needs_clarification` | `Task needs clarification` (+ `\n` if present) | -| `pr_ready` | `Task PR ready for review — \n` | -| `failed` | With phase: `Task failed: `. Without phase (catch-all): `Task <…\|AWT-42> failed: `. Without reason or phase (extreme edge case): `Task <…\|AWT-42> failed`. (+ `\n` appended in all variants if non-empty.) | -| `canceled` | `Task canceled: ` | - -`ChatSDKConfig` grows by one field, `jiraBaseUrl: string`, supplied from `env.JIRA_BASE_URL`. The link is built as `${jiraBaseUrl.replace(/\/$/, '')}/browse/${ticketKey}` — defensive trim on a trailing slash because the env value is user-configured. - -The `` syntax is not standard markdown. Two implementation options will be evaluated during build: - -1. Use the chat package's `link` AST node (`link("AWT-42", "https://...")`) inside a `PostableMessage` and let the Slack adapter render it. Preferred if it produces correct mrkdwn. -2. Pass a `PostableRaw` Slack-formatted string to bypass mdast escaping. Fallback if option 1 escapes the angle brackets. - -Verified during implementation by posting one of each event type to a real Slack channel. - -## Adapter Behavior (Pseudocode) - -```pseudo -notifyForTicket(ticketKey, event): - parent = threadStore.getParent(ticketKey) - text = format(event, ticketKey, jiraBaseUrl) - - try: - sent = post(text, threadParentId: parent ?? undefined) - catch e if isMissingParentError(e): - threadStore.clearParent(ticketKey) - sent = post(text) // top-level retry - parent = null - - if event.kind === "started" && parent == null && sent != null: - threadStore.setParent(ticketKey, sent.id) -``` - -Failure semantics match today's `notify(message)`: any error from the post (after the missing-parent retry) is caught and logged at `warn`. `notifyForTicket` never throws — workflow runs are never broken by a notification failure. - -`isMissingParentError` discriminates on the Slack error code surfaced by the chat package. Likely candidates: `thread_not_found`, `message_not_found`. The exact discriminator is finalized during implementation by deliberately deleting a parent message in a test channel. - -## Call-Site Changes - -### `src/workflows/agent.ts` - -Replace the existing `notifySlack(message: string)` step: - -```ts -async function notifyTicket(ticketKey: string, event: TicketEvent) { - "use step"; - const { createStepAdapters } = await import("../lib/step-adapters.js"); - const { messaging } = createStepAdapters(); - await messaging.notifyForTicket(ticketKey, event); -} -``` - -Each existing call site converts: - -| Line | Today | After | -|---|---|---| -| 441 | `notifySlack(\`Task ${id} started\`)` | `notifyTicket(id, { kind: "started" })` | -| 518 | research-timeout `notifySlack(...)` | `notifyTicket(id, { kind: "failed", phase: "research", reason: "phase timed out", usageReport })` | -| 536 | research clarification | `notifyTicket(id, { kind: "needs_clarification", usageReport })` | -| 543 | research failure | `notifyTicket(id, { kind: "failed", phase: "research", reason: research.body.slice(0,200), usageReport })` | -| 587 | impl clarification | `notifyTicket(id, { kind: "needs_clarification", usageReport })` | -| 594 | impl failure | `notifyTicket(id, { kind: "failed", phase: "impl", reason: implOutput.error, usageReport })` | -| 653 | push failure | `notifyTicket(id, { kind: "failed", phase: "push", reason: pushResult.error, usageReport })` | -| 665 | PR ready | `notifyTicket(id, { kind: "pr_ready", pr: { url: pr.url, number: pr.id }, usageReport })` | -| 674 | catch-all | `notifyTicket(id, { kind: "failed", reason: err.message ?? "unknown", usageReport })` | - -Two extra wiring details for line 665: - -- `createPullRequest(...)` already returns the `PullRequest`, but the result is currently discarded on line 659. Capture it. Both branches (new PR path and existing-PR path) provide `{ url, id (number), branch }` — `prContext` is reusable directly. -- The trailing `${usageReport}` newline-prefix moves into the formatter; pass `usageReport` as a structured field, not concatenated. - -`usageReport` is computed at the call site exactly as today (`formatUsageReport(...)`) and passed in as a string field. The formatter prepends `\n` when emitting it. An empty string is treated as absent (no trailing newline emitted) — equivalent to `undefined`. This matches the current behavior of `usageSuffix()` returning `""` when `phaseUsages` is empty. - -### `src/routes/cron/poll.get.ts:23` - -```ts -await adapters.messaging.notifyForTicket(ticketKey, { kind: "canceled", reason: detail }); -``` - -### `src/routes/webhooks/jira.post.ts:110` - -```ts -await adapters.messaging.notifyForTicket(ticketKey, { - kind: "canceled", - reason: "webhook confirmed ticket is outside AI column", -}); -``` - -### Adapter wiring - -`src/lib/adapters.ts` and `src/lib/step-adapters.ts` both pass three new ingredients into `ChatSDKAdapter`: - -- `jiraBaseUrl: env.JIRA_BASE_URL` -- `threadStore: runRegistry` (the `UpstashRunRegistry` instance now satisfies both `RunRegistryAdapter` and `ThreadStore`) - -Order: instantiate the run registry first, then pass it into the messaging adapter. Both factories already construct the registry, so this is a one-line reorder. - -## Redis Data Model - -**Hash key:** `blazebot:thread-parents:{ENV_PREFIX}` - -Follows the same pattern as the existing `blazebot:active-runs:{ENV_PREFIX}` hash. - -**Field:** Ticket key (e.g., `AWT-42`). - -**Value:** Slack message timestamp (the `id` of `SentMessage` returned by `channel.post()`), e.g. `"1700000000.000123"`. - -**TTL:** None. Entries are bounded by the number of distinct tickets ever processed; at ~50 bytes per entry and 100k tickets, total cost is ~5 MB. `clearParent` removes individual entries when a parent is detected as deleted on Slack. - -**Lifecycle vs `unregister(ticketKey)`:** This hash is **not** touched by `unregister`. The thread mapping outlives a single workflow run, which is the whole point of lifetime threading. - -## Testing - -### `src/adapters/messaging/chatsdk.test.ts` - -Rewrite around `notifyForTicket`. Existing two cases (channel routing, no-throw on failure) port over directly. Add: - -- `started` with no parent → posts top-level, calls `threadStore.setParent` with the returned message id. -- Subsequent event with parent set → posts with `thread_ts` equal to the stored parent id; does **not** call `setParent`. -- Non-`started` event with no parent → posts top-level, does **not** call `setParent` (orphan stays orphan). -- Parent deleted on Slack (mock returns a `thread_not_found`-shaped error) → calls `clearParent`, retries top-level, and if the event is `started`, records the new parent. -- Each event variant produces the expected formatted string. Assert on the substring containing the Jira link (and PR link for `pr_ready`). -- `notifyForTicket` swallows post failures (existing no-throw guarantee preserved). - -The mock for `chat.channel().post()` returns `{ id: "1700000000.000123" }` so we can assert it lands in the thread store unmodified. - -### `src/adapters/run-registry/upstash.test.ts` - -Three new cases for the `ThreadStore` methods on `UpstashRunRegistry`: - -- `setParent` then `getParent` round-trips the message id. -- `getParent` returns `null` when no entry exists. -- `clearParent` removes the entry; `getParent` then returns `null`. -- `unregister(ticketKey)` does **not** touch the thread hash. - -### Workflow integration - -`agent.ts` is not directly unit-tested — steps run through Vercel WDK. The existing `e2e/` suite exercises the full flow. No new e2e is added specifically for threading; manual verification on a real Slack channel during PR review is the realistic check (post a `started`, then a `needs_clarification`, confirm the second appears as a reply under the first). - -## Migration and Rollout - -**In-flight tickets at deploy.** `THREAD_HASH_KEY` starts empty. Tickets currently mid-run have no recorded parent — their next message (which is by construction a non-`started` event such as PR-ready, clarification, or failure) posts top-level and does **not** establish a parent. The first cycle after deploy is therefore a single standalone message; every cycle after that threads correctly. No manual seeding. - -**Re-runs of pre-existing tickets.** When a ticket created before deploy re-enters the AI column, `notifyForTicket(id, { kind: "started" })` runs against an empty entry, posts top-level, records the new parent. Lifetime threading then proceeds normally. - -**Backwards compatibility.** None preserved. `MessagingAdapter.notify(message)` is removed. The change is internal to this repo (no external callers). Tests are rewritten in the same PR. - -**Rollback.** A revert PR restores the prior interface. The `THREAD_HASH_KEY` data left behind in Redis is harmless and ignored by old code. No data cleanup required. - -## Observability - -The two existing log lines in `chatsdk.ts` get extra structured fields: - -- `notification_sent` → adds `ticketKey`, `eventKind`, `threadParentId` (null on top-level posts). -- `notification_failed` → adds `ticketKey`, `eventKind`, and the Slack error code if extractable. - -A new debug log line, `thread_parent_recovered`, is emitted when the missing-parent recovery path runs (parent existed, Slack rejected with thread-not-found, retry top-level). This makes the recovery branch visible in production without needing to instrument it later. - -## Risks - -- **Slack rate limits on `chat.postMessage`.** Threading does not change call frequency — same number of posts as today, just with `thread_ts` sometimes set. No new risk. -- **Bot loses access to a private channel containing the parent.** Surfaces as `channel_not_found` / `not_in_channel`. Same failure mode as today, just with extra context in logs. Operator action (re-invite) unchanged. -- **`` mrkdwn rendering.** Not standard markdown. Resolved during implementation by testing one event of each kind against a real Slack channel and choosing between the chat package's `link` AST node and `PostableRaw`. Captured as a learning in `.claude/learnings.md` once verified. -- **Lifetime threads on very long-lived tickets.** A ticket that lives for months may eventually have a parent that's hard to reach in Slack search. If this becomes a real complaint, switching to "thread per run" is a small follow-up (key `ThreadStore` by `ticketKey + runStartedAt`). - -## Future Work (not in this change) - -- "Thread per run" mode behind a config flag, if lifetime threading proves unwieldy. -- Richer message blocks (Slack Block Kit) for `pr_ready` — buttons for "Open PR" / "Open ticket" instead of inline links. -- Reaction-driven workflow controls (e.g., react with ✅ on the `pr_ready` thread to re-trigger CI). Out of scope here; design only mentions for context.