Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions change-logs/2026/06/10/feature-agent-completion-request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
AI agents can now request task completion via `dev3 task move --status completed`. Instead of moving the task directly, the CLI blocks (up to 10 minutes) while the app shows a visually distinct AI-initiated approval dialog — approving completes the task and destroys the session, declining returns exit code 6 to the agent so it knows the user said no. `cancelled` remains fully forbidden via CLI.
21 changes: 21 additions & 0 deletions decisions/067-agent-completion-request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# 067 — Agent-initiated task completion via blocking CLI approval

## Context

Only the user could move a task to `completed` (it destroys the worktree + tmux session); the CLI blocked it client-side in `DESTRUCTIVE_STATUSES`. Agents needed a way to signal "I'm fully done" — without being able to silently kill their own session. The agent must also learn from the CLI response whether the user approved or declined.

## Decision

`dev3 task move --status completed` now sends `task.requestCompletion` over the CLI socket and **blocks up to 10 minutes** (client-side timeout, `src/cli/commands/task.ts` → `requestCompletion`). The bun handler (`src/bun/cli-socket-server.ts`) registers a pending request in `src/bun/completion-requests.ts` and pushes `agentCompletionRequested` to the renderer, which shows a danger-styled `confirm()` with an "AI agent request" badge and accent border (`agentInitiated` option in `src/mainview/confirm.tsx`, listener in `App.tsx`). The renderer answers via the `respondToAgentCompletionRequest` RPC; on approve the handler runs the normal `moveTask` → completed, on decline the CLI exits with the new documented code 6 (`CLI_EXIT_CODE_COMPLETION_DECLINED`). `cancelled` stays fully forbidden via CLI.

## Risks

- Pending requests are in-memory only (user chose no persistence): an app restart drops the dialog and the CLI times out. Acceptable — the agent can simply retry.
- No server-side timeout: if the CLI dies, the dialog stays; a later user approval still completes the task (the write to the dead socket fails harmlessly). This is intentional — an AFK user's approval must not be lost.
- A repeat request for the same task joins the existing pending decision (`isNew` flag) instead of spawning a duplicate dialog.

## Alternatives considered

- New command `dev3 task request-completion` — rejected; agents already know `task move`, reuse keeps the surface minimal.
- Fire-and-forget push + persisted board badge — rejected by the user; live blocking dialog only, agent gets the verdict in the same invocation.
- Auto-adding a note on decline — rejected by the user; the exit-code-6 message tells the agent to keep working.
1 change: 1 addition & 0 deletions docs/cli-exit-codes.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Public `dev3` CLI exit codes are defined in `src/shared/cli-exit-codes.ts`.
| `3` | `CLI_EXIT_CODE_USAGE_ERROR` | The CLI invocation was invalid: bad command, bad subcommand, or missing required args. |
| `4` | `CLI_EXIT_CODE_INTERNAL_ERROR` | An unexpected internal CLI failure escaped normal command handling. |
| `5` | `CLI_EXIT_CODE_GUI_DEPS_MISSING` | `dev3 gui` cannot launch because system libraries (GTK, WebKit, etc.) are missing. The CLI prints the install command for the detected distro and exits with this code so wrappers can detect it. |
| `6` | `CLI_EXIT_CODE_COMPLETION_DECLINED` | `dev3 task move --status completed` asked the user for approval and the user declined. The task keeps its current status and the session stays alive. |

Rules:

Expand Down
6 changes: 6 additions & 0 deletions docs/ux/UX_DECISIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,3 +61,9 @@ Append-only log of UX architecture decisions. Each entry: date, decision, ration
- **Decision:** Below a `matchMedia("(max-width: 1600px)")` breakpoint (new `useCompact()` hook), the `GlobalHeader` action cluster and the `TaskInfoPanel` toolbar switch to a compact layout: text labels collapse to icon-only (tooltips kept), and the header's three low-frequency external actions (Website, Report, Change Log) fold into a single "More" (`⋯`) overflow dropdown. Diff badge and status stay labelled. Above the breakpoint the layout is unchanged.
- **Rationale:** On a 14" MacBook (≤1512pt) the labelled, `flex-shrink-0` button rows overflowed and overlapped; on 16" (1728pt) they fit. 1600px cleanly separates the two at default scaling and also fires on window resize. Per the action taxonomy, the rare external links are the correct overflow candidates; frequent/destination controls stay visible as icons. Viewport-based v1; a content-aware (ResizeObserver) upgrade is the planned v2 since a long breadcrumb title can still crowd the header near the boundary. No flex-wrap (vertical space is scarce in a terminal-centric app).
- **Status:** `Observed` (implemented: `useCompact.ts`, `GlobalHeader.tsx`, `TaskInfoPanel.tsx`, `PreventSleepToggle.tsx`, `GitPullButton.tsx`; keys `header.moreActions`/`header.githubLabel` in en/ru/es). See decision record 063.

## 2026-06-10 — AI-initiated task completion uses a blocking, visually distinct confirm dialog

- **Decision:** `dev3 task move --status completed` from an agent does not move the task; it opens the existing imperative `confirm()` modal with a new `agentInitiated` treatment — accent border (`border-accent/40`) plus a badge pill (robot glyph + "AI agent request") — and `danger`-role confirm button labelled "Complete task", cancel labelled "Keep session" with autofocus. The CLI blocks (≤10 min) for the verdict; decline returns documented exit code 6 so the agent learns the user said no. No persistence, no board badge — ephemeral live dialog only (explicit user choice). `cancelled` stays CLI-forbidden.
- **Rationale:** Completing destroys the worktree + tmux session, so the action is destructive and must keep human approval; the AI-identity badge prevents the user from mistaking it for a routine self-initiated confirm and accidentally approving. Reusing the Modal surface and the `task move` verb adds zero new UI chrome and zero new CLI surface.
- **Status:** `Observed` (implemented: `confirm.tsx` `agentInitiated`, `App.tsx` listener, `completion-requests.ts`, `cli-socket-server.ts` `task.requestCompletion`, `task.ts` `requestCompletion`, exit code 6). See decision record 067 and `feature-plans/agent-completion-request.md`.
4 changes: 4 additions & 0 deletions docs/ux/UX_MANIFEST_CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,7 @@ Documented the inspector header as a 2×2 quickbar grid (Context / Session-Agent
## 2026-06-03 — macOS dock-persistence + unified quit-confirmation modal

Added a UX decision documenting `exitOnLastWindowClosed: false` (closing the last window keeps the app in the dock, reopened on dock-click) and the React quit-confirmation modal driven by the main-process `before-quit` gate, covering Cmd+Q (via `requestQuit`), menu Quit, and dock Quit. A window-less quit reopens a window that pulls the pending flag on mount to show the dialog reliably. Plus the Cmd+Shift+N New Window shortcut. No new visible buttons or tokens — conforms to the Modal surface and destructive-button-role policy. Decision records 044, 060, 061.

## 2026-06-10 — Agent completion request (AI-initiated destructive confirm)

Documented the agent-initiated task-completion flow: CLI-triggered blocking approval via the existing `confirm()` Modal with a new `agentInitiated` visual treatment (accent border + robot badge), danger-role approve, autofocused safe cancel, CLI exit code 6 on decline. New feature plan `feature-plans/agent-completion-request.md`, UX decision appended, decision record 067. No new surfaces, nav items, or budget changes.
31 changes: 31 additions & 0 deletions docs/ux/feature-plans/agent-completion-request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Feature plan — Agent-initiated task completion request

## Feature classification

- **User job:** decide whether an agent that claims to be done may complete its task (destroying worktree + tmux session).
- **Owning object:** Task. **Workflow:** task lifecycle (status transitions).
- **Feature class:** destructive action with mandatory human approval, AI-initiated.
- **Scope:** single task. **Frequency:** occasional (end of each agent task). **Risk:** destructive (session + worktree loss).

## Placement

- **Trigger:** CLI only — `dev3 task move --status completed`. No new UI entry point; the user-side trigger remains the existing drag-to-Completed / UI flows.
- **Surface:** the existing imperative `confirm()` Modal (`src/mainview/confirm.tsx`), same surface as the branch-merged prompt. No new surface, no nav change, no toolbar buttons — zero impact on complexity budgets.
- **Rejected placements:** persistent board badge / inbox (user explicitly chose ephemeral live dialog); toast (not blocking, too easy to miss for a destructive decision); native dialog (banned — remote/browser mode).

## Action hierarchy & tokens

- **Approve ("Complete task"):** semantic role `destructive`, concrete variant — the dialog's `danger` confirm button (`bg-danger`). Never primary-styled.
- **Cancel ("Keep session"):** semantic role `secondary`; receives `autoFocus` so Enter defaults to the safe choice.
- **AI identity treatment:** `agentInitiated` option renders an accent badge pill (robot glyph `\u{F06A9}` + "AI agent request") and `border-accent/40` dialog border — visually distinct from user-initiated confirms.
- Backdrop click / Esc = cancel. Triple protection against accidental approval: danger styling, cancel autofocus, explicit badge.

## Interaction

- Agent runs the CLI → blocking socket request (10-min client timeout) → push `agentCompletionRequested` → dialog. Approve → task moves to completed (normal `moveTask` path, `taskUpdated` push updates the board, navigation leaves the doomed task screen). Decline → CLI exit code 6 with guidance text for the agent.
- Duplicate requests for the same task join the pending decision — only one dialog ever shows.
- States: app window absent → CLI gets an immediate error; task already completed/cancelled → error; CLI timeout → dialog may remain, late approval still completes.

## i18n

Keys `app.agentCompletion*` + `confirmDialog.agentBadge` in en/ru/es `common.ts`.
218 changes: 218 additions & 0 deletions src/bun/__tests__/cli-socket-completion-request.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
import { describe, it, expect, vi, beforeEach } from "vitest";
import type { Project, Task, CliRequest } from "../../shared/types";

// ---- Mocks (same boundary set as cli-socket-handlers.test.ts) ----

vi.mock("../data", () => ({
loadProjects: vi.fn(),
getProject: vi.fn(),
loadTasks: vi.fn(),
getTask: vi.fn(),
addTask: vi.fn(),
updateTask: vi.fn(),
updateProject: vi.fn(),
}));

vi.mock("../git", () => ({
createWorktree: vi.fn(),
removeWorktree: vi.fn(),
}));

vi.mock("../pty-server", () => ({
destroySession: vi.fn(),
}));

vi.mock("../rpc-handlers/tmux-pty", () => ({
runDevServer: vi.fn(),
stopDevServer: vi.fn(),
restartDevServer: vi.fn(),
getDevServerStatus: vi.fn(),
}));

vi.mock("../rpc-handlers", () => {
const ACTIVE = ["in-progress", "user-questions", "review-by-user", "review-by-ai"];
return {
isActive: vi.fn((status: string) => ACTIVE.includes(status)),
activateTask: vi.fn(),
moveTask: vi.fn(),
runCleanupScript: vi.fn(),
emitTaskSound: vi.fn(),
getPushMessage: vi.fn(() => null),
triggerColumnAgentIfNeeded: vi.fn(),
notifyWatchedTaskStatusChange: vi.fn(),
};
});

vi.mock("../logger", () => ({
createLogger: () => ({
debug: vi.fn(),
info: vi.fn(),
warn: vi.fn(),
error: vi.fn(),
}),
}));

vi.mock("../paths", () => ({
DEV3_HOME: "/tmp/test-dev3",
}));

vi.mock("../socket-backpressure", () => ({
flushAndEnd: vi.fn(),
drainSocket: vi.fn(),
pendingWrites: new Map(),
}));

vi.mock("../settings", () => ({
loadSettings: vi.fn(() => ({ updateChannel: "stable", taskDropPosition: "top" })),
saveSettings: vi.fn(),
}));

vi.mock("node:fs", () => ({
existsSync: vi.fn(() => false),
readdirSync: vi.fn(() => []),
unlinkSync: vi.fn(),
mkdirSync: vi.fn(),
}));

import * as data from "../data";
import { moveTask, getPushMessage } from "../rpc-handlers";
import { resolveCompletionRequest, _resetCompletionRequestsForTests } from "../completion-requests";

const { handleRequest } = await import("../cli-socket-server");

// ---- Helpers ----

function makeProject(overrides?: Partial<Project>): Project {
return {
id: "proj-1",
name: "Test Project",
path: "/tmp/test-project",
setupScript: "",
devScript: "",
cleanupScript: "",
defaultBaseBranch: "main",
createdAt: new Date().toISOString(),
...overrides,
};
}

function makeTask(overrides?: Partial<Task>): Task {
return {
id: "task-abc12345-1111-2222-3333-444444444444",
seq: 1,
projectId: "proj-1",
title: "Test task",
description: "A test task",
status: "in-progress",
baseBranch: "main",
worktreePath: "/tmp/wt",
branchName: "dev3/task-test",
groupId: null,
variantIndex: null,
agentId: null,
configId: null,
createdAt: new Date().toISOString(),
updatedAt: new Date().toISOString(),
...overrides,
};
}

function makeRequest(params: Record<string, unknown>): CliRequest {
return { id: "req-1", method: "task.requestCompletion", params };
}

function setupTask(task: Task): void {
vi.mocked(data.getProject).mockResolvedValue(makeProject());
vi.mocked(data.loadTasks).mockResolvedValue([task]);
}

beforeEach(() => {
vi.clearAllMocks();
_resetCompletionRequestsForTests();
});

describe("task.requestCompletion", () => {
it("errors when the task is already completed", async () => {
setupTask(makeTask({ status: "completed" }));

const resp = await handleRequest(makeRequest({ taskId: "task-abc12345", projectId: "proj-1" }));
expect(resp.ok).toBe(false);
expect(resp.error).toContain("already completed");
});

it("errors when no app window is connected", async () => {
setupTask(makeTask());
vi.mocked(getPushMessage).mockReturnValue(null);

const resp = await handleRequest(makeRequest({ taskId: "task-abc12345", projectId: "proj-1" }));
expect(resp.ok).toBe(false);
expect(resp.error).toContain("No app window is connected");
});

it("pushes agentCompletionRequested and completes the task on approval", async () => {
const task = makeTask({ overview: "Agent overview", userOverview: "User overview wins" });
setupTask(task);
const pushFn = vi.fn();
vi.mocked(getPushMessage).mockReturnValue(pushFn);
const completedTask = { ...task, status: "completed" as const };
vi.mocked(moveTask).mockResolvedValue(completedTask);

const respPromise = handleRequest(makeRequest({ taskId: "task-abc12345", projectId: "proj-1" }));
await vi.waitFor(() => expect(pushFn).toHaveBeenCalled());

const [event, payload] = pushFn.mock.calls[0] as [
string,
{ requestId: string; taskId: string; projectId: string; taskTitle: string; taskOverview?: string },
];
expect(event).toBe("agentCompletionRequested");
expect(payload.taskId).toBe(task.id);
expect(payload.projectId).toBe("proj-1");
expect(payload.taskTitle).toBe("Test task");
expect(payload.taskOverview).toBe("User overview wins");

resolveCompletionRequest(payload.requestId, true);

const resp = await respPromise;
expect(resp.ok).toBe(true);
expect(resp.data).toEqual({ approved: true, task: completedTask });
expect(moveTask).toHaveBeenCalledWith({ taskId: task.id, projectId: "proj-1", newStatus: "completed" });
});

it("returns approved:false without moving the task when declined", async () => {
setupTask(makeTask());
const pushFn = vi.fn();
vi.mocked(getPushMessage).mockReturnValue(pushFn);

const respPromise = handleRequest(makeRequest({ taskId: "task-abc12345", projectId: "proj-1" }));
await vi.waitFor(() => expect(pushFn).toHaveBeenCalled());

const payload = pushFn.mock.calls[0][1] as { requestId: string };
resolveCompletionRequest(payload.requestId, false);

const resp = await respPromise;
expect(resp.ok).toBe(true);
expect(resp.data).toEqual({ approved: false });
expect(moveTask).not.toHaveBeenCalled();
});

it("joins an existing pending request instead of pushing a second dialog", async () => {
setupTask(makeTask());
const pushFn = vi.fn();
vi.mocked(getPushMessage).mockReturnValue(pushFn);

const first = handleRequest(makeRequest({ taskId: "task-abc12345", projectId: "proj-1" }));
await vi.waitFor(() => expect(pushFn).toHaveBeenCalledTimes(1));
const second = handleRequest(makeRequest({ taskId: "task-abc12345", projectId: "proj-1" }));
// Let the second handler reach createCompletionRequest (and join) before resolving.
await new Promise((r) => setTimeout(r, 10));
expect(pushFn).toHaveBeenCalledTimes(1);

const payload = pushFn.mock.calls[0][1] as { requestId: string };
resolveCompletionRequest(payload.requestId, false);

const [respA, respB] = await Promise.all([first, second]);
expect(respA.data).toEqual({ approved: false });
expect(respB.data).toEqual({ approved: false });
expect(pushFn).toHaveBeenCalledTimes(1);
});
});
84 changes: 84 additions & 0 deletions src/bun/__tests__/completion-requests.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
import { describe, it, expect, beforeEach, vi } from "vitest";

vi.mock("../logger", () => ({
createLogger: () => ({
debug: vi.fn(),
info: vi.fn(),
warn: vi.fn(),
error: vi.fn(),
}),
}));

import {
createCompletionRequest,
resolveCompletionRequest,
_resetCompletionRequestsForTests,
} from "../completion-requests";

beforeEach(() => {
_resetCompletionRequestsForTests();
});

describe("createCompletionRequest", () => {
it("creates a new pending request with a unique id", () => {
const a = createCompletionRequest("task-1", "proj-1");
const b = createCompletionRequest("task-2", "proj-1");

expect(a.isNew).toBe(true);
expect(b.isNew).toBe(true);
expect(a.requestId).not.toBe(b.requestId);
});

it("joins the existing request for the same task instead of duplicating", () => {
const first = createCompletionRequest("task-1", "proj-1");
const second = createCompletionRequest("task-1", "proj-1");

expect(second.isNew).toBe(false);
expect(second.requestId).toBe(first.requestId);
expect(second.decision).toBe(first.decision);
});

it("creates a fresh request after the previous one was resolved", () => {
const first = createCompletionRequest("task-1", "proj-1");
resolveCompletionRequest(first.requestId, false);

const second = createCompletionRequest("task-1", "proj-1");
expect(second.isNew).toBe(true);
expect(second.requestId).not.toBe(first.requestId);
});
});

describe("resolveCompletionRequest", () => {
it("resolves the decision promise with true on approval", async () => {
const { requestId, decision } = createCompletionRequest("task-1", "proj-1");

expect(resolveCompletionRequest(requestId, true)).toBe(true);
await expect(decision).resolves.toBe(true);
});

it("resolves the decision promise with false on decline", async () => {
const { requestId, decision } = createCompletionRequest("task-1", "proj-1");

expect(resolveCompletionRequest(requestId, false)).toBe(true);
await expect(decision).resolves.toBe(false);
});

it("returns false for an unknown requestId", () => {
expect(resolveCompletionRequest("nope", true)).toBe(false);
});

it("returns false when resolving the same request twice", () => {
const { requestId } = createCompletionRequest("task-1", "proj-1");
expect(resolveCompletionRequest(requestId, true)).toBe(true);
expect(resolveCompletionRequest(requestId, true)).toBe(false);
});

it("resolves every joined waiter with the same decision", async () => {
const first = createCompletionRequest("task-1", "proj-1");
const second = createCompletionRequest("task-1", "proj-1");

resolveCompletionRequest(first.requestId, true);
await expect(first.decision).resolves.toBe(true);
await expect(second.decision).resolves.toBe(true);
});
});
Loading