Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 73 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,21 +11,31 @@ Local-first โ€œmission controlโ€ for AI coding agent CLIs (and any shell comman
![AgentFleet: Stop Runaway AI Agents with Local Mission Control](screenshots/AgentFleet_Local_AI_Mission_Control.png)

## โœจ Recently Shipped
- **PR #5: Screenshot refresh + dashboard history updates**
- Updated the README demo images to match the latest screenshot set
- Added refreshed history and dashboard visuals for the current UI and spend analytics views
- Kept the product docs aligned with the current MVP surface
- **PR #4: Spend analytics dashboard + session lifecycle improvements**
- Added a spend dashboard with breakdowns by day, repo, command, and model
- Improved server-side session stopping and timeout handling for chat sessions
- Refreshed product docs/screenshots to match the dashboard UI
- **PR #3: LiteLLM chat + terminal replay improvements**
- Added LiteLLM-backed chat support with model selection and the same approve/reject tool flow used by Claude SDK
- Improved persisted PTY replay by stripping alternate-screen escape sequences more reliably
- Refreshed the README screenshots and demo assets to match the current MVP
- **PR #2: Real-time usage tracking**
- Parse Claude Code status lines for more accurate token/cost counting instead of estimates
- Budget enforcement that actually works
- **LiteLLM Spend Analytics tab** (latest)
- Real spend data pulled from your LiteLLM proxy (`/spend/logs`, `/user/daily/activity`)
- Matches Agents Fleet layout: header stats, This Week chart, Weekly Budget strip, By Model and Daily tabs
- Weekly budget resets Sunday; projects spend and flags over-budget
- **Budget 80% warning notifications**
- Native browser notification + in-app toast when a session reaches 80% of its USD or token budget
- Toast auto-dismisses after 8s; works even when browser notifications are blocked
- **One-click session resume** (latest)
- `claude --resume <uuid>` and `codex resume <uuid>` commands are captured automatically on session exit and shown in the Artifacts tab
- **Resume** button spawns a new shell session instantly โ€” no copy-paste needed
- Backfilled across all historical sessions in the database
- **Graceful session exit for Claude and Codex** (latest)
- Stop button sends Ctrl+C โ†’ `/exit` instead of hard-killing, giving Claude/Codex time to save state and print the resume command before exiting
- **Interactive Git Diff Viewer**
- Side-by-side diff display with line-by-line numbering
- Paired removed/added lines render adjacent for easy comparison
- File-level grouping with syntax coloring
- **Spend Analytics Dashboard**
- View total spend by month, week, or day
- Drill down by repo, command, or model
- Real-time cost tracking with USD budgets
- **Budget Tracking in Session Header**
- Display token budgets (input + output combined) and USD budgets side-by-side with current usage
- Shows on all tabs: Shell, Claude (SDK), LiteLLM
- Example: `total 81,772 / 100,000 budget $1.23 / $5.00`
This repository contains a **working MVP**:
- pnpm workspace monorepo
- React + Vite + TypeScript โ€œMission Controlโ€ web app
Expand All @@ -43,7 +53,7 @@ This repository contains a **working MVP**:

**Mission control overview**

![Mission control overview](screenshots/AI_Agent_Mission_Control_System.png)
![Mission control overview](screenshots/AI_Agent_Mission_Control_Overview.png)

**Local-first architecture**

Expand All @@ -53,7 +63,7 @@ This repository contains a **working MVP**:

- Shell session

![New shell session](screenshots/New_Session_Shell.jpg)
![New shell session](screenshots/New_Session_Shell.png)

- Claude (SDK) session

Expand Down Expand Up @@ -81,7 +91,7 @@ This repository contains a **working MVP**:

- Chat conversation view

![Claude SDK chat](screenshots/claude_sdk_Chat.jpg)
![Claude SDK chat](screenshots/claude_sdk_approval_gate.jpg.png)

- Command approval gate

Expand All @@ -101,15 +111,15 @@ This repository contains a **working MVP**:

**Per-session artifacts (git diff snapshots)**

- Git diff snapshot
- Git diff viewer (side-by-side, file tabs)

![Git diff snapshot](screenshots/new_git_diff_long.png)
![Git diff viewer](screenshots/git_diff_viewer.png)

**Spend dashboards / budget tracking**

- Default spend dashboard

![Spend dashboard default view](screenshots/Spend_Dashboard_Default_View.png)
![Spend dashboard default view](screenshots/Spend_Dashboard_Month.png)

- Spend dashboard today

Expand All @@ -131,6 +141,40 @@ This repository contains a **working MVP**:

![Spend dashboard by model](screenshots/Spend_Dashboard_By_Model.png)

**Session resume**

- Resume artifact in Artifacts tab with Copy + Resume button

![Session resume artifact](screenshots/Session_Resume_Artifact.png)

- Resumed session live terminal

![Session resume terminal](screenshots/Session_Resume_Terminal.png)

**LiteLLM Spend Analytics**

- By Model tab โ€” real spend breakdown from proxy

![LiteLLM spend by model](screenshots/LiteLLM_Spend_By_Model.png)

- Daily tab โ€” per-day requests, tokens, and spend

![LiteLLM spend daily](screenshots/LiteLLM_Spend_Daily.png)

**Budget warnings**

- In-app toast (shown when browser notifications are blocked or as a persistent overlay)

![Budget warning toast](screenshots/Budget_Warning_Toast.png)

- Native browser notification

![Budget warning notification](screenshots/Budget_Warning_Notification.png)

- Browser notification permission prompt

![Budget warning permission](screenshots/Budget_Warning_Permission.png)

**SQLite persistence / debug views**

- Sessions table
Expand All @@ -143,7 +187,7 @@ This repository contains a **working MVP**:

### Videos

- `screenshots/AgentFleet_Mission_Control.mp4`
- `screenshots/AgentFleet__AI_Mission_Control.mp4`

The MVP persists several tables in `data/agents_fleet.sqlite`:

Expand All @@ -159,7 +203,7 @@ The MVP persists several tables in `data/agents_fleet.sqlite`:

> Tip: GitHub renders MP4 previews nicely in README. `.mov` files are ignored by default in `.gitignore` to avoid bloating git history.

- `screenshots/Agents_Fleet__Mission_Control_for_Your_Local_AI_Workers.mp4`
- `screenshots/AgentFleet__AI_Mission_Control.mp4`

## Architecture
See `ARCHITECTURE.md`.
Expand Down Expand Up @@ -292,7 +336,7 @@ Notes:
Screenshots:
- Claude SDK session stopped by budget

![Claude SDK budget stop](screenshots/claude_sdk_Chat.jpg)
![Claude SDK budget stop](screenshots/claude_sdk_approval_gate.jpg.png)

- Claude SDK tool call + output

Expand Down Expand Up @@ -332,10 +376,13 @@ If you're running a local or enterprise LiteLLM proxy, Agents Fleet will route a

## Budgets (estimated)
- Optional `Budget USD` and/or `Budget tokens` apply to the entire session lifetime.
- **Token budget:** counts **input + output tokens combined**. When `total_tokens >= budget_tokens`, the session stops.
- **USD budget:** when `estimated_cost >= budget_usd`, the session stops.
- Token estimation: `ceil(text.length / 4)`.
- Cost estimation:
- shell/PTY sessions use the default rates in `apps/server/src/budget.ts`
- shell/PTY sessions use the default rates in `apps/server/src/budget.ts` ($3.00 per 1M input, $15.00 per 1M output by default)
- Claude SDK sessions use a model-based pricing table (`computeModelCostUsd`) and SDK-reported usage when available.
- LiteLLM sessions use model-specific pricing from your LiteLLM proxy configuration.
- If a budget is exceeded, the session is stopped automatically and `stop_reason` becomes `budget_exceeded`.

> Note: USD cost is still an estimate unless you configure model pricing to match your account/contract.
Expand Down Expand Up @@ -375,3 +422,4 @@ COREPACK_HOME="$PWD/.corepack" pnpm -C apps/server test
- PTY sessions do not preserve stdout/stderr separation.
- Token/cost is an estimate unless the CLI provides actual usage.
- Some TUIs (notably Claude) may clear/restore the alternate screen on exit. The persisted replay is a faithful stream replay, so end-of-session scrollback may differ from what you remember seeing just before exit.
- **No multi-line input in the terminal pane.** The xterm.js terminal forwards keystrokes directly to the PTY; the shell owns the line and executes on Enter. Shift+Enter, Ctrl+Enter, etc. all behave the same as plain Enter โ€” there is no way to insert a newline without submitting at the terminal protocol level. Workarounds inside the shell: end a line with `\` for continuation, or use `$'line1\nline2'` quoting. Inside Claude Code's TUI specifically, `Option+Enter` inserts a newline in the prompt. For free-form multi-line composition, use the Claude (SDK) or LiteLLM chat tabs instead, where Shift+Enter works as expected.
39 changes: 12 additions & 27 deletions ROADMAP.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,21 @@
# Agents Fleet Roadmap

## Now (MVP hardening)
- CI for build/lint/typecheck/test (done: GitHub Actions runs `pnpm lint`, `pnpm typecheck`, `pnpm -r test`, `pnpm build`).
- Improve terminal fit/focus reliability across reloads/session switches.
- Harden crash recovery: detect orphaned running sessions and mark them ended on server start.
- Better error surfaces in UI (spawn failures, invalid repo paths, budget stops).
- Budget accuracy hardening: strip ANSI escape sequences before token estimation (done).
- Capture git diff + changed files per session (optional on stop / on exit) and store as a per-session artifact. (done: stored in `session_artifacts`, viewable in UI)

### Claude SDK chat (done)
- Chat-style UI backed by Anthropic SDK (done).
- Per-session transcript persisted as artifacts (done).
- WS streaming for assistant output (done).
- Tool-calling: `run_command` (any shell command) executed in repo, gated by Approve/Reject (done).
- Tool output capped (100KB) to protect context/budgets (done).
- Budget enforcement for Claude SDK sessions, including within tool loops (done; model-aware cost estimate).
- Display session id and token usage (input/output; thinking/cache when present via usage artifacts) (done).

### LiteLLM chat (done)
- Chat-style UI backed by LiteLLM proxy support (done).
- Model selection in the UI for proxy-backed chat sessions (done).
- Tool-calling with the same Approve/Reject workflow as Claude SDK (done).
- Persisted usage/session artifacts for budget enforcement and replay (done).
- Support enterprise/custom proxy endpoints via `LITELLM_BASE_URL` + `LITELLM_API_KEY` (done).
## Done
- Budget 80% warning: native browser notification + in-app toast when a session hits 80% of its USD or token budget.
- One-click session resume: Resume button in Artifacts tab spawns a new shell session instantly. `claude --resume` / `codex resume` captured on graceful exit, backfilled across all historical sessions.
- Graceful exit for Claude and Codex: Stop button sends Ctrl+C โ†’ `/exit` before force-killing, so state is saved and the resume command is printed.
- Crash recovery: sessions stuck in `running` on server start are automatically marked `stopped` with `stop_reason: crash_recovery`.
- LiteLLM Spend Analytics: dedicated tab in Spend Analytics pulling real cost data from LiteLLM proxy โ€” header stats, This Week chart, Weekly Budget strip (Sunday reset), By Model and Daily breakdown tabs.

## Next (Agent mission control)
- One-click rerun: restart a historical session (repoPath + command) in a fresh process.
- Per-session artifacts UX: view/export bundle (diff, changed files list, PTY replay export). (in progress: artifacts tab + JSON diff view)
- Make model pricing configurable (env/JSON) instead of hardcoded table.
- Improve budget accuracy: use model-specific pricing and SDK usage everywhere; add tests.
1. **Multiple sessions management** โ€” Batch-stop, group by repo, launch parallel sessions from the UI. Core to the "fleet" value prop.
2. **Per-session artifacts UX** โ€” View/export bundle (diff, changed files list, PTY replay export).
3. **Model pricing configurable** โ€” env/JSON override instead of hardcoded table.
4. **Budget accuracy hardening** โ€” Model-specific pricing and SDK-reported usage everywhere; add tests.

## Later
- Paste/attachments in Claude (SDK) and LiteLLM chat (images, files via Anthropic Files API).
- UI polish pass: budget progress animations, spinners, improved session status indicators.
- Exact token/cost via a local tokenizer (e.g. WASM tiktoken) for closer budget enforcement.
- Team/shared workspaces (still local-first, optional sync later).
- Optional local proxy mode for exact budget enforcement and richer telemetry.
31 changes: 30 additions & 1 deletion apps/server/src/db.ts
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import crypto from "node:crypto";
import fs from "node:fs";
import path from "node:path";
import Database from "better-sqlite3";
Expand Down Expand Up @@ -125,5 +126,33 @@ export function getDb(): Db {
}

export async function bootstrapDb(): Promise<void> {
getDb();
const db = getDb();

// Recover orphaned sessions: any session still marked 'running' at startup
// was never cleaned up (server crash, forced kill, etc.). Mark them stopped.
const now = new Date().toISOString();
const orphans = db
.prepare("SELECT id FROM sessions WHERE status = 'running'")
.all() as { id: string }[];

if (orphans.length > 0) {
const markStopped = db.prepare(
`UPDATE sessions SET status = 'stopped', ended_at = ?, stop_reason = 'crash_recovery'
WHERE id = ?`,
);
const insertMarker = db.prepare(
`INSERT INTO session_markers (id, session_id, timestamp, kind)
VALUES (?, ?, ?, 'crash_recovery')`,
);
const tx = db.transaction(() => {
for (const { id } of orphans) {
markStopped.run(now, id);
insertMarker.run(crypto.randomUUID(), id, now);
}
});
tx();
console.log(
`[crash-recovery] Marked ${orphans.length} orphaned session(s) as stopped.`,
);
}
}
28 changes: 28 additions & 0 deletions apps/server/src/gitArtifacts.ts
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,34 @@ export function storeSessionArtifact(args: {
} as const;
}

export function captureResumeArtifact(sessionId: string, command: string) {
const db = getDb();
const rows = db
.prepare(
"SELECT data FROM pty_chunks WHERE session_id = ? ORDER BY timestamp ASC",
)
.all(sessionId) as { data: string }[];

const fullText = rows.map((r) => r.data).join("");

// Strip ANSI escape sequences before matching.
// eslint-disable-next-line no-control-regex
const clean = fullText.replace(/\x1b\[[0-9;]*[a-zA-Z]/g, "").replace(/\x1b\][^\x07]*\x07/g, "");

const prefix = command === "codex" ? "codex" : "claude";
const pattern = command === "codex"
? `codex\\s+resume\\s+([a-f0-9-]{36})`
: `claude\\s+--resume\\s+([a-f0-9-]{36})`;
const m = clean.match(new RegExp(pattern, "i"));
if (!m) return null;

const resumeCommand = command === "codex"
? `codex resume ${m[1]}`
: `claude --resume ${m[1]}`;
storeSessionArtifact({ sessionId, kind: `${prefix}_resume`, content: resumeCommand });
return resumeCommand;
}

export function buildGitArtifactContent(snapshot: GitSnapshot): string {
const payload: GitArtifactV1 = {
v: 1,
Expand Down
Loading
Loading