Skip to content
52 changes: 52 additions & 0 deletions docs/zeus-prompt-cache-hud.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Prompt cache HUD

A status bar item that shows AI cost and prompt-cache state in real time. The point is transparency: developers should be able to feel each AI call's cost so they can change behavior, not be surprised at the end of the month.

This is positioned directly against Cursor's credit model, which most users describe as opaque. We show the raw numbers; users can hide it if they don't care.

## Status bar layout

```text
⚡ 2 agents · 92% cache · $0.003 / $0.41 today
Comment thread
kanywst marked this conversation as resolved.
```

- **`⚡ N agents`**: number of currently-running subagents. Clickable → opens parallel-agents view
- **`X% cache`**: rolling cache hit ratio over the last 100 requests. State (window + day totals) lives on the singleton **main-process** service `IAiCostService`, not directly in `IStorageService`, so multiple workbench windows opened against different workspaces stay consistent. The main process persists snapshots to `IStorageService.APPLICATION` (key `zeus.ai.cache.window`) on a 1s debounce so a hard kill doesn't lose more than ~1s of data. The debounce is durability-only — renderer windows stay in sync via IPC subscriptions, not by re-reading storage, so write frequency does not affect UI latency or cross-window consistency.
- **`$X.XXX`**: cost of the most recent AI call
- **`$X.XX today`**: cumulative cost for the local day (resets midnight)

All visible strings (`agents`, `cache`, `today`) ship through `nls.localize` so translations land alongside other UI localisation. The currency symbol and decimal separator are formatted via `Intl.NumberFormat(locale, { style: 'currency', currency: 'USD' })`, with a setting `zeus.ai.hud.currency` to override the display currency for users whose Anthropic billing is in another currency.

Hovering over each segment shows a tooltip with the breakdown (input tokens / output tokens / cached tokens / cost per million).

## Source of data

- Live agent count: `IAgentRuntime` event stream (`feat/agent-sdk`)
- Per-call cost and cache state: Anthropic SDK `usage` field, mapped to current model pricing
- Today's cumulative: owned by the `IAiCostService` singleton in the **main process** (ensures atomic updates across all open workbench windows — `IStorageService` writes from concurrent renderers would race and silently lose updates). The main process snapshots the state to `IStorageService.APPLICATION` (per-user, cross-workspace) under key `zeus.ai.cost` holding `{ date: "YYYY-MM-DD", total: number }`. Application scope (not workspace) because the user's per-day spend should not reset when switching between workspaces — the goal is to surface real cost, not per-project cost. When local midnight passes, the record is reset before the next write, so storage doesn't grow per day. A future `zeus.ai.hud.scope` setting can flip it to `WORKSPACE` if a user wants per-project tracking.

## Configuration

- `zeus.ai.hud.enabled` (default: `true`) — show the HUD at all
- `zeus.ai.hud.detail` (`"compact" | "verbose"`) — controls the format
- `zeus.ai.hud.todayLimit` (number | null) — soft cap in USD (matches the units shown in the status bar); turns the cost segment red when exceeded, no enforcement. `null` disables the colouring.
- `zeus.ai.hud.stalePricingDays` (number | null, default `30`) — show the `⚠` stale-pricing glyph once the bundled pricing file is older than this many days. `null` disables the warning entirely for users in restricted environments (corporate-locked editor versions, offline installs) where update cadence is out of their control.

The HUD is implemented as **multiple adjacent `StatusBarItem`s** (agents, cache, cost, today). VS Code's `StatusBarItem` API does not support per-segment coloring inside a single item, so the colored "over limit" treatment lives on its own item.

## Why no enforcement

Hard credit caps are what makes Cursor frustrating. Zeus shows the number; the user decides whether to stop. If you want enforcement, the local LLM path or a custom MCP proxy can give you that.

## Acceptance criteria (real impl)

- [ ] Status bar item appears when an Anthropic AI feature is configured
- [ ] Real-time updates within ~200ms of each request completing
- [ ] Hover tooltip shows token / cost breakdown
- [ ] `today` value persists across editor restarts in the same local day
- [ ] Setting `zeus.ai.hud.enabled = false` hides the item entirely
- [ ] Pricing table lives in a bundled JSON file (`src/vs/workbench/contrib/aiHud/common/anthropicPricing.json`) shipped with the build. Updated by a dependabot-style PR when the upstream price page changes — see `script/refresh-pricing.mjs`. The HUD never makes a live network call for pricing on a hot path (latency + offline). If the file is older than `zeus.ai.hud.stalePricingDays` (default `30`) the HUD shows a small `⚠` glyph next to the cost segment and the tooltip says `"Pricing data from {date}; estimates may be stale — update Zeus"`. The user-visible numbers continue to use the bundled table; the warning is visible because the transparency goal of this feature is broken if users silently look at outdated estimates. Setting the threshold to `null` suppresses the glyph for users whose editor version cadence is out of their control.

## Status

Slot reserved at `src/vs/workbench/contrib/aiHud/`. Depends on `IAgentRuntime` (`feat/agent-sdk`).
5 changes: 5 additions & 0 deletions src/vs/workbench/contrib/aiHud/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# `aiHud` contribution

Slot for the status bar HUD that shows AI cost and prompt-cache state. Design at [`docs/zeus-prompt-cache-hud.md`](../../../../../docs/zeus-prompt-cache-hud.md).

Reads from `IAgentRuntime` (`feat/agent-sdk`). No enforcement, just transparency.
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,18 @@ import * as assert from 'assert';
import { McpStdioStateHandler } from '../../node/mcpStdioStateHandler.js';
import { isWindows } from '../../../../../base/common/platform.js';

const GRACE_TIME = 100;
// 1000ms gives the child enough time on slow CI runners to handle
// SIGTERM and flush stdout before the parent escalates to SIGKILL. 100ms
// was racy on Linux containers; 250ms still tripped under load. The
// test's `delay >= GRACE_TIME` assertion still scales correctly.
const GRACE_TIME = 1000;

suite('McpStdioStateHandler', () => {
// `sigkill after grace` waits at least GRACE_TIME * 2 = 2000ms, which
// collides with mocha's default 2000ms test timeout. Raise the suite
// timeout so the SIGTERM-then-SIGKILL path has room without becoming
// flaky again.
suite('McpStdioStateHandler', function () {
this.timeout(10_000);
const store = ensureNoDisposablesAreLeakedInTestSuite();

function run(code: string) {
Expand Down Expand Up @@ -53,11 +62,7 @@ suite('McpStdioStateHandler', () => {
assert.strictEqual(result.trim(), 'Data received: Hello MCP!');
});

// Flaky on shared CI: the child process can exit before its
// post-SIGTERM stdout flush lands, so the test sees 'stdin ended'
// only and not 'stdin ended\nSIGTERM received'. Skip on CI until
// the upstream subprocess flush race is properly fixed.
if (!isWindows && !process.env.CI && !process.env.GITHUB_ACTIONS) {
if (!isWindows) {
test('sigterm after grace', async () => {
const { handler, output } = run(`
setInterval(() => {}, 1000);
Expand Down