garrytan · garrytan · May 21, 2026 · May 21, 2026 · May 21, 2026 · May 21, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,93 @@
 
 All notable changes to GBrain will be documented in this file.
 
+## [0.39.0.0] - 2026-05-21
+
+**You can finally cap the cost of `gbrain brainstorm` and `gbrain lsd`, AND if the cap fires mid-run, you can resume right where you left off without losing the ideas you already paid for.**
+
+The 13K-page brain incident that started this wave is real and was expensive. A `gbrain lsd` run estimated $0.96, actually billed $50.71, generated zero usable ideas. The fix wave already merged (PR #1234) capped the prefix sampling that caused the explosion. This release goes one cathedral further: every LLM call that any `gbrain` command makes is now accounted at the gateway layer, so the same cap that protects brainstorm also protects `doctor --remediate`, `eval suspected-contradictions`, the dream cycle, and any future LLM-calling command. The plumbing is shared.
+
+What that means in the hand: pass `--max-cost N` to brainstorm or lsd or `doctor --remediate`, and the first overflow throws a typed error before any extra dollars are spent. The throw fires from inside the gateway's reserve check, so a budget exhaustion never even acquires a rate-lease slot or makes a provider HTTP call. The cap is a real ceiling, not a suggestion.
+
+When brainstorm IS exhausted mid-run, the orchestrator persists what's been done to `~/.gbrain/brainstorm/<run_id>.json` with the FULL idea bodies (not just counts), then re-throws. The user paste-runs the suggested `gbrain brainstorm --resume <run_id>` and the second run skips the already-completed crosses, runs only the missing ones, then merges everything before the judge runs. The final BrainstormResult contains the pre-crash ideas AND the post-resume ideas. (Codex's outside-voice review was the one that caught this — a resume that produces only the second-run's ideas would be silent partial output, which is worse than no resume at all.)
+
+### How to turn it on
+
+```bash
+# Cap brainstorm cost at $2 (default $5). Throws BudgetExhausted if exceeded.
+gbrain brainstorm "what story should I write next" --max-cost 2
+
+# Crash recovery — list saved runs, resume the one you want.
+gbrain brainstorm --list-runs
+gbrain brainstorm --resume 1a2b3c4d5e6f7890
+
+# Bypass the 7-day staleness gate if you really mean it.
+gbrain brainstorm --resume 1a2b3c4d5e6f7890 --force-resume
+
+# Same cap, different command — doctor's autonomous remediation now resumes too.
+gbrain doctor --remediate --max-cost 5
+# (on BudgetExhausted, the run persists a checkpoint at
+#  ~/.gbrain/remediation/<plan_hash>.json and tells you the --resume command)
+gbrain doctor --remediate --resume
+```
+
+### What's safe to know about
+
+A4 amended is a semantic shift: `gbrain doctor --remediate --max-usd` used to be a pre-flight estimate check ("refuse if est > cap"); it's now ALSO a mid-run hard ceiling backed by BudgetTracker via the gateway's AsyncLocalStorage scope. If you cron-schedule `--remediate`, the worst case used to be "the run starts despite the under-estimate"; now the worst case is "the run aborts mid-step and writes a resumable checkpoint." The first failure-mode is gone; the second is recoverable via `--resume`. `--max-cost` is a new alias for `--max-usd` for symmetry with brainstorm.
+
+The brainstorm checkpoint identity intentionally uses NO embedding bits: `run_id = sha256(question + profile + sort(close_slugs) + sort(far_slugs)).slice(0,16)`. Swap your embedding model between runs and the resume still finds the checkpoint. Conversely, change the question by even one word and you get a different run_id (the previous checkpoint is left alone; the cycle purge phase GCs anything older than 7 days).
+
+The dream cycle's `~/.gbrain/audit/dream-budget-YYYY-Www.jsonl` grew one new field on every line: `schema_version: 1`. Reorderings are tolerated (downstream consumers should index by field name, not position); renames or removals are breaking. The same schema-stable contract holds for the new `~/.gbrain/audit/budget-YYYY-Www.jsonl` produced by the unified `BudgetTracker`.
+
+If you wrote integration code against `BudgetExhausted` in the brainstorm orchestrator before this release: that class moved to `src/core/budget/budget-tracker.ts`. The orchestrator re-exports the old name for back-compat, so existing imports keep working.
+
+### Itemized changes
+
+- **`BudgetTracker` is the new canonical primitive** at `src/core/budget/budget-tracker.ts`. One class, one typed error (`BudgetExhausted` with `reason: 'cost' | 'runtime' | 'no_pricing'`), one schema-stable audit JSONL. Pinned by 18 unit cases covering TX1 (record throws when cumulative exceeds cap), TX2 (no_pricing hard-fails when cap is set + pricing missing), A3 amended (pessimistic fallback when `err.usage` is absent), the onExhausted-fires-once-before-throw contract, and the schema-stable audit schema.
+- **`withBudgetTracker(tracker, fn)` at the gateway layer (TX5)** installs the tracker on a module-internal `AsyncLocalStorage<BudgetTracker>`. Every `gateway.chat / embed / rerank` call inside the scope auto-composes. Outside-scope calls are budget no-ops (existing behavior preserved). Nested scopes restore the outer on exit. Parallel `Promise.all` scopes do not bleed trackers across each other.
+- **Subagent rate-lease ordering pinned (A1)**: the gateway's `reserve()` runs BEFORE `acquireLease()` in `src/core/minions/handlers/subagent.ts`. A budget throw must NOT consume a rate-lease slot. The handler body itself no longer needs explicit budget threading; the AsyncLocalStorage composition handles it.
+- **`payload-fitter.ts` (P6)** lands at `src/core/diarize/payload-fitter.ts` with two strategies. `'batch'` is deterministic token-budgeted chunking, no LLM calls. `'summarize'` embed-clusters then Haiku-summarizes each cluster in parallel via `Promise.allSettled` at parallelism=4. The quality gate flags `degraded: true` when success ratio drops below the configured `min_success_ratio` (default 0.75) — caller decides whether to surface or abort.
+- **Brainstorm checkpoint (P7)** at `src/core/brainstorm/checkpoint.ts`. Atomic .tmp+rename writes. Full idea bodies persisted (TX3). One-flag resume (TX4). 7-day mtime-based GC wired into the cycle purge phase.
+- **`doctor --remediate --resume`** loads `~/.gbrain/remediation/<plan_hash>.json` and continues from the next un-completed step. Refuses on mismatched plan_hash with a paste-ready message.
+- **`gbrain brainstorm --list-runs`** prints saved run_ids + iso dates + question stems so the user can pick which to resume.
+- **ISO-week audit filenames consolidated** into `src/core/audit-week-file.ts`. Four call sites migrated (shell-jobs, phantoms, slug-fallback, dream-budget). Year-boundary cases (2020-W53, 2024-12-30 belongs to 2025-W01) pinned by tests.
+- **eval-contradictions** routes through `withBudgetTracker` for telemetry without changing the CLI surface. `--budget-usd` semantics + `PreFlightBudgetError` shape are byte-identical.
+
+### For contributors
+
+- `bun test` adds 73 new tests across 9 new files (`test/core/budget/`, `test/core/audit-week-file.test.ts`, `test/core/diarize/`, `test/brainstorm/checkpoint.test.ts`, `test/e2e/brainstorm-resume.test.ts`, `test/core/remediation-checkpoint.test.ts`). Plus F1 closes the pre-existing PGLite `page_links` schema gap (the brainstorm domain-bank queries `page_links` but the embedded schema only defined `links`). Brainstorm now works against PGLite brains in production via the new `page_links` view alias shipped in both the embedded schema bundle and migration v81. F2 adds an E2E pinning the user-facing `--max-cost` pre-flight refusal path. F3 adds `--max-cost` to `gbrain reindex --code`. All previous brainstorm + doctor + eval-contradictions tests still pass.
+
+## To take advantage of v0.39.0.0
+
+`gbrain upgrade` should do this automatically. If it didn't, or if `gbrain doctor`
+warns about a partial migration:
+
+1. **Run the orchestrator manually:**
+   ```bash
+   gbrain apply-migrations --yes
+   ```
+   This applies migration v81 (`page_links_view_alias`) on PGLite + Postgres brains. The alias is required for `gbrain brainstorm` and `gbrain lsd` to work against the domain-bank tiebreaker; without it, the brainstorm domain-bank queries fail with `relation "page_links" does not exist`.
+2. **Set a cost cap on the commands you care about:**
+   ```bash
+   # Sets a per-run dollar ceiling. Throws BudgetExhausted before any LLM call
+   # if the pre-run estimate exceeds the cap, AND mid-run if cumulative spend
+   # blows past it.
+   gbrain brainstorm "test" --max-cost 1
+   gbrain doctor --remediate --max-cost 5
+   gbrain reindex --code --max-cost 10
+   ```
+3. **Verify the outcome:**
+   ```bash
+   gbrain doctor             # schema_version should be 81
+   gbrain brainstorm --list-runs   # confirms the new checkpoint directory exists
+   ```
+4. **If any step fails or the numbers look wrong,** please file an issue:
+   https://github.com/garrytan/gbrain/issues with:
+   - output of `gbrain doctor`
+   - contents of `~/.gbrain/upgrade-errors.jsonl` if it exists
+   - which step broke
+
+   This feedback loop is how the gbrain maintainers find fragile upgrade paths. Thank you.
 ## [0.37.11.0] - 2026-05-21
 
 **Fresh `gbrain init --pglite` works out of the box now.**

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -107,6 +107,12 @@ strict behavior when unset.
 - `src/core/ai/recipes/voyage.ts` — Voyage AI openai-compatible recipe. **v0.28.7 (#680):** declares `chars_per_token=1` + `safety_factor=0.5` so the gateway pre-splits Voyage batches at a 60K-character budget (50% of 120K-token cap with the dense-tokenizer ratio). Closes the v0.27 backfill loop where ~26% of the corpus stayed un-embedded because tiktoken-grounded budgeting silently undercounted Voyage's actual token usage. **v0.28.11 (#719):** declares `multimodal_models: ['voyage-multimodal-3']` so the gateway rejects text-only Voyage models pointed at the multimodal endpoint with a clear `AIConfigError` instead of waiting for Voyage's HTTP 400. **v0.33.1.1 (#962, fixup):** recipe docstring at `:7-16` tightened to name the seven hosted flexible-dim models that accept `output_dimension` explicitly (`voyage-4-large`, `voyage-4`, `voyage-4-lite`, `voyage-3-large`, `voyage-3.5`, `voyage-3.5-lite`, `voyage-code-3`) and call out that `voyage-4-nano` is the open-weight variant listed separately by Voyage as fixed 1024-dim — does NOT accept the parameter. The "all v4 variants are flexible" misread is what caused the original PR to include nano in `VOYAGE_OUTPUT_DIMENSION_MODELS`; the negative regression assertion in `test/ai/gateway.test.ts` (`dimsProviderOptions` returns `undefined` for `voyage-4-nano`) pins the contract. **v0.37.3.0:** `voyage-code-3` is the recommended embedding model for gstack per-worktree code brains (Topology 3 in `docs/architecture/topologies.md`). Registration was already in the `models` list since pre-v0.33; the v0.37.3.0 wave adds discoverability surfaces — decision-tree branch in `docs/integrations/embedding-providers.md`, Topology 3 "Recommended embedding model" subsection, runtime nudge from `gbrain reindex --code` against non-code-tuned models. Recipe-shape regression pinned by `test/ai/voyage-code-3-recipe.test.ts`.
 - `src/core/ai/recipes/anthropic.ts` — Anthropic recipe (chat + expansion touchpoints). **v0.31.12:** chat and expansion `models:` lists drop the v0.31.6 phantom `claude-sonnet-4-6-20250929` date suffix — canonical id is `claude-sonnet-4-6`. The wrong-direction alias `claude-sonnet-4-6 → claude-sonnet-4-6-20250929` is removed; a reverse alias `claude-sonnet-4-6-20250929 → claude-sonnet-4-6` keeps stale user configs working (rescues `facts.extraction_model` and `models.dream.synthesize` set by v0.31.6 installs). Recipe-shape regression pinned by `test/anthropic-model-ids.test.ts` (6 cases, verbatim cherry-pick of PR #830 plus the reverse-alias rescue case).
 - `src/core/anthropic-pricing.ts` — Single source of truth for Anthropic model pricing (per-MTok input/output). **v0.31.12:** Opus 4.7 corrected from `$15/$75` to `$5/$25` (the old number was from Opus 4 generation, never refreshed when 4.7 shipped); Opus 4.6 also corrected. Consumed by `src/core/budget-meter.ts` and `src/core/cross-modal-eval/runner.ts` — the cross-modal estimator now reads `ANTHROPIC_PRICING` for Anthropic models instead of duplicating the table, killing the v0.31.6 drift bug class.
+- `src/core/budget/budget-tracker.ts` (v0.37.x) — keystone primitive for the brainstorm cost-cathedral wave. One typed error (`BudgetExhausted` with `reason: 'cost' | 'runtime' | 'no_pricing'`), one schema-stable audit JSONL at `~/.gbrain/audit/budget-YYYY-Www.jsonl`. Contracts pinned by 18 unit cases: **TX1** — `record()` throws when cumulative spend exceeds cap (the cap is a real ceiling, not a suggestion); **TX2** — `reserve()` hard-fails with `reason: 'no_pricing'` when `maxCostUsd` is set AND the model is missing from pricing maps (warn-once preserved when cap is unset); **A3 amended** — `extractUsageFromError(err, fallback)` returns `err.usage` when SDK provides it, else the pessimistic fallback (caller passes `maxOutputTokens`, not the optimistic pre-call estimate). `onExhausted(cb)` callback fires once synchronously BEFORE the throw propagates so callers can persist checkpoints. Replaces three parallel copies (inline brainstorm class, cycle/budget-meter, eval-contradictions). Adapts the old `BudgetMeter` via T5 (public shape preserved + `schema_version: 1` stamped on every dream-budget audit line).
+- `src/core/audit-week-file.ts` (v0.37.x, Q1) — single source of truth for ISO-week audit JSONL filename math. Exports `isoWeek(d)`, `isoWeekFilename(prefix, now?)`, `resolveAuditDir()` (honors `GBRAIN_AUDIT_DIR`). Year-boundary correctness pinned by tests at 2020-W53 (the 53-week year), 2025-W01 rolling in from 2024-12-30 (Monday), 2026-W01. Four call sites migrated in T4: `src/core/minions/handlers/shell-audit.ts`, `src/core/facts/phantom-audit.ts`, `src/core/audit-slug-fallback.ts`, `src/core/cycle/budget-meter.ts`. Each call site keeps its `compute<X>AuditFilename` thin wrapper for back-compat with existing tests.
+- `src/core/ai/gateway.ts:withBudgetTracker` (v0.37.x, T3 / TX5) — gateway-layer enforcement via `AsyncLocalStorage<BudgetTracker>`. `withBudgetTracker(tracker, fn)` installs the tracker on the module-internal store; every `gateway.chat / embed / rerank` call inside the scope auto-composes (reserve before, record in try/finally). Outside-scope calls are budget no-ops (current behavior preserved). Nested scopes restore the outer tracker on exit. `getCurrentBudgetTracker()` is the test seam. The chat path uses A3-amended pessimistic fallback on error paths; the embed path estimates input tokens from char count × recipe's `chars_per_token` because the AI SDK doesn't surface per-batch embed token usage; the rerank path estimates char count of query+docs. 6 unit cases pin the contract.
+- `src/core/diarize/payload-fitter.ts` (v0.37.x, P6 / Q3) — generic fit-arbitrarily-large-items-into-per-call-token-budget utility. `'batch'` strategy is deterministic token-budgeted chunking with no LLM calls. `'summarize'` strategy embed-clusters into ceil(items/4) groups via cheap deterministic nearest-neighbor on cosine, Haiku-summarizes each cluster via `Promise.allSettled` at parallelism=4 (Perf1). Each Haiku call composes the active BudgetTracker via T3's AsyncLocalStorage. The quality gate (codex outside-voice finding #4): when `success_ratio < min_success_ratio` (default 0.75), result is flagged `degraded: true` — the fitter preserves the successful subset; the caller decides whether to surface a partial result or abort.
+- `src/core/brainstorm/checkpoint.ts` (v0.37.x, P7 / TX3+TX4+A5 amended) — crash-resilient checkpoint for `gbrain brainstorm` and `gbrain lsd`. Persists FULL idea bodies (~50KB per run) so resume can MERGE the pre-crash ideas with the post-resume ideas before the judge runs (codex's load-bearing finding — a resume that produces only second-run output is silent partial output). `run_id = sha256(question + profile + sort(close_slugs) + sort(far_slugs)).slice(0,16)` — NO embedding bits, stable across embedding-model swaps. Atomic write via `.tmp + rename`. ONE resume flag (`--resume <run_id>` — the proposed `--retry-failed` was dropped per TX4: failed AND never-attempted crosses both go through `--resume`). `--list-runs` prints saved run_ids mtime-newest-first. `--force-resume` bypasses the 7-day staleness gate. The cycle purge phase (`gbrain dream --phase purge`) GCs checkpoints older than 7 days via `gcStaleCheckpoints(7)`. Pinned by 20 unit cases + 3 E2E cases in `test/e2e/brainstorm-resume.test.ts` including the load-bearing merge contract.
+- `src/core/remediation-checkpoint.ts` (v0.37.x, T7 / A4 amended) — `doctor --remediate` checkpoint at `~/.gbrain/remediation/<plan_hash>.json`. `plan_hash = sha256(JSON.stringify(sorted recommendation ids)).slice(0,16)`. Schema-versioned. Atomic write via `.tmp + rename`. `gbrain doctor --remediate --resume <plan_hash>` (or with no arg — picks the newest matching checkpoint) loads it and skips already-completed steps. Mismatched plan_hash refuses with a paste-ready message. Cleared on clean completion. Pinned by 13 unit cases.
 - `src/core/model-config.ts` — Model-string resolution (the seam every internal LLM call walks through). **v0.31.12:** four-tier system (`ModelTier = 'utility' | 'reasoning' | 'deep' | 'subagent'`) with `TIER_DEFAULTS` (utility→haiku-4-5, reasoning→sonnet-4-6, deep→opus-4-7, subagent→sonnet-4-6) and `tier?: ModelTier` on `ResolveModelOpts`. Resolution chain is now 8 steps: cliFlag → deprecated key → config key → `models.default` → `models.tier.<tier>` → env var → `TIER_DEFAULTS[tier]` → caller fallback. Two new exports — `isAnthropicProvider(modelString)` checks `provider:model` prefix OR `claude-` bare-id pattern, and `enforceSubagentAnthropic()` is the layer-2 runtime guard: when `tier === 'subagent'` resolves to a non-Anthropic provider, it emits a once-per-`(source, model)` stderr warn AND falls back to `TIER_DEFAULTS.subagent` instead of letting the Anthropic Messages API tool-loop attempt to run on OpenAI/Gemini. `_resetDeprecationWarningsForTest()` now also clears `_subagentTierWarningsEmitted` so tests re-emit.
 - `src/core/ai/model-resolver.ts` — Recipe-touchpoint validator. **v0.31.12:** `assertTouchpoint(recipe, touchpoint, modelId, extendedModels?)` gains an optional 4th `extendedModels: ReadonlySet<string>` argument. When the modelId is in that set, the native-recipe allowlist throw is bypassed — the user explicitly opted into this model via config so we let provider rejection surface as `model_not_found` at HTTP call time (and `gbrain models doctor` catches it earlier). Default code paths with hardcoded model strings MUST NOT pass `extendedModels` — typos in source code still fail fast. Replaces the earlier plan to soften the validator wholesale (Codex F4/F5 in plan review flagged that as too broad — it would have removed the fail-fast contract for chat + expand + embed all three).
 - `src/core/ai/gateway.ts` extension (v0.31.12) — new module-scoped `_extendedModels: Map<providerId, Set<modelId>>` registry feeds `assertTouchpoint`'s 4th-arg path. New `reconfigureGatewayWithEngine(engine)` async function is called from `cli.ts` after `engine.connect()` (and before every command except `CLI_ONLY` no-DB commands) — re-resolves expansion + chat defaults through `resolveModel()` so `models.tier.*` and `models.default` overrides apply to expansion + chat both. `DEFAULT_CHAT_MODEL` corrected to `anthropic:claude-sonnet-4-6` (was the v0.31.6 phantom `-20250929`). New `__setChatTransportForTests` seam mirrors `__setEmbedTransportForTests` so tests drive `chat()` with a stubbed transport.