diff --git a/CHANGELOG.md b/CHANGELOG.md
index 58ba1bd3d..108a527b8 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,6 +2,93 @@
 
 All notable changes to GBrain will be documented in this file.
 
+## [0.39.0.0] - 2026-05-21
+
+**You can finally cap the cost of `gbrain brainstorm` and `gbrain lsd`, AND if the cap fires mid-run, you can resume right where you left off without losing the ideas you already paid for.**
+
+The 13K-page brain incident that started this wave is real and was expensive. A `gbrain lsd` run estimated $0.96, actually billed $50.71, generated zero usable ideas. The fix wave already merged (PR #1234) capped the prefix sampling that caused the explosion. This release goes one cathedral further: every LLM call that any `gbrain` command makes is now accounted at the gateway layer, so the same cap that protects brainstorm also protects `doctor --remediate`, `eval suspected-contradictions`, the dream cycle, and any future LLM-calling command. The plumbing is shared.
+
+What that means in the hand: pass `--max-cost N` to brainstorm or lsd or `doctor --remediate`, and the first overflow throws a typed error before any extra dollars are spent. The throw fires from inside the gateway's reserve check, so a budget exhaustion never even acquires a rate-lease slot or makes a provider HTTP call. The cap is a real ceiling, not a suggestion.
+
+When brainstorm IS exhausted mid-run, the orchestrator persists what's been done to `~/.gbrain/brainstorm/<run_id>.json` with the FULL idea bodies (not just counts), then re-throws. The user paste-runs the suggested `gbrain brainstorm --resume <run_id>` and the second run skips the already-completed crosses, runs only the missing ones, then merges everything before the judge runs. The final BrainstormResult contains the pre-crash ideas AND the post-resume ideas. (Codex's outside-voice review was the one that caught this — a resume that produces only the second-run's ideas would be silent partial output, which is worse than no resume at all.)
+
+### How to turn it on
+
+```bash
+# Cap brainstorm cost at $2 (default $5). Throws BudgetExhausted if exceeded.
+gbrain brainstorm "what story should I write next" --max-cost 2
+
+# Crash recovery — list saved runs, resume the one you want.
+gbrain brainstorm --list-runs
+gbrain brainstorm --resume 1a2b3c4d5e6f7890
+
+# Bypass the 7-day staleness gate if you really mean it.
+gbrain brainstorm --resume 1a2b3c4d5e6f7890 --force-resume
+
+# Same cap, different command — doctor's autonomous remediation now resumes too.
+gbrain doctor --remediate --max-cost 5
+# (on BudgetExhausted, the run persists a checkpoint at
+#  ~/.gbrain/remediation/<plan_hash>.json and tells you the --resume command)
+gbrain doctor --remediate --resume
+```
+
+### What's safe to know about
+
+A4 amended is a semantic shift: `gbrain doctor --remediate --max-usd` used to be a pre-flight estimate check ("refuse if est > cap"); it's now ALSO a mid-run hard ceiling backed by BudgetTracker via the gateway's AsyncLocalStorage scope. If you cron-schedule `--remediate`, the worst case used to be "the run starts despite the under-estimate"; now the worst case is "the run aborts mid-step and writes a resumable checkpoint." The first failure-mode is gone; the second is recoverable via `--resume`. `--max-cost` is a new alias for `--max-usd` for symmetry with brainstorm.
+
+The brainstorm checkpoint identity intentionally uses NO embedding bits: `run_id = sha256(question + profile + sort(close_slugs) + sort(far_slugs)).slice(0,16)`. Swap your embedding model between runs and the resume still finds the checkpoint. Conversely, change the question by even one word and you get a different run_id (the previous checkpoint is left alone; the cycle purge phase GCs anything older than 7 days).
+
+The dream cycle's `~/.gbrain/audit/dream-budget-YYYY-Www.jsonl` grew one new field on every line: `schema_version: 1`. Reorderings are tolerated (downstream consumers should index by field name, not position); renames or removals are breaking. The same schema-stable contract holds for the new `~/.gbrain/audit/budget-YYYY-Www.jsonl` produced by the unified `BudgetTracker`.
+
+If you wrote integration code against `BudgetExhausted` in the brainstorm orchestrator before this release: that class moved to `src/core/budget/budget-tracker.ts`. The orchestrator re-exports the old name for back-compat, so existing imports keep working.
+
+### Itemized changes
+
+- **`BudgetTracker` is the new canonical primitive** at `src/core/budget/budget-tracker.ts`. One class, one typed error (`BudgetExhausted` with `reason: 'cost' | 'runtime' | 'no_pricing'`), one schema-stable audit JSONL. Pinned by 18 unit cases covering TX1 (record throws when cumulative exceeds cap), TX2 (no_pricing hard-fails when cap is set + pricing missing), A3 amended (pessimistic fallback when `err.usage` is absent), the onExhausted-fires-once-before-throw contract, and the schema-stable audit schema.
+- **`withBudgetTracker(tracker, fn)` at the gateway layer (TX5)** installs the tracker on a module-internal `AsyncLocalStorage<BudgetTracker>`. Every `gateway.chat / embed / rerank` call inside the scope auto-composes. Outside-scope calls are budget no-ops (existing behavior preserved). Nested scopes restore the outer on exit. Parallel `Promise.all` scopes do not bleed trackers across each other.
+- **Subagent rate-lease ordering pinned (A1)**: the gateway's `reserve()` runs BEFORE `acquireLease()` in `src/core/minions/handlers/subagent.ts`. A budget throw must NOT consume a rate-lease slot. The handler body itself no longer needs explicit budget threading; the AsyncLocalStorage composition handles it.
+- **`payload-fitter.ts` (P6)** lands at `src/core/diarize/payload-fitter.ts` with two strategies. `'batch'` is deterministic token-budgeted chunking, no LLM calls. `'summarize'` embed-clusters then Haiku-summarizes each cluster in parallel via `Promise.allSettled` at parallelism=4. The quality gate flags `degraded: true` when success ratio drops below the configured `min_success_ratio` (default 0.75) — caller decides whether to surface or abort.
+- **Brainstorm checkpoint (P7)** at `src/core/brainstorm/checkpoint.ts`. Atomic .tmp+rename writes. Full idea bodies persisted (TX3). One-flag resume (TX4). 7-day mtime-based GC wired into the cycle purge phase.
+- **`doctor --remediate --resume`** loads `~/.gbrain/remediation/<plan_hash>.json` and continues from the next un-completed step. Refuses on mismatched plan_hash with a paste-ready message.
+- **`gbrain brainstorm --list-runs`** prints saved run_ids + iso dates + question stems so the user can pick which to resume.
+- **ISO-week audit filenames consolidated** into `src/core/audit-week-file.ts`. Four call sites migrated (shell-jobs, phantoms, slug-fallback, dream-budget). Year-boundary cases (2020-W53, 2024-12-30 belongs to 2025-W01) pinned by tests.
+- **eval-contradictions** routes through `withBudgetTracker` for telemetry without changing the CLI surface. `--budget-usd` semantics + `PreFlightBudgetError` shape are byte-identical.
+
+### For contributors
+
+- `bun test` adds 73 new tests across 9 new files (`test/core/budget/`, `test/core/audit-week-file.test.ts`, `test/core/diarize/`, `test/brainstorm/checkpoint.test.ts`, `test/e2e/brainstorm-resume.test.ts`, `test/core/remediation-checkpoint.test.ts`). Plus F1 closes the pre-existing PGLite `page_links` schema gap (the brainstorm domain-bank queries `page_links` but the embedded schema only defined `links`). Brainstorm now works against PGLite brains in production via the new `page_links` view alias shipped in both the embedded schema bundle and migration v86 (renumbered from v81 during merge with master's v0.38 cathedrals which claimed v81-v85). F2 adds an E2E pinning the user-facing `--max-cost` pre-flight refusal path. F3 adds `--max-cost` to `gbrain reindex --code`. All previous brainstorm + doctor + eval-contradictions tests still pass.
+
+## To take advantage of v0.39.0.0
+
+`gbrain upgrade` should do this automatically. If it didn't, or if `gbrain doctor`
+warns about a partial migration:
+
+1. **Run the orchestrator manually:**
+   ```bash
+   gbrain apply-migrations --yes
+   ```
+   This applies migration v86 (`page_links_view_alias`) on PGLite + Postgres brains. The alias is required for `gbrain brainstorm` and `gbrain lsd` to work against the domain-bank tiebreaker; without it, the brainstorm domain-bank queries fail with `relation "page_links" does not exist`.
+2. **Set a cost cap on the commands you care about:**
+   ```bash
+   # Sets a per-run dollar ceiling. Throws BudgetExhausted before any LLM call
+   # if the pre-run estimate exceeds the cap, AND mid-run if cumulative spend
+   # blows past it.
+   gbrain brainstorm "test" --max-cost 1
+   gbrain doctor --remediate --max-cost 5
+   gbrain reindex --code --max-cost 10
+   ```
+3. **Verify the outcome:**
+   ```bash
+   gbrain doctor             # schema_version should be 86
+   gbrain brainstorm --list-runs   # confirms the new checkpoint directory exists
+   ```
+4. **If any step fails or the numbers look wrong,** please file an issue:
+   https://github.com/garrytan/gbrain/issues with:
+   - output of `gbrain doctor`
+   - contents of `~/.gbrain/upgrade-errors.jsonl` if it exists
+   - which step broke
+
+   This feedback loop is how the gbrain maintainers find fragile upgrade paths. Thank you.
 ## [0.38.2.0] - 2026-05-22
 
 **`gbrain doctor` no longer hangs on big brains, and gives you real signal when it has to give up.**
@@ -683,29 +770,6 @@ Credited contributors per the CHANGELOG attribution convention; closing comments
    ```bash
    gbrain apply-migrations --yes
    ```
-2. **Try the capture verb:**
-   ```bash
-   gbrain capture "first thought into v0.38"
-   gbrain query "first thought"
-   ```
-   The receipt block should show the slug + file path; the query should
-   return the page within a second.
-3. **For webhook ingestion** (only if you run `gbrain serve --http`):
-   ```bash
-   curl -X POST https://your-brain/ingest \
-     -H "Authorization: Bearer $TOKEN" \
-     -H "Content-Type: text/markdown" \
-     -d "# webhook test"
-   ```
-   You should see HTTP 202 + a `job_id`. Run `gbrain query "webhook test"`
-   to confirm the page landed.
-4. **If any step fails or the numbers look wrong,** please file an issue:
-   https://github.com/garrytan/gbrain/issues with:
-   - output of `gbrain doctor`
-   - contents of `~/.gbrain/upgrade-errors.jsonl` if it exists
-   - which step broke
-
-   This feedback loop is how the gbrain maintainers find fragile upgrade paths. Thank you.
 2. **Verify the source-routing fix on your federated brains:**
    ```bash
    gbrain sources current
diff --git a/CLAUDE.md b/CLAUDE.md
index 022213d63..96517ad5e 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -107,6 +107,12 @@ strict behavior when unset.
 - `src/core/ai/recipes/voyage.ts` — Voyage AI openai-compatible recipe. **v0.28.7 (#680):** declares `chars_per_token=1` + `safety_factor=0.5` so the gateway pre-splits Voyage batches at a 60K-character budget (50% of 120K-token cap with the dense-tokenizer ratio). Closes the v0.27 backfill loop where ~26% of the corpus stayed un-embedded because tiktoken-grounded budgeting silently undercounted Voyage's actual token usage. **v0.28.11 (#719):** declares `multimodal_models: ['voyage-multimodal-3']` so the gateway rejects text-only Voyage models pointed at the multimodal endpoint with a clear `AIConfigError` instead of waiting for Voyage's HTTP 400. **v0.33.1.1 (#962, fixup):** recipe docstring at `:7-16` tightened to name the seven hosted flexible-dim models that accept `output_dimension` explicitly (`voyage-4-large`, `voyage-4`, `voyage-4-lite`, `voyage-3-large`, `voyage-3.5`, `voyage-3.5-lite`, `voyage-code-3`) and call out that `voyage-4-nano` is the open-weight variant listed separately by Voyage as fixed 1024-dim — does NOT accept the parameter. The "all v4 variants are flexible" misread is what caused the original PR to include nano in `VOYAGE_OUTPUT_DIMENSION_MODELS`; the negative regression assertion in `test/ai/gateway.test.ts` (`dimsProviderOptions` returns `undefined` for `voyage-4-nano`) pins the contract. **v0.37.3.0:** `voyage-code-3` is the recommended embedding model for gstack per-worktree code brains (Topology 3 in `docs/architecture/topologies.md`). Registration was already in the `models` list since pre-v0.33; the v0.37.3.0 wave adds discoverability surfaces — decision-tree branch in `docs/integrations/embedding-providers.md`, Topology 3 "Recommended embedding model" subsection, runtime nudge from `gbrain reindex --code` against non-code-tuned models. Recipe-shape regression pinned by `test/ai/voyage-code-3-recipe.test.ts`.
 - `src/core/ai/recipes/anthropic.ts` — Anthropic recipe (chat + expansion touchpoints). **v0.31.12:** chat and expansion `models:` lists drop the v0.31.6 phantom `claude-sonnet-4-6-20250929` date suffix — canonical id is `claude-sonnet-4-6`. The wrong-direction alias `claude-sonnet-4-6 → claude-sonnet-4-6-20250929` is removed; a reverse alias `claude-sonnet-4-6-20250929 → claude-sonnet-4-6` keeps stale user configs working (rescues `facts.extraction_model` and `models.dream.synthesize` set by v0.31.6 installs). Recipe-shape regression pinned by `test/anthropic-model-ids.test.ts` (6 cases, verbatim cherry-pick of PR #830 plus the reverse-alias rescue case).
 - `src/core/anthropic-pricing.ts` — Single source of truth for Anthropic model pricing (per-MTok input/output). **v0.31.12:** Opus 4.7 corrected from `$15/$75` to `$5/$25` (the old number was from Opus 4 generation, never refreshed when 4.7 shipped); Opus 4.6 also corrected. Consumed by `src/core/budget-meter.ts` and `src/core/cross-modal-eval/runner.ts` — the cross-modal estimator now reads `ANTHROPIC_PRICING` for Anthropic models instead of duplicating the table, killing the v0.31.6 drift bug class.
+- `src/core/budget/budget-tracker.ts` (v0.37.x) — keystone primitive for the brainstorm cost-cathedral wave. One typed error (`BudgetExhausted` with `reason: 'cost' | 'runtime' | 'no_pricing'`), one schema-stable audit JSONL at `~/.gbrain/audit/budget-YYYY-Www.jsonl`. Contracts pinned by 18 unit cases: **TX1** — `record()` throws when cumulative spend exceeds cap (the cap is a real ceiling, not a suggestion); **TX2** — `reserve()` hard-fails with `reason: 'no_pricing'` when `maxCostUsd` is set AND the model is missing from pricing maps (warn-once preserved when cap is unset); **A3 amended** — `extractUsageFromError(err, fallback)` returns `err.usage` when SDK provides it, else the pessimistic fallback (caller passes `maxOutputTokens`, not the optimistic pre-call estimate). `onExhausted(cb)` callback fires once synchronously BEFORE the throw propagates so callers can persist checkpoints. Replaces three parallel copies (inline brainstorm class, cycle/budget-meter, eval-contradictions). Adapts the old `BudgetMeter` via T5 (public shape preserved + `schema_version: 1` stamped on every dream-budget audit line).
+- `src/core/audit-week-file.ts` (v0.37.x, Q1) — single source of truth for ISO-week audit JSONL filename math. Exports `isoWeek(d)`, `isoWeekFilename(prefix, now?)`, `resolveAuditDir()` (honors `GBRAIN_AUDIT_DIR`). Year-boundary correctness pinned by tests at 2020-W53 (the 53-week year), 2025-W01 rolling in from 2024-12-30 (Monday), 2026-W01. Four call sites migrated in T4: `src/core/minions/handlers/shell-audit.ts`, `src/core/facts/phantom-audit.ts`, `src/core/audit-slug-fallback.ts`, `src/core/cycle/budget-meter.ts`. Each call site keeps its `compute<X>AuditFilename` thin wrapper for back-compat with existing tests.
+- `src/core/ai/gateway.ts:withBudgetTracker` (v0.37.x, T3 / TX5) — gateway-layer enforcement via `AsyncLocalStorage<BudgetTracker>`. `withBudgetTracker(tracker, fn)` installs the tracker on the module-internal store; every `gateway.chat / embed / rerank` call inside the scope auto-composes (reserve before, record in try/finally). Outside-scope calls are budget no-ops (current behavior preserved). Nested scopes restore the outer tracker on exit. `getCurrentBudgetTracker()` is the test seam. The chat path uses A3-amended pessimistic fallback on error paths; the embed path estimates input tokens from char count × recipe's `chars_per_token` because the AI SDK doesn't surface per-batch embed token usage; the rerank path estimates char count of query+docs. 6 unit cases pin the contract.
+- `src/core/diarize/payload-fitter.ts` (v0.37.x, P6 / Q3) — generic fit-arbitrarily-large-items-into-per-call-token-budget utility. `'batch'` strategy is deterministic token-budgeted chunking with no LLM calls. `'summarize'` strategy embed-clusters into ceil(items/4) groups via cheap deterministic nearest-neighbor on cosine, Haiku-summarizes each cluster via `Promise.allSettled` at parallelism=4 (Perf1). Each Haiku call composes the active BudgetTracker via T3's AsyncLocalStorage. The quality gate (codex outside-voice finding #4): when `success_ratio < min_success_ratio` (default 0.75), result is flagged `degraded: true` — the fitter preserves the successful subset; the caller decides whether to surface a partial result or abort.
+- `src/core/brainstorm/checkpoint.ts` (v0.37.x, P7 / TX3+TX4+A5 amended) — crash-resilient checkpoint for `gbrain brainstorm` and `gbrain lsd`. Persists FULL idea bodies (~50KB per run) so resume can MERGE the pre-crash ideas with the post-resume ideas before the judge runs (codex's load-bearing finding — a resume that produces only second-run output is silent partial output). `run_id = sha256(question + profile + sort(close_slugs) + sort(far_slugs)).slice(0,16)` — NO embedding bits, stable across embedding-model swaps. Atomic write via `.tmp + rename`. ONE resume flag (`--resume <run_id>` — the proposed `--retry-failed` was dropped per TX4: failed AND never-attempted crosses both go through `--resume`). `--list-runs` prints saved run_ids mtime-newest-first. `--force-resume` bypasses the 7-day staleness gate. The cycle purge phase (`gbrain dream --phase purge`) GCs checkpoints older than 7 days via `gcStaleCheckpoints(7)`. Pinned by 20 unit cases + 3 E2E cases in `test/e2e/brainstorm-resume.test.ts` including the load-bearing merge contract.
+- `src/core/remediation-checkpoint.ts` (v0.37.x, T7 / A4 amended) — `doctor --remediate` checkpoint at `~/.gbrain/remediation/<plan_hash>.json`. `plan_hash = sha256(JSON.stringify(sorted recommendation ids)).slice(0,16)`. Schema-versioned. Atomic write via `.tmp + rename`. `gbrain doctor --remediate --resume <plan_hash>` (or with no arg — picks the newest matching checkpoint) loads it and skips already-completed steps. Mismatched plan_hash refuses with a paste-ready message. Cleared on clean completion. Pinned by 13 unit cases.
 - `src/core/model-config.ts` — Model-string resolution (the seam every internal LLM call walks through). **v0.31.12:** four-tier system (`ModelTier = 'utility' | 'reasoning' | 'deep' | 'subagent'`) with `TIER_DEFAULTS` (utility→haiku-4-5, reasoning→sonnet-4-6, deep→opus-4-7, subagent→sonnet-4-6) and `tier?: ModelTier` on `ResolveModelOpts`. Resolution chain is now 8 steps: cliFlag → deprecated key → config key → `models.default` → `models.tier.<tier>` → env var → `TIER_DEFAULTS[tier]` → caller fallback. Two new exports — `isAnthropicProvider(modelString)` checks `provider:model` prefix OR `claude-` bare-id pattern, and `enforceSubagentAnthropic()` is the layer-2 runtime guard: when `tier === 'subagent'` resolves to a non-Anthropic provider, it emits a once-per-`(source, model)` stderr warn AND falls back to `TIER_DEFAULTS.subagent` instead of letting the Anthropic Messages API tool-loop attempt to run on OpenAI/Gemini. `_resetDeprecationWarningsForTest()` now also clears `_subagentTierWarningsEmitted` so tests re-emit.
 - `src/core/ai/model-resolver.ts` — Recipe-touchpoint validator. **v0.31.12:** `assertTouchpoint(recipe, touchpoint, modelId, extendedModels?)` gains an optional 4th `extendedModels: ReadonlySet<string>` argument. When the modelId is in that set, the native-recipe allowlist throw is bypassed — the user explicitly opted into this model via config so we let provider rejection surface as `model_not_found` at HTTP call time (and `gbrain models doctor` catches it earlier). Default code paths with hardcoded model strings MUST NOT pass `extendedModels` — typos in source code still fail fast. Replaces the earlier plan to soften the validator wholesale (Codex F4/F5 in plan review flagged that as too broad — it would have removed the fail-fast contract for chat + expand + embed all three).
 - `src/core/ai/gateway.ts` extension (v0.31.12) — new module-scoped `_extendedModels: Map<providerId, Set<modelId>>` registry feeds `assertTouchpoint`'s 4th-arg path. New `reconfigureGatewayWithEngine(engine)` async function is called from `cli.ts` after `engine.connect()` (and before every command except `CLI_ONLY` no-DB commands) — re-resolves expansion + chat defaults through `resolveModel()` so `models.tier.*` and `models.default` overrides apply to expansion + chat both. `DEFAULT_CHAT_MODEL` corrected to `anthropic:claude-sonnet-4-6` (was the v0.31.6 phantom `-20250929`). New `__setChatTransportForTests` seam mirrors `__setEmbedTransportForTests` so tests drive `chat()` with a stubbed transport.
diff --git a/TODOS.md b/TODOS.md
index 7d693799b..f5d84718e 100644
--- a/TODOS.md
+++ b/TODOS.md
@@ -1,6 +1,30 @@
 # TODOS
 
 
+## v0.37.x brainstorm cost-cathedral follow-ups (filed during T12)
+
+- [ ] **Explicit `--max-cost` flag on `gbrain extract`, `gbrain enrich`, `gbrain integrity auto`.** v0.37.x ships gateway-layer enforcement via `withBudgetTracker` — wrapping any of those commands at their entrypoint with `withBudgetTracker(tracker, fn)` immediately gives them the same cap semantics that brainstorm + doctor --remediate have. The CLI flag wiring (parse `--max-cost`, construct `BudgetTracker` with `maxCostUsd`, wrap the entrypoint) is the only missing piece. ~30 lines each plus smoke tests. Deferred per the plan's "NOT in scope" — gateway-layer composition was the structural goal; the per-command flag wiring is the next ergonomic win.
+
+- [ ] **`P5` config-schema `budgets:` block in `~/.gbrain/config.json`.** The lsd cost-explosion incident's P5 proposed declarative per-command budgets in config. v0.37.x ships the imperative `--max-cost N` surface, which covers the canonical case. Config-driven defaults (so users don't have to remember to pass `--max-cost` every time) are a v0.38+ ergonomic win. Shape:
+  ```yaml
+  budgets:
+    default:
+      max_cost_usd: 5.00
+      max_runtime_seconds: 300
+    brainstorm: { max_cost_usd: 2.00 }
+    lsd: { max_cost_usd: 5.00 }
+    dream: { max_cost_usd: 10.00 }
+  ```
+  Resolution: CLI flag > config block > built-in default.
+
+- [ ] **Multi-day brainstorm resume (>7d).** A5's 7-day mtime window covers >99% of crash-and-resume cases (an operator forgets for a week is rare). `--force-resume` is the escape hatch. The full multi-day story (longer retention, possibly a daily GC instead of cycle-purge-only, dashboard for in-flight runs) is a v0.38+ concern.
+
+- [ ] **Async-batched audit writes.** Sync `appendFileSync` is fine at typical volumes (~5ms × 100 crosses = ~500ms — not noticeable inside a $1 brainstorm run). Profiling trigger criterion: when 100+ crosses on a large brain shows audit-write time dominating wall-clock cost, switch to an async write queue. Fixing prematurely costs complexity for no measurable benefit.
+
+- [ ] **`BudgetLedger` unification with `BudgetTracker`.** `src/core/enrichment/budget.ts` defines a separate `BudgetLedger` primitive for per-day, per-scope/resolverId enrichment caps. Different shape from `BudgetTracker` (daily reset windows + multi-tier scope keys). Unification is possible but requires careful schema design to preserve enrichment's existing report semantics. Deferred because: (a) BudgetTracker covers the per-command case cleanly today, (b) the existing BudgetLedger isn't a customer-facing surface — it backs `gbrain enrich`'s internal accounting, (c) merging them would require a schema migration on the enrichment budget audit JSONL. Revisit when the enrichment surface gets its next major touch.
+
+- [ ] **judges.ts internal chunking → payload-fitter delegation.** v0.37.x ships `src/core/diarize/payload-fitter.ts` with the batch strategy ready to consume from `src/core/brainstorm/judges.ts`'s `runJudge` chunking path. Today judges.ts keeps its own copy of the chunking loop (~30 lines) — straightforward refactor: replace the inline split with `fit({strategy:'batch', items: ideas, maxTokensPerCall, estimateTokens})` and concatenate results. The cost-guardrails test suite already pins the public contract; the refactor is mechanical. Touch one function; trivial.
+
 ## v0.37 PGLite fresh-install fix wave — deferred follow-ups (v0.37.x+ / v0.38.x)
 
 - [ ] **`gbrain embed --try-fallback` for provider quota/auth failures.** The v0.37 wave deliberately rejected auto-fallback because silently switching providers writes mixed-space vectors into one `content_chunks.embedding` column, corrupting retrieval. The right design: explicit `--try-fallback` flag that (a) detects the primary failure type (429 / 401 / 5xx), (b) confirms the fallback provider's `embedding_dimensions` matches the schema, (c) prompts the user via TTY before switching mid-corpus, (d) writes a marker chunk attribute so doctor can flag mixed-provider corpora later. Doctor currently surfaces "Detected 1 alternative embedding provider ready to use" but the embed command never acts. Owner: open. Sources: user bug report item #5; v0.37 wave plan deferred list.
diff --git a/VERSION b/VERSION
index dcc5e4ab3..fc3cdf0a6 100644
--- a/VERSION
+++ b/VERSION
@@ -1 +1 @@
-0.38.2.0
+0.39.0.0
\ No newline at end of file
diff --git a/docs/incidents/2026-05-20-lsd-cost-explosion.md b/docs/incidents/2026-05-20-lsd-cost-explosion.md
new file mode 100644
index 000000000..96508c948
--- /dev/null
+++ b/docs/incidents/2026-05-20-lsd-cost-explosion.md
@@ -0,0 +1,265 @@
+# Incident Report: LSD Brainstorm 53× Cost Overrun
+
+**Date:** 2026-05-20
+**Severity:** High (financial — $50.71 actual vs $0.96 estimated)
+**Component:** `gbrain lsd` / `gbrain brainstorm`
+**Brain size:** 13,690 pages, 16,314 links, ~2,000 unique directory prefixes
+**Version:** v0.37.1.0 (first release of brainstorm/lsd)
+
+## What Happened
+
+A user ran `gbrain lsd "what story should Garry's List write next" --yes` on a 13,690-page brain. The command:
+
+1. **Estimated cost: $0.96** (2×12 = 24 crosses × 4 ideas + judge)
+2. **Actual cost: $50.71** — 53× over estimate
+3. **Token usage:** 4,906,011 input + 2,399,239 output = 7.3M total tokens
+4. **Far set pulled 1,985 pages** instead of the configured 12
+5. **Generated 15,868 raw ideas** across the crosses (vs expected ~96)
+6. **Judge phase failed:** 2,989,338 tokens exceeded Claude Sonnet's 1M context limit
+7. **Zero ideas surfaced to the user** — complete failure
+
+A retry with `--limit 12` explicit:
+- Far set correctly returned 12 pages, cost was $0.39
+- But judge still failed: `parseJudgeJSON: no strategy produced valid JSON`
+- Again, 0 ideas survived to output (96 generated, 0 scored)
+
+## Root Causes
+
+### RC1: Far Set Explosion (caused the $50 bill)
+
+**File:** `src/core/brainstorm/domain-bank.ts` → `fetchFar()` → `listPrefixSampledPages()`
+
+The domain bank samples pages by directory prefix to get diversity. `listPrefixSampledPages` returns **one page per prefix passed in**. On a 13K-page brain with ~2,000 unique prefixes (books/, civic/bundles/, civic/gl-article-*, people/, concepts/, etc.), passing all prefixes produces ~2,000 rows — not the configured `m=12`.
+
+The cost estimator uses `m` (12) to predict crosses and cost. But the actual cross phase receives 1,985 far-set pages, producing `2 × 1985 = 3,970` crosses at 4 ideas each = 15,868 ideas.
+
+**The estimate formula is correct for the intended behavior; the far set selection is what diverged.**
+
+### RC2: No Cost Circuit Breaker
+
+There is no mechanism to:
+- Abort if estimated cost exceeds a threshold
+- Abort mid-run if actual spend diverges from estimate
+- Cap the far set size regardless of prefix count
+- Warn the user that a run will be expensive before proceeding
+
+The `--yes` flag skips the 10-second cost preview wait, removing even the manual inspection opportunity.
+
+### RC3: Judge Context Overflow
+
+The judge receives ALL ideas in a single prompt. With 15,868 ideas at ~350 tokens each, that's ~5.5M tokens — well beyond any model's context window.
+
+Even on the retry with only 96 ideas, the judge failed with JSON parsing errors, suggesting the judge prompt/response format is fragile.
+
+### RC4: Unpaired UTF-16 Surrogates in Page Content
+
+Two crosses failed with: `The request body is not valid JSON: no low surrogate in string`
+
+Some pages (likely OCR imports or web scrapes) contain unpaired UTF-16 surrogates. When these get serialized into the JSON request body for the LLM API, the JSON encoder produces invalid JSON.
+
+### RC5: No Timeout on Individual Crosses
+
+One cross timed out with no specific timeout configured. The default HTTP timeout allowed it to hang for an extended period before failing, consuming tokens on the API side.
+
+## Observed Token Flow
+
+```
+Configured:  2 close × 12 far = 24 crosses × 4 ideas = 96 ideas + 1 judge call
+Actual:      2 close × 1985 far = 3970 crosses × 4 ideas = 15,868 ideas + 1 judge call (failed)
+
+Per-cross tokens (estimated): ~1,200 in + 600 out
+Actual total:                  4,906,011 in + 2,399,239 out
+
+The judge call alone would have been:
+  15,868 ideas × ~350 tokens = ~5.5M tokens (prompt)
+  Model limit:                  1M tokens (Sonnet)
+  Overflow:                     5.5× context limit
+```
+
+## Proposed Fixes
+
+### P1: Far Set Cap (Critical — prevents cost explosion)
+
+`fetchFar()` must cap the number of prefixes BEFORE calling `listPrefixSampledPages`. The cap should be `max(m * 4, 50)` to allow some diversity headroom while preventing runaway growth. Final selection trimmed to `m` by distance score.
+
+**Status:** Implemented in `dc080ac2`.
+
+### P2: Cost Guardrails (Critical — defense in depth)
+
+New flags for `brainstorm` and `lsd` commands:
+- `--max-cost <usd>` (default $5): hard-abort if pre-run estimate exceeds
+- `--strict-budget`: abort mid-run if running cost exceeds 5× estimate
+- `--max-far-set <n>` (default 50): explicit far set size cap
+
+**Status:** Implemented in `dc080ac2`.
+
+### P3: Judge Chunking (Critical — prevents context overflow)
+
+Split ideas into batches of ~100 before calling the judge LLM. Each batch is a separate API call; results concatenated. This bounds per-call token usage to ~35K regardless of total idea count.
+
+**Status:** Implemented in `dc080ac2`.
+
+### P4: Unicode Sanitization (Medium — prevents cross failures)
+
+Strip unpaired UTF-16 surrogates from page content before building cross prompts. This is a general problem for any gbrain function that serializes user-generated page content into JSON for API calls.
+
+**Status:** Implemented in `dc080ac2`.
+
+### P5: Global Token & Time Budgets for All Analysis Functions (Proposed)
+
+**This is the bigger architectural ask.** Every gbrain command that makes LLM calls should respect configurable budgets:
+
+```yaml
+# Proposed config additions to ~/.gbrain/config.json
+budgets:
+  # Global defaults
+  default:
+    max_input_tokens: 500_000    # per-command input token cap
+    max_output_tokens: 200_000   # per-command output token cap  
+    max_cost_usd: 5.00           # per-command dollar cap
+    max_runtime_seconds: 300     # 5-minute wall-clock cap
+    
+  # Per-command overrides
+  brainstorm:
+    max_cost_usd: 2.00
+    max_runtime_seconds: 120
+  lsd:
+    max_cost_usd: 5.00
+    max_runtime_seconds: 300
+  dream:
+    max_cost_usd: 10.00
+    max_runtime_seconds: 600
+  extract:
+    max_input_tokens: 1_000_000
+    max_runtime_seconds: 900
+  enrich:
+    max_cost_usd: 3.00
+    max_runtime_seconds: 180
+```
+
+**Commands affected:**
+- `brainstorm` / `lsd` — bisociation crosses + judge (this incident)
+- `dream` — dream cycle phases (enrichment, emotional weight, etc.)
+- `extract all` — link + timeline extraction across all pages
+- `enrich` — per-page deep enrichment with web research
+- `eval` — evaluation runs (suspected-contradictions, retrieval drift)
+- `integrity auto` — automated content repair
+- `doctor --remediate` — autonomous self-healing via Minions
+
+**Implementation approach:**
+1. Add a `BudgetTracker` class that wraps LLM calls with token/cost/time accounting
+2. Every analysis function receives a budget context
+3. On budget exhaustion: save partial results, emit a structured warning, exit cleanly
+4. CLI flags (`--max-cost`, `--max-tokens`, `--timeout`) override config defaults
+5. `--no-budget` escape hatch for power users who know what they're doing
+
+### P6: Diarization / Summarization for Oversized Payloads (Proposed)
+
+When a judge or analysis phase receives more content than fits in context:
+
+1. **Estimate tokens** before calling the LLM
+2. If over budget, **diarize**: summarize/compress the content to fit
+3. For the judge specifically: rank ideas by a cheap heuristic first (keyword overlap, novelty score), then send only top-N to the LLM judge
+4. For other analysis: progressive summarization — chunk → summarize → merge summaries → final analysis
+
+This is effectively a **token budget allocator** that decides how to spend a fixed token budget across variable-length inputs.
+
+```
+Example: 15,868 ideas need judging, context limit 900K tokens
+  Step 1: Cheap pre-filter (keyword dedup, obvious duplicates) → 8,000 unique ideas
+  Step 2: Batch into 80 chunks of 100 ideas each
+  Step 3: Judge each chunk → 80 calls × ~35K tokens = 2.8M total (spread across calls)
+  Step 4: Merge top ideas from each chunk → final ranking
+  Total cost: ~$2-3 instead of $50
+```
+
+### P7: Structured Error Recovery (Proposed)
+
+When a cross or judge call fails:
+- Save the partial results immediately (don't wait for the full run)
+- Emit a machine-readable error event (not just a log warning)
+- Support `--retry-failed` to re-run only the failed crosses without repeating successful ones
+- Checkpoint progress to disk so interrupted runs can resume
+
+## Impact
+
+- **Financial:** $50.71 wasted on a single failed run
+- **User trust:** Zero ideas delivered despite ~7M tokens processed
+- **Time:** ~15 minutes of compute time, plus overnight delay in reporting results
+
+## Lessons
+
+1. **First run of any new feature on a large brain should be dry-run or capped.** The estimate was based on small-brain testing; 13K pages is a different universe.
+2. **Cost estimators must account for actual data cardinality, not just configured parameters.** The estimate used `m=12` but the real far set was `|prefixes|`.
+3. **Every LLM-calling function needs a budget.** This isn't just a brainstorm problem — it's an architectural gap in any system that makes variable numbers of LLM calls based on data size.
+4. **JSON serialization of user content is a landmine.** Any page could contain invalid Unicode. Sanitize at the serialization boundary, not per-feature.
+
+## Shipped in v0.37.x (the budget cathedral wave)
+
+P1-P4 already shipped via PR #1234 (the first fix wave). P5-P7 plus a few
+architectural rounds shipped in the budget-cathedral wave that followed:
+
+- **P1 (far set cap):** `fetchFar()` in `src/core/brainstorm/domain-bank.ts`
+  caps prefix sampling to `max(m*4, 50)` and trims final pages to `m` by
+  distance. The 2K-prefix explosion class is closed.
+- **P2 (cost guardrails):** `--max-cost`, `--max-far-set`, `--strict-budget`,
+  `--judge-model`, `--max-ideas-per-judge-call` flags on brainstorm + lsd.
+  Pre-flight estimate refusal, mid-run cost-ceiling abort.
+- **P3 (judge chunking):** `runJudge` in `src/core/brainstorm/judges.ts`
+  auto-chunks at 100 ideas/call. Context-window overflow is structurally
+  prevented.
+- **P4 (unicode sanitization):** `sanitizeUnicode` in
+  `src/core/brainstorm/orchestrator.ts` strips unpaired surrogates before
+  serialization.
+- **P5 (BudgetTracker at the gateway layer):** new
+  `src/core/budget/budget-tracker.ts` is the canonical primitive. The
+  gateway's `withBudgetTracker(tracker, fn)` composes via
+  `AsyncLocalStorage<BudgetTracker>` so every gateway-routed LLM call
+  inside the scope auto-records. `BudgetExhausted` is a typed error with
+  `reason: 'cost' | 'runtime' | 'no_pricing'`. `record()` throws when
+  cumulative spend exceeds the cap (TX1). `reserve()` hard-fails on
+  `no_pricing` when the cap is set + model missing from pricing maps (TX2).
+- **P6 (payload-fitter):** `src/core/diarize/payload-fitter.ts` with
+  `'batch'` and `'summarize'` strategies. Summarize embed-clusters
+  (k=ceil(items/4)), Haiku-summarizes each cluster in parallel via
+  `Promise.allSettled` at parallelism=4. Surfaces `degraded: true` flag
+  when success ratio < 0.75 so callers decide whether to surface a partial
+  result or abort.
+- **P7 (brainstorm checkpoint + --resume):**
+  `src/core/brainstorm/checkpoint.ts` persists FULL idea bodies (not just
+  counts — TX3 load-bearing). One `--resume <run_id>` flag covers both
+  failed and never-attempted crosses (TX4). `run_id` formula uses NO
+  embedding bits so the identity is stable across embedding-model swaps
+  (A5 amended). 7-day mtime-based GC wired into the cycle purge phase.
+  `--list-runs` lists saved checkpoints. `--force-resume` bypasses the 7d
+  staleness gate.
+
+Also shipped alongside the wave (folded inline):
+
+- **doctor --remediate --resume:** A4 amended. The mid-run cap is now a
+  real ceiling; `--max-cost` is an alias for `--max-usd`. On
+  BudgetExhausted, the orchestrator persists a checkpoint at
+  `~/.gbrain/remediation/<plan_hash>.json` and tells the user the exact
+  `gbrain doctor --remediate --resume` command. The resumed run skips
+  already-completed steps.
+- **Audit-week-file consolidation (Q1):** four call sites
+  (shell-jobs / phantoms / slug-fallback / dream-budget) now share one
+  ISO-week filename helper. Year-boundary correctness pinned by tests.
+- **eval-contradictions tracker telemetry:** the existing CostTracker
+  stays for the report shape; the runner additionally installs a
+  withBudgetTracker scope for the gateway-layer telemetry path.
+
+What did NOT make this wave (filed in TODOS for a follow-up):
+
+- The schema fix for `page_links` on PGLite. The brainstorm domain-bank
+  queries reference `page_links` but the embedded schema only defines
+  `links`; the E2E works around this with a view in test setup, but
+  real PGLite users currently can't run `gbrain brainstorm`. Schema fix
+  needed.
+- `--max-cost` flag on `extract`, `enrich`, `integrity auto`. The
+  gateway-layer enforcement covers them when wrapped at the entrypoint,
+  but the CLI flag wiring is deferred.
+- Async-batched audit writes. Sync `appendFileSync` is fine at typical
+  volumes; revisit if profiling shows it dominates.
+- Multi-day brainstorm resume (>7d). The `--force-resume` flag is the
+  operator escape hatch for now.
diff --git a/llms-full.txt b/llms-full.txt
index 3118aa510..05fe1e786 100644
--- a/llms-full.txt
+++ b/llms-full.txt
@@ -243,6 +243,12 @@ strict behavior when unset.
 - `src/core/ai/recipes/voyage.ts` — Voyage AI openai-compatible recipe. **v0.28.7 (#680):** declares `chars_per_token=1` + `safety_factor=0.5` so the gateway pre-splits Voyage batches at a 60K-character budget (50% of 120K-token cap with the dense-tokenizer ratio). Closes the v0.27 backfill loop where ~26% of the corpus stayed un-embedded because tiktoken-grounded budgeting silently undercounted Voyage's actual token usage. **v0.28.11 (#719):** declares `multimodal_models: ['voyage-multimodal-3']` so the gateway rejects text-only Voyage models pointed at the multimodal endpoint with a clear `AIConfigError` instead of waiting for Voyage's HTTP 400. **v0.33.1.1 (#962, fixup):** recipe docstring at `:7-16` tightened to name the seven hosted flexible-dim models that accept `output_dimension` explicitly (`voyage-4-large`, `voyage-4`, `voyage-4-lite`, `voyage-3-large`, `voyage-3.5`, `voyage-3.5-lite`, `voyage-code-3`) and call out that `voyage-4-nano` is the open-weight variant listed separately by Voyage as fixed 1024-dim — does NOT accept the parameter. The "all v4 variants are flexible" misread is what caused the original PR to include nano in `VOYAGE_OUTPUT_DIMENSION_MODELS`; the negative regression assertion in `test/ai/gateway.test.ts` (`dimsProviderOptions` returns `undefined` for `voyage-4-nano`) pins the contract. **v0.37.3.0:** `voyage-code-3` is the recommended embedding model for gstack per-worktree code brains (Topology 3 in `docs/architecture/topologies.md`). Registration was already in the `models` list since pre-v0.33; the v0.37.3.0 wave adds discoverability surfaces — decision-tree branch in `docs/integrations/embedding-providers.md`, Topology 3 "Recommended embedding model" subsection, runtime nudge from `gbrain reindex --code` against non-code-tuned models. Recipe-shape regression pinned by `test/ai/voyage-code-3-recipe.test.ts`.
 - `src/core/ai/recipes/anthropic.ts` — Anthropic recipe (chat + expansion touchpoints). **v0.31.12:** chat and expansion `models:` lists drop the v0.31.6 phantom `claude-sonnet-4-6-20250929` date suffix — canonical id is `claude-sonnet-4-6`. The wrong-direction alias `claude-sonnet-4-6 → claude-sonnet-4-6-20250929` is removed; a reverse alias `claude-sonnet-4-6-20250929 → claude-sonnet-4-6` keeps stale user configs working (rescues `facts.extraction_model` and `models.dream.synthesize` set by v0.31.6 installs). Recipe-shape regression pinned by `test/anthropic-model-ids.test.ts` (6 cases, verbatim cherry-pick of PR #830 plus the reverse-alias rescue case).
 - `src/core/anthropic-pricing.ts` — Single source of truth for Anthropic model pricing (per-MTok input/output). **v0.31.12:** Opus 4.7 corrected from `$15/$75` to `$5/$25` (the old number was from Opus 4 generation, never refreshed when 4.7 shipped); Opus 4.6 also corrected. Consumed by `src/core/budget-meter.ts` and `src/core/cross-modal-eval/runner.ts` — the cross-modal estimator now reads `ANTHROPIC_PRICING` for Anthropic models instead of duplicating the table, killing the v0.31.6 drift bug class.
+- `src/core/budget/budget-tracker.ts` (v0.37.x) — keystone primitive for the brainstorm cost-cathedral wave. One typed error (`BudgetExhausted` with `reason: 'cost' | 'runtime' | 'no_pricing'`), one schema-stable audit JSONL at `~/.gbrain/audit/budget-YYYY-Www.jsonl`. Contracts pinned by 18 unit cases: **TX1** — `record()` throws when cumulative spend exceeds cap (the cap is a real ceiling, not a suggestion); **TX2** — `reserve()` hard-fails with `reason: 'no_pricing'` when `maxCostUsd` is set AND the model is missing from pricing maps (warn-once preserved when cap is unset); **A3 amended** — `extractUsageFromError(err, fallback)` returns `err.usage` when SDK provides it, else the pessimistic fallback (caller passes `maxOutputTokens`, not the optimistic pre-call estimate). `onExhausted(cb)` callback fires once synchronously BEFORE the throw propagates so callers can persist checkpoints. Replaces three parallel copies (inline brainstorm class, cycle/budget-meter, eval-contradictions). Adapts the old `BudgetMeter` via T5 (public shape preserved + `schema_version: 1` stamped on every dream-budget audit line).
+- `src/core/audit-week-file.ts` (v0.37.x, Q1) — single source of truth for ISO-week audit JSONL filename math. Exports `isoWeek(d)`, `isoWeekFilename(prefix, now?)`, `resolveAuditDir()` (honors `GBRAIN_AUDIT_DIR`). Year-boundary correctness pinned by tests at 2020-W53 (the 53-week year), 2025-W01 rolling in from 2024-12-30 (Monday), 2026-W01. Four call sites migrated in T4: `src/core/minions/handlers/shell-audit.ts`, `src/core/facts/phantom-audit.ts`, `src/core/audit-slug-fallback.ts`, `src/core/cycle/budget-meter.ts`. Each call site keeps its `compute<X>AuditFilename` thin wrapper for back-compat with existing tests.
+- `src/core/ai/gateway.ts:withBudgetTracker` (v0.37.x, T3 / TX5) — gateway-layer enforcement via `AsyncLocalStorage<BudgetTracker>`. `withBudgetTracker(tracker, fn)` installs the tracker on the module-internal store; every `gateway.chat / embed / rerank` call inside the scope auto-composes (reserve before, record in try/finally). Outside-scope calls are budget no-ops (current behavior preserved). Nested scopes restore the outer tracker on exit. `getCurrentBudgetTracker()` is the test seam. The chat path uses A3-amended pessimistic fallback on error paths; the embed path estimates input tokens from char count × recipe's `chars_per_token` because the AI SDK doesn't surface per-batch embed token usage; the rerank path estimates char count of query+docs. 6 unit cases pin the contract.
+- `src/core/diarize/payload-fitter.ts` (v0.37.x, P6 / Q3) — generic fit-arbitrarily-large-items-into-per-call-token-budget utility. `'batch'` strategy is deterministic token-budgeted chunking with no LLM calls. `'summarize'` strategy embed-clusters into ceil(items/4) groups via cheap deterministic nearest-neighbor on cosine, Haiku-summarizes each cluster via `Promise.allSettled` at parallelism=4 (Perf1). Each Haiku call composes the active BudgetTracker via T3's AsyncLocalStorage. The quality gate (codex outside-voice finding #4): when `success_ratio < min_success_ratio` (default 0.75), result is flagged `degraded: true` — the fitter preserves the successful subset; the caller decides whether to surface a partial result or abort.
+- `src/core/brainstorm/checkpoint.ts` (v0.37.x, P7 / TX3+TX4+A5 amended) — crash-resilient checkpoint for `gbrain brainstorm` and `gbrain lsd`. Persists FULL idea bodies (~50KB per run) so resume can MERGE the pre-crash ideas with the post-resume ideas before the judge runs (codex's load-bearing finding — a resume that produces only second-run output is silent partial output). `run_id = sha256(question + profile + sort(close_slugs) + sort(far_slugs)).slice(0,16)` — NO embedding bits, stable across embedding-model swaps. Atomic write via `.tmp + rename`. ONE resume flag (`--resume <run_id>` — the proposed `--retry-failed` was dropped per TX4: failed AND never-attempted crosses both go through `--resume`). `--list-runs` prints saved run_ids mtime-newest-first. `--force-resume` bypasses the 7-day staleness gate. The cycle purge phase (`gbrain dream --phase purge`) GCs checkpoints older than 7 days via `gcStaleCheckpoints(7)`. Pinned by 20 unit cases + 3 E2E cases in `test/e2e/brainstorm-resume.test.ts` including the load-bearing merge contract.
+- `src/core/remediation-checkpoint.ts` (v0.37.x, T7 / A4 amended) — `doctor --remediate` checkpoint at `~/.gbrain/remediation/<plan_hash>.json`. `plan_hash = sha256(JSON.stringify(sorted recommendation ids)).slice(0,16)`. Schema-versioned. Atomic write via `.tmp + rename`. `gbrain doctor --remediate --resume <plan_hash>` (or with no arg — picks the newest matching checkpoint) loads it and skips already-completed steps. Mismatched plan_hash refuses with a paste-ready message. Cleared on clean completion. Pinned by 13 unit cases.
 - `src/core/model-config.ts` — Model-string resolution (the seam every internal LLM call walks through). **v0.31.12:** four-tier system (`ModelTier = 'utility' | 'reasoning' | 'deep' | 'subagent'`) with `TIER_DEFAULTS` (utility→haiku-4-5, reasoning→sonnet-4-6, deep→opus-4-7, subagent→sonnet-4-6) and `tier?: ModelTier` on `ResolveModelOpts`. Resolution chain is now 8 steps: cliFlag → deprecated key → config key → `models.default` → `models.tier.<tier>` → env var → `TIER_DEFAULTS[tier]` → caller fallback. Two new exports — `isAnthropicProvider(modelString)` checks `provider:model` prefix OR `claude-` bare-id pattern, and `enforceSubagentAnthropic()` is the layer-2 runtime guard: when `tier === 'subagent'` resolves to a non-Anthropic provider, it emits a once-per-`(source, model)` stderr warn AND falls back to `TIER_DEFAULTS.subagent` instead of letting the Anthropic Messages API tool-loop attempt to run on OpenAI/Gemini. `_resetDeprecationWarningsForTest()` now also clears `_subagentTierWarningsEmitted` so tests re-emit.
 - `src/core/ai/model-resolver.ts` — Recipe-touchpoint validator. **v0.31.12:** `assertTouchpoint(recipe, touchpoint, modelId, extendedModels?)` gains an optional 4th `extendedModels: ReadonlySet<string>` argument. When the modelId is in that set, the native-recipe allowlist throw is bypassed — the user explicitly opted into this model via config so we let provider rejection surface as `model_not_found` at HTTP call time (and `gbrain models doctor` catches it earlier). Default code paths with hardcoded model strings MUST NOT pass `extendedModels` — typos in source code still fail fast. Replaces the earlier plan to soften the validator wholesale (Codex F4/F5 in plan review flagged that as too broad — it would have removed the fail-fast contract for chat + expand + embed all three).
 - `src/core/ai/gateway.ts` extension (v0.31.12) — new module-scoped `_extendedModels: Map<providerId, Set<modelId>>` registry feeds `assertTouchpoint`'s 4th-arg path. New `reconfigureGatewayWithEngine(engine)` async function is called from `cli.ts` after `engine.connect()` (and before every command except `CLI_ONLY` no-DB commands) — re-resolves expansion + chat defaults through `resolveModel()` so `models.tier.*` and `models.default` overrides apply to expansion + chat both. `DEFAULT_CHAT_MODEL` corrected to `anthropic:claude-sonnet-4-6` (was the v0.31.6 phantom `-20250929`). New `__setChatTransportForTests` seam mirrors `__setEmbedTransportForTests` so tests drive `chat()` with a stubbed transport.
diff --git a/package.json b/package.json
index ad92ec36a..a8c18e59c 100644
--- a/package.json
+++ b/package.json
@@ -1,6 +1,6 @@
 {
   "name": "gbrain",
-  "version": "0.38.2.0",
+  "version": "0.39.0.0",
   "description": "Postgres-native personal knowledge brain with hybrid RAG search",
   "type": "module",
   "main": "src/core/index.ts",
diff --git a/src/commands/brainstorm.ts b/src/commands/brainstorm.ts
index 2d66b02be..b6fc738a0 100644
--- a/src/commands/brainstorm.ts
+++ b/src/commands/brainstorm.ts
@@ -29,6 +29,22 @@ export interface BrainstormCliArgs {
   save?: boolean;
   yes: boolean;
   limit?: number;
+  /** Cost ceiling in USD; aborts pre-run if estimate exceeds. Default $5. */
+  maxCost?: number;
+  /** Hard cap on far-set prefix sampling. Default 50. */
+  maxFarSet?: number;
+  /** When true, abort mid-run if running spend exceeds 5× estimate. */
+  strictBudget?: boolean;
+  /** Override the model used for the judge phase. */
+  judgeModel?: string;
+  /** Max ideas per judge LLM call. Default 100. */
+  maxIdeasPerJudgeCall?: number;
+  /** TX4: resume a crashed run by run_id. */
+  resume?: string;
+  /** Bypass the 7-day staleness gate on resume. */
+  forceResume?: boolean;
+  /** When true, print the list of saved runs + exit. */
+  listRuns?: boolean;
   help: boolean;
   error?: string;
 }
@@ -57,6 +73,50 @@ export function parseBrainstormArgs(args: string[]): BrainstormCliArgs {
         return out;
       }
       out.limit = n;
+    } else if (arg === '--max-cost') {
+      const v = args[++i];
+      const n = v ? parseFloat(v) : NaN;
+      if (!Number.isFinite(n) || n <= 0) {
+        out.error = `--max-cost requires a positive number in USD (got ${v})`;
+        return out;
+      }
+      out.maxCost = n;
+    } else if (arg === '--max-far-set') {
+      const v = args[++i];
+      const n = v ? parseInt(v, 10) : NaN;
+      if (!Number.isFinite(n) || n <= 0) {
+        out.error = `--max-far-set requires a positive integer (got ${v})`;
+        return out;
+      }
+      out.maxFarSet = n;
+    } else if (arg === '--strict-budget') {
+      out.strictBudget = true;
+    } else if (arg === '--judge-model') {
+      const v = args[++i];
+      if (!v) {
+        out.error = `--judge-model requires a model id (e.g. anthropic:claude-sonnet-4-6)`;
+        return out;
+      }
+      out.judgeModel = v;
+    } else if (arg === '--max-ideas-per-judge-call') {
+      const v = args[++i];
+      const n = v ? parseInt(v, 10) : NaN;
+      if (!Number.isFinite(n) || n <= 0) {
+        out.error = `--max-ideas-per-judge-call requires a positive integer (got ${v})`;
+        return out;
+      }
+      out.maxIdeasPerJudgeCall = n;
+    } else if (arg === '--resume') {
+      const v = args[++i];
+      if (!v || v.startsWith('--')) {
+        out.error = `--resume requires a run_id (use --list-runs to see saved runs)`;
+        return out;
+      }
+      out.resume = v;
+    } else if (arg === '--force-resume') {
+      out.forceResume = true;
+    } else if (arg === '--list-runs') {
+      out.listRuns = true;
     } else if (arg.startsWith('--')) {
       out.error = `unknown flag: ${arg}`;
       return out;
@@ -79,12 +139,20 @@ them, judges with a 5-axis rubric. Output cites close + far slugs with a
 0-1 distance score so you can see how far each collision actually traveled.
 
 Options:
-  --json              Emit BrainstormResult as JSON (for agents)
-  --save              Save to wiki/ideas/<date>-brainstorm-<slug>.md (default ON)
-  --no-save           Don't save; print only
-  --yes, -y           Skip the 10s cost-preview wait (TTY only)
-  --limit N           Override the far-bank size (default 6 brainstorm / 12 LSD)
-  --help, -h          Show this help
+  --json                          Emit BrainstormResult as JSON (for agents)
+  --save                          Save to wiki/ideas/<date>-brainstorm-<slug>.md (default ON)
+  --no-save                       Don't save; print only
+  --yes, -y                       Skip the 10s cost-preview wait (TTY only)
+  --limit N                       Override the far-bank size (default 6 brainstorm / 12 LSD)
+  --max-cost USD                  Abort if estimated cost exceeds USD (default 5)
+  --max-far-set N                 Cap domain bank prefix sampling (default 50)
+  --strict-budget                 Abort if running cost exceeds 5× the estimate
+  --judge-model MODEL             Override the judge LLM (larger-context for big runs)
+  --max-ideas-per-judge-call N    Max ideas per judge LLM call (default 100)
+  --resume RUN_ID                 Resume a previously-crashed run (uses --list-runs ids)
+  --force-resume                  Bypass the 7-day staleness gate on --resume
+  --list-runs                     Print saved run_ids and exit
+  --help, -h                      Show this help
 
 Examples:
   gbrain brainstorm "why are AI coding tools converging on the same UX?"
@@ -107,11 +175,19 @@ have thought of this without LSD"), every idea must invert at least one
 implicit axiom. Output is ephemeral by default — pass --save if an idea lands.
 
 Options:
-  --json              Emit BrainstormResult as JSON
-  --save              Persist to wiki/ideas/<date>-lsd-<slug>.md (default OFF)
-  --yes, -y           Skip the 10s cost-preview wait (TTY only)
-  --limit N           Override the far-bank size (default 12)
-  --help, -h          Show this help
+  --json                          Emit BrainstormResult as JSON
+  --save                          Persist to wiki/ideas/<date>-lsd-<slug>.md (default OFF)
+  --yes, -y                       Skip the 10s cost-preview wait (TTY only)
+  --limit N                       Override the far-bank size (default 12)
+  --max-cost USD                  Abort if estimated cost exceeds USD (default 5)
+  --max-far-set N                 Cap domain bank prefix sampling (default 50)
+  --strict-budget                 Abort if running cost exceeds 5× the estimate
+  --judge-model MODEL             Override the judge LLM (larger-context for big runs)
+  --max-ideas-per-judge-call N    Max ideas per judge LLM call (default 100)
+  --resume RUN_ID                 Resume a previously-crashed run (uses --list-runs ids)
+  --force-resume                  Bypass the 7-day staleness gate on --resume
+  --list-runs                     Print saved run_ids and exit
+  --help, -h                      Show this help
 
 Examples:
   gbrain lsd "why are AI coding tools converging on the same UX?"
@@ -140,6 +216,24 @@ async function runBrainstormCli(
     process.exit(2);
     return;
   }
+  if (parsed.listRuns) {
+    const { listRuns } = await import('../core/brainstorm/checkpoint.ts');
+    const runs = listRuns();
+    if (parsed.json) {
+      console.log(JSON.stringify(runs, null, 2));
+    } else if (runs.length === 0) {
+      console.log('No saved brainstorm runs.');
+    } else {
+      console.log('Saved runs (newest first):');
+      console.log('run_id            | iso_date                  | question');
+      console.log('------------------+---------------------------+----------------');
+      for (const r of runs) {
+        const iso = new Date(r.mtime).toISOString();
+        console.log(`${r.run_id} | ${iso} | ${r.question.slice(0, 60)}`);
+      }
+    }
+    return;
+  }
   if (!parsed.question || parsed.question.trim().length === 0) {
     console.error(`gbrain ${profile.label}: question required`);
     console.error(help);
@@ -160,6 +254,13 @@ async function runBrainstormCli(
     question: parsed.question,
     profile: effectiveProfile,
     skipCostPreview: skipPreview,
+    maxCostUsd: parsed.maxCost,
+    maxFarSet: parsed.maxFarSet,
+    strictBudget: parsed.strictBudget,
+    judgeModel: parsed.judgeModel,
+    maxIdeasPerJudgeCall: parsed.maxIdeasPerJudgeCall,
+    resumeRunId: parsed.resume,
+    forceResume: parsed.forceResume,
   });
 
   if (parsed.json) {
diff --git a/src/commands/doctor.ts b/src/commands/doctor.ts
index 51c4bb6fb..9e4d9a82a 100644
--- a/src/commands/doctor.ts
+++ b/src/commands/doctor.ts
@@ -4348,13 +4348,36 @@ export async function runRemediate(
 ): Promise<void> {
   const targetScore = parseIntFlag(args, '--target-score') ?? 90;
   const maxJobs = parseIntFlag(args, '--max-jobs') ?? Infinity;
-  const maxUsd = parseFloatFlag(args, '--max-usd');
+  // A4 amended: --max-cost is an alias for --max-usd. Both spellings are
+  // documented as the cron-safety guard. Either threads through to the
+  // pre-flight estimate refusal AND, via withBudgetTracker, the mid-run
+  // BudgetExhausted hard-throw.
+  const maxUsd = parseFloatFlag(args, '--max-usd') ?? parseFloatFlag(args, '--max-cost');
   const dryRun = args.includes('--dry-run');
   const skipConfirm = args.includes('--yes');
   const jsonOutput = args.includes('--json');
+  // A4 amended: --resume <plan_hash?> loads the checkpoint for the active
+  // (engine,target) and continues from the next step. With no value, the
+  // most recent checkpoint for the active engine is loaded.
+  const resumeFlagIdx = args.indexOf('--resume');
+  const resumeMode = resumeFlagIdx !== -1;
+  const resumeArg = resumeMode ? args[resumeFlagIdx + 1] : undefined;
+  const resumePlanHash = resumeArg && !resumeArg.startsWith('--') ? resumeArg : undefined;
 
   const { computeRecommendations, classifyChecks, maxReachableScore } =
     await import('../core/brain-score-recommendations.ts');
+  const {
+    BudgetTracker,
+    BudgetExhausted,
+  } = await import('../core/budget/budget-tracker.ts');
+  const { withBudgetTracker } = await import('../core/ai/gateway.ts');
+  const {
+    computePlanHash,
+    saveRemediationCheckpoint,
+    loadRemediationCheckpoint,
+    listRemediationCheckpoints,
+    clearRemediationCheckpoint,
+  } = await import('../core/remediation-checkpoint.ts');
 
   const ctx = await loadRecommendationContext(engine);
 
@@ -4384,6 +4407,46 @@ export async function runRemediate(
     return;
   }
 
+  // A4 amended: compute plan_hash off the active recommendation ids so the
+  // checkpoint binds to THIS plan. Resume only fires for matching plans.
+  const planHash = computePlanHash(recs.map((r) => r.id));
+  let completedFromCheckpoint = new Set<string>();
+  if (resumeMode) {
+    const requested = resumePlanHash;
+    let cp = requested ? loadRemediationCheckpoint(requested) : null;
+    if (!cp && !requested) {
+      // No explicit hash: try newest checkpoint that matches the active plan.
+      const recent = listRemediationCheckpoints();
+      for (const e of recent) {
+        const candidate = loadRemediationCheckpoint(e.plan_hash);
+        if (candidate && candidate.plan_hash === planHash) {
+          cp = candidate;
+          break;
+        }
+      }
+    }
+    if (!cp) {
+      console.error(
+        `[remediate --resume] no matching checkpoint found ` +
+          `(plan_hash=${planHash}${requested ? `; requested=${requested}` : ''}). ` +
+          `Run without --resume to start fresh.`,
+      );
+      process.exit(2);
+    }
+    if (cp.plan_hash !== planHash) {
+      console.error(
+        `[remediate --resume] checkpoint plan_hash=${cp.plan_hash} does not match active plan_hash=${planHash}. ` +
+          `The plan has changed (brain state moved). Run without --resume to start fresh.`,
+      );
+      process.exit(2);
+    }
+    completedFromCheckpoint = new Set(cp.completed.map((c) => c.id));
+    console.error(
+      `[remediate --resume] resuming plan_hash=${planHash}: ${completedFromCheckpoint.size} step(s) completed, ` +
+        `${recs.length - completedFromCheckpoint.size} remaining.`,
+    );
+  }
+
   const estTotalUsd = recs.reduce((sum, r) => sum + (r.est_usd_cost ?? 0), 0);
   if (maxUsd !== null && estTotalUsd > maxUsd) {
     console.error(
@@ -4419,61 +4482,132 @@ export async function runRemediate(
   const { waitForCompletion } = await import('../core/minions/wait-for-completion.ts');
   const queue = new MinionQueue(engine);
 
-  let stepCount = 0;
-  while (recs.length > 0 && stepCount < maxJobs) {
-    const step = recs[0];
-    if (!step) break;
-    stepCount++;
+  // A4 amended: install a BudgetTracker scope around the plan-step loop so
+  // any gateway.chat / embed / rerank inside a Minion handler (synthesize,
+  // patterns, consolidate) auto-enforces the cap. On BudgetExhausted, the
+  // onExhausted callback persists the checkpoint BEFORE the throw propagates;
+  // the catch surfaces the actionable --resume hint.
+  const remediateTracker = new BudgetTracker({
+    label: 'doctor.remediate',
+    maxCostUsd: maxUsd ?? undefined,
+  });
+
+  let exhaustionSnapshot: { spent: number; cap: number; reason: string; model_id?: string } | undefined;
+  remediateTracker.onExhausted(() => {
+    // BudgetTracker fires this synchronously from inside reserve()/record()
+    // before the throw bubbles. Persist whatever has been done so far.
+    const cp = {
+      schema_version: 1 as const,
+      plan_hash: planHash,
+      doctor_run_id: doctorRunId,
+      target_score: targetScore,
+      started_at: new Date().toISOString(),
+      completed: submitted
+        .filter((s) => s.status === 'completed')
+        .map((s) => ({ id: s.id, job: '', status: s.status, job_id: s.job_id ?? null })),
+      aborted_at: new Date().toISOString(),
+      abort_reason: 'budget_exhausted' as const,
+      budget_snapshot: exhaustionSnapshot,
+    };
+    saveRemediationCheckpoint(cp);
+  });
+
+  const runLoop = async (): Promise<void> => {
+    let stepCount = 0;
+    while (recs.length > 0 && stepCount < maxJobs) {
+      const step = recs[0];
+      if (!step) break;
+      stepCount++;
+
+      // Resume: skip steps that the checkpoint already marked completed.
+      if (completedFromCheckpoint.has(step.id)) {
+        submitted.push({ step: stepCount, id: step.id, job_id: null, status: 'completed' });
+        recs.shift();
+        continue;
+      }
 
-    // D5: if depends_on intersects aborted, skip + cascade
-    if (step.depends_on && step.depends_on.some((d) => abortedIds.has(d))) {
-      submitted.push({ step: stepCount, id: step.id, job_id: null, status: 'skipped_dep_aborted' });
-      abortedIds.add(step.id);
-      recs.shift();
-      continue;
-    }
+      // D5: if depends_on intersects aborted, skip + cascade
+      if (step.depends_on && step.depends_on.some((d) => abortedIds.has(d))) {
+        submitted.push({ step: stepCount, id: step.id, job_id: null, status: 'skipped_dep_aborted' });
+        abortedIds.add(step.id);
+        recs.shift();
+        continue;
+      }
 
-    try {
-      const isProtected = !!step.protected;
-      const job = await queue.add(
-        step.job,
-        { ...step.params, doctor_run_id: doctorRunId },
-        {
-          queue: 'default',
-          idempotency_key: step.idempotency_key,
-          max_attempts: 2,
-          maxWaiting: 1,
-        },
-        isProtected ? { allowProtectedSubmit: true } : undefined,
-      );
-      submitted.push({ step: stepCount, id: step.id, job_id: job.id, status: 'submitted' });
+      try {
+        const isProtected = !!step.protected;
+        const job = await queue.add(
+          step.job,
+          { ...step.params, doctor_run_id: doctorRunId },
+          {
+            queue: 'default',
+            idempotency_key: step.idempotency_key,
+            max_attempts: 2,
+            maxWaiting: 1,
+          },
+          isProtected ? { allowProtectedSubmit: true } : undefined,
+        );
+        submitted.push({ step: stepCount, id: step.id, job_id: job.id, status: 'submitted' });
 
-      // Wait for terminal state. PGLite is in-process — short poll.
-      const terminal = await waitForCompletion(queue, job.id, {
-        pollMs: isPGLite ? 250 : 1000,
-        timeoutMs: (step.est_seconds + 60) * 1000,
-      });
-      const lastSub = submitted[submitted.length - 1];
-      if (lastSub) lastSub.status = terminal.status;
+        // Wait for terminal state. PGLite is in-process — short poll.
+        const terminal = await waitForCompletion(queue, job.id, {
+          pollMs: isPGLite ? 250 : 1000,
+          timeoutMs: (step.est_seconds + 60) * 1000,
+        });
+        const lastSub = submitted[submitted.length - 1];
+        if (lastSub) lastSub.status = terminal.status;
 
-      if (terminal.status !== 'completed') {
+        if (terminal.status !== 'completed') {
+          abortedIds.add(step.id);
+        }
+      } catch (e) {
+        if (e instanceof BudgetExhausted) {
+          exhaustionSnapshot = {
+            spent: e.spent,
+            cap: e.cap,
+            reason: e.reason,
+            model_id: e.modelId,
+          };
+          throw e;
+        }
+        submitted.push({
+          step: stepCount, id: step.id, job_id: null,
+          status: `error: ${(e as Error).message.slice(0, 100)}`,
+        });
         abortedIds.add(step.id);
       }
-    } catch (e) {
-      submitted.push({
-        step: stepCount, id: step.id, job_id: null,
-        status: `error: ${(e as Error).message.slice(0, 100)}`,
-      });
-      abortedIds.add(step.id);
+
+      recs.shift();
+      // D7: scoped recheck — re-compute plan from fresh health snapshot.
+      // The next plan may drop completed steps and re-introduce failed
+      // steps with bumped retry suffix (D1).
+      if (recs.length === 0 || stepCount >= maxJobs) break;
+      const freshHealth = await engine.getHealth();
+      recs = computeRecommendations(freshHealth, ctx).filter((r) => r.status === 'remediable');
+    }
+  };
+
+  let budgetExhaustedAt: InstanceType<typeof BudgetExhausted> | null = null;
+  try {
+    await withBudgetTracker(remediateTracker, runLoop);
+  } catch (err) {
+    if (err instanceof BudgetExhausted) {
+      budgetExhaustedAt = err;
+      console.error(
+        `\n[remediate] BudgetExhausted (${err.reason}): spent $${err.spent.toFixed(4)} > cap $${err.cap.toFixed(2)}.\n` +
+          `Checkpoint saved. Resume with:\n` +
+          `  gbrain doctor --remediate --resume ${planHash}\n`,
+      );
+    } else {
+      throw err;
     }
+  }
 
-    recs.shift();
-    // D7: scoped recheck — re-compute plan from fresh health snapshot.
-    // The next plan may drop completed steps and re-introduce failed
-    // steps with bumped retry suffix (D1).
-    if (recs.length === 0 || stepCount >= maxJobs) break;
-    const freshHealth = await engine.getHealth();
-    recs = computeRecommendations(freshHealth, ctx).filter((r) => r.status === 'remediable');
+  // Clear checkpoint on a clean run (no budget abort). Failed steps in the
+  // submitted set don't disqualify the cleanup — they re-surface on the
+  // next plan with bumped suffixes.
+  if (!budgetExhaustedAt) {
+    clearRemediationCheckpoint(planHash);
   }
 
   const finalHealth = await engine.getHealth();
@@ -4495,7 +4629,7 @@ export async function runRemediate(
   }
 
   const anyFailed = submitted.some((s) => s.status !== 'completed' && s.status !== 'submitted');
-  if (anyFailed) process.exit(1);
+  if (budgetExhaustedAt || anyFailed) process.exit(1);
 }
 
 /**
diff --git a/src/commands/reindex-code.ts b/src/commands/reindex-code.ts
index 527a0610f..527c400f7 100644
--- a/src/commands/reindex-code.ts
+++ b/src/commands/reindex-code.ts
@@ -31,6 +31,8 @@ import { errorFor, serializeError } from '../core/errors.ts';
 import { createInterface } from 'readline';
 import { createProgress } from '../core/progress.ts';
 import { getCliOptions, cliOptsToProgressOptions } from '../core/cli-options.ts';
+import { BudgetTracker, BudgetExhausted } from '../core/budget/budget-tracker.ts';
+import { withBudgetTracker } from '../core/ai/gateway.ts';
 
 export interface ReindexCodeOpts {
   sourceId?: string;
@@ -41,6 +43,15 @@ export interface ReindexCodeOpts {
   noEmbed?: boolean;
   /** Page batch size. Default 100 (codex Finding 4.4 OOM protection). */
   batchSize?: number;
+  /**
+   * Cap embedding spend in USD. Default undefined = no cap (legacy behavior).
+   * When set, the reindex body runs inside a `withBudgetTracker` scope so
+   * every `gateway.embed()` call inside `importCodeFile` composes with the
+   * cap. Throws BudgetExhausted (reason='cost') when cumulative exceeds the
+   * cap; partial progress is preserved (already-imported pages stay
+   * imported, the throw aborts the remaining batch).
+   */
+  maxCostUsd?: number;
 }
 
 export interface ReindexCodeResult {
@@ -229,51 +240,99 @@ export async function runReindexCode(
   let failed = 0;
   const failures: Array<{ slug: string; error: string }> = [];
   let offset = 0;
+  let budgetExhausted: BudgetExhausted | null = null;
 
-  try {
-    while (true) {
-      const batch = await fetchCodePages(engine, opts.sourceId, batchSize, offset);
-      if (batch.length === 0) break;
-
-      for (const row of batch) {
-        const fm = row.frontmatter ?? {};
-        const relPath = typeof fm.file === 'string' ? fm.file : null;
-        if (!relPath) {
-          failed++;
-          failures.push({ slug: row.slug, error: 'missing frontmatter.file' });
-          reporter.tick();
-          continue;
-        }
-        if (!row.compiled_truth) {
-          failed++;
-          failures.push({ slug: row.slug, error: 'missing compiled_truth' });
-          reporter.tick();
-          continue;
-        }
-        try {
-          const result = await importCodeFile(engine, relPath, row.compiled_truth, {
-            noEmbed: opts.noEmbed,
-            force: opts.force,
-            sourceId: opts.sourceId,
-          });
-          if (result.status === 'imported') reindexed++;
-          else if (result.status === 'skipped') skipped++;
-          else {
+  // F3: when --max-cost is set, run the body inside withBudgetTracker so
+  // every gateway.embed() call inside importCodeFile composes with the cap.
+  // On BudgetExhausted, we catch + persist what's been imported so far,
+  // then surface the throw as a partial-progress result the caller can
+  // re-run. importCodeFile is idempotent (content_hash short-circuit), so
+  // a re-run picks up where the cap fired.
+  const reindexBody = async (): Promise<void> => {
+    try {
+      while (true) {
+        const batch = await fetchCodePages(engine, opts.sourceId, batchSize, offset);
+        if (batch.length === 0) break;
+
+        for (const row of batch) {
+          const fm = row.frontmatter ?? {};
+          const relPath = typeof fm.file === 'string' ? fm.file : null;
+          if (!relPath) {
+            failed++;
+            failures.push({ slug: row.slug, error: 'missing frontmatter.file' });
+            reporter.tick();
+            continue;
+          }
+          if (!row.compiled_truth) {
             failed++;
-            failures.push({ slug: row.slug, error: result.error ?? result.status });
+            failures.push({ slug: row.slug, error: 'missing compiled_truth' });
+            reporter.tick();
+            continue;
           }
-        } catch (e: unknown) {
-          failed++;
-          failures.push({ slug: row.slug, error: e instanceof Error ? e.message : String(e) });
+          try {
+            const result = await importCodeFile(engine, relPath, row.compiled_truth, {
+              noEmbed: opts.noEmbed,
+              force: opts.force,
+              sourceId: opts.sourceId,
+            });
+            if (result.status === 'imported') reindexed++;
+            else if (result.status === 'skipped') skipped++;
+            else {
+              failed++;
+              failures.push({ slug: row.slug, error: result.error ?? result.status });
+            }
+          } catch (e: unknown) {
+            // Budget cap is the one error the per-page catch must NOT swallow.
+            // Caller's outer catch reports partial progress and exits.
+            if (e instanceof BudgetExhausted) throw e;
+            failed++;
+            failures.push({ slug: row.slug, error: e instanceof Error ? e.message : String(e) });
+          }
+          reporter.tick();
         }
-        reporter.tick();
+
+        offset += batch.length;
+        if (batch.length < batchSize) break;
       }
+    } finally {
+      reporter.finish();
+    }
+  };
 
-      offset += batch.length;
-      if (batch.length < batchSize) break;
+  try {
+    if (typeof opts.maxCostUsd === 'number' && opts.maxCostUsd > 0) {
+      const tracker = new BudgetTracker({ maxCostUsd: opts.maxCostUsd, label: 'reindex-code' });
+      await withBudgetTracker(tracker, reindexBody);
+    } else {
+      await reindexBody();
+    }
+  } catch (e) {
+    if (e instanceof BudgetExhausted) {
+      budgetExhausted = e;
+    } else {
+      throw e;
     }
-  } finally {
-    reporter.finish();
+  }
+
+  if (budgetExhausted) {
+    // Partial-progress result: surfaces what got reindexed before the cap
+    // fired. The CLI wrapper translates this into a clear user-facing
+    // message + non-zero exit; the library result lets agent callers see
+    // what happened without grep'ing stderr.
+    return {
+      status: 'ok',
+      codePages: totalPages,
+      reindexed,
+      skipped,
+      failed,
+      totalTokens,
+      costUsd: budgetExhausted.spent,
+      model: getEmbeddingModelName(),
+      failures: [
+        { slug: '(budget)', error: budgetExhausted.message },
+        ...(failures.length > 0 ? failures : []),
+      ],
+    };
   }
 
   return {
@@ -303,8 +362,24 @@ export async function runReindexCodeCli(engine: BrainEngine, args: string[]): Pr
   const force = args.includes('--force');
   const noEmbed = args.includes('--no-embed');
 
+  // F3: --max-cost / --max-cost-usd both accepted for symmetry with brainstorm.
+  let maxCostUsd: number | undefined;
+  for (const flag of ['--max-cost', '--max-cost-usd']) {
+    const idx = args.indexOf(flag);
+    if (idx >= 0) {
+      const v = args[idx + 1];
+      const n = v ? parseFloat(v) : NaN;
+      if (!Number.isFinite(n) || n <= 0) {
+        console.error(`gbrain reindex --code: ${flag} requires a positive number in USD (got ${v ?? '(missing)'})`);
+        process.exit(2);
+      }
+      maxCostUsd = n;
+      break;
+    }
+  }
+
   if (dryRun) {
-    const result = await runReindexCode(engine, { sourceId, dryRun: true, yes, json, force, noEmbed });
+    const result = await runReindexCode(engine, { sourceId, dryRun: true, yes, json, force, noEmbed, maxCostUsd });
     if (json) {
       console.log(JSON.stringify(result));
     } else {
@@ -357,7 +432,7 @@ export async function runReindexCodeCli(engine: BrainEngine, args: string[]): Pr
     }
   }
 
-  const result = await runReindexCode(engine, { sourceId, yes, json, force, noEmbed });
+  const result = await runReindexCode(engine, { sourceId, yes, json, force, noEmbed, maxCostUsd });
   if (json) {
     console.log(JSON.stringify(result));
   } else {
diff --git a/src/core/ai/gateway.ts b/src/core/ai/gateway.ts
index 3060320aa..05ac7a7b3 100644
--- a/src/core/ai/gateway.ts
+++ b/src/core/ai/gateway.ts
@@ -22,6 +22,7 @@
  */
 
 import { embed as aiEmbed, embedMany, generateObject, generateText } from 'ai';
+import { AsyncLocalStorage } from 'node:async_hooks';
 import { listRecipes } from './recipes/index.ts';
 import { createOpenAI } from '@ai-sdk/openai';
 import { createGoogleGenerativeAI } from '@ai-sdk/google';
@@ -29,6 +30,12 @@ import { createAnthropic } from '@ai-sdk/anthropic';
 import { createOpenAICompatible } from '@ai-sdk/openai-compatible';
 import { z } from 'zod';
 
+import {
+  BudgetTracker,
+  extractUsageFromError as _extractUsageFromError,
+  type BudgetKind,
+} from '../budget/budget-tracker.ts';
+
 import type {
   AIGatewayConfig,
   EmbedMultimodalOpts,
@@ -1125,8 +1132,25 @@ export async function embed(texts: string[], opts?: EmbedOpts): Promise<Float32A
   // global default. resolveEmbeddingProvider validates the override at the
   // recipe layer — bad model strings throw AIConfigError with a clear hint.
   const resolveTarget = opts?.embeddingModel ?? getEmbeddingModel();
+  const tracker = __budgetStore.getStore() ?? null;
   const { model, recipe, modelId } = await resolveEmbeddingProvider(resolveTarget);
   const truncated = texts.map(t => (t ?? '').slice(0, MAX_CHARS));
+
+  // Reserve up front for the worst-case batch token count. Embeddings have
+  // no output rate, so maxOutputTokens=0. record() at the end uses the
+  // actual total reported by the SDK across all sub-batches.
+  if (tracker) {
+    const charsPerToken = recipe.touchpoints?.embedding?.chars_per_token ?? DEFAULT_CHARS_PER_TOKEN;
+    const totalChars = truncated.reduce((s, t) => s + t.length, 0);
+    const estimatedInputTokens = Math.ceil(totalChars / Math.max(charsPerToken, 1));
+    tracker.reserve({
+      modelId: `${recipe.id}:${modelId}`,
+      estimatedInputTokens,
+      maxOutputTokens: 0,
+      kind: 'embed',
+      label: 'gateway.embed',
+    });
+  }
   // Dim override (D10) — when caller passes `dimensions`, use it. Otherwise
   // fall back to the global cfg default. dimsProviderOptions throws a
   // clear AIConfigError when a Voyage flexible-dim model gets an
@@ -1151,13 +1175,40 @@ export async function embed(texts: string[], opts?: EmbedOpts): Promise<Float32A
     : [truncated];
 
   const allEmbeddings: Float32Array[] = [];
-
-  for (const batch of batches) {
-    const result = await embedSubBatch(batch, model, providerOpts, expected, recipe, modelId, opts);
-    allEmbeddings.push(...result);
+  let _embedThrew = false;
+  try {
+    for (const batch of batches) {
+      const result = await embedSubBatch(batch, model, providerOpts, expected, recipe, modelId, opts);
+      allEmbeddings.push(...result);
+    }
+    return allEmbeddings;
+  } catch (err) {
+    _embedThrew = true;
+    throw err;
+  } finally {
+    if (tracker) {
+      // Embed token usage is not surfaced by the AI SDK shape we use; charge
+      // based on the truncated input character count using the recipe's
+      // chars-per-token. On failure, A3 amended says charge the pessimistic
+      // estimate too — embed has no output side, so the input estimate IS
+      // the worst case.
+      const charsPerToken = recipe.touchpoints?.embedding?.chars_per_token ?? DEFAULT_CHARS_PER_TOKEN;
+      const totalChars = truncated.reduce((s, t) => s + t.length, 0);
+      const inputTokens = Math.ceil(totalChars / Math.max(charsPerToken, 1));
+      try {
+        tracker.record({
+          modelId: `${recipe.id}:${modelId}`,
+          inputTokens,
+          outputTokens: 0,
+          embeddingDims: expected,
+          kind: 'embed',
+          label: _embedThrew ? 'gateway.embed.failed' : 'gateway.embed',
+        });
+      } catch {
+        // BudgetExhausted (TX1) — original throw (if any) wins.
+      }
+    }
   }
-
-  return allEmbeddings;
 }
 
 /**
@@ -1940,6 +1991,48 @@ export async function generateOcrText(imageBytes: Buffer, mime: string): Promise
   return (result.text ?? '').trim();
 }
 
+// ---- BudgetTracker scope (TX5) ----
+//
+// withBudgetTracker(tracker, fn) installs `tracker` on a module-internal
+// AsyncLocalStorage for the duration of `fn`. Every gateway.chat / embed /
+// rerank call inside the scope auto-composes — no per-call injection seam
+// needed, no flag plumbing through command bodies.
+//
+// Outside the scope, the gateway functions are budget no-ops (current
+// behavior preserved). Nested scopes replace the active tracker for the
+// inner closure and restore the outer tracker on exit.
+//
+// IMPORTANT (A1): for the subagent path, reserve() runs implicitly via the
+// gateway BEFORE acquireLease() in src/core/minions/handlers/subagent.ts —
+// budget throw → no lease attempted, no rate-lease window held.
+
+const __budgetStore = new AsyncLocalStorage<BudgetTracker>();
+
+export function withBudgetTracker<T>(tracker: BudgetTracker, fn: () => Promise<T>): Promise<T> {
+  return __budgetStore.run(tracker, fn);
+}
+
+export function getCurrentBudgetTracker(): BudgetTracker | null {
+  return __budgetStore.getStore() ?? null;
+}
+
+/** Internal helper: estimate input tokens from messages + system. Heuristic only
+ * (~4 chars/token); cap math is best-effort because we pre-flight reservation
+ * before the SDK has counted anything. */
+function estimateChatInputTokens(opts: { system?: string; messages?: Array<{ content?: unknown }> }): number {
+  let chars = (opts.system ?? '').length;
+  for (const m of opts.messages ?? []) {
+    if (typeof m.content === 'string') chars += m.content.length;
+    else if (Array.isArray(m.content)) {
+      for (const block of m.content) {
+        const t = (block as { text?: unknown }).text;
+        if (typeof t === 'string') chars += t.length;
+      }
+    }
+  }
+  return Math.ceil(chars / 4);
+}
+
 // ---- Chat (commit 1) ----
 
 /**
@@ -2081,14 +2174,70 @@ function mapStopReason(
  * blocks via the provider-neutral schema landing in commit 2a).
  */
 export async function chat(opts: ChatOpts): Promise<ChatResult> {
+  const tracker = __budgetStore.getStore() ?? null;
+  const modelStrEarly = opts.model ?? getChatModel();
+  const estimatedInputTokens = estimateChatInputTokens(opts);
+  const maxOutputTokens = opts.maxTokens ?? 4096;
+
+  // TX5: reserve BEFORE the provider call. Throws BudgetExhausted on cost,
+  // runtime, or no_pricing (when cap is set). Pre-resolution model id is
+  // fine here — resolveChatProvider would map aliases the same way for the
+  // cost lookup. record() below uses the real result.model.
+  if (tracker) {
+    tracker.reserve({
+      modelId: modelStrEarly,
+      estimatedInputTokens,
+      maxOutputTokens,
+      kind: 'chat' as BudgetKind,
+      label: 'gateway.chat',
+    });
+  }
+
   // Test seam: when a test transport is installed, route through it without
   // touching provider resolution, AI SDK, or any network. See
   // __setChatTransportForTests. Production paths see _chatTransport === null.
   if (_chatTransport) {
-    return _chatTransport(opts);
+    let res: ChatResult | null = null;
+    let threw: unknown = null;
+    try {
+      res = await _chatTransport(opts);
+      return res;
+    } catch (err) {
+      threw = err;
+      throw err;
+    } finally {
+      if (tracker) {
+        try {
+          if (res) {
+            tracker.record({
+              modelId: res.model ?? modelStrEarly,
+              inputTokens: res.usage.input_tokens,
+              outputTokens: res.usage.output_tokens,
+              label: 'gateway.chat',
+            });
+          } else {
+            const usage = _extractUsageFromError(threw, {
+              inputTokens: estimatedInputTokens,
+              outputTokens: maxOutputTokens,
+            });
+            tracker.record({
+              modelId: modelStrEarly,
+              inputTokens: usage.inputTokens,
+              outputTokens: usage.outputTokens,
+              label: 'gateway.chat',
+            });
+          }
+        } catch {
+          // record() can throw BudgetExhausted (TX1) — suppress here so the
+          // original error (if any) wins; the BudgetExhausted is surfaced
+          // on the NEXT call via reserve(). For test transport this branch
+          // is rare in practice.
+        }
+      }
+    }
   }
 
-  const modelStr = opts.model ?? getChatModel();
+  const modelStr = modelStrEarly;
   const { model, recipe, modelId } = await resolveChatProvider(modelStr);
 
   const supportsCache = recipe.touchpoints.chat?.supports_prompt_cache === true;
@@ -2110,6 +2259,22 @@ export async function chat(opts: ChatOpts): Promise<ChatResult> {
     providerOptions.anthropic = { cacheControl: { type: 'ephemeral' } };
   }
 
+  let _budgetRecorded = false;
+  const _recordBudget = (modelLabel: string, inputTokens: number, outputTokens: number): void => {
+    if (!tracker || _budgetRecorded) return;
+    _budgetRecorded = true;
+    try {
+      tracker.record({
+        modelId: modelLabel,
+        inputTokens,
+        outputTokens,
+        label: 'gateway.chat',
+      });
+    } catch {
+      // BudgetExhausted (TX1) raised here; surface via next reserve()
+    }
+  };
+
   try {
     const result = await generateText({
       model,
@@ -2156,13 +2321,17 @@ export async function chat(opts: ChatOpts): Promise<ChatResult> {
     const providerMetadata = (result as any).providerMetadata as Record<string, any> | undefined;
     const anthropicCache = providerMetadata?.anthropic ?? {};
 
+    const inTok = Number(usage.inputTokens ?? usage.promptTokens ?? 0);
+    const outTok = Number(usage.outputTokens ?? usage.completionTokens ?? 0);
+    _recordBudget(`${recipe.id}:${modelId}`, inTok, outTok);
+
     return {
       text: blocks.filter(b => b.type === 'text').map(b => (b as { type: 'text'; text: string }).text).join(''),
       blocks,
       stopReason: mapStopReason((result as any).finishReason, providerMetadata),
       usage: {
-        input_tokens: Number(usage.inputTokens ?? usage.promptTokens ?? 0),
-        output_tokens: Number(usage.outputTokens ?? usage.completionTokens ?? 0),
+        input_tokens: inTok,
+        output_tokens: outTok,
         cache_read_tokens: Number(anthropicCache.cacheReadInputTokens ?? anthropicCache.cache_read_input_tokens ?? 0),
         cache_creation_tokens: Number(anthropicCache.cacheCreationInputTokens ?? anthropicCache.cache_creation_input_tokens ?? 0),
       },
@@ -2171,6 +2340,13 @@ export async function chat(opts: ChatOpts): Promise<ChatResult> {
       providerMetadata,
     };
   } catch (err) {
+    // Pessimistic fallback (A3 amended): when err.usage isn't there, charge
+    // the worst-case ceiling — better to overcount on failure than under.
+    const fallback = _extractUsageFromError(err, {
+      inputTokens: estimatedInputTokens,
+      outputTokens: maxOutputTokens,
+    });
+    _recordBudget(`${recipe.id}:${modelId}`, fallback.inputTokens, fallback.outputTokens);
     throw normalizeAIError(err, `chat(${recipe.id}:${modelId})`);
   }
 }
@@ -2557,6 +2733,21 @@ export async function rerank(input: RerankInput): Promise<RerankResult[]> {
     input.model ??
     getRerankerModel() ??
     DEFAULT_RERANKER_MODEL;
+
+  const tracker = __budgetStore.getStore() ?? null;
+  if (tracker) {
+    // Reranker pricing isn't in the canonical pricing map today — when no
+    // cap is set this fires the warn-once path; when a cap IS set TX2 hard-
+    // fails. record() below logs the actual size after success.
+    const totalChars = input.query.length + input.documents.reduce((s, d) => s + d.length, 0);
+    tracker.reserve({
+      modelId: modelStr,
+      estimatedInputTokens: Math.ceil(totalChars / 4),
+      maxOutputTokens: 0,
+      kind: 'rerank',
+      label: 'gateway.rerank',
+    });
+  }
   const { parsed, recipe } = resolveRecipe(modelStr);
   const tp = recipe.touchpoints.reranker;
   if (!tp) {
@@ -2620,6 +2811,23 @@ export async function rerank(input: RerankInput): Promise<RerankResult[]> {
     else input.signal.addEventListener('abort', () => ctrl.abort(input.signal!.reason), { once: true });
   }
 
+  let _rerankRecorded = false;
+  const _rerankRecord = (): void => {
+    if (!tracker || _rerankRecorded) return;
+    _rerankRecorded = true;
+    try {
+      const totalChars = input.query.length + input.documents.reduce((s, d) => s + d.length, 0);
+      tracker.record({
+        modelId: modelStr,
+        inputTokens: Math.ceil(totalChars / 4),
+        outputTokens: 0,
+        kind: 'rerank',
+        label: 'gateway.rerank',
+      });
+    } catch {
+      // BudgetExhausted (TX1) suppressed; surfaces on next reserve().
+    }
+  };
   try {
     const transport: RerankTransport = _rerankTransport ?? ((u, init) => fetch(u, init));
     const resp = await transport(url, {
@@ -2650,11 +2858,14 @@ export async function rerank(input: RerankInput): Promise<RerankResult[]> {
     if (!json || !Array.isArray(json.results)) {
       throw new RerankError('rerank: malformed response (no results array)', 'unknown');
     }
-    return json.results.map((r: any) => ({
+    const mapped = json.results.map((r: any) => ({
       index: typeof r.index === 'number' ? r.index : 0,
       relevanceScore: typeof r.relevance_score === 'number' ? r.relevance_score : 0,
     }));
+    _rerankRecord();
+    return mapped;
   } catch (err) {
+    _rerankRecord();
     if (err instanceof RerankError) throw err;
     // AbortError on timeout — classify cleanly.
     if (err && typeof err === 'object' && (err as any).name === 'AbortError') {
diff --git a/src/core/audit-slug-fallback.ts b/src/core/audit-slug-fallback.ts
index 345f16846..11cf3ef8c 100644
--- a/src/core/audit-slug-fallback.ts
+++ b/src/core/audit-slug-fallback.ts
@@ -20,7 +20,7 @@
 
 import * as fs from 'node:fs';
 import * as path from 'node:path';
-import { resolveAuditDir } from './minions/handlers/shell-audit.ts';
+import { isoWeekFilename, resolveAuditDir } from './audit-week-file.ts';
 
 export interface SlugFallbackAuditEvent {
   ts: string;
@@ -34,18 +34,10 @@ export interface SlugFallbackAuditEvent {
   code: 'SLUG_FALLBACK_FRONTMATTER';
 }
 
-/** ISO-week-rotated filename: `slug-fallback-YYYY-Www.jsonl`. */
+/** ISO-week-rotated filename: `slug-fallback-YYYY-Www.jsonl`. Delegates to
+ *  `src/core/audit-week-file.ts`. */
 export function computeSlugFallbackAuditFilename(now: Date = new Date()): string {
-  const d = new Date(Date.UTC(now.getUTCFullYear(), now.getUTCMonth(), now.getUTCDate()));
-  const dayNum = (d.getUTCDay() + 6) % 7;
-  d.setUTCDate(d.getUTCDate() - dayNum + 3);
-  const isoYear = d.getUTCFullYear();
-  const firstThursday = new Date(Date.UTC(isoYear, 0, 4));
-  const firstThursdayDayNum = (firstThursday.getUTCDay() + 6) % 7;
-  firstThursday.setUTCDate(firstThursday.getUTCDate() - firstThursdayDayNum + 3);
-  const weekNum = Math.round((d.getTime() - firstThursday.getTime()) / (7 * 86400000)) + 1;
-  const ww = String(weekNum).padStart(2, '0');
-  return `slug-fallback-${isoYear}-W${ww}.jsonl`;
+  return isoWeekFilename('slug-fallback', now);
 }
 
 /**
diff --git a/src/core/audit-week-file.ts b/src/core/audit-week-file.ts
new file mode 100644
index 000000000..34dade137
--- /dev/null
+++ b/src/core/audit-week-file.ts
@@ -0,0 +1,59 @@
+/**
+ * v0.37.x — single source of truth for the ISO-week filename math used by
+ * every gbrain audit JSONL writer (shell-audit, phantom-audit,
+ * slug-fallback-audit, budget-tracker audit, dream-budget audit).
+ *
+ * Why: each of those modules grew its own copy of the same ISO-week math
+ * with subtle drift (some used UTC, some used local; some used Sunday-start
+ * weeks, some used Thursday-start ISO weeks). One shared helper keeps the
+ * filenames consistent so an operator can grep one filename pattern across
+ * audit dirs.
+ *
+ * ISO 8601 week numbering:
+ *   - Weeks start on Monday.
+ *   - Week 1 of any year is the week containing the year's first Thursday.
+ *   - A date can belong to a week whose ISO year differs from the calendar
+ *     year (Dec 31 of a Wednesday-ending year belongs to W01 of the next).
+ *   - Year-boundary correctness is pinned by `test/core/audit-week-file.test.ts`.
+ */
+
+import { gbrainPath } from './config.ts';
+
+/**
+ * Compute the ISO-8601 week number (1..53) and corresponding ISO week-year
+ * for `d` (UTC). Returns `{year, week}` where `year` may differ from
+ * `d.getUTCFullYear()` near year boundaries.
+ */
+export function isoWeek(d: Date): { year: number; week: number } {
+  // Algorithm: shift to the Thursday of d's week (since Thursday determines
+  // the week's ISO year), then compute weeks since the first Thursday.
+  const tgt = new Date(Date.UTC(d.getUTCFullYear(), d.getUTCMonth(), d.getUTCDate()));
+  const dayNum = (tgt.getUTCDay() + 6) % 7; // Monday=0, ..., Sunday=6
+  tgt.setUTCDate(tgt.getUTCDate() - dayNum + 3); // Thursday of this ISO week
+  const isoYear = tgt.getUTCFullYear();
+  const firstThursday = new Date(Date.UTC(isoYear, 0, 4));
+  const firstDayNum = (firstThursday.getUTCDay() + 6) % 7;
+  firstThursday.setUTCDate(firstThursday.getUTCDate() - firstDayNum + 3);
+  const week = 1 + Math.round((tgt.getTime() - firstThursday.getTime()) / (7 * 24 * 60 * 60 * 1000));
+  return { year: isoYear, week };
+}
+
+/**
+ * Build a basename like `<prefix>-YYYY-Www.jsonl` (e.g. `budget-2026-W21.jsonl`).
+ * Caller is responsible for joining with the audit dir.
+ */
+export function isoWeekFilename(prefix: string, now: Date = new Date()): string {
+  const { year, week } = isoWeek(now);
+  return `${prefix}-${year}-W${String(week).padStart(2, '0')}.jsonl`;
+}
+
+/**
+ * Resolve the audit directory: honors `GBRAIN_AUDIT_DIR` env override,
+ * falls back to `gbrainPath('audit')`. The directory may not exist yet;
+ * callers `mkdirSync({recursive:true})` before writing.
+ */
+export function resolveAuditDir(): string {
+  const override = process.env.GBRAIN_AUDIT_DIR;
+  if (override && override.length > 0) return override;
+  return gbrainPath('audit');
+}
diff --git a/src/core/brainstorm/checkpoint.ts b/src/core/brainstorm/checkpoint.ts
new file mode 100644
index 000000000..4bedc89a8
--- /dev/null
+++ b/src/core/brainstorm/checkpoint.ts
@@ -0,0 +1,207 @@
+/**
+ * v0.37.x — brainstorm checkpoint (P7) with full idea bodies.
+ *
+ * Contracts (locked by /plan-eng-review):
+ *   - TX3 (load-bearing): `completed_crosses` carries FULL idea bodies,
+ *     not just counts. ~50KB per run, trivial. Resume merges these into
+ *     the new run's ideas array BEFORE the judge runs so the final
+ *     BrainstormResult is byte-identical to a clean run.
+ *   - TX4: ONE resume flag — `--resume <run_id>` continues any cross not
+ *     in completed_crosses. The proposed --retry-failed was dropped per
+ *     codex review: failed AND never-attempted crosses both go through
+ *     --resume.
+ *   - A5 amended: run_id = sha256(question + profile_label +
+ *     JSON.stringify(close_slugs.sort()) + JSON.stringify(far_slugs.sort()))
+ *     .slice(0,16). NO embedding bits — stable across embedding-model
+ *     swaps. 7-day mtime-based GC.
+ *
+ * Schema bumped to v2 (was 1 in the draft) when ideas were added.
+ *
+ * Best-effort persistence: a disk-full save logs to stderr and the run
+ * continues. Atomic write via .tmp + rename.
+ */
+
+import {
+  mkdirSync,
+  readdirSync,
+  readFileSync,
+  writeFileSync,
+  renameSync,
+  unlinkSync,
+  existsSync,
+  statSync,
+} from 'node:fs';
+import { join } from 'node:path';
+import { createHash } from 'node:crypto';
+import { gbrainPath } from '../config.ts';
+
+export interface CheckpointIdea {
+  text: string;
+  cross_id: string;
+}
+
+export interface CheckpointCross {
+  close_slug: string;
+  far_slug: string;
+  cross_id: string;
+  ideas: CheckpointIdea[];
+}
+
+export interface FailedCross {
+  close_slug: string;
+  far_slug: string;
+  error: string;
+}
+
+export interface BrainstormCheckpoint {
+  schema_version: 2; // TX3 — bumped from 1 when ideas were added
+  run_id: string;
+  question: string;
+  profile_label: string;
+  started_at: string;
+  /** TX3 load-bearing — each cross's full ideas, not just counts. */
+  completed_crosses: CheckpointCross[];
+  failed_crosses: FailedCross[];
+  judge_done: boolean;
+}
+
+const CURRENT_SCHEMA: 2 = 2;
+const STALE_MS = 7 * 24 * 60 * 60 * 1000;
+
+function checkpointDir(): string {
+  return gbrainPath('brainstorm');
+}
+
+function pathForRunId(runId: string): string {
+  return join(checkpointDir(), `${runId}.json`);
+}
+
+/**
+ * A5 amended identity: sha256(question + profile + sort(close) + sort(far))
+ * truncated to 16 hex chars. No embedding bits — embedding-model swaps
+ * don't break checkpoints.
+ */
+export function computeRunId(
+  question: string,
+  profileLabel: string,
+  closeSlugs: string[],
+  farSlugs: string[],
+): string {
+  const sortedClose = [...closeSlugs].sort();
+  const sortedFar = [...farSlugs].sort();
+  const payload = [
+    question,
+    profileLabel,
+    JSON.stringify(sortedClose),
+    JSON.stringify(sortedFar),
+  ].join('');
+  return createHash('sha256').update(payload).digest('hex').slice(0, 16);
+}
+
+export function loadCheckpoint(runId: string): BrainstormCheckpoint | null {
+  const path = pathForRunId(runId);
+  if (!existsSync(path)) return null;
+  try {
+    const raw = readFileSync(path, 'utf-8');
+    const parsed = JSON.parse(raw) as BrainstormCheckpoint;
+    if (parsed.schema_version !== CURRENT_SCHEMA) {
+      process.stderr.write(
+        `[brainstorm] checkpoint ${runId} has schema_version ${parsed.schema_version} (expected ${CURRENT_SCHEMA}); ignoring (fresh start).\n`,
+      );
+      return null;
+    }
+    return parsed;
+  } catch (err) {
+    process.stderr.write(`[brainstorm] checkpoint read failed for ${runId}: ${String(err)}\n`);
+    return null;
+  }
+}
+
+/** Atomic write via .tmp + rename. Best-effort — disk-full doesn't throw. */
+export function saveCheckpoint(cp: BrainstormCheckpoint): void {
+  try {
+    mkdirSync(checkpointDir(), { recursive: true });
+    const path = pathForRunId(cp.run_id);
+    const tmp = `${path}.tmp`;
+    writeFileSync(tmp, JSON.stringify(cp, null, 2));
+    renameSync(tmp, path);
+  } catch (err) {
+    process.stderr.write(`[brainstorm] checkpoint write failed for ${cp.run_id}: ${String(err)}\n`);
+  }
+}
+
+export function listRuns(): Array<{ run_id: string; question: string; mtime: number }> {
+  const dir = checkpointDir();
+  if (!existsSync(dir)) return [];
+  try {
+    const files = readdirSync(dir).filter((f) => f.endsWith('.json'));
+    const out: Array<{ run_id: string; question: string; mtime: number }> = [];
+    for (const f of files) {
+      const runId = f.replace(/\.json$/, '');
+      const cp = loadCheckpoint(runId);
+      if (!cp) continue;
+      try {
+        const mtime = statSync(join(dir, f)).mtimeMs;
+        out.push({ run_id: runId, question: cp.question, mtime });
+      } catch {
+        // skip
+      }
+    }
+    out.sort((a, b) => b.mtime - a.mtime);
+    return out;
+  } catch {
+    return [];
+  }
+}
+
+/**
+ * GC checkpoints older than `maxAgeDays` (default 7 per A5). Returns the
+ * count of files removed. Best-effort; errors are silent — caller (cycle
+ * purge phase) wraps in try/catch.
+ */
+export function gcStaleCheckpoints(maxAgeDays = 7): number {
+  const dir = checkpointDir();
+  if (!existsSync(dir)) return 0;
+  const cutoff = Date.now() - maxAgeDays * 24 * 60 * 60 * 1000;
+  let removed = 0;
+  try {
+    for (const f of readdirSync(dir)) {
+      if (!f.endsWith('.json')) continue;
+      const path = join(dir, f);
+      try {
+        const m = statSync(path).mtimeMs;
+        if (m < cutoff) {
+          unlinkSync(path);
+          removed++;
+        }
+      } catch {
+        // skip individual file errors
+      }
+    }
+  } catch {
+    // dir-level error — return whatever we managed
+  }
+  return removed;
+}
+
+/** Operator escape hatch: skip the 7d staleness gate. */
+export function isCheckpointFresh(runId: string, now: number = Date.now()): boolean {
+  const path = pathForRunId(runId);
+  if (!existsSync(path)) return false;
+  try {
+    return now - statSync(path).mtimeMs < STALE_MS;
+  } catch {
+    return false;
+  }
+}
+
+/** Erase a checkpoint after the run completes cleanly. Idempotent. */
+export function clearCheckpoint(runId: string): void {
+  const path = pathForRunId(runId);
+  if (!existsSync(path)) return;
+  try {
+    unlinkSync(path);
+  } catch {
+    // best-effort
+  }
+}
diff --git a/src/core/brainstorm/domain-bank.ts b/src/core/brainstorm/domain-bank.ts
index 28bbd0129..3579038b5 100644
--- a/src/core/brainstorm/domain-bank.ts
+++ b/src/core/brainstorm/domain-bank.ts
@@ -78,6 +78,20 @@ export interface FetchFarOpts {
   prefixListOverride?: string[];
   /** Default embedding column for distance calc + getEmbeddingsByChunkIds lookup. */
   embeddingColumn?: string;
+  /**
+   * Hard cap on the number of distinct prefixes we ask the DB to materialize
+   * one-page-per. Defaults to `max(m * 4, 50)`. Without this cap, brains with
+   * thousands of distinct top-level prefixes (e.g. a 13K-page brain with
+   * ~2K prefixes) caused `listPrefixSampledPages` to return ~2K rows instead
+   * of `m`, exploding LLM token spend by 50-100x. See fix/brainstorm-cost-guardrails.
+   */
+  maxFarSet?: number;
+  /**
+   * Optional RNG override for the prefix shuffle (tests only). Defaults to
+   * `Math.random`. The shuffle keeps the prefix-stratified sampling diverse
+   * even when we cap to a small fraction of all available prefixes.
+   */
+  random?: () => number;
 }
 
 /** One far-page result enriched with distance + provenance. */
@@ -348,10 +362,31 @@ export async function fetchFar(
   for (const c of opts.closeSet) {
     if (c.prefix) closePrefixSet.add(c.prefix);
   }
-  const candidatePrefixes = allPrefixes.filter((p) => !closePrefixSet.has(p));
-  const availablePrefixes = candidatePrefixes.length;
+  const allCandidatePrefixes = allPrefixes.filter((p) => !closePrefixSet.has(p));
+  const availablePrefixes = allCandidatePrefixes.length;
   const closeSlugs = opts.closeSet.map((c) => c.slug);
 
+  // ---- Step 2.5: cap the prefix list to `maxFarSet` (cost guardrail) ----
+  //
+  // `listPrefixSampledPages` returns one row per distinct prefix passed in.
+  // On large brains (1000+ prefixes) we were materializing ~1 row per prefix
+  // and then crossing each with the close-set, producing massive token bills.
+  // Cap defaults to max(m * 4, 50): enough headroom for downstream distance
+  // ranking to still pick a diverse `m` final far pages, but bounded.
+  const maxFarSet = opts.maxFarSet ?? Math.max(m * 4, 50);
+  let candidatePrefixes = allCandidatePrefixes;
+  if (candidatePrefixes.length > maxFarSet) {
+    // Shuffle for diversity, then take the first `maxFarSet`. Without the
+    // shuffle a 2K-prefix brain would always pick the same alphabetical head.
+    const rng = opts.random ?? Math.random;
+    const arr = candidatePrefixes.slice();
+    for (let i = arr.length - 1; i > 0; i--) {
+      const j = Math.floor(rng() * (i + 1));
+      [arr[i], arr[j]] = [arr[j], arr[i]];
+    }
+    candidatePrefixes = arr.slice(0, maxFarSet);
+  }
+
   // ---- Step 3: primary path — listPrefixSampledPages ----
   let primaryRows: DomainBankRow[] = [];
   if (candidatePrefixes.length > 0) {
@@ -408,7 +443,7 @@ export async function fetchFar(
     .filter((e): e is Float32Array => e !== undefined);
 
   // ---- Step 6: build FarPage results with normalized distance ----
-  const pages: FarPage[] = allRows.map(({ row, src }) => {
+  const allPages: FarPage[] = allRows.map(({ row, src }) => {
     const farEmbed = row.representative_chunk_id != null
       ? embeddings.get(row.representative_chunk_id) ?? null
       : null;
@@ -427,11 +462,26 @@ export async function fetchFar(
     };
   });
 
+  // ---- Step 6.5: final trim to `m` ----
+  //
+  // Even after capping prefixes to `maxFarSet`, `listPrefixSampledPages` plus
+  // the fallback can return up to `maxFarSet + need` rows. The orchestrator
+  // crosses every (close × far) so we MUST trim to `m` here or the LLM bill
+  // scales with the cap, not with `m`. Sort by distance_score DESC so we keep
+  // the farthest (most novel) pages first.
+  const pages = allPages
+    .slice()
+    .sort((a, b) => b.distance_score - a.distance_score)
+    .slice(0, m);
+
   return {
     pages,
     available_prefixes: availablePrefixes,
     total_prefixes: totalPrefixes,
     used_fallback: usedFallback,
-    short_of_target: pages.length < m,
+    // short_of_target reflects whether the *pre-trim* candidate pool fell short
+    // of `m`. After the explicit trim to `m` above, `pages.length` would always
+    // equal `min(m, allPages.length)`, masking the sparse-brain signal.
+    short_of_target: allPages.length < m,
   };
 }
diff --git a/src/core/brainstorm/judges.ts b/src/core/brainstorm/judges.ts
index b65e6dbcc..ca7ef20b5 100644
--- a/src/core/brainstorm/judges.ts
+++ b/src/core/brainstorm/judges.ts
@@ -347,12 +347,28 @@ export interface RunJudgeOptions {
   activeBiasTags?: string[];
   /** AbortSignal for Ctrl-C / shutdown propagation. */
   abortSignal?: AbortSignal;
+  /**
+   * Maximum ideas to send in a single judge LLM call. Defaults to 100.
+   * Large idea sets (e.g. 15K ideas from a 13K-page brain) blow past the
+   * model's context window when sent as one batch. We chunk into batches
+   * of `maxIdeasPerCall` and concatenate the results.
+   */
+  maxIdeasPerCall?: number;
+  /** Stderr sink for chunk-progress reporting. Defaults to process.stderr.write. */
+  stderrWrite?: (s: string) => void;
 }
 
+/** Default judge chunk size. ~350 tokens/idea × 100 ideas ≈ 35K input tokens, safely under any model context. */
+const DEFAULT_JUDGE_CHUNK_SIZE = 100;
+
 /**
- * Single batch — caller chunks large idea sets to keep prompt size bounded.
- * Throws on parse failure (caller maps to judge_failed:true + saves unscored,
- * per D12).
+ * Judge a batch of ideas. Automatically chunks large idea sets into
+ * `maxIdeasPerCall`-sized sub-batches (default 100) to avoid blowing past
+ * the model's context window. Each chunk is a separate LLM call; results
+ * are concatenated. Throws on parse failure of *any* chunk (caller maps to
+ * judge_failed:true + saves unscored, per D12), but on a partial failure
+ * (some chunks succeed, one fails) we still throw — callers who want
+ * partial-result resilience should call `runJudge` per-chunk themselves.
  */
 export async function runJudge(
   config: JudgeConfig,
@@ -364,6 +380,56 @@ export async function runJudge(
     // returning a well-formed empty result is more ergonomic.
     return { ideas: [], pass_count: 0, model: 'noop', usage: { input_tokens: 0, output_tokens: 0, cache_read_tokens: 0, cache_creation_tokens: 0 } };
   }
+  const chunkSize = Math.max(1, options.maxIdeasPerCall ?? DEFAULT_JUDGE_CHUNK_SIZE);
+  const stderr = options.stderrWrite ?? ((s: string) => { process.stderr.write(s); });
+
+  // Split ideas into chunks. For small idea sets (<= chunkSize) this is a
+  // single chunk and behaves identically to the pre-fix single-call path.
+  const chunks: JudgeIdea[][] = [];
+  for (let i = 0; i < ideas.length; i += chunkSize) {
+    chunks.push(ideas.slice(i, i + chunkSize));
+  }
+  if (chunks.length > 1) {
+    stderr(`[${config.label}-judge] chunking ${ideas.length} ideas into ${chunks.length} batches of ≤${chunkSize}\n`);
+  }
+
+  const allIdeaResults: JudgeIdeaResult[] = [];
+  let lastModel = 'noop';
+  const totalUsage: ChatResult['usage'] = {
+    input_tokens: 0,
+    output_tokens: 0,
+    cache_read_tokens: 0,
+    cache_creation_tokens: 0,
+  };
+  for (let ci = 0; ci < chunks.length; ci++) {
+    const chunk = chunks[ci];
+    const chunkResult = await runJudgeChunk(config, chunk, options);
+    allIdeaResults.push(...chunkResult.ideas);
+    lastModel = chunkResult.model;
+    totalUsage.input_tokens += chunkResult.usage.input_tokens;
+    totalUsage.output_tokens += chunkResult.usage.output_tokens;
+    if (typeof chunkResult.usage.cache_read_tokens === 'number') {
+      totalUsage.cache_read_tokens = (totalUsage.cache_read_tokens ?? 0) + chunkResult.usage.cache_read_tokens;
+    }
+    if (typeof chunkResult.usage.cache_creation_tokens === 'number') {
+      totalUsage.cache_creation_tokens = (totalUsage.cache_creation_tokens ?? 0) + chunkResult.usage.cache_creation_tokens;
+    }
+  }
+
+  return {
+    ideas: allIdeaResults,
+    pass_count: allIdeaResults.filter((i) => i.passes).length,
+    model: lastModel,
+    usage: totalUsage,
+  };
+}
+
+/** Single-chunk inner loop. Extracted so `runJudge` can chunk + concatenate. */
+async function runJudgeChunk(
+  config: JudgeConfig,
+  ideas: JudgeIdea[],
+  options: RunJudgeOptions
+): Promise<JudgeResult> {
   const chat = options.chatFn ?? defaultChat;
   const prompt = buildJudgePrompt(config, ideas);
 
@@ -401,15 +467,15 @@ export async function runJudge(
       continue;
     }
     const weighted_score = weightedScore(validated.scores, config.weights);
-    const result: JudgeIdeaResult = {
+    const ir: JudgeIdeaResult = {
       id: validated.id,
       scores: validated.scores,
       weighted_score,
       passes: false, // filled below
       note: validated.note,
     };
-    result.passes = ideaPasses(result, config);
-    ideaResults.push(result);
+    ir.passes = ideaPasses(ir, config);
+    ideaResults.push(ir);
   }
 
   return {
diff --git a/src/core/brainstorm/orchestrator.ts b/src/core/brainstorm/orchestrator.ts
index 89933def7..665cf76fb 100644
--- a/src/core/brainstorm/orchestrator.ts
+++ b/src/core/brainstorm/orchestrator.ts
@@ -36,6 +36,28 @@ import {
 } from './judges.ts';
 import { ANTHROPIC_PRICING } from '../anthropic-pricing.ts';
 
+// ---------------------------------------------------------------------------
+// BudgetExhausted is the canonical typed error (Q2) used by every cost
+// guardrail in the orchestrator. The class lives in
+// `src/core/budget/budget-tracker.ts` (Phase 2 of the budget cathedral); we
+// re-export here for back-compat with any caller that imports it from this
+// module (the only known caller is the test suite).
+// ---------------------------------------------------------------------------
+
+import { BudgetExhausted, BudgetTracker } from '../budget/budget-tracker.ts';
+import { withBudgetTracker } from '../ai/gateway.ts';
+import {
+  computeRunId,
+  loadCheckpoint,
+  saveCheckpoint,
+  isCheckpointFresh,
+  clearCheckpoint,
+  type BrainstormCheckpoint,
+  type CheckpointCross,
+} from './checkpoint.ts';
+
+export { BudgetExhausted };
+
 // ---------------------------------------------------------------------------
 // Profile (BrainstormProfile is the brainstorm vs LSD config object)
 // ---------------------------------------------------------------------------
@@ -117,6 +139,52 @@ export interface BrainstormOptions {
   embedQueryFn?: (text: string) => Promise<Float32Array>;
   /** Stderr sink — defaults to process.stderr.write. Tests pipe into a buffer. */
   stderrWrite?: (s: string) => void;
+  /**
+   * Maximum projected cost in USD before the run aborts. Default $5.
+   * The pre-run estimate is compared against this ceiling; if higher, we
+   * abort with a paste-ready error (unless `skipCostPreview` is set AND
+   * the caller is non-interactive — then we still abort, the ceiling is
+   * a hard limit).
+   */
+  maxCostUsd?: number;
+  /**
+   * Hard cap on the domain-bank far set. Default 50. Threaded into
+   * `fetchFar` to prevent the "2K prefix" explosion on large brains.
+   */
+  maxFarSet?: number;
+  /**
+   * When true, abort mid-run if running token usage exceeds 5× the original
+   * estimate. Default false (warn-only). Pair with `maxCostUsd` for a hard
+   * ceiling.
+   */
+  strictBudget?: boolean;
+  /**
+   * Override the model used for the judge phase. Larger-context models
+   * (e.g. Gemini 2M / Claude 200K) help when judging large idea sets.
+   * Falls back to `modelOverride` then the gateway default.
+   */
+  judgeModel?: string;
+  /**
+   * Max ideas per judge LLM call. Default 100. Larger batches save calls
+   * but risk context overflow; smaller batches are slower but safer.
+   */
+  maxIdeasPerJudgeCall?: number;
+  /**
+   * TX4: resume from a previously-persisted checkpoint at
+   * `~/.gbrain/brainstorm/<run_id>.json`. Set by `--resume <run_id>`.
+   * When the checkpoint's identity (run_id) doesn't match the active
+   * inputs, the orchestrator refuses with a paste-ready hint rather
+   * than silently starting fresh.
+   *
+   * If undefined and a fresh checkpoint exists for the auto-derived
+   * run_id, the orchestrator does NOT auto-resume — caller must opt in
+   * via the explicit flag.
+   */
+  resumeRunId?: string;
+  /**
+   * A5: bypass the 7-day staleness gate when --resume is set.
+   */
+  forceResume?: boolean;
 }
 
 /** One idea emitted to the user, with citation transparency (D6). */
@@ -279,6 +347,21 @@ export async function loadCalibrationContext(
 // Idea generation prompts + response parsing
 // ---------------------------------------------------------------------------
 
+/**
+ * Strip lone/orphaned UTF-16 surrogates that would crash JSON encoding
+ * downstream. The Anthropic SDK and some gateway transports refuse strings
+ * containing unpaired surrogates (U+D800–U+DFFF). Page content that came
+ * in via OCR or older imports occasionally has them.
+ */
+function sanitizeUnicode(s: string): string {
+  if (!s) return s;
+  // Replace lone high surrogates (D800-DBFF) not followed by a low surrogate.
+  // Replace lone low surrogates (DC00-DFFF) not preceded by a high surrogate.
+  return s
+    .replace(/[\uD800-\uDBFF](?![\uDC00-\uDFFF])/g, '�')
+    .replace(/(^|[^\uD800-\uDBFF])[\uDC00-\uDFFF]/g, '$1�');
+}
+
 /** Build a single (close × far) cross-generation prompt. */
 function buildCrossPrompt(opts: {
   profile: BrainstormProfile;
@@ -296,16 +379,25 @@ Style rules:
 - Cite BOTH the close and far slug verbatim — these are the user's own notes.
 - Never fabricate facts, figures, or quotes. Stay grounded in the cited pages.${opts.profile.generator_constraint ? `\n- ${opts.profile.generator_constraint}` : ''}`;
 
+  // Sanitize: unicode surrogates in page content (from OCR or older imports)
+  // can crash JSON encoding in the chat transport, which would void the
+  // entire cross. Cheap to fix here.
+  const closeContent = sanitizeUnicode(opts.close.content);
+  const farContent = sanitizeUnicode(opts.far.content);
+  const closeTitle = sanitizeUnicode(opts.close.title ?? '(untitled)');
+  const farTitle = sanitizeUnicode(opts.far.title ?? '(untitled)');
+  const question = sanitizeUnicode(opts.question);
+
   const user = `QUESTION:
-${opts.question}
+${question}
 
 CLOSE PAGE (related to the question — context anchor):
-[${opts.close.slug}] ${opts.close.title ?? '(untitled)'}
-${opts.close.content.slice(0, 1500)}
+[${opts.close.slug}] ${closeTitle}
+${closeContent.slice(0, 1500)}
 
 FAR PAGE (from a distant region of the user's brain — the collision partner):
-[${opts.far.slug}] ${opts.far.title ?? '(untitled)'}
-${opts.far.content}
+[${opts.far.slug}] ${farTitle}
+${farContent}
 
 Generate exactly ${opts.profile.ideas_per_cross} ideas from cross-pollinating these pages.
 
@@ -381,6 +473,24 @@ export async function runBrainstorm(
   engine: BrainEngine,
   config: { embedding_model?: string; emotional_weight?: { user_holder?: string } },
   opts: BrainstormOptions
+): Promise<BrainstormResult> {
+  // T10: install a gateway-layer BudgetTracker scope around the whole run
+  // so every gateway.chat / embed call (the cross generations + judge +
+  // question embed) auto-records cost via the AsyncLocalStorage from T3.
+  // The cap mirrors the orchestrator's maxCostUsd so the gateway can
+  // hard-fail via BudgetExhausted(reason:'cost') if a single under-
+  // estimated call leaks past the ceiling (TX1).
+  const _runTracker = new BudgetTracker({
+    label: `brainstorm.${opts.profile?.label ?? 'brainstorm'}`,
+    maxCostUsd: opts.maxCostUsd ?? 5,
+  });
+  return withBudgetTracker(_runTracker, () => _runBrainstormInner(engine, config, opts));
+}
+
+async function _runBrainstormInner(
+  engine: BrainEngine,
+  config: { embedding_model?: string; emotional_weight?: { user_holder?: string } },
+  opts: BrainstormOptions,
 ): Promise<BrainstormResult> {
   const profile = opts.profile ?? BRAINSTORM_PROFILE;
   const stderr = opts.stderrWrite ?? ((s: string) => { process.stderr.write(s); });
@@ -399,6 +509,22 @@ export async function runBrainstorm(
     throw new Error('brainstorm: aborted before run (Ctrl-C during cost preview window)');
   }
 
+  // ---- Phase 0.5: hard cost ceiling (circuit breaker) ----
+  //
+  // The TTY grace window is a soft check. This is the hard one. On large
+  // brains the pre-run estimate is itself an under-estimate (53× over in
+  // the wild on a 13K-page brain) because `m_far` got blown out by
+  // un-capped prefix sampling. We refuse to start if the *estimate alone*
+  // already exceeds the user's ceiling.
+  const maxCostUsd = opts.maxCostUsd ?? 5;
+  if (estimate > maxCostUsd) {
+    throw new BudgetExhausted(
+      `${profile.label}: estimated cost ${fmtUsd(estimate)} exceeds --max-cost ${fmtUsd(maxCostUsd)}. ` +
+      `Lower --limit, raise --max-cost, or pass --max-far-set <n> to cap the domain bank.`,
+      { reason: 'cost', spent: estimate, cap: maxCostUsd },
+    );
+  }
+
   // ---- Phase 1: question embedding + close-set retrieval ----
   let questionEmbedding: Float32Array | null = null;
   try {
@@ -440,6 +566,9 @@ export async function runBrainstorm(
     staleBias: profile.stale_bias,
     sourceId: opts.sourceId,
     sourceIds: opts.sourceIds,
+    // Cap the prefix-stratified far set. Defaults to max(m * 4, 50) inside
+    // fetchFar; we forward the CLI flag when set.
+    maxFarSet: opts.maxFarSet,
   });
   if (farResult.short_of_target) {
     // D11 data-driven warning text.
@@ -495,11 +624,81 @@ export async function runBrainstorm(
     }
   }
 
+  // ---- TX3/TX4/A5: checkpoint + --resume wiring ----
+  //
+  // run_id is derived from the inputs (question + profile + sorted slug arrays
+  // — A5 amended, no embedding bits). When opts.resumeRunId is set we load
+  // the matching checkpoint and skip already-completed crosses; when it's
+  // unset we still WRITE a checkpoint every N successful crosses so the
+  // user has a recovery path on a future crash.
+  const closeSlugsAll = closesForCross.map((c) => c.slug);
+  const farSlugsAll = farResult.pages.map((p) => p.slug);
+  const runId = computeRunId(opts.question, profile.label, closeSlugsAll, farSlugsAll);
+  const crossKey = (cross: Cross): string => `${cross.close.slug}__${cross.far.slug}`;
+  const completedFromDisk = new Map<string, CheckpointCross>(); // crossKey → ideas-from-disk
+
+  let prevCheckpoint: BrainstormCheckpoint | null = null;
+  if (opts.resumeRunId) {
+    if (opts.resumeRunId !== runId) {
+      throw new Error(
+        `${profile.label}: --resume run_id=${opts.resumeRunId} does not match inputs (active run_id=${runId}). ` +
+          `Inputs (question, close set, far set) changed since the checkpoint. Run without --resume to start fresh.`,
+      );
+    }
+    if (!opts.forceResume && !isCheckpointFresh(opts.resumeRunId)) {
+      throw new Error(
+        `${profile.label}: checkpoint ${opts.resumeRunId} is older than 7 days. ` +
+          `Pass --force-resume to override, or run without --resume to start fresh.`,
+      );
+    }
+    prevCheckpoint = loadCheckpoint(opts.resumeRunId);
+    if (!prevCheckpoint) {
+      throw new Error(
+        `${profile.label}: --resume ${opts.resumeRunId}: no checkpoint found or schema mismatch. ` +
+          `Run without --resume to start fresh.`,
+      );
+    }
+    for (const cc of prevCheckpoint.completed_crosses) {
+      completedFromDisk.set(`${cc.close_slug}__${cc.far_slug}`, cc);
+    }
+    stderr(`[${profile.label}] resuming run ${runId}: ${completedFromDisk.size}/${crosses.length} crosses already done\n`);
+  }
+
+  // Live checkpoint state — appended to as crosses succeed/fail; flushed
+  // every 5 crosses.
+  const liveCheckpoint: BrainstormCheckpoint = {
+    schema_version: 2,
+    run_id: runId,
+    question: opts.question,
+    profile_label: profile.label,
+    started_at: prevCheckpoint?.started_at ?? new Date().toISOString(),
+    completed_crosses: prevCheckpoint?.completed_crosses.slice() ?? [],
+    failed_crosses: prevCheckpoint?.failed_crosses.slice() ?? [],
+    judge_done: false,
+  };
+  let crossesSinceFlush = 0;
+  const flush = (): void => {
+    saveCheckpoint(liveCheckpoint);
+    crossesSinceFlush = 0;
+  };
+
   let totalUsage = { input_tokens: 0, output_tokens: 0 };
   let crossModel = modelStr;
 
   // Parallelize chat calls bounded at DEFAULT_PARALLELISM.
   const rawIdeasByCross = await mapWithConcurrency(crosses, DEFAULT_PARALLELISM, async (cross) => {
+    // Skip crosses already completed in a prior run (TX4 single-rule).
+    const key = crossKey(cross);
+    if (completedFromDisk.has(key)) {
+      const fromDisk = completedFromDisk.get(key)!;
+      return fromDisk.ideas.map((idea) => ({
+        text: idea.text,
+        close_slug: cross.close.slug,
+        far_slug: cross.far.slug,
+        distance_score: cross.far.distance_score,
+      }));
+    }
+
     const { system, user } = buildCrossPrompt({
       profile,
       question: opts.question,
@@ -518,19 +717,68 @@ export async function runBrainstorm(
       totalUsage.input_tokens += result.usage.input_tokens;
       totalUsage.output_tokens += result.usage.output_tokens;
       crossModel = result.model;
+      // Mid-run cost guard: if running spend already exceeds the projected
+      // ceiling or the strict-budget multiplier, abort the remaining crosses.
+      const runningPricing = ANTHROPIC_PRICING[result.model] ?? { input: 3, output: 15 };
+      const runningUsd =
+        (totalUsage.input_tokens / 1_000_000) * runningPricing.input +
+        (totalUsage.output_tokens / 1_000_000) * runningPricing.output;
+      if (runningUsd > maxCostUsd) {
+        throw new BudgetExhausted(
+          `${profile.label}: running cost ${fmtUsd(runningUsd)} exceeded --max-cost ${fmtUsd(maxCostUsd)} mid-run; aborting remaining crosses`,
+          { reason: 'cost', spent: runningUsd, cap: maxCostUsd },
+        );
+      }
+      if (opts.strictBudget === true && runningUsd > estimate * 5) {
+        throw new BudgetExhausted(
+          `${profile.label}: running cost ${fmtUsd(runningUsd)} exceeded 5× estimate (${fmtUsd(estimate)}) under --strict-budget`,
+          { reason: 'cost', spent: runningUsd, cap: estimate * 5 },
+        );
+      }
       const parsed = parseIdeaResponse(result.text);
-      return parsed.slice(0, profile.ideas_per_cross).map((text) => ({
+      const sliced = parsed.slice(0, profile.ideas_per_cross);
+      // TX3: persist FULL idea bodies, not just counts. Resume reconstructs
+      // the BrainstormResult by reading these back from disk.
+      const crossId = `${cross.close.slug}__${cross.far.slug}`;
+      liveCheckpoint.completed_crosses.push({
+        close_slug: cross.close.slug,
+        far_slug: cross.far.slug,
+        cross_id: crossId,
+        ideas: sliced.map((text) => ({ text, cross_id: crossId })),
+      });
+      crossesSinceFlush++;
+      if (crossesSinceFlush >= 5) flush();
+      return sliced.map((text) => ({
         text,
         close_slug: cross.close.slug,
         far_slug: cross.far.slug,
         distance_score: cross.far.distance_score,
       }));
     } catch (err) {
+      // Q2: typed-error check, replaces PR #1234's brittle string-match
+      // (`msg.includes('--max-cost')`). Cost-cap errors propagate; other
+      // per-cross errors are warned + swallowed so one bad cross doesn't
+      // void the rest of the run.
+      if (err instanceof BudgetExhausted) {
+        // Flush checkpoint before propagating so any completed crosses
+        // are persisted for --resume.
+        flush();
+        throw err;
+      }
       const msg = err instanceof Error ? err.message : String(err);
       stderr(`[${profile.label}] WARN: cross [${cross.close.slug}] × [${cross.far.slug}] failed: ${msg}\n`);
+      liveCheckpoint.failed_crosses.push({
+        close_slug: cross.close.slug,
+        far_slug: cross.far.slug,
+        error: msg,
+      });
+      crossesSinceFlush++;
+      if (crossesSinceFlush >= 5) flush();
       return [];
     }
   });
+  // Final flush so the on-disk file reflects the post-loop state.
+  flush();
 
   // Flatten + assign stable ids.
   const allRawIdeas: Array<{ id: string; text: string; close_slug: string; far_slug: string; distance_score: number }> = [];
@@ -559,10 +807,12 @@ export async function runBrainstorm(
       far_slug: i.far_slug,
     }));
     const judgeResult = await runJudge(profile.judge_config, judgeInput, {
-      modelOverride: opts.modelOverride,
+      modelOverride: opts.judgeModel ?? opts.modelOverride,
       chatFn: opts.chatFn,
       activeBiasTags: activeBiasTags ?? undefined,
       abortSignal: opts.abortSignal,
+      maxIdeasPerCall: opts.maxIdeasPerJudgeCall,
+      stderrWrite: stderr,
     });
     for (const idea of judgeResult.ideas) {
       judgedById.set(idea.id, idea);
@@ -599,6 +849,21 @@ export async function runBrainstorm(
   const actual = (totalIn / 1_000_000) * pricing.input + (totalOut / 1_000_000) * pricing.output;
   stderr(`[${profile.label}] actual cost: ${fmtUsd(actual)} (estimated ${fmtUsd(estimate)}) — in=${totalIn} out=${totalOut} tokens\n`);
 
+  // TX4: surface --resume hint when any cross failed during this run.
+  // The user can re-run with `--resume <run_id>` and we'll retry only
+  // the missing crosses (failed_crosses + never-attempted).
+  if (liveCheckpoint.failed_crosses.length > 0) {
+    stderr(
+      `[${profile.label}] ${liveCheckpoint.failed_crosses.length} cross(es) failed. Resume with: gbrain ${profile.label} --resume ${runId}\n`,
+    );
+  } else {
+    // Clean completion — every cross succeeded. Clear the checkpoint so we
+    // don't accumulate noise + so a stale run_id doesn't auto-resume.
+    liveCheckpoint.judge_done = true;
+    saveCheckpoint(liveCheckpoint);
+    clearCheckpoint(runId);
+  }
+
   return {
     profile_label: profile.label,
     question: opts.question,
diff --git a/src/core/budget/budget-tracker.ts b/src/core/budget/budget-tracker.ts
new file mode 100644
index 000000000..929351f6a
--- /dev/null
+++ b/src/core/budget/budget-tracker.ts
@@ -0,0 +1,431 @@
+/**
+ * v0.37.x — unified BudgetTracker for every gateway-routed LLM call.
+ *
+ * Replaces the per-command budget code (brainstorm orchestrator inline
+ * BudgetExhausted, cycle/budget-meter, eval-contradictions cost-prompt +
+ * cost-tracker). One class, one error type, one audit JSONL schema.
+ *
+ * Compose via `withBudgetTracker(tracker, fn)` from `src/core/ai/gateway.ts`
+ * (Phase 2 / TX5). Once inside the scope, every `gateway.chat / embed /
+ * rerank` call auto-records cost via AsyncLocalStorage — no per-call
+ * injection seam needed.
+ *
+ * Contracts (locked by /plan-eng-review):
+ *   - TX1: `record()` THROWS BudgetExhausted(reason:'cost') when cumulative
+ *     spend > maxCostUsd. The cap is a real ceiling, not a suggestion.
+ *   - TX2: When `maxCostUsd` is set AND the model is not in the pricing
+ *     maps, `reserve()` HARD-FAILS with BudgetExhausted(reason:'no_pricing').
+ *     When `maxCostUsd` is unset, legacy warn-once behavior is preserved.
+ *   - A3 amended: `record()` is best called from try/finally on every
+ *     gateway site. When the call threw without usage, callers feed
+ *     `extractUsageFromError(err, fallback)` — fallback is the pessimistic
+ *     ceiling (`maxOutputTokens` worth of output), not the optimistic
+ *     pre-call estimate. Better to overcount on failure than undercount.
+ *
+ * Audit JSONL lives at `~/.gbrain/audit/budget-YYYY-Www.jsonl` (ISO-week
+ * rotation, same shape as shell-audit / phantom-audit). Every line carries
+ * `schema_version: 1` so consumers can detect future renames. Writes are
+ * best-effort: a disk-full audit never gates the run.
+ */
+
+import { mkdirSync, appendFileSync } from 'node:fs';
+import { dirname } from 'node:path';
+import { gbrainPath } from '../config.ts';
+import { ANTHROPIC_PRICING, type ModelPricing } from '../anthropic-pricing.ts';
+import { EMBEDDING_PRICING, lookupEmbeddingPrice } from '../embedding-pricing.ts';
+import { isoWeekFilename, resolveAuditDir } from '../audit-week-file.ts';
+
+export type BudgetKind = 'chat' | 'embed' | 'rerank';
+
+export type BudgetReason = 'cost' | 'runtime' | 'no_pricing';
+
+export interface BudgetEstimate {
+  modelId: string;
+  estimatedInputTokens: number;
+  maxOutputTokens: number;
+  kind: BudgetKind;
+  /** Optional label for telemetry (e.g. 'brainstorm.cross', 'dream.synthesize'). */
+  label?: string;
+}
+
+export interface BudgetActualUsage {
+  modelId: string;
+  inputTokens: number;
+  outputTokens?: number;
+  /** For embeddings: dimension count, surfaces in audit only. */
+  embeddingDims?: number;
+  /** Optional label echo for the audit row. */
+  label?: string;
+}
+
+export interface BudgetSnapshot {
+  cumulativeCostUsd: number;
+  startedAt: number;
+  elapsedMs: number;
+  maxCostUsd?: number;
+  maxRuntimeMs?: number;
+  callsRecorded: number;
+}
+
+export interface BudgetTrackerOpts {
+  /** USD cap. When undefined, cost gate disabled; pricing misses warn-once. */
+  maxCostUsd?: number;
+  /** Wall-clock cap in milliseconds. When undefined, runtime gate disabled. */
+  maxRuntimeMs?: number;
+  /** Phase/command label used in audit rows. */
+  label: string;
+  /** Override the audit file path (tests + custom installers). */
+  auditPath?: string;
+}
+
+export class BudgetExhausted extends Error {
+  readonly tag = 'BUDGET_EXHAUSTED' as const;
+  reason: BudgetReason;
+  spent: number;
+  cap: number;
+  modelId?: string;
+  constructor(
+    message: string,
+    opts: { reason: BudgetReason; spent: number; cap: number; modelId?: string },
+  ) {
+    super(message);
+    this.name = 'BudgetExhausted';
+    this.reason = opts.reason;
+    this.spent = opts.spent;
+    this.cap = opts.cap;
+    this.modelId = opts.modelId;
+  }
+}
+
+/** One-process memo: warn-once on missing pricing per (modelId, kind). */
+const _unpricedWarnings = new Set<string>();
+
+/** Test seam: reset warn-once memo so unit tests can re-trigger the path. */
+export function _resetBudgetTrackerWarningsForTest(): void {
+  _unpricedWarnings.clear();
+}
+
+/**
+ * Best-effort JSONL audit append. Failure never gates the run; matches the
+ * shell-audit / phantom-audit posture.
+ */
+function appendAuditLine(path: string, entry: object): void {
+  try {
+    mkdirSync(dirname(path), { recursive: true });
+    appendFileSync(path, JSON.stringify(entry) + '\n');
+  } catch {
+    // swallow — audit failures must not block the LLM call
+  }
+}
+
+function defaultAuditPath(): string {
+  const dir = resolveAuditDir();
+  return `${dir}/${isoWeekFilename('budget')}`;
+}
+
+/**
+ * Look up `modelId` in the chat or embedding pricing maps. Returns a
+ * per-1M-token price tuple, or null when unknown.
+ *
+ * Strategy:
+ *   - Chat: try the bare model id in ANTHROPIC_PRICING first (legacy keys
+ *     are bare claude-* ids). Fall back to the provider-prefixed key.
+ *   - Embed: lookupEmbeddingPrice already handles the provider:model form,
+ *     defaulting to openai when the colon is missing.
+ *   - Rerank: not priced today — treat as a chat call with no output cost
+ *     when caller passes ANTHROPIC_PRICING-shaped id, else unknown.
+ */
+function lookupPricing(modelId: string, kind: BudgetKind): ModelPricing | null {
+  if (kind === 'embed') {
+    const hit = lookupEmbeddingPrice(modelId);
+    if (hit.kind === 'known') {
+      return { input: hit.pricePerMTok, output: 0 };
+    }
+    return null;
+  }
+  // chat or rerank: try bare key first, then provider:model
+  const bare = ANTHROPIC_PRICING[modelId];
+  if (bare) return bare;
+  const [, modelTail] = modelId.includes(':') ? modelId.split(':', 2) : [null, modelId];
+  if (modelTail) {
+    const tailHit = ANTHROPIC_PRICING[modelTail];
+    if (tailHit) return tailHit;
+  }
+  return null;
+}
+
+function costForUsage(modelId: string, inputTokens: number, outputTokens: number, kind: BudgetKind): number | null {
+  const p = lookupPricing(modelId, kind);
+  if (!p) return null;
+  return (inputTokens / 1_000_000) * p.input + (outputTokens / 1_000_000) * p.output;
+}
+
+export class BudgetTracker {
+  private cumulativeUsd = 0;
+  private callsRecorded = 0;
+  private readonly startedAt: number;
+  private readonly auditPath: string;
+  private readonly onExhaustedCbs: Array<() => void> = [];
+  private exhaustedFired = false;
+
+  constructor(private readonly opts: BudgetTrackerOpts) {
+    this.startedAt = Date.now();
+    this.auditPath = opts.auditPath ?? defaultAuditPath();
+  }
+
+  /** Public read access. */
+  get totalSpent(): number {
+    return this.cumulativeUsd;
+  }
+
+  /**
+   * Register a synchronous callback to fire the first time the tracker
+   * throws BudgetExhausted (from reserve OR record). Fires once. Useful for
+   * persisting checkpoint state before the throw propagates. The callback
+   * MUST be synchronous; async work (fs writes are fine via writeFileSync)
+   * goes inside the callback body.
+   */
+  onExhausted(cb: () => void): void {
+    this.onExhaustedCbs.push(cb);
+  }
+
+  /**
+   * Project a planned LLM call against the cap. Throws BudgetExhausted
+   * BEFORE any provider call when:
+   *   - cumulative + projected > maxCostUsd (reason: 'cost')
+   *   - wall-clock > maxRuntimeMs (reason: 'runtime')
+   *   - maxCostUsd set AND pricing missing (reason: 'no_pricing') -- TX2
+   *
+   * When maxCostUsd is unset, missing pricing warns-once but does not throw
+   * (legacy behavior preserved for non-priced providers).
+   */
+  reserve(estimate: BudgetEstimate): void {
+    this.assertRuntime(estimate.modelId);
+
+    const projected = costForUsage(
+      estimate.modelId,
+      estimate.estimatedInputTokens,
+      estimate.maxOutputTokens,
+      estimate.kind,
+    );
+
+    if (projected === null) {
+      if (this.opts.maxCostUsd !== undefined) {
+        // TX2: hard-fail when a cap is set but pricing is missing — without
+        // pricing we can't enforce the cap, and silently ignoring it would
+        // void the contract.
+        const msg = `${this.opts.label}: no pricing entry for model "${estimate.modelId}" (kind=${estimate.kind}). ` +
+          `Add it to src/core/${estimate.kind === 'embed' ? 'embedding-pricing.ts' : 'anthropic-pricing.ts'} or drop --max-cost.`;
+        this.fireExhausted();
+        throw new BudgetExhausted(msg, {
+          reason: 'no_pricing',
+          spent: this.cumulativeUsd,
+          cap: this.opts.maxCostUsd,
+          modelId: estimate.modelId,
+        });
+      }
+      // Legacy warn-once path — cap unset.
+      const memoKey = `${estimate.modelId}:${estimate.kind}`;
+      if (!_unpricedWarnings.has(memoKey)) {
+        _unpricedWarnings.add(memoKey);
+        process.stderr.write(
+          `[budget] BUDGET_TRACKER_NO_PRICING: model "${estimate.modelId}" (kind=${estimate.kind}) not in pricing maps. ` +
+            `Cost gate disabled for this call.\n`,
+        );
+      }
+      appendAuditLine(this.auditPath, {
+        schema_version: 1,
+        ts: new Date().toISOString(),
+        event: 'reserve_unpriced',
+        label: this.opts.label,
+        kind: estimate.kind,
+        model: estimate.modelId,
+        sub_label: estimate.label,
+        estimated_input_tokens: estimate.estimatedInputTokens,
+        max_output_tokens: estimate.maxOutputTokens,
+      });
+      return;
+    }
+
+    if (this.opts.maxCostUsd !== undefined) {
+      const after = this.cumulativeUsd + projected;
+      if (after > this.opts.maxCostUsd) {
+        appendAuditLine(this.auditPath, {
+          schema_version: 1,
+          ts: new Date().toISOString(),
+          event: 'reserve_denied',
+          label: this.opts.label,
+          kind: estimate.kind,
+          model: estimate.modelId,
+          sub_label: estimate.label,
+          projected_cost_usd: projected,
+          cumulative_cost_usd: this.cumulativeUsd,
+          max_cost_usd: this.opts.maxCostUsd,
+        });
+        this.fireExhausted();
+        throw new BudgetExhausted(
+          `${this.opts.label}: projected cost $${after.toFixed(4)} exceeds --max-cost $${this.opts.maxCostUsd.toFixed(2)} ` +
+            `(cumulative $${this.cumulativeUsd.toFixed(4)} + this call $${projected.toFixed(4)})`,
+          { reason: 'cost', spent: this.cumulativeUsd, cap: this.opts.maxCostUsd, modelId: estimate.modelId },
+        );
+      }
+    }
+
+    appendAuditLine(this.auditPath, {
+      schema_version: 1,
+      ts: new Date().toISOString(),
+      event: 'reserve',
+      label: this.opts.label,
+      kind: estimate.kind,
+      model: estimate.modelId,
+      sub_label: estimate.label,
+      projected_cost_usd: projected,
+      cumulative_cost_usd: this.cumulativeUsd,
+      max_cost_usd: this.opts.maxCostUsd ?? null,
+    });
+  }
+
+  /**
+   * Record the actual usage after the provider returned (or threw). Updates
+   * cumulative spend. Throws BudgetExhausted(reason:'cost') AFTER the update
+   * when cumulative > maxCostUsd (TX1): a single underestimated call can
+   * blow past the cap and the cap must remain a real ceiling.
+   *
+   * `outputTokens` defaults to 0 (embed/rerank). `embeddingDims` is audit-
+   * only metadata.
+   */
+  record(actual: BudgetActualUsage & { kind?: BudgetKind }): void {
+    this.callsRecorded++;
+    const kind: BudgetKind = actual.kind ?? 'chat';
+    const cost = costForUsage(actual.modelId, actual.inputTokens, actual.outputTokens ?? 0, kind);
+
+    if (cost === null) {
+      // Unpriced model: record audit but skip cumulative math. Cap (if set)
+      // already rejected this call at reserve(); a record() here means the
+      // unpriced warn-once path let it through (cap unset).
+      appendAuditLine(this.auditPath, {
+        schema_version: 1,
+        ts: new Date().toISOString(),
+        event: 'record_unpriced',
+        label: this.opts.label,
+        kind,
+        model: actual.modelId,
+        sub_label: actual.label,
+        input_tokens: actual.inputTokens,
+        output_tokens: actual.outputTokens ?? 0,
+        embedding_dims: actual.embeddingDims ?? null,
+      });
+      return;
+    }
+
+    this.cumulativeUsd += cost;
+    appendAuditLine(this.auditPath, {
+      schema_version: 1,
+      ts: new Date().toISOString(),
+      event: 'record',
+      label: this.opts.label,
+      kind,
+      model: actual.modelId,
+      sub_label: actual.label,
+      input_tokens: actual.inputTokens,
+      output_tokens: actual.outputTokens ?? 0,
+      embedding_dims: actual.embeddingDims ?? null,
+      actual_cost_usd: cost,
+      cumulative_cost_usd: this.cumulativeUsd,
+      max_cost_usd: this.opts.maxCostUsd ?? null,
+    });
+
+    if (this.opts.maxCostUsd !== undefined && this.cumulativeUsd > this.opts.maxCostUsd) {
+      // TX1: hard-throw — a single under-estimated call exceeded the cap.
+      this.fireExhausted();
+      throw new BudgetExhausted(
+        `${this.opts.label}: cumulative cost $${this.cumulativeUsd.toFixed(4)} exceeded --max-cost $${this.opts.maxCostUsd.toFixed(2)} after recording ${kind} call to ${actual.modelId}`,
+        { reason: 'cost', spent: this.cumulativeUsd, cap: this.opts.maxCostUsd, modelId: actual.modelId },
+      );
+    }
+  }
+
+  snapshot(): BudgetSnapshot {
+    return {
+      cumulativeCostUsd: this.cumulativeUsd,
+      startedAt: this.startedAt,
+      elapsedMs: Date.now() - this.startedAt,
+      maxCostUsd: this.opts.maxCostUsd,
+      maxRuntimeMs: this.opts.maxRuntimeMs,
+      callsRecorded: this.callsRecorded,
+    };
+  }
+
+  /** Internal helper: throw BudgetExhausted(reason:'runtime') when the wall-clock cap fires. */
+  private assertRuntime(modelId: string): void {
+    if (this.opts.maxRuntimeMs === undefined) return;
+    const elapsed = Date.now() - this.startedAt;
+    if (elapsed > this.opts.maxRuntimeMs) {
+      appendAuditLine(this.auditPath, {
+        schema_version: 1,
+        ts: new Date().toISOString(),
+        event: 'runtime_denied',
+        label: this.opts.label,
+        elapsed_ms: elapsed,
+        max_runtime_ms: this.opts.maxRuntimeMs,
+        model: modelId,
+      });
+      this.fireExhausted();
+      throw new BudgetExhausted(
+        `${this.opts.label}: wall-clock ${(elapsed / 1000).toFixed(1)}s exceeded --max-runtime ${(this.opts.maxRuntimeMs / 1000).toFixed(1)}s`,
+        { reason: 'runtime', spent: elapsed, cap: this.opts.maxRuntimeMs, modelId },
+      );
+    }
+  }
+
+  private fireExhausted(): void {
+    if (this.exhaustedFired) return;
+    this.exhaustedFired = true;
+    for (const cb of this.onExhaustedCbs) {
+      try {
+        cb();
+      } catch (err) {
+        process.stderr.write(`[budget] onExhausted callback threw: ${String(err)}\n`);
+      }
+    }
+  }
+}
+
+/**
+ * Pull usage out of an SDK error envelope. Common providers attach `usage`
+ * either at the top level (Anthropic) or under `response.usage` (OpenAI).
+ * Returns the fallback (pessimistic ceiling) when no usage can be found —
+ * NOT the conservative pre-call estimate (A3 amended). Callers should pass
+ * `{ inputTokens: estimate.estimatedInputTokens, outputTokens: estimate.maxOutputTokens }`
+ * so the worst-case budget is consumed on failure.
+ */
+export function extractUsageFromError(
+  err: unknown,
+  fallback: { inputTokens: number; outputTokens: number },
+): { inputTokens: number; outputTokens: number } {
+  if (err && typeof err === 'object') {
+    const top = (err as { usage?: unknown }).usage;
+    const nested = (err as { response?: { usage?: unknown } }).response?.usage;
+    const candidate = (top && typeof top === 'object' ? top : nested && typeof nested === 'object' ? nested : null) as
+      | { input_tokens?: number; output_tokens?: number; inputTokens?: number; outputTokens?: number }
+      | null;
+    if (candidate) {
+      const inputTokens = numericOrNull(candidate.input_tokens ?? candidate.inputTokens);
+      const outputTokens = numericOrNull(candidate.output_tokens ?? candidate.outputTokens);
+      if (inputTokens !== null || outputTokens !== null) {
+        return {
+          inputTokens: inputTokens ?? fallback.inputTokens,
+          outputTokens: outputTokens ?? fallback.outputTokens,
+        };
+      }
+    }
+  }
+  return { inputTokens: fallback.inputTokens, outputTokens: fallback.outputTokens };
+}
+
+function numericOrNull(v: unknown): number | null {
+  return typeof v === 'number' && Number.isFinite(v) ? v : null;
+}
+
+/** Re-export the pricing maps for introspection / test setup. */
+export { ANTHROPIC_PRICING, EMBEDDING_PRICING };
diff --git a/src/core/cycle.ts b/src/core/cycle.ts
index 8593199b6..da46ce8fe 100644
--- a/src/core/cycle.ts
+++ b/src/core/cycle.ts
@@ -978,13 +978,25 @@ async function runPhasePurge(engine: BrainEngine, dryRun: boolean): Promise<Phas
     } catch {
       // Non-fatal: op_checkpoints table may not exist yet on pre-v67 brains.
     }
+    // v0.37.x — TX3 / A5: GC stale brainstorm checkpoints (filesystem-side).
+    // 7-day mtime window mirrors op_checkpoints. Wrapped in try/catch
+    // because the brainstorm dir may not exist on a brain that's never
+    // run a brainstorm.
+    let purgedBrainstormCheckpoints = 0;
+    try {
+      const { gcStaleCheckpoints } = await import('./brainstorm/checkpoint.ts');
+      purgedBrainstormCheckpoints = gcStaleCheckpoints(7);
+    } catch {
+      // Non-fatal.
+    }
     return {
       phase: 'purge',
       status: 'ok',
       duration_ms: 0,
       summary:
         `purged ${purgedSources.length} source(s), ${purgedPages.count} page(s), ` +
-        `${purgedClones.count} orphan clone temp dir(s), and ${purgedCheckpoints} stale op_checkpoint(s)`,
+        `${purgedClones.count} orphan clone temp dir(s), ${purgedCheckpoints} stale op_checkpoint(s), ` +
+        `and ${purgedBrainstormCheckpoints} stale brainstorm checkpoint(s)`,
       details: {
         purged_sources_count: purgedSources.length,
         purged_pages_count: purgedPages.count,
@@ -993,6 +1005,7 @@ async function runPhasePurge(engine: BrainEngine, dryRun: boolean): Promise<Phas
         purged_sources: purgedSources,
         purged_page_slugs: purgedPages.slugs,
         purged_checkpoints_count: purgedCheckpoints,
+        purged_brainstorm_checkpoints_count: purgedBrainstormCheckpoints,
       },
     };
   } catch (e) {
diff --git a/src/core/cycle/budget-meter.ts b/src/core/cycle/budget-meter.ts
index 3c939f927..446eb3ecb 100644
--- a/src/core/cycle/budget-meter.ts
+++ b/src/core/cycle/budget-meter.ts
@@ -1,19 +1,28 @@
 /**
  * v0.28: cumulative cost meter for dream-cycle phases (auto-think + drift).
  *
+ * v0.37.x: kept as a thin adapter over `BudgetTracker` semantics. The public
+ * class shape (`BudgetMeter`, `SubmitEstimate`, `BudgetCheckResult`) is
+ * preserved so every existing dream-cycle call site keeps working. The
+ * audit JSONL grew a `schema_version: 1` field on every line (A2 amended:
+ * schema-stable, not byte-stable — reorderings are tolerated, field
+ * renames are breaking). `test/fixtures/dream-budget-schema-v1.jsonl`
+ * pins the documented field set.
+ *
  * Per Codex P1 #10: each subagent submit estimates max-cost from
  * `model + max_output_tokens`, accumulates per-cycle, refuses next submit
  * if cumulative > budget. Non-Anthropic models bypass the gate with a
  * `BUDGET_METER_NO_PRICING` warn (once per process).
  *
  * Ledger lives at `~/.gbrain/audit/dream-budget-YYYY-Www.jsonl` (ISO-week
- * rotation, same pattern as shell-audit). Each line is one submit's cost
+ * rotation, same pattern as shell-audit; filename math now goes through
+ * `src/core/audit-week-file.ts` per T4). Each line is one submit's cost
  * estimate + actual usage when reported back.
  */
 
 import { mkdirSync, appendFileSync } from 'node:fs';
-import { dirname } from 'node:path';
-import { gbrainPath } from '../config.ts';
+import { dirname, join } from 'node:path';
+import { isoWeekFilename, resolveAuditDir } from '../audit-week-file.ts';
 import { estimateMaxCostUsd, ANTHROPIC_PRICING } from '../anthropic-pricing.ts';
 
 export interface BudgetMeterOpts {
@@ -51,15 +60,7 @@ const _unpricedWarnings = new Set<string>();
 
 function auditFilePath(override?: string): string {
   if (override) return override;
-  // ISO week format: YYYY-Www (2026-W18)
-  const now = new Date();
-  const year = now.getUTCFullYear();
-  // ISO week: Thursday's week. Approximated for filename only.
-  const oneJan = new Date(Date.UTC(year, 0, 1));
-  const diffDays = Math.floor((now.getTime() - oneJan.getTime()) / 86_400_000);
-  const week = Math.ceil((diffDays + oneJan.getUTCDay() + 1) / 7);
-  const weekStr = String(week).padStart(2, '0');
-  return gbrainPath(`audit/dream-budget-${year}-W${weekStr}.jsonl`);
+  return join(resolveAuditDir(), isoWeekFilename('dream-budget'));
 }
 
 function writeLedgerLine(path: string, entry: object): void {
@@ -99,6 +100,7 @@ export class BudgetMeter {
         );
       }
       writeLedgerLine(this.auditPath, {
+        schema_version: 1,
         phase: this.opts.phase,
         ts: new Date().toISOString(),
         event: 'submit_unpriced',
@@ -120,6 +122,7 @@ export class BudgetMeter {
     if (this.opts.budgetUsd <= 0) {
       this.cumulativeUsd += cost;
       writeLedgerLine(this.auditPath, {
+        schema_version: 1,
         phase: this.opts.phase,
         ts: new Date().toISOString(),
         event: 'submit',
@@ -135,6 +138,7 @@ export class BudgetMeter {
     const projected = this.cumulativeUsd + cost;
     if (projected > this.opts.budgetUsd) {
       writeLedgerLine(this.auditPath, {
+        schema_version: 1,
         phase: this.opts.phase,
         ts: new Date().toISOString(),
         event: 'submit_denied',
@@ -155,6 +159,7 @@ export class BudgetMeter {
 
     this.cumulativeUsd += cost;
     writeLedgerLine(this.auditPath, {
+      schema_version: 1,
       phase: this.opts.phase,
       ts: new Date().toISOString(),
       event: 'submit',
diff --git a/src/core/diarize/payload-fitter.ts b/src/core/diarize/payload-fitter.ts
new file mode 100644
index 000000000..4d58e5fd2
--- /dev/null
+++ b/src/core/diarize/payload-fitter.ts
@@ -0,0 +1,268 @@
+/**
+ * v0.37.x — payload-fitter (P6) with two strategies + a quality gate.
+ *
+ * Generic utility for fitting an arbitrarily large list of items into a
+ * downstream caller's per-call token budget.
+ *
+ * Strategies (Q3 + codex finding #4):
+ *   - 'batch'     deterministic token-budgeted chunking. The caller
+ *                 receives a flat fit list shaped like the input; the
+ *                 chunking decision is left to the caller (e.g. the
+ *                 brainstorm judge concatenates results across batches).
+ *                 No LLM calls.
+ *   - 'summarize' embed-cluster (k = ceil(items/4)), Haiku-summarize each
+ *                 cluster, return the fitted payload (summary nodes
+ *                 instead of every original item). Composes the active
+ *                 BudgetTracker via the gateway's AsyncLocalStorage scope
+ *                 (T3) — every Haiku call shows up in the cost ledger.
+ *                 Promise.allSettled at parallelism=4 (Perf1) so a single
+ *                 cluster-failure does not stall the whole pass.
+ *
+ * Quality gate (codex outside-voice finding #4):
+ *   When the summarize strategy returns less than `min_success_ratio`
+ *   (default 0.75) of attempted clusters, the result is flagged
+ *   `degraded: true` and the caller decides whether to surface a partial
+ *   result or abort. Brainstorm aborts on degraded; defaults can be
+ *   relaxed per-caller.
+ */
+
+import type { ChatOpts, ChatResult } from '../ai/gateway.ts';
+
+/** Local ChatFn shape — kept here so payload-fitter doesn't depend on
+ *  src/core/brainstorm/judges.ts (which is the canonical owner of the
+ *  ChatFn alias today). */
+type ChatFn = (opts: ChatOpts) => Promise<ChatResult>;
+
+export type FitStrategy = 'batch' | 'summarize';
+
+export interface FitOptions<T> {
+  items: T[];
+  strategy: FitStrategy;
+  /** Hard per-call token budget. 'batch' chunks under this; 'summarize'
+   *  shapes its k-clusters so each cluster fits this budget. */
+  maxTokensPerCall: number;
+  /** Token estimator. Caller-supplied so payload-fitter is generic. */
+  estimateTokens: (item: T) => number;
+  // ---- summarize-only ----
+  /** Optional embed function (only used by 'summarize'). Caller supplies
+   *  the active gateway.embed binding. */
+  embedFn?: (text: string) => Promise<Float32Array>;
+  /** Optional chat function for summarization. Caller supplies the
+   *  active gateway.chat binding. */
+  chatFn?: ChatFn;
+  /** Summarize-only: convert an item to text for embed + summarize. */
+  itemToText?: (item: T) => string;
+  /** Summarize-only: convert a Haiku summary string back into an item-
+   *  shaped fitted node. Caller-supplied so the fitted list has the
+   *  caller's own type. */
+  summaryToItem?: (summary: string, cluster: T[]) => T;
+  /** Summarize parallelism. Default 4 per Perf1. */
+  parallelism?: number;
+  /** Quality gate threshold. Default 0.75. When the success ratio drops
+   *  below this, result.degraded === true. */
+  min_success_ratio?: number;
+  /** Override the summarization model (e.g. 'anthropic:claude-haiku-4-5').
+   *  Default falls back to the gateway's configured chat model. */
+  summarizeModel?: string;
+}
+
+export interface FitResult<T> {
+  fitted: T[];
+  strategy: FitStrategy;
+  /** Count of clusters that failed (summarize) or 0 (batch). */
+  dropped: number;
+  /** Ratio of successful clusters: 1.0 for batch / clean summarize. */
+  success_ratio: number;
+  /** True when success_ratio < min_success_ratio. */
+  degraded: boolean;
+  /** Total LLM usage rolled up across summarize calls. Undefined for batch. */
+  usage?: ChatResult['usage'];
+}
+
+const DEFAULT_PARALLELISM = 4;
+const DEFAULT_MIN_SUCCESS_RATIO = 0.75;
+
+/**
+ * Public entry point. Dispatches on strategy. Pure typecheck failures
+ * (e.g. summarize without embedFn/chatFn) throw `Error` synchronously so
+ * caller misuse fails loud.
+ */
+export async function fit<T>(opts: FitOptions<T>): Promise<FitResult<T>> {
+  if (opts.strategy === 'batch') {
+    return fitBatch(opts);
+  }
+  if (opts.strategy === 'summarize') {
+    return fitSummarize(opts);
+  }
+  throw new Error(`payload-fitter: unknown strategy "${(opts as { strategy: string }).strategy}"`);
+}
+
+/**
+ * 'batch' strategy: deterministic, token-budgeted chunking. Returns the
+ * original items unchanged (no LLM calls). `dropped` is the count of
+ * items that exceeded the per-call budget all on their own — these are
+ * preserved in `fitted` (caller decides whether to surface a warning)
+ * but they signal a budgeting mismatch the caller should know about.
+ */
+function fitBatch<T>(opts: FitOptions<T>): FitResult<T> {
+  const dropped = opts.items.filter((it) => opts.estimateTokens(it) > opts.maxTokensPerCall).length;
+  return {
+    fitted: opts.items.slice(),
+    strategy: 'batch',
+    dropped,
+    success_ratio: opts.items.length === 0 ? 1.0 : (opts.items.length - dropped) / opts.items.length,
+    degraded: false,
+  };
+}
+
+/**
+ * 'summarize' strategy: embed-cluster then Haiku-summarize each cluster.
+ *
+ *   1. embed every item (caller-supplied embedFn).
+ *   2. cluster into k = ceil(items/4) groups via cheap greedy nearest-
+ *      neighbor on cosine similarity (deterministic; no sklearn).
+ *   3. parallel Haiku-summarize each cluster via Promise.allSettled
+ *      with parallelism `opts.parallelism ?? 4` (Perf1).
+ *   4. drop failed clusters; surface a `degraded: true` flag when the
+ *      success ratio falls below `min_success_ratio`.
+ *
+ * Each Haiku call composes the active BudgetTracker via AsyncLocalStorage
+ * (no per-call injection). On BudgetExhausted the call throws — caller's
+ * outer catch handles persistence.
+ */
+async function fitSummarize<T>(opts: FitOptions<T>): Promise<FitResult<T>> {
+  if (!opts.embedFn || !opts.chatFn || !opts.itemToText || !opts.summaryToItem) {
+    throw new Error(
+      `payload-fitter: strategy='summarize' requires embedFn + chatFn + itemToText + summaryToItem`,
+    );
+  }
+  const minRatio = opts.min_success_ratio ?? DEFAULT_MIN_SUCCESS_RATIO;
+  const parallelism = Math.max(1, opts.parallelism ?? DEFAULT_PARALLELISM);
+
+  if (opts.items.length === 0) {
+    return { fitted: [], strategy: 'summarize', dropped: 0, success_ratio: 1.0, degraded: false };
+  }
+
+  // 1. Embed every item. The gateway.embed call composes the active
+  //    tracker; a budget throw here propagates cleanly.
+  const texts = opts.items.map((it) => opts.itemToText!(it));
+  const embeds: Float32Array[] = [];
+  for (const text of texts) {
+    embeds.push(await opts.embedFn(text));
+  }
+
+  // 2. Greedy clustering. Pick the first un-clustered item as the seed;
+  //    add the (k-1) closest remaining items by cosine. Deterministic
+  //    given the input order. k = ceil(items / 4).
+  const k = Math.max(1, Math.ceil(opts.items.length / 4));
+  const clusterSize = Math.ceil(opts.items.length / k);
+  const claimed = new Set<number>();
+  const clusters: number[][] = [];
+  for (let c = 0; c < k && claimed.size < opts.items.length; c++) {
+    let seedIdx = -1;
+    for (let i = 0; i < opts.items.length; i++) {
+      if (!claimed.has(i)) {
+        seedIdx = i;
+        break;
+      }
+    }
+    if (seedIdx === -1) break;
+    claimed.add(seedIdx);
+    const group = [seedIdx];
+    const seedVec = embeds[seedIdx];
+    // Score remaining un-claimed by similarity to seed; pick closest until cluster is full.
+    const remaining = opts.items
+      .map((_, idx) => idx)
+      .filter((idx) => idx !== seedIdx && !claimed.has(idx))
+      .map((idx) => ({ idx, sim: cosine(seedVec, embeds[idx]) }))
+      .sort((a, b) => b.sim - a.sim);
+    for (const cand of remaining) {
+      if (group.length >= clusterSize) break;
+      claimed.add(cand.idx);
+      group.push(cand.idx);
+    }
+    clusters.push(group);
+  }
+
+  // 3. Parallel summarize via allSettled with bounded concurrency.
+  const fitted: T[] = [];
+  const totalUsage: ChatResult['usage'] = {
+    input_tokens: 0,
+    output_tokens: 0,
+    cache_read_tokens: 0,
+    cache_creation_tokens: 0,
+  };
+  let failed = 0;
+  for (let i = 0; i < clusters.length; i += parallelism) {
+    const wave = clusters.slice(i, i + parallelism);
+    const results = await Promise.allSettled(
+      wave.map((group) => summarizeCluster(group, opts, texts)),
+    );
+    for (let j = 0; j < results.length; j++) {
+      const r = results[j];
+      const group = wave[j];
+      if (r.status === 'fulfilled') {
+        fitted.push(opts.summaryToItem!(r.value.summary, group.map((idx) => opts.items[idx])));
+        totalUsage.input_tokens += r.value.usage.input_tokens;
+        totalUsage.output_tokens += r.value.usage.output_tokens;
+        if (typeof r.value.usage.cache_read_tokens === 'number') {
+          totalUsage.cache_read_tokens =
+            (totalUsage.cache_read_tokens ?? 0) + r.value.usage.cache_read_tokens;
+        }
+        if (typeof r.value.usage.cache_creation_tokens === 'number') {
+          totalUsage.cache_creation_tokens =
+            (totalUsage.cache_creation_tokens ?? 0) + r.value.usage.cache_creation_tokens;
+        }
+      } else {
+        failed++;
+      }
+    }
+  }
+
+  const succeeded = clusters.length - failed;
+  const success_ratio = clusters.length === 0 ? 1.0 : succeeded / clusters.length;
+  const degraded = success_ratio < minRatio;
+  return {
+    fitted,
+    strategy: 'summarize',
+    dropped: failed,
+    success_ratio,
+    degraded,
+    usage: totalUsage,
+  };
+}
+
+interface SummarizeOutcome {
+  summary: string;
+  usage: ChatResult['usage'];
+}
+
+async function summarizeCluster<T>(
+  group: number[],
+  opts: FitOptions<T>,
+  texts: string[],
+): Promise<SummarizeOutcome> {
+  const chat = opts.chatFn!;
+  const lines = group.map((idx) => `- ${texts[idx]}`).join('\n');
+  const prompt = `Summarize the following items in ~3 sentences capturing the load-bearing themes. Do not paraphrase verbatim.\n\n${lines}`;
+  const res = await chat({
+    model: opts.summarizeModel,
+    messages: [{ role: 'user', content: prompt }],
+    maxTokens: 400,
+  });
+  return { summary: res.text.trim(), usage: res.usage };
+}
+
+function cosine(a: Float32Array, b: Float32Array): number {
+  const len = Math.min(a.length, b.length);
+  let dot = 0;
+  let na = 0;
+  let nb = 0;
+  for (let i = 0; i < len; i++) {
+    dot += a[i] * b[i];
+    na += a[i] * a[i];
+    nb += b[i] * b[i];
+  }
+  if (na === 0 || nb === 0) return 0;
+  return dot / (Math.sqrt(na) * Math.sqrt(nb));
+}
diff --git a/src/core/eval-contradictions/runner.ts b/src/core/eval-contradictions/runner.ts
index 8c2873530..7a16728af 100644
--- a/src/core/eval-contradictions/runner.ts
+++ b/src/core/eval-contradictions/runner.ts
@@ -33,6 +33,8 @@ import { JudgeCache } from './cache.ts';
 import { CostTracker, estimateUpperBoundCost } from './cost-tracker.ts';
 import { buildSourceTierBreakdown, classifySlugTier } from './cross-source.ts';
 import { shouldSkipForDateMismatch } from './date-filter.ts';
+import { withBudgetTracker } from '../ai/gateway.ts';
+import { BudgetTracker, BudgetExhausted } from '../budget/budget-tracker.ts';
 import { judgeContradiction, type JudgeInput, type JudgeOutput } from './judge.ts';
 import { JudgeErrorCollector } from './judge-errors.ts';
 import { buildHotPages } from './severity-classify.ts';
@@ -225,6 +227,34 @@ function sortPairs(
  * strings — CLI flag parsing lives in the command file, not here.
  */
 export async function runContradictionProbe(opts: RunnerOpts): Promise<RunnerResult> {
+  // T6: wrap the entire body in withBudgetTracker so every gateway-layer
+  // chat/embed/rerank call (judge, embed-on-query) auto-records via the
+  // AsyncLocalStorage scope from src/core/ai/gateway.ts. The existing
+  // CostTracker stays for the report shape — the new BudgetTracker is a
+  // parallel record-keeper that doesn't enforce a cap on top of the
+  // existing soft ceiling. Public surface (--budget-usd, PreFlightBudgetError)
+  // is byte-identical.
+  const _outerBudgetUsd = opts.budgetUsd ?? 5.0;
+  const _runnerTracker = new BudgetTracker({
+    // Set the cap only when callers passed --budget-usd explicitly; this
+    // keeps the existing soft-ceiling semantics from CostTracker as the
+    // primary enforcement and uses the new tracker for telemetry only.
+    label: 'eval.suspected-contradictions',
+  });
+  try {
+    return await withBudgetTracker(_runnerTracker, () => _runContradictionProbeInner(opts));
+  } catch (err) {
+    // BudgetExhausted from the gateway path should bubble cleanly. With no
+    // cap set, the tracker only records; it doesn't throw, so this path
+    // is reachable only via future opt-in.
+    if (err instanceof BudgetExhausted) {
+      throw err;
+    }
+    throw err;
+  }
+}
+
+async function _runContradictionProbeInner(opts: RunnerOpts): Promise<RunnerResult> {
   const startedAt = Date.now();
   const judgeModel = opts.judgeModel ?? DEFAULT_JUDGE_MODEL;
   const topK = Math.max(1, opts.topK ?? DEFAULT_TOP_K);
diff --git a/src/core/facts/phantom-audit.ts b/src/core/facts/phantom-audit.ts
index 525ccedf3..2365d3490 100644
--- a/src/core/facts/phantom-audit.ts
+++ b/src/core/facts/phantom-audit.ts
@@ -20,7 +20,7 @@
 
 import * as fs from 'node:fs';
 import * as path from 'node:path';
-import { resolveAuditDir } from '../minions/handlers/shell-audit.ts';
+import { isoWeekFilename, resolveAuditDir } from '../audit-week-file.ts';
 
 export type PhantomOutcome =
   | 'redirected'
@@ -41,18 +41,10 @@ export interface PhantomAuditEvent {
   candidates?: Array<{ slug: string; connection_count: number }>;
 }
 
-/** ISO-week-rotated filename: `phantoms-YYYY-Www.jsonl`. */
+/** ISO-week-rotated filename: `phantoms-YYYY-Www.jsonl`. Delegates to
+ *  `src/core/audit-week-file.ts`. */
 export function computePhantomAuditFilename(now: Date = new Date()): string {
-  const d = new Date(Date.UTC(now.getUTCFullYear(), now.getUTCMonth(), now.getUTCDate()));
-  const dayNum = (d.getUTCDay() + 6) % 7;
-  d.setUTCDate(d.getUTCDate() - dayNum + 3);
-  const isoYear = d.getUTCFullYear();
-  const firstThursday = new Date(Date.UTC(isoYear, 0, 4));
-  const firstThursdayDayNum = (firstThursday.getUTCDay() + 6) % 7;
-  firstThursday.setUTCDate(firstThursday.getUTCDate() - firstThursdayDayNum + 3);
-  const weekNum = Math.round((d.getTime() - firstThursday.getTime()) / (7 * 86400000)) + 1;
-  const ww = String(weekNum).padStart(2, '0');
-  return `phantoms-${isoYear}-W${ww}.jsonl`;
+  return isoWeekFilename('phantoms', now);
 }
 
 /**
diff --git a/src/core/migrate.ts b/src/core/migrate.ts
index 4c9ea4fad..28938caba 100644
--- a/src/core/migrate.ts
+++ b/src/core/migrate.ts
@@ -3992,6 +3992,35 @@ export const MIGRATIONS: Migration[] = [
         ADD COLUMN IF NOT EXISTS budget_usd_per_day NUMERIC(10, 2) NULL;
     `,
   },
+  {
+    version: 86,
+    name: 'page_links_view_alias',
+    // v0.39 — pglite-engine.ts and postgres-engine.ts both query a relation
+    // named `page_links` (LEFT JOIN page_links pl ON pl.to_page_id = p.id —
+    // see pglite-engine.ts:896 / postgres-engine.ts:959). The canonical
+    // table has always been `links`. This migration installs a `page_links`
+    // VIEW that aliases the table so brains initialized before the v0.39
+    // schema bundle pick up the alias on upgrade.
+    //
+    // Fresh installs already get the view via the embedded schema bundle.
+    // This migration is idempotent (CREATE OR REPLACE VIEW) so re-running
+    // is safe on either engine.
+    //
+    // Discovered during the brainstorm-cathedral wave (v0.39.0.0) when the
+    // E2E test had to workaround the missing view to exercise the resume
+    // path. Originally numbered v81; renumbered to v86 during merge with
+    // master's v0.38 cathedrals (provenance / subagent / spend / oauth
+    // binding) which claimed v81-v85.
+    //
+    // Narrow projection (id, from_page_id, to_page_id) so the view does not
+    // depend on columns added in later migrations (link_source,
+    // origin_page_id, resolution_type) — keeps ALTER TABLE DROP COLUMN
+    // and the bootstrap forward-reference probes unblocked on legacy brains.
+    sql: `
+      CREATE OR REPLACE VIEW page_links AS
+        SELECT id, from_page_id, to_page_id FROM links;
+    `,
+  },
 ];
 
 export const LATEST_VERSION = MIGRATIONS.length > 0
diff --git a/src/core/minions/handlers/shell-audit.ts b/src/core/minions/handlers/shell-audit.ts
index 06bf35c48..21d2583a4 100644
--- a/src/core/minions/handlers/shell-audit.ts
+++ b/src/core/minions/handlers/shell-audit.ts
@@ -15,7 +15,7 @@
 
 import * as fs from 'node:fs';
 import * as path from 'node:path';
-import { gbrainPath } from '../../config.ts';
+import { isoWeekFilename, resolveAuditDir as _sharedResolveAuditDir } from '../../audit-week-file.ts';
 
 export interface ShellAuditEvent {
   ts: string;
@@ -30,33 +30,18 @@ export interface ShellAuditEvent {
   inherit?: string[];
 }
 
-/** Compute `shell-jobs-YYYY-Www.jsonl` using ISO-8601 week numbering.
- *
- *  Year-boundary edge: 2027-01-01 is ISO week 53 of year 2026, so the correct
- *  filename is `shell-jobs-2026-W53.jsonl`. This matches the ISO week standard
- *  (week containing the first Thursday of the year is W1; week containing Dec 28
- *  is always W52 or W53 of that year).
- */
+/** Compute `shell-jobs-YYYY-Www.jsonl`. Delegates to the shared helper in
+ *  `src/core/audit-week-file.ts` — Year-boundary edges (2027-01-01 → W53 of
+ *  2026, 2020-W53 etc.) are covered by `test/core/audit-week-file.test.ts`. */
 export function computeAuditFilename(now: Date = new Date()): string {
-  // Copy date and move to nearest Thursday (ISO week anchor).
-  const d = new Date(Date.UTC(now.getUTCFullYear(), now.getUTCMonth(), now.getUTCDate()));
-  const dayNum = (d.getUTCDay() + 6) % 7; // Mon=0, Sun=6
-  d.setUTCDate(d.getUTCDate() - dayNum + 3); // shift to Thursday
-  const isoYear = d.getUTCFullYear();
-  const firstThursday = new Date(Date.UTC(isoYear, 0, 4));
-  const firstThursdayDayNum = (firstThursday.getUTCDay() + 6) % 7;
-  firstThursday.setUTCDate(firstThursday.getUTCDate() - firstThursdayDayNum + 3);
-  const weekNum = Math.round((d.getTime() - firstThursday.getTime()) / (7 * 86400000)) + 1;
-  const ww = String(weekNum).padStart(2, '0');
-  return `shell-jobs-${isoYear}-W${ww}.jsonl`;
+  return isoWeekFilename('shell-jobs', now);
 }
 
 /** Resolve the audit dir. Honors `GBRAIN_AUDIT_DIR` for container/sandbox deployments
- *  where `$HOME` is read-only. Defaults to `~/.gbrain/audit/`. */
+ *  where `$HOME` is read-only. Defaults to `~/.gbrain/audit/`. Delegates to the
+ *  shared helper. */
 export function resolveAuditDir(): string {
-  const override = process.env.GBRAIN_AUDIT_DIR;
-  if (override && override.trim().length > 0) return override;
-  return gbrainPath('audit');
+  return _sharedResolveAuditDir();
 }
 
 export function logShellSubmission(event: Omit<ShellAuditEvent, 'ts'>): void {
diff --git a/src/core/minions/handlers/subagent.ts b/src/core/minions/handlers/subagent.ts
index ddd4ef1a0..5dda236aa 100644
--- a/src/core/minions/handlers/subagent.ts
+++ b/src/core/minions/handlers/subagent.ts
@@ -395,6 +395,31 @@ export function makeSubagentHandler(deps: SubagentDeps) {
       }
 
       // 1. Acquire rate lease for the outbound call.
+      //
+      // A1 ORDERING (v0.37.x budget cathedral):
+      //
+      //   +----------------------------------+
+      //   | gateway.chat() inside subagent   |
+      //   +-----+----------------------------+
+      //         |
+      //   1. getCurrentBudgetTracker()?.reserve(...)
+      //         |  (runs via the gateway's AsyncLocalStorage scope,
+      //         |   set by the upstream caller of the subagent.
+      //         |   On BudgetExhausted: throw BEFORE we touch the lease.)
+      //         v
+      //   2. acquireLease(...)  <-- the line below
+      //         |  (only attempted if the budget gate passed)
+      //         v
+      //   3. provider HTTP call
+      //         |
+      //         v
+      //   4. tracker.record(actual usage)
+      //
+      // The handler body intentionally does NOT thread `BudgetTracker`
+      // explicitly. Gateway-layer composition (TX5) handles it. The
+      // ordering is load-bearing: a budget throw must NOT consume a
+      // lease slot, because the lease is the rate-limit pacer for the
+      // entire fleet.
       const lease = await acquireLease(engine, rateLeaseKey, ctx.id, maxConcurrent, { ttlMs: leaseTtlMs });
       if (!lease.acquired) {
         // No slots — treat as a renewable error so the worker re-claims
diff --git a/src/core/pglite-schema.ts b/src/core/pglite-schema.ts
index 1661b1eb8..02f715bc3 100644
--- a/src/core/pglite-schema.ts
+++ b/src/core/pglite-schema.ts
@@ -171,6 +171,20 @@ CREATE INDEX IF NOT EXISTS idx_links_to ON links(to_page_id);
 CREATE INDEX IF NOT EXISTS idx_links_source ON links(link_source);
 CREATE INDEX IF NOT EXISTS idx_links_origin ON links(origin_page_id);
 
+-- v0.38: page_links is the alias the engine queries use (pglite-engine.ts +
+-- postgres-engine.ts both JOIN page_links pl ON pl.to_page_id = p.id). The
+-- alias predates the table-name standardization; the canonical table is
+-- links. Brainstorm domain-bank connection_count tiebreaker and the
+-- doctor link-density score read through this view.
+--
+-- The projection is intentionally NARROW (id, from_page_id, to_page_id only).
+-- Engine queries only reference pl.id (via COUNT(*)) and pl.to_page_id.
+-- Including link_source / origin_page_id / etc. in the view would couple
+-- the alias to columns that didn't exist in pre-v0.13 brains AND would
+-- block ALTER TABLE DROP COLUMN on those columns during upgrades.
+CREATE OR REPLACE VIEW page_links AS
+  SELECT id, from_page_id, to_page_id FROM links;
+
 -- ============================================================
 -- tags
 -- ============================================================
diff --git a/src/core/remediation-checkpoint.ts b/src/core/remediation-checkpoint.ts
new file mode 100644
index 000000000..3f780a5ed
--- /dev/null
+++ b/src/core/remediation-checkpoint.ts
@@ -0,0 +1,123 @@
+/**
+ * v0.37.x — doctor --remediate checkpoint (A4 amended).
+ *
+ * When `gbrain doctor --remediate --max-cost N` blows past the cap mid-run
+ * (BudgetTracker throws BudgetExhausted via the gateway-layer
+ * AsyncLocalStorage), the runRemediate orchestrator persists what's been
+ * completed so the user can continue with `gbrain doctor --remediate --resume`.
+ *
+ * Checkpoint file: `~/.gbrain/remediation/<plan_hash>.json`
+ *   - plan_hash = sha256(JSON.stringify(sorted recommendation ids)).slice(0,16)
+ *   - schema_version: 1
+ *
+ * Best-effort write: a disk-full checkpoint never blocks the throw; we'd
+ * rather surface the BudgetExhausted than swallow it because the audit
+ * sidecar failed.
+ */
+
+import { mkdirSync, writeFileSync, readFileSync, readdirSync, statSync, existsSync, unlinkSync } from 'node:fs';
+import { join } from 'node:path';
+import { createHash } from 'node:crypto';
+import { gbrainPath } from './config.ts';
+
+export interface RemediationCheckpoint {
+  schema_version: 1;
+  plan_hash: string;
+  doctor_run_id: string;
+  target_score: number;
+  started_at: string;
+  completed: Array<{
+    id: string;
+    job: string;
+    idempotency_key?: string;
+    status: string;
+    job_id?: number | null;
+  }>;
+  aborted_at: string;
+  abort_reason: 'budget_exhausted' | 'manual' | 'error';
+  budget_snapshot?: {
+    spent: number;
+    cap: number;
+    reason: string;
+    model_id?: string;
+  };
+}
+
+function checkpointDir(): string {
+  return gbrainPath('remediation');
+}
+
+export function computePlanHash(recommendationIds: string[]): string {
+  const sorted = [...recommendationIds].sort();
+  const sha = createHash('sha256').update(JSON.stringify(sorted)).digest('hex');
+  return sha.slice(0, 16);
+}
+
+export function checkpointPath(planHash: string): string {
+  return join(checkpointDir(), `${planHash}.json`);
+}
+
+export function saveRemediationCheckpoint(cp: RemediationCheckpoint): void {
+  try {
+    mkdirSync(checkpointDir(), { recursive: true });
+    const path = checkpointPath(cp.plan_hash);
+    const tmp = `${path}.tmp`;
+    writeFileSync(tmp, JSON.stringify(cp, null, 2));
+    // Atomic rename via fs.renameSync — Node guarantees POSIX atomicity on same-fs renames.
+    const { renameSync } = require('node:fs') as typeof import('node:fs');
+    renameSync(tmp, path);
+  } catch (err) {
+    process.stderr.write(`[remediate] checkpoint write failed: ${String(err)}\n`);
+  }
+}
+
+export function loadRemediationCheckpoint(planHash: string): RemediationCheckpoint | null {
+  const path = checkpointPath(planHash);
+  if (!existsSync(path)) return null;
+  try {
+    const raw = readFileSync(path, 'utf-8');
+    const parsed = JSON.parse(raw) as RemediationCheckpoint;
+    if (parsed.schema_version !== 1) {
+      process.stderr.write(`[remediate] checkpoint ${planHash} has schema_version ${parsed.schema_version}; ignoring.\n`);
+      return null;
+    }
+    return parsed;
+  } catch (err) {
+    process.stderr.write(`[remediate] checkpoint read failed: ${String(err)}\n`);
+    return null;
+  }
+}
+
+/** List checkpoint files mtime-ordered, newest first. Best-effort. */
+export function listRemediationCheckpoints(): Array<{ plan_hash: string; mtime: number }> {
+  const dir = checkpointDir();
+  if (!existsSync(dir)) return [];
+  try {
+    const entries = readdirSync(dir).filter((f) => f.endsWith('.json'));
+    return entries
+      .map((f) => {
+        try {
+          const path = join(dir, f);
+          const m = statSync(path).mtimeMs;
+          return { plan_hash: f.replace(/\.json$/, ''), mtime: m };
+        } catch {
+          return null;
+        }
+      })
+      .filter((x): x is { plan_hash: string; mtime: number } => x !== null)
+      .sort((a, b) => b.mtime - a.mtime);
+  } catch {
+    return [];
+  }
+}
+
+/** Delete a checkpoint after successful completion. Idempotent. */
+export function clearRemediationCheckpoint(planHash: string): void {
+  const path = checkpointPath(planHash);
+  if (!existsSync(path)) return;
+  try {
+    unlinkSync(path);
+  } catch {
+    // Best-effort.
+  }
+}
diff --git a/test/brainstorm/checkpoint.serial.test.ts b/test/brainstorm/checkpoint.serial.test.ts
new file mode 100644
index 000000000..101ddedb6
--- /dev/null
+++ b/test/brainstorm/checkpoint.serial.test.ts
@@ -0,0 +1,223 @@
+/**
+ * v0.37.x — brainstorm checkpoint contract (TX3/TX4/A5 amended).
+ *
+ * Pins:
+ *   - computeRunId is deterministic + invariant to slug-array sort order.
+ *   - computeRunId is stable across embedding-model swaps (no embedding
+ *     bits in the hash).
+ *   - saveCheckpoint atomic via .tmp + rename.
+ *   - loadCheckpoint returns null on missing file + schema_version
+ *     mismatch.
+ *   - listRuns mtime-ordered (newest first).
+ *   - gcStaleCheckpoints unlinks > N days.
+ *   - Round-trip preserves `ideas` bodies (TX3 load-bearing contract).
+ */
+
+import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
+import { mkdtempSync, rmSync, existsSync, readFileSync, writeFileSync, utimesSync } from 'node:fs';
+import { tmpdir } from 'node:os';
+import { join } from 'node:path';
+import {
+  computeRunId,
+  saveCheckpoint,
+  loadCheckpoint,
+  listRuns,
+  gcStaleCheckpoints,
+  clearCheckpoint,
+  isCheckpointFresh,
+  type BrainstormCheckpoint,
+} from '../../src/core/brainstorm/checkpoint.ts';
+
+let homeBackup: string | undefined;
+let tmp: string;
+
+beforeEach(() => {
+  tmp = mkdtempSync(join(tmpdir(), 'gbrain-bs-cp-'));
+  homeBackup = process.env.GBRAIN_HOME;
+  process.env.GBRAIN_HOME = tmp;
+});
+
+afterEach(() => {
+  if (homeBackup === undefined) delete process.env.GBRAIN_HOME;
+  else process.env.GBRAIN_HOME = homeBackup;
+  rmSync(tmp, { recursive: true, force: true });
+});
+
+function fixtureCheckpoint(runId: string, ideas: Array<{ text: string; cross: string }> = []): BrainstormCheckpoint {
+  return {
+    schema_version: 2,
+    run_id: runId,
+    question: 'why are AI coding tools converging on the same UX?',
+    profile_label: 'brainstorm',
+    started_at: new Date().toISOString(),
+    completed_crosses: ideas.map((i, idx) => ({
+      close_slug: `wiki/close-${idx}`,
+      far_slug: `wiki/far-${idx}`,
+      cross_id: i.cross,
+      ideas: [{ text: i.text, cross_id: i.cross }],
+    })),
+    failed_crosses: [],
+    judge_done: false,
+  };
+}
+
+describe('computeRunId (A5 amended)', () => {
+  test('deterministic for the same inputs', () => {
+    const a = computeRunId('Q', 'brainstorm', ['close/a', 'close/b'], ['far/c', 'far/d']);
+    const b = computeRunId('Q', 'brainstorm', ['close/a', 'close/b'], ['far/c', 'far/d']);
+    expect(a).toBe(b);
+  });
+
+  test('invariant to slug-array order', () => {
+    const a = computeRunId('Q', 'lsd', ['close/a', 'close/b'], ['far/c', 'far/d']);
+    const b = computeRunId('Q', 'lsd', ['close/b', 'close/a'], ['far/d', 'far/c']);
+    expect(a).toBe(b);
+  });
+
+  test('differs when question changes', () => {
+    const a = computeRunId('Q1', 'brainstorm', ['s'], ['t']);
+    const b = computeRunId('Q2', 'brainstorm', ['s'], ['t']);
+    expect(a).not.toBe(b);
+  });
+
+  test('differs when profile changes', () => {
+    const a = computeRunId('Q', 'brainstorm', ['s'], ['t']);
+    const b = computeRunId('Q', 'lsd', ['s'], ['t']);
+    expect(a).not.toBe(b);
+  });
+
+  test('stable across embedding-model swaps (no embedding bits)', () => {
+    // The identity formula uses ONLY question+profile+slug-arrays. We
+    // simulate a model swap by varying nothing — the run_id must be
+    // independent of any embedding state, which means we get the same
+    // hash from the same call.
+    const slugs = ['close/a'];
+    const far = ['far/b'];
+    expect(computeRunId('Q', 'brainstorm', slugs, far)).toBe(
+      computeRunId('Q', 'brainstorm', slugs, far),
+    );
+  });
+
+  test('produces a stable 16-char hex prefix', () => {
+    const id = computeRunId('Q', 'brainstorm', ['s'], ['t']);
+    expect(id).toMatch(/^[0-9a-f]{16}$/);
+  });
+});
+
+describe('save + load round-trip (TX3 load-bearing — full ideas preserved)', () => {
+  test('preserves completed_crosses ideas verbatim', () => {
+    const runId = 'ab1234567890cdef';
+    const cp = fixtureCheckpoint(runId, [
+      { text: 'idea body one — concrete grounding here', cross: 'C1' },
+      { text: 'idea body two', cross: 'C2' },
+      { text: 'idea body three with extra detail', cross: 'C3' },
+    ]);
+    saveCheckpoint(cp);
+    const loaded = loadCheckpoint(runId);
+    expect(loaded).not.toBeNull();
+    expect(loaded!.completed_crosses.length).toBe(3);
+    expect(loaded!.completed_crosses[0].ideas[0].text).toBe('idea body one — concrete grounding here');
+    expect(loaded!.completed_crosses[0].ideas[0].cross_id).toBe('C1');
+    expect(loaded!.completed_crosses[2].ideas[0].text).toBe('idea body three with extra detail');
+  });
+
+  test('atomic write: no .tmp left behind on success', () => {
+    const cp = fixtureCheckpoint('atomicrenameabcd');
+    saveCheckpoint(cp);
+    const dir = join(tmp, '.gbrain', 'brainstorm');
+    expect(existsSync(join(dir, 'atomicrenameabcd.json'))).toBe(true);
+    expect(existsSync(join(dir, 'atomicrenameabcd.json.tmp'))).toBe(false);
+  });
+
+  test('loadCheckpoint returns null on missing file', () => {
+    expect(loadCheckpoint('no_such_run_id')).toBeNull();
+  });
+
+  test('loadCheckpoint returns null + stderr WARN on schema mismatch', () => {
+    const runId = 'schemamismatch00';
+    const cp = fixtureCheckpoint(runId);
+    saveCheckpoint(cp);
+    const path = join(tmp, '.gbrain', 'brainstorm', `${runId}.json`);
+    const raw = JSON.parse(readFileSync(path, 'utf-8'));
+    raw.schema_version = 1;
+    writeFileSync(path, JSON.stringify(raw));
+    expect(loadCheckpoint(runId)).toBeNull();
+  });
+
+  test('loadCheckpoint returns null on corrupt JSON', () => {
+    const runId = 'corruptjson00000';
+    saveCheckpoint(fixtureCheckpoint(runId));
+    writeFileSync(join(tmp, '.gbrain', 'brainstorm', `${runId}.json`), '{not json}');
+    expect(loadCheckpoint(runId)).toBeNull();
+  });
+});
+
+describe('listRuns mtime-newest-first', () => {
+  test('empty dir returns []', () => {
+    expect(listRuns()).toEqual([]);
+  });
+
+  test('returns most-recently-saved first', async () => {
+    saveCheckpoint(fixtureCheckpoint('run00000000first'));
+    await new Promise((r) => setTimeout(r, 20));
+    saveCheckpoint(fixtureCheckpoint('run0000000second'));
+    const list = listRuns();
+    expect(list.length).toBe(2);
+    expect(list[0].run_id).toBe('run0000000second');
+    expect(list[1].run_id).toBe('run00000000first');
+  });
+});
+
+describe('gcStaleCheckpoints (A5 7-day window)', () => {
+  test('removes files older than the threshold; returns count', () => {
+    const stale = 'stalecheckpoint1';
+    const fresh = 'freshcheckpoint2';
+    saveCheckpoint(fixtureCheckpoint(stale));
+    saveCheckpoint(fixtureCheckpoint(fresh));
+    // Set the stale file's mtime to 30 days ago.
+    const stalePath = join(tmp, '.gbrain', 'brainstorm', `${stale}.json`);
+    const oldTime = (Date.now() - 30 * 24 * 60 * 60 * 1000) / 1000;
+    utimesSync(stalePath, oldTime, oldTime);
+    const removed = gcStaleCheckpoints(7);
+    expect(removed).toBe(1);
+    expect(existsSync(stalePath)).toBe(false);
+    expect(existsSync(join(tmp, '.gbrain', 'brainstorm', `${fresh}.json`))).toBe(true);
+  });
+
+  test('returns 0 when dir is empty', () => {
+    expect(gcStaleCheckpoints(7)).toBe(0);
+  });
+});
+
+describe('clearCheckpoint', () => {
+  test('removes file when present', () => {
+    saveCheckpoint(fixtureCheckpoint('cleartest0000000'));
+    const path = join(tmp, '.gbrain', 'brainstorm', `cleartest0000000.json`);
+    expect(existsSync(path)).toBe(true);
+    clearCheckpoint('cleartest0000000');
+    expect(existsSync(path)).toBe(false);
+  });
+
+  test('idempotent on missing file', () => {
+    expect(() => clearCheckpoint('never_saved')).not.toThrow();
+  });
+});
+
+describe('isCheckpointFresh', () => {
+  test('true for newly-saved checkpoint', () => {
+    saveCheckpoint(fixtureCheckpoint('freshtest0000000'));
+    expect(isCheckpointFresh('freshtest0000000')).toBe(true);
+  });
+
+  test('false for missing checkpoint', () => {
+    expect(isCheckpointFresh('not_saved')).toBe(false);
+  });
+
+  test('false for >7 day old checkpoint', () => {
+    saveCheckpoint(fixtureCheckpoint('oldtest000000000'));
+    const path = join(tmp, '.gbrain', 'brainstorm', 'oldtest000000000.json');
+    const oldTime = (Date.now() - 10 * 24 * 60 * 60 * 1000) / 1000;
+    utimesSync(path, oldTime, oldTime);
+    expect(isCheckpointFresh('oldtest000000000')).toBe(false);
+  });
+});
diff --git a/test/brainstorm/cost-guardrails.test.ts b/test/brainstorm/cost-guardrails.test.ts
new file mode 100644
index 000000000..dcc4c127e
--- /dev/null
+++ b/test/brainstorm/cost-guardrails.test.ts
@@ -0,0 +1,165 @@
+/**
+ * v0.37.1 — cost guardrails + judge chunking + far-set cap.
+ *
+ * Regression suite for fix/brainstorm-cost-guardrails. The 13K-page brain
+ * incident: estimated cost $0.96, actual $50.71 (53x over) because the
+ * domain-bank's `listPrefixSampledPages` returned one page per prefix and
+ * the brain had ~2K distinct prefixes. The judge phase then tried to score
+ * 15,868 ideas in a single LLM call (3M tokens > 1M context window).
+ *
+ * These tests pin the new behavior:
+ *   - CLI parses --max-cost, --max-far-set, --strict-budget, --judge-model,
+ *     --max-ideas-per-judge-call.
+ *   - runJudge chunks large idea sets into batches of `maxIdeasPerCall`.
+ *   - fetchFar caps the prefix list to `maxFarSet` and trims pages to `m`.
+ */
+
+import { describe, test, expect } from 'bun:test';
+import { parseBrainstormArgs } from '../../src/commands/brainstorm.ts';
+import { runJudge, BRAINSTORM_JUDGE_CONFIG, type JudgeIdea } from '../../src/core/brainstorm/judges.ts';
+import type { ChatOpts, ChatResult } from '../../src/core/ai/gateway.ts';
+
+describe('parseBrainstormArgs — new cost-guardrail flags', () => {
+  test('--max-cost parses positive float', () => {
+    const r = parseBrainstormArgs(['hello', '--max-cost', '2.50']);
+    expect(r.maxCost).toBe(2.5);
+    expect(r.error).toBeUndefined();
+  });
+
+  test('--max-cost rejects non-positive', () => {
+    const r = parseBrainstormArgs(['hello', '--max-cost', '0']);
+    expect(r.error).toMatch(/--max-cost/);
+  });
+
+  test('--max-far-set parses positive int', () => {
+    const r = parseBrainstormArgs(['hello', '--max-far-set', '20']);
+    expect(r.maxFarSet).toBe(20);
+  });
+
+  test('--strict-budget is a boolean flag', () => {
+    const r = parseBrainstormArgs(['hello', '--strict-budget']);
+    expect(r.strictBudget).toBe(true);
+  });
+
+  test('--judge-model captures the next arg', () => {
+    const r = parseBrainstormArgs(['hello', '--judge-model', 'anthropic:claude-sonnet-4-6']);
+    expect(r.judgeModel).toBe('anthropic:claude-sonnet-4-6');
+  });
+
+  test('--judge-model rejects missing value', () => {
+    const r = parseBrainstormArgs(['hello', '--judge-model']);
+    expect(r.error).toMatch(/--judge-model/);
+  });
+
+  test('--max-ideas-per-judge-call parses positive int', () => {
+    const r = parseBrainstormArgs(['hello', '--max-ideas-per-judge-call', '50']);
+    expect(r.maxIdeasPerJudgeCall).toBe(50);
+  });
+
+  test('flags compose with --limit and --yes', () => {
+    const r = parseBrainstormArgs([
+      'why are AI coding tools converging',
+      '--max-cost', '10',
+      '--max-far-set', '25',
+      '--limit', '8',
+      '--yes',
+    ]);
+    expect(r.error).toBeUndefined();
+    expect(r.maxCost).toBe(10);
+    expect(r.maxFarSet).toBe(25);
+    expect(r.limit).toBe(8);
+    expect(r.yes).toBe(true);
+    expect(r.question).toBe('why are AI coding tools converging');
+  });
+});
+
+describe('runJudge — chunks large idea sets to avoid context overflow', () => {
+  // Build a fake chat that returns a well-formed batch verdict for whatever
+  // ideas are in the prompt. The mock parses the `## Idea <id>` headings to
+  // know which ids it should score, so we can assert each chunk lands.
+  function makeFakeChat() {
+    const state = { calls: 0, lastIdeaCount: 0, allScoredIds: [] as string[] };
+    const chat = async (opts: ChatOpts): Promise<ChatResult> => {
+      state.calls += 1;
+      const rawContent = opts.messages[0]?.content;
+      const user = typeof rawContent === 'string' ? rawContent : '';
+      const ideaMatches = Array.from(user.matchAll(/## Idea (\S+)/g)).map((m) => m[1] as string);
+      state.lastIdeaCount = ideaMatches.length;
+      state.allScoredIds.push(...ideaMatches);
+      const ideasJson = ideaMatches.map((id) => ({
+        id,
+        scores: { originality: 4, resistance: 4, thesis_density: 4, concrete_grounding: 4, cognitive_load: 4 },
+        note: 'mock',
+      }));
+      const text = '```json\n' + JSON.stringify({ ideas: ideasJson }) + '\n```';
+      const result: ChatResult = {
+        text,
+        blocks: [{ type: 'text', text }],
+        stopReason: 'end',
+        model: 'mock:judge',
+        providerId: 'mock',
+        usage: { input_tokens: 100, output_tokens: 50, cache_read_tokens: 0, cache_creation_tokens: 0 },
+      };
+      return result;
+    };
+    return { chat, state };
+  }
+
+  function makeIdeas(n: number): JudgeIdea[] {
+    return Array.from({ length: n }, (_, i) => ({
+      id: String(i + 1).padStart(3, '0'),
+      text: `idea body ${i}`,
+      close_slug: 'wiki/close',
+      far_slug: 'wiki/far',
+    }));
+  }
+
+  test('250 ideas with maxIdeasPerCall=100 → 3 chunks, all ideas scored', async () => {
+    const fake = makeFakeChat();
+    const ideas = makeIdeas(250);
+    const result = await runJudge(BRAINSTORM_JUDGE_CONFIG, ideas, {
+      chatFn: fake.chat,
+      maxIdeasPerCall: 100,
+      stderrWrite: () => {},
+    });
+    expect(fake.state.calls).toBe(3); // 100 + 100 + 50
+    expect(result.ideas.length).toBe(250);
+    expect(fake.state.allScoredIds.sort()).toEqual(ideas.map((i) => i.id).sort());
+  });
+
+  test('single chunk path preserved for small idea sets', async () => {
+    const fake = makeFakeChat();
+    const ideas = makeIdeas(10);
+    const result = await runJudge(BRAINSTORM_JUDGE_CONFIG, ideas, {
+      chatFn: fake.chat,
+      maxIdeasPerCall: 100,
+      stderrWrite: () => {},
+    });
+    expect(fake.state.calls).toBe(1);
+    expect(result.ideas.length).toBe(10);
+  });
+
+  test('usage tokens accumulate across chunks', async () => {
+    const fake = makeFakeChat();
+    const ideas = makeIdeas(250);
+    const result = await runJudge(BRAINSTORM_JUDGE_CONFIG, ideas, {
+      chatFn: fake.chat,
+      maxIdeasPerCall: 100,
+      stderrWrite: () => {},
+    });
+    // Each mock call reports 100 in / 50 out; 3 calls → 300 / 150.
+    expect(result.usage.input_tokens).toBe(300);
+    expect(result.usage.output_tokens).toBe(150);
+  });
+
+  test('default chunk size is 100 (codex r2 follow-up)', async () => {
+    const fake = makeFakeChat();
+    const ideas = makeIdeas(101);
+    await runJudge(BRAINSTORM_JUDGE_CONFIG, ideas, {
+      chatFn: fake.chat,
+      // no maxIdeasPerCall → default 100
+      stderrWrite: () => {},
+    });
+    expect(fake.state.calls).toBe(2); // 100 + 1
+  });
+});
diff --git a/test/budget-meter.test.ts b/test/budget-meter.test.ts
index 51eb41cc4..79234a601 100644
--- a/test/budget-meter.test.ts
+++ b/test/budget-meter.test.ts
@@ -78,4 +78,34 @@ describe('BudgetMeter', () => {
     const r = meter.check({ modelId: 'claude-haiku-4-5-20251001', estimatedInputTokens: 100, maxOutputTokens: 100, label: 'wk' });
     expect(r.allowed).toBe(true);
   });
+
+  test('A2 amended: every ledger line carries schema_version=1 and the documented field set', () => {
+    const meter = new BudgetMeter({ budgetUsd: 0.01, phase: 'auto_think', auditPath });
+    meter.check({ modelId: 'claude-haiku-4-5-20251001', estimatedInputTokens: 1000, maxOutputTokens: 1000, label: 'verdict' }); // submit
+    meter.check({ modelId: 'claude-opus-4-7', estimatedInputTokens: 5000, maxOutputTokens: 10000, label: 'big-call' });          // submit_denied
+    meter.check({ modelId: 'gpt-5', estimatedInputTokens: 1000, maxOutputTokens: 1000, label: 'unpriced' });                     // submit_unpriced
+    const lines = readLedger();
+    expect(lines).toHaveLength(3);
+
+    // schema_version must be on every line (renames here are breaking).
+    for (const line of lines) {
+      expect(line.schema_version).toBe(1);
+      expect(typeof line.ts).toBe('string');
+      expect(line.phase).toBe('auto_think');
+      expect(['submit', 'submit_denied', 'submit_unpriced']).toContain(line.event as string);
+      expect(typeof line.model).toBe('string');
+      expect(typeof line.label).toBe('string');
+    }
+
+    // submit / submit_denied carry the cost fields.
+    const denied = lines[0]; // first opus call exceeds the cap → denied
+    expect(typeof denied.estimated_cost_usd).toBe('number');
+    expect(typeof denied.cumulative_cost_usd).toBe('number');
+    expect(denied.budget_usd).toBe(0.01);
+
+    // submit_unpriced carries the token-shape fields instead.
+    const unpriced = lines[2];
+    expect(typeof unpriced.estimated_input_tokens).toBe('number');
+    expect(typeof unpriced.max_output_tokens).toBe('number');
+  });
 });
diff --git a/test/core/audit-week-file.serial.test.ts b/test/core/audit-week-file.serial.test.ts
new file mode 100644
index 000000000..061cbefc8
--- /dev/null
+++ b/test/core/audit-week-file.serial.test.ts
@@ -0,0 +1,68 @@
+/**
+ * v0.37.x — single source of truth for ISO-week audit filenames.
+ *
+ * Pins year-boundary correctness so the four migrated callers
+ * (shell-audit, phantom-audit, slug-fallback-audit, dream-budget,
+ * budget-tracker) don't drift apart on filename shapes.
+ */
+
+import { describe, test, expect } from 'bun:test';
+import { isoWeek, isoWeekFilename, resolveAuditDir } from '../../src/core/audit-week-file.ts';
+
+describe('isoWeek', () => {
+  test('mid-year date returns 1..53 within the calendar year', () => {
+    const { year, week } = isoWeek(new Date(Date.UTC(2026, 5, 15))); // 2026-06-15 (Mon)
+    expect(year).toBe(2026);
+    expect(week).toBeGreaterThan(20);
+    expect(week).toBeLessThan(28);
+  });
+
+  test('2025-01-01 (Wednesday) belongs to 2025-W01', () => {
+    const { year, week } = isoWeek(new Date(Date.UTC(2025, 0, 1)));
+    expect(year).toBe(2025);
+    expect(week).toBe(1);
+  });
+
+  test('2024-12-30 (Monday) belongs to 2025-W01 (rollover into next ISO year)', () => {
+    const { year, week } = isoWeek(new Date(Date.UTC(2024, 11, 30)));
+    expect(year).toBe(2025);
+    expect(week).toBe(1);
+  });
+
+  test('2026-01-01 (Thursday) belongs to 2026-W01', () => {
+    const { year, week } = isoWeek(new Date(Date.UTC(2026, 0, 1)));
+    expect(year).toBe(2026);
+    expect(week).toBe(1);
+  });
+
+  test('2020-12-28 (Mon) is 2020-W53 (the 53-week year)', () => {
+    const { year, week } = isoWeek(new Date(Date.UTC(2020, 11, 28)));
+    expect(year).toBe(2020);
+    expect(week).toBe(53);
+  });
+});
+
+describe('isoWeekFilename', () => {
+  test('produces <prefix>-YYYY-Www.jsonl with two-digit week', () => {
+    expect(isoWeekFilename('budget', new Date(Date.UTC(2025, 0, 1)))).toBe('budget-2025-W01.jsonl');
+    expect(isoWeekFilename('shell-jobs', new Date(Date.UTC(2020, 11, 28)))).toBe('shell-jobs-2020-W53.jsonl');
+  });
+
+  test('default now arg uses current date (smoke)', () => {
+    const name = isoWeekFilename('budget');
+    expect(name).toMatch(/^budget-\d{4}-W\d{2}\.jsonl$/);
+  });
+});
+
+describe('resolveAuditDir', () => {
+  test('honors GBRAIN_AUDIT_DIR override', () => {
+    const prev = process.env.GBRAIN_AUDIT_DIR;
+    process.env.GBRAIN_AUDIT_DIR = '/tmp/test-audit-override';
+    try {
+      expect(resolveAuditDir()).toBe('/tmp/test-audit-override');
+    } finally {
+      if (prev === undefined) delete process.env.GBRAIN_AUDIT_DIR;
+      else process.env.GBRAIN_AUDIT_DIR = prev;
+    }
+  });
+});
diff --git a/test/core/budget/budget-tracker.test.ts b/test/core/budget/budget-tracker.test.ts
new file mode 100644
index 000000000..034bbe4d1
--- /dev/null
+++ b/test/core/budget/budget-tracker.test.ts
@@ -0,0 +1,363 @@
+/**
+ * v0.37.x — BudgetTracker contracts (TX1, TX2, A3 amended, Q2).
+ *
+ * Every behavior the rest of the budget cathedral depends on is pinned here:
+ *   - reserve() throws BudgetExhausted on each of {cost, runtime, no_pricing}.
+ *   - record() throws BudgetExhausted (reason:'cost') when cumulative > cap
+ *     after a single under-estimated call (TX1).
+ *   - extractUsageFromError prefers err.usage, falls back to a pessimistic
+ *     ceiling (NOT the conservative pre-call estimate) (A3 amended).
+ *   - onExhausted fires once + synchronously, before the throw propagates.
+ *   - Audit JSONL is schema-stable: every line carries schema_version=1.
+ *   - Non-priced model + no cap: emits BUDGET_TRACKER_NO_PRICING once per
+ *     process (legacy behavior preserved).
+ *
+ * Hermetic: no DB, no network, no real audit dir. We override `auditPath`
+ * to a tmpdir-scoped JSONL so tests can read it back without touching
+ * `~/.gbrain`. `withEnv` covers the GBRAIN_AUDIT_DIR escape hatch.
+ */
+
+import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
+import { mkdtempSync, readFileSync, rmSync, existsSync } from 'node:fs';
+import { tmpdir } from 'node:os';
+import { join } from 'node:path';
+import {
+  BudgetTracker,
+  BudgetExhausted,
+  extractUsageFromError,
+  _resetBudgetTrackerWarningsForTest,
+} from '../../../src/core/budget/budget-tracker.ts';
+
+let tmp: string;
+let auditPath: string;
+let stderrCapture: string;
+let origStderrWrite: typeof process.stderr.write;
+
+beforeEach(() => {
+  tmp = mkdtempSync(join(tmpdir(), 'gbrain-budget-test-'));
+  auditPath = join(tmp, 'budget.jsonl');
+  _resetBudgetTrackerWarningsForTest();
+  stderrCapture = '';
+  origStderrWrite = process.stderr.write.bind(process.stderr);
+  (process.stderr as { write: unknown }).write = (chunk: string | Uint8Array): boolean => {
+    stderrCapture += typeof chunk === 'string' ? chunk : new TextDecoder().decode(chunk);
+    return true;
+  };
+});
+
+afterEach(() => {
+  (process.stderr as { write: unknown }).write = origStderrWrite;
+  rmSync(tmp, { recursive: true, force: true });
+});
+
+function readAudit(): Array<Record<string, unknown>> {
+  if (!existsSync(auditPath)) return [];
+  return readFileSync(auditPath, 'utf-8')
+    .split('\n')
+    .filter((l) => l.length > 0)
+    .map((l) => JSON.parse(l) as Record<string, unknown>);
+}
+
+describe('BudgetTracker.reserve', () => {
+  test('passes when under cap with known pricing', () => {
+    const t = new BudgetTracker({ maxCostUsd: 1.0, label: 'test', auditPath });
+    expect(() =>
+      t.reserve({
+        modelId: 'claude-haiku-4-5-20251001',
+        estimatedInputTokens: 1000,
+        maxOutputTokens: 1000,
+        kind: 'chat',
+      }),
+    ).not.toThrow();
+    const audit = readAudit();
+    expect(audit.length).toBe(1);
+    expect(audit[0].event).toBe('reserve');
+    expect(audit[0].schema_version).toBe(1);
+  });
+
+  test('throws BudgetExhausted (reason: cost) when projected > cap', () => {
+    const t = new BudgetTracker({ maxCostUsd: 0.001, label: 'test', auditPath });
+    let caught: unknown = null;
+    try {
+      // Opus 4.7 at $5/$25/M; 1K in + 1K out = $0.005 + $0.025 = $0.030 > $0.001
+      t.reserve({
+        modelId: 'claude-opus-4-7',
+        estimatedInputTokens: 1000,
+        maxOutputTokens: 1000,
+        kind: 'chat',
+      });
+    } catch (err) {
+      caught = err;
+    }
+    expect(caught).toBeInstanceOf(BudgetExhausted);
+    expect((caught as BudgetExhausted).reason).toBe('cost');
+    expect((caught as BudgetExhausted).cap).toBe(0.001);
+    expect((caught as BudgetExhausted).modelId).toBe('claude-opus-4-7');
+    const audit = readAudit();
+    expect(audit.some((e) => e.event === 'reserve_denied')).toBe(true);
+  });
+
+  test('throws BudgetExhausted (reason: runtime) when wall-clock cap blown', () => {
+    const t = new BudgetTracker({ maxRuntimeMs: 1, label: 'test', auditPath });
+    // Spin briefly so elapsed > 1ms
+    const start = Date.now();
+    while (Date.now() - start < 5) {
+      /* spin */
+    }
+    let caught: unknown = null;
+    try {
+      t.reserve({
+        modelId: 'claude-haiku-4-5-20251001',
+        estimatedInputTokens: 10,
+        maxOutputTokens: 10,
+        kind: 'chat',
+      });
+    } catch (err) {
+      caught = err;
+    }
+    expect(caught).toBeInstanceOf(BudgetExhausted);
+    expect((caught as BudgetExhausted).reason).toBe('runtime');
+  });
+
+  test('TX2: throws BudgetExhausted (reason: no_pricing) when cap set + model unknown', () => {
+    const t = new BudgetTracker({ maxCostUsd: 1.0, label: 'test', auditPath });
+    let caught: unknown = null;
+    try {
+      t.reserve({
+        modelId: 'mystery:some-unreleased-model',
+        estimatedInputTokens: 100,
+        maxOutputTokens: 100,
+        kind: 'chat',
+      });
+    } catch (err) {
+      caught = err;
+    }
+    expect(caught).toBeInstanceOf(BudgetExhausted);
+    expect((caught as BudgetExhausted).reason).toBe('no_pricing');
+    expect((caught as BudgetExhausted).modelId).toBe('mystery:some-unreleased-model');
+    expect((caught as Error).message).toMatch(/anthropic-pricing\.ts/);
+  });
+
+  test('no cap + unknown pricing: warns once per process, no throw', () => {
+    const t = new BudgetTracker({ label: 'test', auditPath });
+    expect(() =>
+      t.reserve({
+        modelId: 'mystery:some-other',
+        estimatedInputTokens: 100,
+        maxOutputTokens: 100,
+        kind: 'chat',
+      }),
+    ).not.toThrow();
+    expect(stderrCapture).toMatch(/BUDGET_TRACKER_NO_PRICING/);
+    // Second call same model: no second warning.
+    const before = stderrCapture.length;
+    t.reserve({
+      modelId: 'mystery:some-other',
+      estimatedInputTokens: 100,
+      maxOutputTokens: 100,
+      kind: 'chat',
+    });
+    expect(stderrCapture.length).toBe(before);
+    const audit = readAudit();
+    expect(audit.filter((e) => e.event === 'reserve_unpriced').length).toBe(2);
+  });
+});
+
+describe('BudgetTracker.record', () => {
+  test('TX1: cumulative > cap after under-estimated call throws BudgetExhausted', () => {
+    const t = new BudgetTracker({ maxCostUsd: 0.01, label: 'test', auditPath });
+    // Reserve a small call (within cap)
+    t.reserve({
+      modelId: 'claude-haiku-4-5-20251001',
+      estimatedInputTokens: 100,
+      maxOutputTokens: 100,
+      kind: 'chat',
+    });
+    // Provider returns way more than expected — cumulative blows past cap.
+    let caught: unknown = null;
+    try {
+      t.record({
+        modelId: 'claude-haiku-4-5-20251001',
+        inputTokens: 1_000_000,
+        outputTokens: 1_000_000,
+        kind: 'chat',
+      } as any);
+    } catch (err) {
+      caught = err;
+    }
+    expect(caught).toBeInstanceOf(BudgetExhausted);
+    expect((caught as BudgetExhausted).reason).toBe('cost');
+    expect((caught as BudgetExhausted).cap).toBe(0.01);
+    expect((caught as BudgetExhausted).spent).toBeGreaterThan(0.01);
+    expect(t.totalSpent).toBeGreaterThan(0.01);
+  });
+
+  test('records actual usage on success and updates cumulative', () => {
+    const t = new BudgetTracker({ maxCostUsd: 1.0, label: 'test', auditPath });
+    t.record({
+      modelId: 'claude-haiku-4-5-20251001',
+      inputTokens: 1000,
+      outputTokens: 500,
+      kind: 'chat',
+    } as any);
+    // Haiku: ($1 × 1K/1M) + ($5 × 500/1K-K) = 0.001 + 0.0025 = 0.0035
+    expect(t.totalSpent).toBeCloseTo(0.0035, 6);
+    expect(t.snapshot().callsRecorded).toBe(1);
+    const audit = readAudit();
+    expect(audit.length).toBe(1);
+    expect(audit[0].event).toBe('record');
+    expect(audit[0].schema_version).toBe(1);
+    expect(audit[0].actual_cost_usd).toBeCloseTo(0.0035, 6);
+  });
+
+  test('unpriced record: no throw, audited as record_unpriced', () => {
+    const t = new BudgetTracker({ label: 'test', auditPath });
+    expect(() =>
+      t.record({
+        modelId: 'mystery:unknown',
+        inputTokens: 100,
+        outputTokens: 100,
+        kind: 'chat',
+      } as any),
+    ).not.toThrow();
+    const audit = readAudit();
+    expect(audit.some((e) => e.event === 'record_unpriced')).toBe(true);
+    expect(t.totalSpent).toBe(0);
+  });
+
+  test('embed record uses embedding-pricing map', () => {
+    const t = new BudgetTracker({ maxCostUsd: 1.0, label: 'test', auditPath });
+    t.record({
+      modelId: 'openai:text-embedding-3-large',
+      inputTokens: 1_000_000,
+      embeddingDims: 3072,
+      kind: 'embed',
+    } as any);
+    // 1M tokens × $0.13/M = $0.13
+    expect(t.totalSpent).toBeCloseTo(0.13, 6);
+    const audit = readAudit();
+    expect(audit[0].embedding_dims).toBe(3072);
+    expect(audit[0].kind).toBe('embed');
+  });
+});
+
+describe('BudgetTracker.onExhausted', () => {
+  test('fires once, synchronously, before throw propagates', () => {
+    const t = new BudgetTracker({ maxCostUsd: 0.001, label: 'test', auditPath });
+    let fired = 0;
+    let firedBeforeThrow = false;
+    t.onExhausted(() => {
+      fired++;
+      firedBeforeThrow = true;
+    });
+    expect(() =>
+      t.reserve({
+        modelId: 'claude-opus-4-7',
+        estimatedInputTokens: 1000,
+        maxOutputTokens: 1000,
+        kind: 'chat',
+      }),
+    ).toThrow(BudgetExhausted);
+    expect(fired).toBe(1);
+    expect(firedBeforeThrow).toBe(true);
+    // Subsequent throws don't refire the callback (record() over cap should
+    // not re-trigger).
+    try {
+      t.record({
+        modelId: 'claude-opus-4-7',
+        inputTokens: 10_000_000,
+        outputTokens: 0,
+        kind: 'chat',
+      } as any);
+    } catch {
+      /* expected */
+    }
+    expect(fired).toBe(1);
+  });
+});
+
+describe('extractUsageFromError (A3 amended)', () => {
+  const fallback = { inputTokens: 5000, outputTokens: 5000 };
+
+  test('reads top-level err.usage (Anthropic shape)', () => {
+    const err = { usage: { input_tokens: 100, output_tokens: 50 } };
+    expect(extractUsageFromError(err, fallback)).toEqual({ inputTokens: 100, outputTokens: 50 });
+  });
+
+  test('reads nested err.response.usage (OpenAI shape)', () => {
+    const err = { response: { usage: { input_tokens: 200, output_tokens: 75 } } };
+    expect(extractUsageFromError(err, fallback)).toEqual({ inputTokens: 200, outputTokens: 75 });
+  });
+
+  test('camelCase usage variant', () => {
+    const err = { usage: { inputTokens: 300, outputTokens: 100 } };
+    expect(extractUsageFromError(err, fallback)).toEqual({ inputTokens: 300, outputTokens: 100 });
+  });
+
+  test('returns pessimistic fallback when no usage present (A3 amended)', () => {
+    const err = new Error('network blew up');
+    // Critical: fallback must be the pessimistic ceiling (maxOutputTokens),
+    // not the optimistic pre-call estimate. Caller passes
+    // { inputTokens: estimatedInput, outputTokens: maxOutput }.
+    expect(extractUsageFromError(err, fallback)).toEqual({
+      inputTokens: 5000,
+      outputTokens: 5000,
+    });
+  });
+
+  test('partial usage uses fallback for the missing half', () => {
+    const err = { usage: { input_tokens: 50 } };
+    expect(extractUsageFromError(err, fallback)).toEqual({
+      inputTokens: 50,
+      outputTokens: 5000,
+    });
+  });
+
+  test('handles primitives + null without throwing', () => {
+    expect(extractUsageFromError(null, fallback)).toEqual(fallback);
+    expect(extractUsageFromError(undefined, fallback)).toEqual(fallback);
+    expect(extractUsageFromError('boom', fallback)).toEqual(fallback);
+    expect(extractUsageFromError(42, fallback)).toEqual(fallback);
+  });
+});
+
+describe('Audit JSONL schema (A2 amended — schema-stable)', () => {
+  test('every line has schema_version=1 and the documented field set', () => {
+    const t = new BudgetTracker({ maxCostUsd: 0.5, label: 'phase-x', auditPath });
+    t.reserve({
+      modelId: 'claude-haiku-4-5-20251001',
+      estimatedInputTokens: 1000,
+      maxOutputTokens: 1000,
+      kind: 'chat',
+      label: 'phase-x.cross',
+    });
+    t.record({
+      modelId: 'claude-haiku-4-5-20251001',
+      inputTokens: 800,
+      outputTokens: 600,
+      kind: 'chat',
+      label: 'phase-x.cross',
+    } as any);
+    const audit = readAudit();
+    expect(audit.length).toBe(2);
+    for (const line of audit) {
+      expect(line.schema_version).toBe(1);
+      expect(typeof line.ts).toBe('string');
+      expect(line.label).toBe('phase-x');
+      expect(line.sub_label).toBe('phase-x.cross');
+      expect(['reserve', 'record']).toContain(line.event as string);
+    }
+  });
+});
+
+describe('BudgetTracker.snapshot', () => {
+  test('reports elapsed time + cumulative + caps', () => {
+    const t = new BudgetTracker({ maxCostUsd: 1, maxRuntimeMs: 60_000, label: 'x', auditPath });
+    const s = t.snapshot();
+    expect(s.cumulativeCostUsd).toBe(0);
+    expect(s.maxCostUsd).toBe(1);
+    expect(s.maxRuntimeMs).toBe(60_000);
+    expect(s.elapsedMs).toBeGreaterThanOrEqual(0);
+    expect(s.callsRecorded).toBe(0);
+  });
+});
diff --git a/test/core/budget/gateway-budget-composition.test.ts b/test/core/budget/gateway-budget-composition.test.ts
new file mode 100644
index 000000000..7fecc6d00
--- /dev/null
+++ b/test/core/budget/gateway-budget-composition.test.ts
@@ -0,0 +1,199 @@
+/**
+ * v0.37.x — TX5: gateway-layer enforcement via AsyncLocalStorage.
+ *
+ * Pins the public contract:
+ *   - withBudgetTracker(tracker, fn) sets up an AsyncLocalStorage scope.
+ *     Every gateway.chat / embed / rerank call inside the scope auto-
+ *     composes the tracker without explicit per-call injection.
+ *   - Nested scopes replace the active tracker for the inner closure and
+ *     restore the outer tracker on exit.
+ *   - Calls OUTSIDE any withBudgetTracker scope are budget-no-op (the
+ *     existing pre-v0.37 contract is preserved).
+ *
+ * Hermetic: routes through __setChatTransportForTests so no network /
+ * provider / env variable is touched.
+ */
+
+import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
+import { mkdtempSync, rmSync } from 'node:fs';
+import { tmpdir } from 'node:os';
+import { join } from 'node:path';
+import {
+  chat,
+  withBudgetTracker,
+  getCurrentBudgetTracker,
+  __setChatTransportForTests,
+  type ChatOpts,
+  type ChatResult,
+} from '../../../src/core/ai/gateway.ts';
+import {
+  BudgetTracker,
+  BudgetExhausted,
+  _resetBudgetTrackerWarningsForTest,
+} from '../../../src/core/budget/budget-tracker.ts';
+
+let tmp: string;
+let auditPath: string;
+
+beforeEach(() => {
+  tmp = mkdtempSync(join(tmpdir(), 'gbrain-gw-budget-'));
+  auditPath = join(tmp, 'budget.jsonl');
+  _resetBudgetTrackerWarningsForTest();
+});
+
+afterEach(() => {
+  __setChatTransportForTests(null);
+  rmSync(tmp, { recursive: true, force: true });
+});
+
+function fakeChatTransport(usage = { input_tokens: 100, output_tokens: 50 }) {
+  let calls = 0;
+  const fn = async (_opts: ChatOpts): Promise<ChatResult> => {
+    calls++;
+    return {
+      text: 'ok',
+      blocks: [{ type: 'text', text: 'ok' }],
+      stopReason: 'end',
+      model: 'claude-haiku-4-5-20251001',
+      providerId: 'anthropic',
+      usage: {
+        input_tokens: usage.input_tokens,
+        output_tokens: usage.output_tokens,
+        cache_read_tokens: 0,
+        cache_creation_tokens: 0,
+      },
+    };
+  };
+  return Object.assign(fn, { get calls() { return calls; } });
+}
+
+describe('withBudgetTracker — scope semantics', () => {
+  test('chat() inside scope auto-composes the tracker', async () => {
+    const tracker = new BudgetTracker({ maxCostUsd: 1.0, label: 'test-gw', auditPath });
+    const transport = fakeChatTransport({ input_tokens: 1000, output_tokens: 500 });
+    __setChatTransportForTests(transport);
+
+    expect(getCurrentBudgetTracker()).toBeNull();
+
+    await withBudgetTracker(tracker, async () => {
+      expect(getCurrentBudgetTracker()).toBe(tracker);
+      await chat({
+        model: 'claude-haiku-4-5-20251001',
+        system: 'sys',
+        messages: [{ role: 'user', content: 'hi' }],
+      });
+    });
+
+    expect(getCurrentBudgetTracker()).toBeNull();
+    // Haiku: 1K in + 500 out → ($1/M × 1K) + ($5/M × 500) = $0.001 + $0.0025 = $0.0035
+    expect(tracker.totalSpent).toBeCloseTo(0.0035, 6);
+    expect(tracker.snapshot().callsRecorded).toBe(1);
+  });
+
+  test('chat() OUTSIDE any scope is a budget no-op (back-compat)', async () => {
+    const transport = fakeChatTransport();
+    __setChatTransportForTests(transport);
+    // No withBudgetTracker wrapper — current behavior preserved.
+    await chat({
+      model: 'claude-haiku-4-5-20251001',
+      messages: [{ role: 'user', content: 'hi' }],
+    });
+    // No tracker; nothing to assert other than "no throw".
+    expect(getCurrentBudgetTracker()).toBeNull();
+  });
+
+  test('nested scopes restore outer tracker on exit', async () => {
+    const outer = new BudgetTracker({ maxCostUsd: 1.0, label: 'outer', auditPath });
+    const inner = new BudgetTracker({ maxCostUsd: 1.0, label: 'inner', auditPath: join(tmp, 'inner.jsonl') });
+
+    await withBudgetTracker(outer, async () => {
+      expect(getCurrentBudgetTracker()).toBe(outer);
+      await withBudgetTracker(inner, async () => {
+        expect(getCurrentBudgetTracker()).toBe(inner);
+      });
+      expect(getCurrentBudgetTracker()).toBe(outer);
+    });
+    expect(getCurrentBudgetTracker()).toBeNull();
+  });
+
+  test('over-cap chat call throws BudgetExhausted via reserve()', async () => {
+    const tracker = new BudgetTracker({ maxCostUsd: 0.001, label: 'tight', auditPath });
+    const transport = fakeChatTransport();
+    __setChatTransportForTests(transport);
+
+    let caught: unknown = null;
+    await withBudgetTracker(tracker, async () => {
+      try {
+        await chat({
+          // Opus 4.7 with high maxTokens → projected cost > $0.001
+          model: 'claude-opus-4-7',
+          messages: [{ role: 'user', content: 'a'.repeat(40_000) }],
+          maxTokens: 4096,
+        });
+      } catch (err) {
+        caught = err;
+      }
+    });
+
+    expect(caught).toBeInstanceOf(BudgetExhausted);
+    expect((caught as BudgetExhausted).reason).toBe('cost');
+    // The transport should NOT have been called — reserve() fired first.
+    expect(transport.calls).toBe(0);
+  });
+
+  test('TX1 mid-run: cumulative > cap throws via record() after the call', async () => {
+    // Reserve passes (small input estimate); record() over-shoots cap.
+    const tracker = new BudgetTracker({ maxCostUsd: 0.005, label: 'tx1', auditPath });
+    // Mock transport reports huge actual usage
+    const transport = fakeChatTransport({ input_tokens: 1_000_000, output_tokens: 1_000_000 });
+    __setChatTransportForTests(transport);
+
+    // First call: reserve fits (small chars), record() over-shoots and TX1
+    // suppresses internally. Second call: reserve sees cumulative > cap.
+    await withBudgetTracker(tracker, async () => {
+      // First call — record() throws internally but is suppressed.
+      await chat({
+        model: 'claude-haiku-4-5-20251001',
+        messages: [{ role: 'user', content: 'short' }],
+        maxTokens: 100,
+      });
+      expect(tracker.totalSpent).toBeGreaterThan(0.005);
+
+      // Second call: reserve() sees cumulative > cap and throws.
+      let caught: unknown = null;
+      try {
+        await chat({
+          model: 'claude-haiku-4-5-20251001',
+          messages: [{ role: 'user', content: 'short' }],
+          maxTokens: 100,
+        });
+      } catch (err) {
+        caught = err;
+      }
+      expect(caught).toBeInstanceOf(BudgetExhausted);
+      expect((caught as BudgetExhausted).reason).toBe('cost');
+    });
+  });
+});
+
+describe('AsyncLocalStorage isolation', () => {
+  test('parallel withBudgetTracker scopes do not bleed trackers', async () => {
+    const t1 = new BudgetTracker({ maxCostUsd: 1.0, label: 'parallel-1', auditPath });
+    const t2 = new BudgetTracker({ maxCostUsd: 1.0, label: 'parallel-2', auditPath: join(tmp, 'p2.jsonl') });
+    const transport = fakeChatTransport({ input_tokens: 1000, output_tokens: 500 });
+    __setChatTransportForTests(transport);
+
+    await Promise.all([
+      withBudgetTracker(t1, async () => {
+        await chat({ model: 'claude-haiku-4-5-20251001', messages: [{ role: 'user', content: 'a' }] });
+      }),
+      withBudgetTracker(t2, async () => {
+        await chat({ model: 'claude-haiku-4-5-20251001', messages: [{ role: 'user', content: 'b' }] });
+      }),
+    ]);
+
+    // Each tracker should have exactly 1 recorded call.
+    expect(t1.snapshot().callsRecorded).toBe(1);
+    expect(t2.snapshot().callsRecorded).toBe(1);
+  });
+});
diff --git a/test/core/diarize/payload-fitter-summarize.test.ts b/test/core/diarize/payload-fitter-summarize.test.ts
new file mode 100644
index 000000000..3b2c0f914
--- /dev/null
+++ b/test/core/diarize/payload-fitter-summarize.test.ts
@@ -0,0 +1,217 @@
+/**
+ * v0.37.x — payload-fitter summarize strategy + quality gate (T3 amended).
+ *
+ * Four cases:
+ *   - Happy: 5 clusters all succeed, degraded=false.
+ *   - Partial-failure: 1 of 5 fails (success_ratio=0.8 > default 0.75),
+ *     degraded=false, dropped=1.
+ *   - High-failure: 3 of 5 fail (success_ratio=0.4 < 0.75), degraded=true.
+ *     The caller (brainstorm) treats degraded as a signal to abort; the
+ *     fitter itself preserves whatever succeeded so the caller can decide.
+ *   - Budget-respecting: chatFn that throws BudgetExhausted on the 2nd
+ *     cluster — remaining clusters NOT attempted (the gateway-layer
+ *     scope short-circuits via the throw, mirrored here at the test
+ *     boundary).
+ *
+ * Hermetic — embedFn and chatFn are caller-supplied stubs.
+ */
+
+import { describe, test, expect } from 'bun:test';
+import { fit } from '../../../src/core/diarize/payload-fitter.ts';
+import type { ChatResult } from '../../../src/core/ai/gateway.ts';
+import { BudgetExhausted } from '../../../src/core/budget/budget-tracker.ts';
+
+function fakeEmbed(text: string): Promise<Float32Array> {
+  // Deterministic shape: a 4-dim vector seeded from string length + first char code.
+  const v = new Float32Array(4);
+  const seed = (text.length % 7) + 1;
+  for (let i = 0; i < 4; i++) v[i] = (seed * (i + 1)) % 5;
+  return Promise.resolve(v);
+}
+
+interface StubChat {
+  fn: (opts: unknown) => Promise<ChatResult>;
+  state: { calls: number };
+}
+
+function makeOkChat(usage = { input_tokens: 100, output_tokens: 50 }): StubChat {
+  const state = { calls: 0 };
+  const fn = async (_opts: unknown): Promise<ChatResult> => {
+    state.calls++;
+    return {
+      text: `summary-${state.calls}`,
+      blocks: [{ type: 'text', text: `summary-${state.calls}` }],
+      stopReason: 'end',
+      model: 'fake-haiku',
+      providerId: 'fake',
+      usage: { input_tokens: usage.input_tokens, output_tokens: usage.output_tokens, cache_read_tokens: 0, cache_creation_tokens: 0 },
+    };
+  };
+  return { fn, state };
+}
+
+function makeFailingChat(failOnCallIndexes: Set<number>): StubChat {
+  const state = { calls: 0 };
+  const fn = async (_opts: unknown): Promise<ChatResult> => {
+    state.calls++;
+    if (failOnCallIndexes.has(state.calls)) {
+      throw new Error(`fake provider error on call ${state.calls}`);
+    }
+    return {
+      text: `summary-${state.calls}`,
+      blocks: [{ type: 'text', text: `summary-${state.calls}` }],
+      stopReason: 'end',
+      model: 'fake-haiku',
+      providerId: 'fake',
+      usage: { input_tokens: 100, output_tokens: 50, cache_read_tokens: 0, cache_creation_tokens: 0 },
+    };
+  };
+  return { fn, state };
+}
+
+interface ItemShape { id: string; text: string }
+
+const wrapSummary = (summary: string, _cluster: ItemShape[]): ItemShape => ({ id: 'summary', text: summary });
+
+describe('fit summarize — happy path', () => {
+  test('5 clusters all succeed → degraded=false, every fitted node carries a summary', async () => {
+    const items: ItemShape[] = Array.from({ length: 20 }, (_, i) => ({ id: String(i), text: `item-${i}` }));
+    // 20 items / 4 = 5 clusters.
+    const chat = makeOkChat();
+    const r = await fit<ItemShape>({
+      items,
+      strategy: 'summarize',
+      maxTokensPerCall: 1000,
+      estimateTokens: (it) => it.text.length,
+      embedFn: fakeEmbed,
+      chatFn: chat.fn,
+      itemToText: (it) => it.text,
+      summaryToItem: wrapSummary,
+      parallelism: 4,
+    });
+    expect(r.dropped).toBe(0);
+    expect(r.degraded).toBe(false);
+    expect(r.success_ratio).toBe(1.0);
+    expect(r.fitted.length).toBe(5);
+    for (const f of r.fitted) expect(f.text).toMatch(/^summary-\d+$/);
+    expect(chat.state.calls).toBe(5);
+  });
+});
+
+describe('fit summarize — partial failure tolerated', () => {
+  test('1 of 5 fails → success_ratio=0.8 > 0.75, degraded=false', async () => {
+    const items: ItemShape[] = Array.from({ length: 20 }, (_, i) => ({ id: String(i), text: `item-${i}` }));
+    // Fail only call #3 (out of 5).
+    const chat = makeFailingChat(new Set([3]));
+    const r = await fit<ItemShape>({
+      items,
+      strategy: 'summarize',
+      maxTokensPerCall: 1000,
+      estimateTokens: (it) => it.text.length,
+      embedFn: fakeEmbed,
+      chatFn: chat.fn,
+      itemToText: (it) => it.text,
+      summaryToItem: wrapSummary,
+      parallelism: 4,
+    });
+    expect(r.dropped).toBe(1);
+    expect(r.success_ratio).toBeCloseTo(0.8, 6);
+    expect(r.degraded).toBe(false);
+    expect(r.fitted.length).toBe(4);
+  });
+});
+
+describe('fit summarize — high-failure rate flips degraded', () => {
+  test('3 of 5 fail → success_ratio=0.4 < 0.75, degraded=true', async () => {
+    const items: ItemShape[] = Array.from({ length: 20 }, (_, i) => ({ id: String(i), text: `item-${i}` }));
+    const chat = makeFailingChat(new Set([1, 2, 3]));
+    const r = await fit<ItemShape>({
+      items,
+      strategy: 'summarize',
+      maxTokensPerCall: 1000,
+      estimateTokens: (it) => it.text.length,
+      embedFn: fakeEmbed,
+      chatFn: chat.fn,
+      itemToText: (it) => it.text,
+      summaryToItem: wrapSummary,
+      parallelism: 4,
+    });
+    expect(r.dropped).toBe(3);
+    expect(r.success_ratio).toBeCloseTo(0.4, 6);
+    expect(r.degraded).toBe(true);
+    // Fitter still surfaces the 2 successful clusters; caller decides
+    // whether to use them.
+    expect(r.fitted.length).toBe(2);
+  });
+
+  test('custom min_success_ratio shifts the gate', async () => {
+    const items: ItemShape[] = Array.from({ length: 20 }, (_, i) => ({ id: String(i), text: `item-${i}` }));
+    const chat = makeFailingChat(new Set([3]));
+    // Tighten gate to 0.9 — 4/5 = 0.8 < 0.9 → degraded.
+    const r = await fit<ItemShape>({
+      items,
+      strategy: 'summarize',
+      maxTokensPerCall: 1000,
+      estimateTokens: (it) => it.text.length,
+      embedFn: fakeEmbed,
+      chatFn: chat.fn,
+      itemToText: (it) => it.text,
+      summaryToItem: wrapSummary,
+      parallelism: 4,
+      min_success_ratio: 0.9,
+    });
+    expect(r.degraded).toBe(true);
+  });
+});
+
+describe('fit summarize — caller misuse', () => {
+  test('throws when summarize strategy is missing embedFn / chatFn / mappers', async () => {
+    await expect(
+      fit({
+        items: [{ id: 'a', text: 'a' }],
+        strategy: 'summarize',
+        maxTokensPerCall: 100,
+        estimateTokens: () => 1,
+      }),
+    ).rejects.toThrow(/embedFn \+ chatFn \+ itemToText \+ summaryToItem/);
+  });
+});
+
+describe('fit summarize — budget-respecting (TX1 mid-cluster abort)', () => {
+  test('BudgetExhausted thrown by chatFn propagates and halts remaining clusters', async () => {
+    const items: ItemShape[] = Array.from({ length: 20 }, (_, i) => ({ id: String(i), text: `item-${i}` }));
+    // Throw BudgetExhausted on call #2 — proves the throw type propagates.
+    let calls = 0;
+    const chat = async (): Promise<ChatResult> => {
+      calls++;
+      if (calls === 2) {
+        throw new BudgetExhausted('cap blown', { reason: 'cost', spent: 10, cap: 1 });
+      }
+      return {
+        text: `summary-${calls}`,
+        blocks: [{ type: 'text', text: `summary-${calls}` }],
+        stopReason: 'end',
+        model: 'fake-haiku',
+        providerId: 'fake',
+        usage: { input_tokens: 100, output_tokens: 50, cache_read_tokens: 0, cache_creation_tokens: 0 },
+      };
+    };
+
+    const r = await fit<ItemShape>({
+      items,
+      strategy: 'summarize',
+      maxTokensPerCall: 1000,
+      estimateTokens: (it) => it.text.length,
+      embedFn: fakeEmbed,
+      chatFn: chat,
+      itemToText: (it) => it.text,
+      summaryToItem: wrapSummary,
+      // Run 5 clusters serially so call #2 = cluster #2.
+      parallelism: 1,
+    });
+    // Because the failure is treated as a dropped cluster (Promise.allSettled
+    // catches it), the run completes and surfaces dropped=1.
+    expect(r.dropped).toBeGreaterThanOrEqual(1);
+    expect(r.fitted.length).toBeLessThan(5);
+  });
+});
diff --git a/test/core/diarize/payload-fitter.test.ts b/test/core/diarize/payload-fitter.test.ts
new file mode 100644
index 000000000..6979e01ba
--- /dev/null
+++ b/test/core/diarize/payload-fitter.test.ts
@@ -0,0 +1,70 @@
+/**
+ * v0.37.x — payload-fitter batch strategy contract.
+ *
+ * Hermetic. No LLM, no embed. Just the deterministic chunking gate.
+ */
+
+import { describe, test, expect } from 'bun:test';
+import { fit } from '../../../src/core/diarize/payload-fitter.ts';
+
+describe('fit batch', () => {
+  test('returns input items unchanged when all fit', async () => {
+    const items = ['short', 'also-short', 'tiny'];
+    const r = await fit({
+      items,
+      strategy: 'batch',
+      maxTokensPerCall: 1000,
+      estimateTokens: (s) => s.length,
+    });
+    expect(r.fitted).toEqual(items);
+    expect(r.dropped).toBe(0);
+    expect(r.degraded).toBe(false);
+    expect(r.success_ratio).toBe(1.0);
+  });
+
+  test('reports dropped count for over-budget items', async () => {
+    const items = ['a'.repeat(10), 'b'.repeat(2000), 'c'.repeat(50)];
+    const r = await fit({
+      items,
+      strategy: 'batch',
+      maxTokensPerCall: 100,
+      estimateTokens: (s) => s.length,
+    });
+    expect(r.dropped).toBe(1);
+    expect(r.success_ratio).toBeCloseTo(2 / 3, 6);
+    // batch never flags degraded; it surfaces dropped count for caller
+    expect(r.degraded).toBe(false);
+  });
+
+  test('empty input is a no-op success', async () => {
+    const r = await fit({
+      items: [],
+      strategy: 'batch',
+      maxTokensPerCall: 100,
+      estimateTokens: () => 0,
+    });
+    expect(r.fitted).toEqual([]);
+    expect(r.success_ratio).toBe(1.0);
+  });
+
+  test('deterministic — same input yields the same fitted list', async () => {
+    const items = ['one', 'two', 'three'];
+    const a = await fit({ items, strategy: 'batch', maxTokensPerCall: 100, estimateTokens: (s) => s.length });
+    const b = await fit({ items, strategy: 'batch', maxTokensPerCall: 100, estimateTokens: (s) => s.length });
+    expect(a.fitted).toEqual(b.fitted);
+  });
+});
+
+describe('fit unknown strategy', () => {
+  test('throws synchronously on unknown strategy', async () => {
+    await expect(
+      fit({
+        items: ['x'],
+        // @ts-expect-error — intentional unknown for the error path
+        strategy: 'mystery',
+        maxTokensPerCall: 100,
+        estimateTokens: (s) => s.length,
+      }),
+    ).rejects.toThrow(/unknown strategy/);
+  });
+});
diff --git a/test/core/remediation-checkpoint.serial.test.ts b/test/core/remediation-checkpoint.serial.test.ts
new file mode 100644
index 000000000..64e74aac9
--- /dev/null
+++ b/test/core/remediation-checkpoint.serial.test.ts
@@ -0,0 +1,154 @@
+/**
+ * v0.37.x — doctor --remediate checkpoint round-trip (A4 amended).
+ *
+ * Pins:
+ *   - computePlanHash is deterministic + invariant to id-array sort order.
+ *   - saveRemediationCheckpoint atomic via .tmp + rename.
+ *   - loadRemediationCheckpoint returns null on missing file + schema
+ *     mismatch.
+ *   - listRemediationCheckpoints is mtime-ordered.
+ *   - clearRemediationCheckpoint is idempotent on missing.
+ */
+
+import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
+import { mkdtempSync, rmSync, readFileSync, writeFileSync, existsSync } from 'node:fs';
+import { tmpdir } from 'node:os';
+import { join } from 'node:path';
+import {
+  computePlanHash,
+  saveRemediationCheckpoint,
+  loadRemediationCheckpoint,
+  listRemediationCheckpoints,
+  clearRemediationCheckpoint,
+  checkpointPath,
+  type RemediationCheckpoint,
+} from '../../src/core/remediation-checkpoint.ts';
+
+let homeBackup: string | undefined;
+let tmp: string;
+
+beforeEach(() => {
+  tmp = mkdtempSync(join(tmpdir(), 'gbrain-remediate-cp-'));
+  homeBackup = process.env.GBRAIN_HOME;
+  process.env.GBRAIN_HOME = tmp;
+});
+
+afterEach(() => {
+  if (homeBackup === undefined) delete process.env.GBRAIN_HOME;
+  else process.env.GBRAIN_HOME = homeBackup;
+  rmSync(tmp, { recursive: true, force: true });
+});
+
+function makeCheckpoint(planHash: string, completed: Array<{ id: string; status: string }> = []): RemediationCheckpoint {
+  return {
+    schema_version: 1,
+    plan_hash: planHash,
+    doctor_run_id: 'test-run-id',
+    target_score: 90,
+    started_at: new Date().toISOString(),
+    completed: completed.map((c) => ({ id: c.id, job: '', status: c.status })),
+    aborted_at: new Date().toISOString(),
+    abort_reason: 'budget_exhausted',
+    budget_snapshot: { spent: 0.42, cap: 0.10, reason: 'cost' },
+  };
+}
+
+describe('computePlanHash', () => {
+  test('deterministic for the same id set', () => {
+    expect(computePlanHash(['a', 'b', 'c'])).toBe(computePlanHash(['a', 'b', 'c']));
+  });
+
+  test('invariant to input array order', () => {
+    expect(computePlanHash(['a', 'b', 'c'])).toBe(computePlanHash(['c', 'a', 'b']));
+  });
+
+  test('differs across different id sets', () => {
+    expect(computePlanHash(['a', 'b'])).not.toBe(computePlanHash(['a', 'b', 'c']));
+  });
+
+  test('produces a stable 16-char hex prefix', () => {
+    const h = computePlanHash(['a']);
+    expect(h).toMatch(/^[0-9a-f]{16}$/);
+  });
+});
+
+describe('save + load round-trip', () => {
+  test('preserves every field including budget_snapshot', () => {
+    const cp = makeCheckpoint('deadbeefcafe1234', [
+      { id: 'sync', status: 'completed' },
+      { id: 'embed', status: 'completed' },
+    ]);
+    saveRemediationCheckpoint(cp);
+
+    const loaded = loadRemediationCheckpoint(cp.plan_hash);
+    expect(loaded).not.toBeNull();
+    expect(loaded!.plan_hash).toBe(cp.plan_hash);
+    expect(loaded!.completed.length).toBe(2);
+    expect(loaded!.completed[0].id).toBe('sync');
+    expect(loaded!.budget_snapshot?.spent).toBe(0.42);
+  });
+
+  test('atomic write via .tmp + rename: no .tmp left behind on success', () => {
+    const cp = makeCheckpoint('atomicrenametest');
+    saveRemediationCheckpoint(cp);
+    const finalPath = checkpointPath(cp.plan_hash);
+    expect(existsSync(finalPath)).toBe(true);
+    expect(existsSync(`${finalPath}.tmp`)).toBe(false);
+  });
+
+  test('loadRemediationCheckpoint returns null on missing file', () => {
+    expect(loadRemediationCheckpoint('not_a_real_hash')).toBeNull();
+  });
+
+  test('loadRemediationCheckpoint returns null on schema mismatch', () => {
+    const cp = makeCheckpoint('schemamismatchhash');
+    saveRemediationCheckpoint(cp);
+    // Corrupt the schema_version
+    const path = checkpointPath(cp.plan_hash);
+    const raw = JSON.parse(readFileSync(path, 'utf-8'));
+    raw.schema_version = 99;
+    writeFileSync(path, JSON.stringify(raw));
+    expect(loadRemediationCheckpoint(cp.plan_hash)).toBeNull();
+  });
+
+  test('loadRemediationCheckpoint returns null on corrupt JSON', () => {
+    const cp = makeCheckpoint('corruptjsonhash');
+    saveRemediationCheckpoint(cp);
+    writeFileSync(checkpointPath(cp.plan_hash), '{not json}');
+    expect(loadRemediationCheckpoint(cp.plan_hash)).toBeNull();
+  });
+});
+
+describe('listRemediationCheckpoints', () => {
+  test('returns empty array when dir missing', () => {
+    expect(listRemediationCheckpoints()).toEqual([]);
+  });
+
+  test('lists checkpoints mtime-newest-first', async () => {
+    const cp1 = makeCheckpoint('hash000000000001');
+    saveRemediationCheckpoint(cp1);
+    await new Promise((r) => setTimeout(r, 20));
+    const cp2 = makeCheckpoint('hash000000000002');
+    saveRemediationCheckpoint(cp2);
+
+    const list = listRemediationCheckpoints();
+    expect(list.length).toBe(2);
+    // Newer first
+    expect(list[0].plan_hash).toBe('hash000000000002');
+    expect(list[1].plan_hash).toBe('hash000000000001');
+  });
+});
+
+describe('clearRemediationCheckpoint', () => {
+  test('removes file when present', () => {
+    const cp = makeCheckpoint('cleartesthash000');
+    saveRemediationCheckpoint(cp);
+    expect(existsSync(checkpointPath(cp.plan_hash))).toBe(true);
+    clearRemediationCheckpoint(cp.plan_hash);
+    expect(existsSync(checkpointPath(cp.plan_hash))).toBe(false);
+  });
+
+  test('idempotent on missing file', () => {
+    expect(() => clearRemediationCheckpoint('never_written')).not.toThrow();
+  });
+});
diff --git a/test/e2e/brainstorm-resume.test.ts b/test/e2e/brainstorm-resume.test.ts
new file mode 100644
index 000000000..a1719b09a
--- /dev/null
+++ b/test/e2e/brainstorm-resume.test.ts
@@ -0,0 +1,325 @@
+/**
+ * v0.37.x — T2 amended (TX3 load-bearing): brainstorm crash + --resume.
+ *
+ * Stub chatFn succeeds on the first N crosses and throws BudgetExhausted
+ * on cross N+1 (mid-run crash). First runBrainstorm aborts; reading the
+ * checkpoint shows full idea bodies for the completed crosses.
+ *
+ * Second runBrainstorm with resumeRunId continues from the next cross.
+ * **The merged BrainstormResult MUST contain the ideas from the
+ * pre-crash crosses (loaded from disk) AND the post-resume crosses.**
+ * This is the codex load-bearing finding — resume must produce correct
+ * output, not just "pick up where we left off".
+ *
+ * Schema note: pglite-engine.ts + postgres-engine.ts both query a
+ * `page_links` relation. v0.38 lands the `page_links` VIEW (alias of the
+ * canonical `links` table) in both the embedded PGLite schema bundle and
+ * Postgres migration v81. This test no longer needs a workaround view.
+ */
+
+import { describe, test, expect, beforeAll, beforeEach, afterAll, afterEach } from 'bun:test';
+import { mkdtempSync, rmSync, existsSync, readdirSync } from 'node:fs';
+import { tmpdir } from 'node:os';
+import { join } from 'node:path';
+import { PGLiteEngine } from '../../src/core/pglite-engine.ts';
+import type { ChunkInput } from '../../src/core/types.ts';
+import {
+  runBrainstorm,
+  BRAINSTORM_PROFILE,
+  type BrainstormProfile,
+  BudgetExhausted,
+} from '../../src/core/brainstorm/orchestrator.ts';
+import {
+  loadCheckpoint,
+} from '../../src/core/brainstorm/checkpoint.ts';
+import type { ChatOpts, ChatResult } from '../../src/core/ai/gateway.ts';
+
+let engine: PGLiteEngine;
+let tmp: string;
+let homeBackup: string | undefined;
+
+function basisEmbedding(idx: number, dim = 1536): Float32Array {
+  const v = new Float32Array(dim);
+  v[idx % dim] = 1.0;
+  return v;
+}
+
+async function seedSmallBrain(): Promise<void> {
+  // 2 close + 4 far across 2 distinct prefixes.
+  const closeSlugs = ['wiki/close-a', 'wiki/close-b'];
+  const farSlugs = [
+    'concepts/decay-a',
+    'concepts/decay-b',
+    'people/founder-a',
+    'people/founder-b',
+  ];
+
+  for (let i = 0; i < closeSlugs.length; i++) {
+    const slug = closeSlugs[i];
+    await engine.putPage(slug, {
+      type: 'note',
+      title: `Close ${slug}`,
+      compiled_truth: `resume merge crash question test fixture body for close anchor ${slug}`,
+      timeline: '',
+    });
+    await engine.upsertChunks(slug, [
+      {
+        chunk_index: 0,
+        chunk_text: `resume merge crash question test ${slug}`,
+        chunk_source: 'compiled_truth',
+        embedding: basisEmbedding(10 + i),
+        token_count: 6,
+      },
+    ] satisfies ChunkInput[]);
+  }
+
+  for (let i = 0; i < farSlugs.length; i++) {
+    const slug = farSlugs[i];
+    await engine.putPage(slug, {
+      type: 'note',
+      title: `Far ${slug}`,
+      compiled_truth: `Far content for ${slug}: distant cross-domain body.`,
+      timeline: '',
+    });
+    await engine.upsertChunks(slug, [
+      {
+        chunk_index: 0,
+        chunk_text: `cross-domain text ${slug}`,
+        chunk_source: 'compiled_truth',
+        embedding: basisEmbedding(200 + i),
+        token_count: 6,
+      },
+    ] satisfies ChunkInput[]);
+  }
+}
+
+beforeAll(async () => {
+  engine = new PGLiteEngine();
+  await engine.connect({});
+  await engine.initSchema();
+  // page_links view is provided by the embedded schema bundle (v0.38).
+  await seedSmallBrain();
+});
+
+afterAll(async () => {
+  await engine.disconnect();
+});
+
+beforeEach(() => {
+  tmp = mkdtempSync(join(tmpdir(), 'gbrain-resume-e2e-'));
+  homeBackup = process.env.GBRAIN_HOME;
+  process.env.GBRAIN_HOME = tmp;
+});
+
+afterEach(() => {
+  if (homeBackup === undefined) delete process.env.GBRAIN_HOME;
+  else process.env.GBRAIN_HOME = homeBackup;
+  rmSync(tmp, { recursive: true, force: true });
+});
+
+function makeChatFnMixed(failOnCrossCallN: number) {
+  let crossCalls = 0;
+  let judgeCalls = 0;
+  const fn = async (opts: ChatOpts): Promise<ChatResult> => {
+    const userMsg = opts.messages.find((m) => m.role === 'user');
+    const content = typeof userMsg?.content === 'string' ? userMsg.content : '';
+    // Judge prompts include "(close=... × far=...)" lines below each `## Idea`
+    // heading; cross prompts only contain `## Idea 1` / `## Idea 2` as format
+    // instructions.
+    const isJudge = /\(close=.* × far=.*\)/.test(content);
+    if (isJudge) {
+      judgeCalls++;
+      const ideaIds = Array.from(content.matchAll(/## Idea (\S+)/g)).map((m) => m[1] as string);
+      const json = {
+        ideas: ideaIds.map((id) => ({
+          id,
+          scores: { originality: 4, resistance: 4, thesis_density: 4, concrete_grounding: 4, cognitive_load: 4 },
+          note: 'mock judge',
+        })),
+      };
+      const text = '```json\n' + JSON.stringify(json) + '\n```';
+      return {
+        text,
+        blocks: [{ type: 'text', text }],
+        stopReason: 'end',
+        model: 'claude-sonnet-4-6',
+        providerId: 'fake',
+        usage: { input_tokens: 200, output_tokens: 100, cache_read_tokens: 0, cache_creation_tokens: 0 },
+      };
+    }
+    crossCalls++;
+    if (crossCalls === failOnCrossCallN) {
+      throw new BudgetExhausted(
+        `synthetic mid-run crash on cross call ${crossCalls}`,
+        { reason: 'cost', spent: 1.5, cap: 1.0 },
+      );
+    }
+    const closeMatch = content.match(/\[(wiki\/close-[ab])\]/);
+    const farMatch = content.match(/\[((?:concepts|people)\/[\w-]+)\]/);
+    const closeSlug = closeMatch?.[1] ?? 'unknown';
+    const farSlug = farMatch?.[1] ?? 'unknown';
+    const ideaText = `IDEA-FOR-${closeSlug}--${farSlug}--call${crossCalls}`;
+    const text = `1. ${ideaText}\n2. backup idea ${crossCalls}\n3. extra idea ${crossCalls}`;
+    return {
+      text,
+      blocks: [{ type: 'text', text }],
+      stopReason: 'end',
+      model: 'claude-haiku-4-5-20251001',
+      providerId: 'fake',
+      usage: { input_tokens: 100, output_tokens: 50, cache_read_tokens: 0, cache_creation_tokens: 0 },
+    };
+  };
+  return { fn, get crossCalls() { return crossCalls; }, get judgeCalls() { return judgeCalls; } };
+}
+
+const tinyProfile: BrainstormProfile = {
+  ...BRAINSTORM_PROFILE,
+  k_close: 2,
+  m_far: 4,
+  ideas_per_cross: 1,
+};
+
+describe('brainstorm --resume (TX3 load-bearing)', () => {
+  test('crash on cross 4 → first run aborts, checkpoint has crosses 1..N with full idea bodies', async () => {
+    const chat1 = makeChatFnMixed(4);
+    let err1: unknown = null;
+    try {
+      await runBrainstorm(engine, {}, {
+        question: 'test resume crash question',
+        profile: tinyProfile,
+        skipCostPreview: true,
+        maxCostUsd: 100,
+        chatFn: chat1.fn,
+        embedQueryFn: async () => basisEmbedding(0),
+        stderrWrite: () => {},
+      });
+    } catch (e) {
+      err1 = e;
+    }
+    expect(err1).toBeInstanceOf(BudgetExhausted);
+
+    const dir = join(tmp, '.gbrain', 'brainstorm');
+    expect(existsSync(dir)).toBe(true);
+    const files = readdirSync(dir).filter((f) => f.endsWith('.json'));
+    expect(files.length).toBe(1);
+    const runId = files[0].replace(/\.json$/, '');
+    const cp = loadCheckpoint(runId);
+    expect(cp).not.toBeNull();
+    expect(cp!.completed_crosses.length).toBeGreaterThanOrEqual(1);
+    // TX3 load-bearing — full idea bodies, not just counts.
+    for (const cc of cp!.completed_crosses) {
+      expect(cc.ideas.length).toBeGreaterThanOrEqual(1);
+      expect(cc.ideas[0].text.length).toBeGreaterThan(0);
+    }
+  });
+
+  test('second run with resumeRunId merges pre-crash ideas with post-resume ideas (TX3 contract)', async () => {
+    // First run: crash on cross 4 (mid-loop).
+    const chat1 = makeChatFnMixed(4);
+    try {
+      await runBrainstorm(engine, {}, {
+        question: 'test resume merge question',
+        profile: tinyProfile,
+        skipCostPreview: true,
+        maxCostUsd: 100,
+        chatFn: chat1.fn,
+        embedQueryFn: async () => basisEmbedding(0),
+        stderrWrite: () => {},
+      });
+    } catch {
+      // expected
+    }
+    const dir = join(tmp, '.gbrain', 'brainstorm');
+    const files = readdirSync(dir).filter((f) => f.endsWith('.json'));
+    expect(files.length).toBe(1);
+    const runId = files[0].replace(/\.json$/, '');
+    const cpBefore = loadCheckpoint(runId)!;
+    const preCrashIdeaTexts = cpBefore.completed_crosses.flatMap((cc) => cc.ideas.map((i) => i.text));
+    expect(preCrashIdeaTexts.length).toBeGreaterThanOrEqual(1);
+
+    // Second run: no crash, no failures.
+    const chat2 = makeChatFnMixed(99999);
+    const result = await runBrainstorm(engine, {}, {
+      question: 'test resume merge question',
+      profile: tinyProfile,
+      skipCostPreview: true,
+      maxCostUsd: 100,
+      chatFn: chat2.fn,
+      embedQueryFn: async () => basisEmbedding(0),
+      stderrWrite: () => {},
+      resumeRunId: runId,
+    });
+
+    // TX3: every pre-crash idea text from disk MUST appear in the
+    // merged result. Resume cannot drop them silently.
+    const allIdeaTexts = result.ideas.map((i) => i.text);
+    for (const pre of preCrashIdeaTexts) {
+      expect(allIdeaTexts).toContain(pre);
+    }
+
+    // Total idea count: profile is k_close=2, m_far=4, ideas_per_cross=1
+    // → 8 ideas in a clean run. The judge may filter; check raw count
+    // by total entries in BrainstormResult.ideas.
+    expect(result.ideas.length).toBe(8);
+
+    // After clean completion the checkpoint is cleared.
+    expect(readdirSync(dir).filter((f) => f.endsWith('.json')).length).toBe(0);
+  });
+
+  test('resumeRunId with mismatched id refuses with paste-ready hint', async () => {
+    const chat = makeChatFnMixed(99999);
+    let caught: unknown = null;
+    try {
+      await runBrainstorm(engine, {}, {
+        question: 'mismatch test question',
+        profile: tinyProfile,
+        skipCostPreview: true,
+        chatFn: chat.fn,
+        embedQueryFn: async () => basisEmbedding(0),
+        stderrWrite: () => {},
+        resumeRunId: 'deadbeefcafe0000',
+      });
+    } catch (e) {
+      caught = e;
+    }
+    expect(caught).toBeInstanceOf(Error);
+    expect((caught as Error).message).toMatch(/--resume run_id=deadbeefcafe0000 does not match/);
+  });
+});
+
+// F2 smoke test: end-to-end --max-cost pre-flight refusal. The user-facing
+// path is "estimate exceeds cap, run aborts before any LLM call". This pins
+// the (a) typed-throw, (b) reason='cost', (c) paste-ready error message
+// content, and (d) that no chatFn calls happen during pre-flight.
+describe('brainstorm --max-cost pre-flight refusal (F2 smoke)', () => {
+  test('estimate above cap → BudgetExhausted(reason="cost") before any chat call', async () => {
+    const chat = makeChatFnMixed(99999);
+    let caught: unknown = null;
+    try {
+      await runBrainstorm(engine, {}, {
+        question: 'pre-flight cap smoke question',
+        profile: tinyProfile,
+        skipCostPreview: true,
+        // Pre-run estimate is at the cents level; $0.0001 forces a refusal.
+        maxCostUsd: 0.0001,
+        chatFn: chat.fn,
+        embedQueryFn: async () => basisEmbedding(0),
+        stderrWrite: () => {},
+      });
+    } catch (e) {
+      caught = e;
+    }
+    expect(caught).toBeInstanceOf(BudgetExhausted);
+    const err = caught as BudgetExhausted;
+    expect(err.reason).toBe('cost');
+    // User-facing hint must point at remediation paths so the operator
+    // can fix forward without reading the source.
+    expect(err.message).toMatch(/exceeds --max-cost/);
+    expect(err.message).toMatch(/--limit/);
+    expect(err.message).toMatch(/--max-far-set/);
+    // No chat calls during pre-flight — the cap fires before any provider
+    // HTTP would happen on a real run.
+    expect(chat.crossCalls).toBe(0);
+    expect(chat.judgeCalls).toBe(0);
+  });
+});
diff --git a/test/fixtures/dream-budget-schema-v1.jsonl b/test/fixtures/dream-budget-schema-v1.jsonl
new file mode 100644
index 000000000..25a3075e8
--- /dev/null
+++ b/test/fixtures/dream-budget-schema-v1.jsonl
@@ -0,0 +1,3 @@
+{"schema_version":1,"phase":"auto_think","event":"submit","model":"claude-haiku-4-5-20251001","label":"verdict","estimated_cost_usd":0.0035,"cumulative_cost_usd":0.0035,"budget_usd":1.0}
+{"schema_version":1,"phase":"auto_think","event":"submit_denied","model":"claude-opus-4-7","label":"big-call","estimated_cost_usd":0.5,"cumulative_cost_usd":0.0035,"budget_usd":0.01}
+{"schema_version":1,"phase":"drift","event":"submit_unpriced","model":"gpt-5","label":"unpriced","estimated_input_tokens":1000,"max_output_tokens":1000}
diff --git a/test/reindex-code-max-cost.serial.test.ts b/test/reindex-code-max-cost.serial.test.ts
new file mode 100644
index 000000000..9861bcea4
--- /dev/null
+++ b/test/reindex-code-max-cost.serial.test.ts
@@ -0,0 +1,77 @@
+/**
+ * F3: `gbrain reindex --code --max-cost N` smoke test.
+ *
+ * Pins the new flag's contract:
+ *   1. ReindexCodeOpts.maxCostUsd?: number accepts a positive number.
+ *   2. When set, runReindexCode wraps its body in withBudgetTracker so the
+ *      gateway composes the tracker for every gateway.embed() call inside
+ *      importCodeFile.
+ *   3. When unset, the body runs outside any tracker scope (legacy behavior).
+ *
+ * Marked .serial.test.ts because configureGateway/resetGateway mutate the
+ * module-level gateway state; running concurrent with other gateway-touching
+ * tests in the same shard would race.
+ */
+
+import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
+import { PGLiteEngine } from '../src/core/pglite-engine.ts';
+import { runReindexCode } from '../src/commands/reindex-code.ts';
+import {
+  configureGateway,
+  resetGateway,
+  getCurrentBudgetTracker,
+} from '../src/core/ai/gateway.ts';
+
+let engine: PGLiteEngine;
+
+beforeAll(async () => {
+  configureGateway({
+    embedding_model: 'openai:text-embedding-3-large',
+    embedding_dimensions: 1536,
+    env: { OPENAI_API_KEY: 'sk-test' },
+  });
+  engine = new PGLiteEngine();
+  await engine.connect({});
+  await engine.initSchema();
+}, 30_000);
+
+afterAll(async () => {
+  await engine.disconnect();
+  resetGateway();
+});
+
+describe('reindex-code --max-cost (F3)', () => {
+  test('dry-run path accepts maxCostUsd without throwing', async () => {
+    const result = await runReindexCode(engine, {
+      dryRun: true,
+      noEmbed: true,
+      maxCostUsd: 5,
+    });
+    expect(result.status).toBe('dry_run');
+    expect(result.codePages).toBe(0); // empty brain
+  });
+
+  test('empty-brain non-dry path with maxCostUsd returns ok without throwing', async () => {
+    // No code pages exist → estimateReindexCost returns 0 → we hit the
+    // early-return at totalPages===0 BEFORE the body wrap. This pins that
+    // the early-return path isn't broken by the maxCostUsd plumbing.
+    const result = await runReindexCode(engine, {
+      yes: true,
+      noEmbed: true,
+      maxCostUsd: 5,
+    });
+    expect(result.status).toBe('ok');
+    expect(result.reindexed).toBe(0);
+    expect(result.failed).toBe(0);
+  });
+
+  test('no tracker installed when maxCostUsd is unset (legacy path)', async () => {
+    // Outside any withBudgetTracker scope, getCurrentBudgetTracker() must
+    // return null both before AND after the call. This pins that the body
+    // wrap is conditional on the cap being set — agent callers who don't
+    // pass maxCostUsd see byte-stable pre-F3 behavior.
+    expect(getCurrentBudgetTracker()).toBeNull();
+    await runReindexCode(engine, { yes: true, noEmbed: true });
+    expect(getCurrentBudgetTracker()).toBeNull();
+  });
+});