Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
6a3f24c
feat(brainstorm): T1 cost guardrails + judge chunking + far-set cap
garrytan May 21, 2026
a0eb4f4
Merge remote-tracking branch 'origin/master' into garrytan/shanghai-v3
garrytan May 21, 2026
1729b0e
feat(budget): T2 BudgetTracker + BudgetExhausted + audit-week helper
garrytan May 21, 2026
5179524
feat(gateway): T3 withBudgetTracker + AsyncLocalStorage composition
garrytan May 21, 2026
052b660
chore(audit): T4 migrate 4 audit writers to shared isoWeekFilename he…
garrytan May 21, 2026
75e0c74
feat(cycle): T5 BudgetMeter schema_version=1 + golden fixture (A2 ame…
garrytan May 21, 2026
9043a41
feat(eval): T6 wrap eval-contradictions runner in withBudgetTracker
garrytan May 21, 2026
87fdc3e
feat(doctor): T7 --remediate budget tracker + checkpoint + --resume (A4)
garrytan May 21, 2026
7468da8
docs(subagent): T8 A1 ordering ASCII diagram before acquireLease
garrytan May 21, 2026
ac5f4e1
feat(diarize): T9 payload-fitter (P6) with batch + summarize + gate
garrytan May 21, 2026
5cc3d3a
feat(brainstorm): T10 checkpoint + --resume with full idea bodies (P7)
garrytan May 21, 2026
966be2e
docs: T11 + T12 wave release docs + deferred follow-ups
garrytan May 21, 2026
1d378f6
fix(schema): F1 page_links view alias for both engines
garrytan May 21, 2026
8096118
test(brainstorm): F2 pre-flight --max-cost refusal smoke E2E
garrytan May 21, 2026
069e48d
feat(reindex-code): F3 --max-cost flag via withBudgetTracker
garrytan May 21, 2026
292cbb6
fix(schema): narrow page_links view projection to bootstrap-safe columns
garrytan May 21, 2026
af89486
chore: bump version to v0.39.0.0
garrytan May 21, 2026
4e512c1
test(isolation): rename 3 env-mutating tests to .serial.test.ts (CI fix)
garrytan May 21, 2026
e7b39cc
Merge remote-tracking branch 'origin/master' into garrytan/shanghai-v3
garrytan May 22, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,93 @@

All notable changes to GBrain will be documented in this file.

## [0.39.0.0] - 2026-05-21

**You can finally cap the cost of `gbrain brainstorm` and `gbrain lsd`, AND if the cap fires mid-run, you can resume right where you left off without losing the ideas you already paid for.**

The 13K-page brain incident that started this wave is real and was expensive. A `gbrain lsd` run estimated $0.96, actually billed $50.71, generated zero usable ideas. The fix wave already merged (PR #1234) capped the prefix sampling that caused the explosion. This release goes one cathedral further: every LLM call that any `gbrain` command makes is now accounted at the gateway layer, so the same cap that protects brainstorm also protects `doctor --remediate`, `eval suspected-contradictions`, the dream cycle, and any future LLM-calling command. The plumbing is shared.

What that means in the hand: pass `--max-cost N` to brainstorm or lsd or `doctor --remediate`, and the first overflow throws a typed error before any extra dollars are spent. The throw fires from inside the gateway's reserve check, so a budget exhaustion never even acquires a rate-lease slot or makes a provider HTTP call. The cap is a real ceiling, not a suggestion.

When brainstorm IS exhausted mid-run, the orchestrator persists what's been done to `~/.gbrain/brainstorm/<run_id>.json` with the FULL idea bodies (not just counts), then re-throws. The user paste-runs the suggested `gbrain brainstorm --resume <run_id>` and the second run skips the already-completed crosses, runs only the missing ones, then merges everything before the judge runs. The final BrainstormResult contains the pre-crash ideas AND the post-resume ideas. (Codex's outside-voice review was the one that caught this — a resume that produces only the second-run's ideas would be silent partial output, which is worse than no resume at all.)

### How to turn it on

```bash
# Cap brainstorm cost at $2 (default $5). Throws BudgetExhausted if exceeded.
gbrain brainstorm "what story should I write next" --max-cost 2

# Crash recovery — list saved runs, resume the one you want.
gbrain brainstorm --list-runs
gbrain brainstorm --resume 1a2b3c4d5e6f7890

# Bypass the 7-day staleness gate if you really mean it.
gbrain brainstorm --resume 1a2b3c4d5e6f7890 --force-resume

# Same cap, different command — doctor's autonomous remediation now resumes too.
gbrain doctor --remediate --max-cost 5
# (on BudgetExhausted, the run persists a checkpoint at
# ~/.gbrain/remediation/<plan_hash>.json and tells you the --resume command)
gbrain doctor --remediate --resume
```

### What's safe to know about

A4 amended is a semantic shift: `gbrain doctor --remediate --max-usd` used to be a pre-flight estimate check ("refuse if est > cap"); it's now ALSO a mid-run hard ceiling backed by BudgetTracker via the gateway's AsyncLocalStorage scope. If you cron-schedule `--remediate`, the worst case used to be "the run starts despite the under-estimate"; now the worst case is "the run aborts mid-step and writes a resumable checkpoint." The first failure-mode is gone; the second is recoverable via `--resume`. `--max-cost` is a new alias for `--max-usd` for symmetry with brainstorm.

The brainstorm checkpoint identity intentionally uses NO embedding bits: `run_id = sha256(question + profile + sort(close_slugs) + sort(far_slugs)).slice(0,16)`. Swap your embedding model between runs and the resume still finds the checkpoint. Conversely, change the question by even one word and you get a different run_id (the previous checkpoint is left alone; the cycle purge phase GCs anything older than 7 days).

The dream cycle's `~/.gbrain/audit/dream-budget-YYYY-Www.jsonl` grew one new field on every line: `schema_version: 1`. Reorderings are tolerated (downstream consumers should index by field name, not position); renames or removals are breaking. The same schema-stable contract holds for the new `~/.gbrain/audit/budget-YYYY-Www.jsonl` produced by the unified `BudgetTracker`.

If you wrote integration code against `BudgetExhausted` in the brainstorm orchestrator before this release: that class moved to `src/core/budget/budget-tracker.ts`. The orchestrator re-exports the old name for back-compat, so existing imports keep working.

### Itemized changes

- **`BudgetTracker` is the new canonical primitive** at `src/core/budget/budget-tracker.ts`. One class, one typed error (`BudgetExhausted` with `reason: 'cost' | 'runtime' | 'no_pricing'`), one schema-stable audit JSONL. Pinned by 18 unit cases covering TX1 (record throws when cumulative exceeds cap), TX2 (no_pricing hard-fails when cap is set + pricing missing), A3 amended (pessimistic fallback when `err.usage` is absent), the onExhausted-fires-once-before-throw contract, and the schema-stable audit schema.
- **`withBudgetTracker(tracker, fn)` at the gateway layer (TX5)** installs the tracker on a module-internal `AsyncLocalStorage<BudgetTracker>`. Every `gateway.chat / embed / rerank` call inside the scope auto-composes. Outside-scope calls are budget no-ops (existing behavior preserved). Nested scopes restore the outer on exit. Parallel `Promise.all` scopes do not bleed trackers across each other.
- **Subagent rate-lease ordering pinned (A1)**: the gateway's `reserve()` runs BEFORE `acquireLease()` in `src/core/minions/handlers/subagent.ts`. A budget throw must NOT consume a rate-lease slot. The handler body itself no longer needs explicit budget threading; the AsyncLocalStorage composition handles it.
- **`payload-fitter.ts` (P6)** lands at `src/core/diarize/payload-fitter.ts` with two strategies. `'batch'` is deterministic token-budgeted chunking, no LLM calls. `'summarize'` embed-clusters then Haiku-summarizes each cluster in parallel via `Promise.allSettled` at parallelism=4. The quality gate flags `degraded: true` when success ratio drops below the configured `min_success_ratio` (default 0.75) — caller decides whether to surface or abort.
- **Brainstorm checkpoint (P7)** at `src/core/brainstorm/checkpoint.ts`. Atomic .tmp+rename writes. Full idea bodies persisted (TX3). One-flag resume (TX4). 7-day mtime-based GC wired into the cycle purge phase.
- **`doctor --remediate --resume`** loads `~/.gbrain/remediation/<plan_hash>.json` and continues from the next un-completed step. Refuses on mismatched plan_hash with a paste-ready message.
- **`gbrain brainstorm --list-runs`** prints saved run_ids + iso dates + question stems so the user can pick which to resume.
- **ISO-week audit filenames consolidated** into `src/core/audit-week-file.ts`. Four call sites migrated (shell-jobs, phantoms, slug-fallback, dream-budget). Year-boundary cases (2020-W53, 2024-12-30 belongs to 2025-W01) pinned by tests.
- **eval-contradictions** routes through `withBudgetTracker` for telemetry without changing the CLI surface. `--budget-usd` semantics + `PreFlightBudgetError` shape are byte-identical.

### For contributors

- `bun test` adds 73 new tests across 9 new files (`test/core/budget/`, `test/core/audit-week-file.test.ts`, `test/core/diarize/`, `test/brainstorm/checkpoint.test.ts`, `test/e2e/brainstorm-resume.test.ts`, `test/core/remediation-checkpoint.test.ts`). Plus F1 closes the pre-existing PGLite `page_links` schema gap (the brainstorm domain-bank queries `page_links` but the embedded schema only defined `links`). Brainstorm now works against PGLite brains in production via the new `page_links` view alias shipped in both the embedded schema bundle and migration v81. F2 adds an E2E pinning the user-facing `--max-cost` pre-flight refusal path. F3 adds `--max-cost` to `gbrain reindex --code`. All previous brainstorm + doctor + eval-contradictions tests still pass.

## To take advantage of v0.39.0.0

`gbrain upgrade` should do this automatically. If it didn't, or if `gbrain doctor`
warns about a partial migration:

1. **Run the orchestrator manually:**
```bash
gbrain apply-migrations --yes
```
This applies migration v81 (`page_links_view_alias`) on PGLite + Postgres brains. The alias is required for `gbrain brainstorm` and `gbrain lsd` to work against the domain-bank tiebreaker; without it, the brainstorm domain-bank queries fail with `relation "page_links" does not exist`.
2. **Set a cost cap on the commands you care about:**
```bash
# Sets a per-run dollar ceiling. Throws BudgetExhausted before any LLM call
# if the pre-run estimate exceeds the cap, AND mid-run if cumulative spend
# blows past it.
gbrain brainstorm "test" --max-cost 1
gbrain doctor --remediate --max-cost 5
gbrain reindex --code --max-cost 10
```
3. **Verify the outcome:**
```bash
gbrain doctor # schema_version should be 81
gbrain brainstorm --list-runs # confirms the new checkpoint directory exists
```
4. **If any step fails or the numbers look wrong,** please file an issue:
https://github.com/garrytan/gbrain/issues with:
- output of `gbrain doctor`
- contents of `~/.gbrain/upgrade-errors.jsonl` if it exists
- which step broke

This feedback loop is how the gbrain maintainers find fragile upgrade paths. Thank you.
## [0.37.11.0] - 2026-05-21

**Fresh `gbrain init --pglite` works out of the box now.**
Expand Down
6 changes: 6 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,12 @@ strict behavior when unset.
- `src/core/ai/recipes/voyage.ts` — Voyage AI openai-compatible recipe. **v0.28.7 (#680):** declares `chars_per_token=1` + `safety_factor=0.5` so the gateway pre-splits Voyage batches at a 60K-character budget (50% of 120K-token cap with the dense-tokenizer ratio). Closes the v0.27 backfill loop where ~26% of the corpus stayed un-embedded because tiktoken-grounded budgeting silently undercounted Voyage's actual token usage. **v0.28.11 (#719):** declares `multimodal_models: ['voyage-multimodal-3']` so the gateway rejects text-only Voyage models pointed at the multimodal endpoint with a clear `AIConfigError` instead of waiting for Voyage's HTTP 400. **v0.33.1.1 (#962, fixup):** recipe docstring at `:7-16` tightened to name the seven hosted flexible-dim models that accept `output_dimension` explicitly (`voyage-4-large`, `voyage-4`, `voyage-4-lite`, `voyage-3-large`, `voyage-3.5`, `voyage-3.5-lite`, `voyage-code-3`) and call out that `voyage-4-nano` is the open-weight variant listed separately by Voyage as fixed 1024-dim — does NOT accept the parameter. The "all v4 variants are flexible" misread is what caused the original PR to include nano in `VOYAGE_OUTPUT_DIMENSION_MODELS`; the negative regression assertion in `test/ai/gateway.test.ts` (`dimsProviderOptions` returns `undefined` for `voyage-4-nano`) pins the contract. **v0.37.3.0:** `voyage-code-3` is the recommended embedding model for gstack per-worktree code brains (Topology 3 in `docs/architecture/topologies.md`). Registration was already in the `models` list since pre-v0.33; the v0.37.3.0 wave adds discoverability surfaces — decision-tree branch in `docs/integrations/embedding-providers.md`, Topology 3 "Recommended embedding model" subsection, runtime nudge from `gbrain reindex --code` against non-code-tuned models. Recipe-shape regression pinned by `test/ai/voyage-code-3-recipe.test.ts`.
- `src/core/ai/recipes/anthropic.ts` — Anthropic recipe (chat + expansion touchpoints). **v0.31.12:** chat and expansion `models:` lists drop the v0.31.6 phantom `claude-sonnet-4-6-20250929` date suffix — canonical id is `claude-sonnet-4-6`. The wrong-direction alias `claude-sonnet-4-6 → claude-sonnet-4-6-20250929` is removed; a reverse alias `claude-sonnet-4-6-20250929 → claude-sonnet-4-6` keeps stale user configs working (rescues `facts.extraction_model` and `models.dream.synthesize` set by v0.31.6 installs). Recipe-shape regression pinned by `test/anthropic-model-ids.test.ts` (6 cases, verbatim cherry-pick of PR #830 plus the reverse-alias rescue case).
- `src/core/anthropic-pricing.ts` — Single source of truth for Anthropic model pricing (per-MTok input/output). **v0.31.12:** Opus 4.7 corrected from `$15/$75` to `$5/$25` (the old number was from Opus 4 generation, never refreshed when 4.7 shipped); Opus 4.6 also corrected. Consumed by `src/core/budget-meter.ts` and `src/core/cross-modal-eval/runner.ts` — the cross-modal estimator now reads `ANTHROPIC_PRICING` for Anthropic models instead of duplicating the table, killing the v0.31.6 drift bug class.
- `src/core/budget/budget-tracker.ts` (v0.37.x) — keystone primitive for the brainstorm cost-cathedral wave. One typed error (`BudgetExhausted` with `reason: 'cost' | 'runtime' | 'no_pricing'`), one schema-stable audit JSONL at `~/.gbrain/audit/budget-YYYY-Www.jsonl`. Contracts pinned by 18 unit cases: **TX1** — `record()` throws when cumulative spend exceeds cap (the cap is a real ceiling, not a suggestion); **TX2** — `reserve()` hard-fails with `reason: 'no_pricing'` when `maxCostUsd` is set AND the model is missing from pricing maps (warn-once preserved when cap is unset); **A3 amended** — `extractUsageFromError(err, fallback)` returns `err.usage` when SDK provides it, else the pessimistic fallback (caller passes `maxOutputTokens`, not the optimistic pre-call estimate). `onExhausted(cb)` callback fires once synchronously BEFORE the throw propagates so callers can persist checkpoints. Replaces three parallel copies (inline brainstorm class, cycle/budget-meter, eval-contradictions). Adapts the old `BudgetMeter` via T5 (public shape preserved + `schema_version: 1` stamped on every dream-budget audit line).
- `src/core/audit-week-file.ts` (v0.37.x, Q1) — single source of truth for ISO-week audit JSONL filename math. Exports `isoWeek(d)`, `isoWeekFilename(prefix, now?)`, `resolveAuditDir()` (honors `GBRAIN_AUDIT_DIR`). Year-boundary correctness pinned by tests at 2020-W53 (the 53-week year), 2025-W01 rolling in from 2024-12-30 (Monday), 2026-W01. Four call sites migrated in T4: `src/core/minions/handlers/shell-audit.ts`, `src/core/facts/phantom-audit.ts`, `src/core/audit-slug-fallback.ts`, `src/core/cycle/budget-meter.ts`. Each call site keeps its `compute<X>AuditFilename` thin wrapper for back-compat with existing tests.
- `src/core/ai/gateway.ts:withBudgetTracker` (v0.37.x, T3 / TX5) — gateway-layer enforcement via `AsyncLocalStorage<BudgetTracker>`. `withBudgetTracker(tracker, fn)` installs the tracker on the module-internal store; every `gateway.chat / embed / rerank` call inside the scope auto-composes (reserve before, record in try/finally). Outside-scope calls are budget no-ops (current behavior preserved). Nested scopes restore the outer tracker on exit. `getCurrentBudgetTracker()` is the test seam. The chat path uses A3-amended pessimistic fallback on error paths; the embed path estimates input tokens from char count × recipe's `chars_per_token` because the AI SDK doesn't surface per-batch embed token usage; the rerank path estimates char count of query+docs. 6 unit cases pin the contract.
- `src/core/diarize/payload-fitter.ts` (v0.37.x, P6 / Q3) — generic fit-arbitrarily-large-items-into-per-call-token-budget utility. `'batch'` strategy is deterministic token-budgeted chunking with no LLM calls. `'summarize'` strategy embed-clusters into ceil(items/4) groups via cheap deterministic nearest-neighbor on cosine, Haiku-summarizes each cluster via `Promise.allSettled` at parallelism=4 (Perf1). Each Haiku call composes the active BudgetTracker via T3's AsyncLocalStorage. The quality gate (codex outside-voice finding #4): when `success_ratio < min_success_ratio` (default 0.75), result is flagged `degraded: true` — the fitter preserves the successful subset; the caller decides whether to surface a partial result or abort.
- `src/core/brainstorm/checkpoint.ts` (v0.37.x, P7 / TX3+TX4+A5 amended) — crash-resilient checkpoint for `gbrain brainstorm` and `gbrain lsd`. Persists FULL idea bodies (~50KB per run) so resume can MERGE the pre-crash ideas with the post-resume ideas before the judge runs (codex's load-bearing finding — a resume that produces only second-run output is silent partial output). `run_id = sha256(question + profile + sort(close_slugs) + sort(far_slugs)).slice(0,16)` — NO embedding bits, stable across embedding-model swaps. Atomic write via `.tmp + rename`. ONE resume flag (`--resume <run_id>` — the proposed `--retry-failed` was dropped per TX4: failed AND never-attempted crosses both go through `--resume`). `--list-runs` prints saved run_ids mtime-newest-first. `--force-resume` bypasses the 7-day staleness gate. The cycle purge phase (`gbrain dream --phase purge`) GCs checkpoints older than 7 days via `gcStaleCheckpoints(7)`. Pinned by 20 unit cases + 3 E2E cases in `test/e2e/brainstorm-resume.test.ts` including the load-bearing merge contract.
- `src/core/remediation-checkpoint.ts` (v0.37.x, T7 / A4 amended) — `doctor --remediate` checkpoint at `~/.gbrain/remediation/<plan_hash>.json`. `plan_hash = sha256(JSON.stringify(sorted recommendation ids)).slice(0,16)`. Schema-versioned. Atomic write via `.tmp + rename`. `gbrain doctor --remediate --resume <plan_hash>` (or with no arg — picks the newest matching checkpoint) loads it and skips already-completed steps. Mismatched plan_hash refuses with a paste-ready message. Cleared on clean completion. Pinned by 13 unit cases.
- `src/core/model-config.ts` — Model-string resolution (the seam every internal LLM call walks through). **v0.31.12:** four-tier system (`ModelTier = 'utility' | 'reasoning' | 'deep' | 'subagent'`) with `TIER_DEFAULTS` (utility→haiku-4-5, reasoning→sonnet-4-6, deep→opus-4-7, subagent→sonnet-4-6) and `tier?: ModelTier` on `ResolveModelOpts`. Resolution chain is now 8 steps: cliFlag → deprecated key → config key → `models.default` → `models.tier.<tier>` → env var → `TIER_DEFAULTS[tier]` → caller fallback. Two new exports — `isAnthropicProvider(modelString)` checks `provider:model` prefix OR `claude-` bare-id pattern, and `enforceSubagentAnthropic()` is the layer-2 runtime guard: when `tier === 'subagent'` resolves to a non-Anthropic provider, it emits a once-per-`(source, model)` stderr warn AND falls back to `TIER_DEFAULTS.subagent` instead of letting the Anthropic Messages API tool-loop attempt to run on OpenAI/Gemini. `_resetDeprecationWarningsForTest()` now also clears `_subagentTierWarningsEmitted` so tests re-emit.
- `src/core/ai/model-resolver.ts` — Recipe-touchpoint validator. **v0.31.12:** `assertTouchpoint(recipe, touchpoint, modelId, extendedModels?)` gains an optional 4th `extendedModels: ReadonlySet<string>` argument. When the modelId is in that set, the native-recipe allowlist throw is bypassed — the user explicitly opted into this model via config so we let provider rejection surface as `model_not_found` at HTTP call time (and `gbrain models doctor` catches it earlier). Default code paths with hardcoded model strings MUST NOT pass `extendedModels` — typos in source code still fail fast. Replaces the earlier plan to soften the validator wholesale (Codex F4/F5 in plan review flagged that as too broad — it would have removed the fail-fast contract for chat + expand + embed all three).
- `src/core/ai/gateway.ts` extension (v0.31.12) — new module-scoped `_extendedModels: Map<providerId, Set<modelId>>` registry feeds `assertTouchpoint`'s 4th-arg path. New `reconfigureGatewayWithEngine(engine)` async function is called from `cli.ts` after `engine.connect()` (and before every command except `CLI_ONLY` no-DB commands) — re-resolves expansion + chat defaults through `resolveModel()` so `models.tier.*` and `models.default` overrides apply to expansion + chat both. `DEFAULT_CHAT_MODEL` corrected to `anthropic:claude-sonnet-4-6` (was the v0.31.6 phantom `-20250929`). New `__setChatTransportForTests` seam mirrors `__setEmbedTransportForTests` so tests drive `chat()` with a stubbed transport.
Expand Down
Loading
Loading