Skip to content

fix(brainstorm): cost guardrails + judge overflow + far set cap#1234

Open
garrytan-agents wants to merge 2 commits into
garrytan:masterfrom
garrytan-agents:fix/brainstorm-cost-guardrails
Open

fix(brainstorm): cost guardrails + judge overflow + far set cap#1234
garrytan-agents wants to merge 2 commits into
garrytan:masterfrom
garrytan-agents:fix/brainstorm-cost-guardrails

Conversation

@garrytan-agents
Copy link
Copy Markdown
Contributor

Incident: LSD Brainstorm 53× Cost Overrun

Estimated: $0.96 → Actual: $50.71 on a 13,690-page brain. Zero ideas delivered.

Root Causes

  1. Far set explosionlistPrefixSampledPages returned one page per prefix (~2K prefixes → 1,985 pages instead of configured 12)
  2. No cost circuit breaker — no mechanism to abort when actual spend diverges from estimate
  3. Judge context overflow — 15,868 ideas at ~350 tokens each = 5.5M tokens, exceeding Sonnet 1M limit
  4. Unpaired UTF-16 surrogates in OCR/import pages crashed JSON serialization
  5. No per-cross timeout — individual crosses could hang indefinitely

Implemented Fixes

P1: Far set cap (domain-bank.ts)

  • Shuffle candidate prefixes, slice to maxFarSet (default max(m*4, 50)) before SQL
  • Final trim to m by distance score
  • Bill now scales with m, not |prefixes|

P2: Cost guardrails (brainstorm.ts + orchestrator.ts)

  • --max-cost <usd> (default $5): hard-abort pre-run
  • --strict-budget: abort mid-run if spend exceeds 5× estimate
  • --max-far-set <n> (default 50): explicit cap
  • --judge-model <id>: route judge to larger-context model

P3: Judge chunking (judges.ts)

  • Split ideas into batches of 100 (configurable via --max-ideas-per-judge-call)
  • Each batch is separate LLM call; results concatenated
  • 15,868 ideas → 159 calls of ~100 instead of one 3M-token call

P4: Unicode sanitization (orchestrator.ts)

  • Strip unpaired UTF-16 surrogates before building cross prompts
  • Prevents JSON-encoding crashes on OCR/import-derived pages

Postmortem

Full incident report with token flow forensics and architectural proposals (global budgets for all analysis functions, diarization, checkpointing) in docs/incidents/2026-05-20-lsd-cost-explosion.md.

Proposed Future Work

  • P5: Global token/time/cost budgets for ALL gbrain analysis functions (brainstorm, dream, extract, enrich, eval, integrity, doctor)
  • P6: Diarization — summarize oversized payloads to fit context instead of failing
  • P7: Structured error recovery with checkpointing for interrupted runs

Tests

  • 12 new tests in test/brainstorm/cost-guardrails.test.ts
  • Full brainstorm suite: 82 pass, 0 fail
  • tsc --noEmit: clean

root added 2 commits May 20, 2026 16:08
- Add --max-cost flag (default $5) to brainstorm/lsd commands; hard-aborts
  pre-run if estimate exceeds, and mid-run if running cost overshoots.
- Add --max-far-set flag (default max(m*4, 50)) to cap the domain bank's
  prefix-stratified sampling. listPrefixSampledPages returns one page per
  prefix; on a 13K-page brain with ~2K distinct prefixes this was pulling
  ~1985 far pages instead of the configured m=6. fetchFar now shuffles +
  caps the prefix list, and trims final pages to m by distance score.
- Add --strict-budget flag: abort mid-run if running cost exceeds 5x the
  initial estimate (warn-only by default).
- Chunk the judge phase (default 100 ideas per LLM call, --max-ideas-per-judge-call
  to override). Large brain runs produced 15K+ ideas, blowing past the
  model's 1M-token context in a single call. Now batched and concatenated.
- Add --judge-model flag for routing the judge phase to a larger-context
  model when needed.
- Sanitize unpaired UTF-16 surrogates in cross-prompt content (close+far
  page bodies, titles, question) to prevent JSON-encoding crashes on
  OCR/import-derived pages with lone surrogates.

Fixes: 53x cost overrun on 13K-page brain ($0.96 estimate vs $50.71 actual)
Fixes: judge phase 3M-token overflow > 1M model context
Fixes: 1985-page far set when m_far was configured at 6
Incident report covering:
- Root cause analysis (5 contributing factors)
- Observed token flow and cost breakdown
- Implemented fixes (P1-P4) in dc080ac
- Proposed architectural changes:
  - P5: Global token/time/cost budgets for ALL analysis functions
  - P6: Diarization/summarization for oversized payloads
  - P7: Structured error recovery with checkpointing

Key insight: every gbrain analysis function that makes LLM calls
needs configurable budgets (tokens, cost, wall-clock time) with
graceful degradation on exhaustion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant