fix(brainstorm): cost guardrails + judge overflow + far set cap#1234
Open
garrytan-agents wants to merge 2 commits into
Open
fix(brainstorm): cost guardrails + judge overflow + far set cap#1234garrytan-agents wants to merge 2 commits into
garrytan-agents wants to merge 2 commits into
Conversation
added 2 commits
May 20, 2026 16:08
- Add --max-cost flag (default $5) to brainstorm/lsd commands; hard-aborts pre-run if estimate exceeds, and mid-run if running cost overshoots. - Add --max-far-set flag (default max(m*4, 50)) to cap the domain bank's prefix-stratified sampling. listPrefixSampledPages returns one page per prefix; on a 13K-page brain with ~2K distinct prefixes this was pulling ~1985 far pages instead of the configured m=6. fetchFar now shuffles + caps the prefix list, and trims final pages to m by distance score. - Add --strict-budget flag: abort mid-run if running cost exceeds 5x the initial estimate (warn-only by default). - Chunk the judge phase (default 100 ideas per LLM call, --max-ideas-per-judge-call to override). Large brain runs produced 15K+ ideas, blowing past the model's 1M-token context in a single call. Now batched and concatenated. - Add --judge-model flag for routing the judge phase to a larger-context model when needed. - Sanitize unpaired UTF-16 surrogates in cross-prompt content (close+far page bodies, titles, question) to prevent JSON-encoding crashes on OCR/import-derived pages with lone surrogates. Fixes: 53x cost overrun on 13K-page brain ($0.96 estimate vs $50.71 actual) Fixes: judge phase 3M-token overflow > 1M model context Fixes: 1985-page far set when m_far was configured at 6
Incident report covering: - Root cause analysis (5 contributing factors) - Observed token flow and cost breakdown - Implemented fixes (P1-P4) in dc080ac - Proposed architectural changes: - P5: Global token/time/cost budgets for ALL analysis functions - P6: Diarization/summarization for oversized payloads - P7: Structured error recovery with checkpointing Key insight: every gbrain analysis function that makes LLM calls needs configurable budgets (tokens, cost, wall-clock time) with graceful degradation on exhaustion.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Incident: LSD Brainstorm 53× Cost Overrun
Estimated: $0.96 → Actual: $50.71 on a 13,690-page brain. Zero ideas delivered.
Root Causes
listPrefixSampledPagesreturned one page per prefix (~2K prefixes → 1,985 pages instead of configured 12)Implemented Fixes
P1: Far set cap (
domain-bank.ts)maxFarSet(defaultmax(m*4, 50)) before SQLmby distance scorem, not|prefixes|P2: Cost guardrails (
brainstorm.ts+orchestrator.ts)--max-cost <usd>(default $5): hard-abort pre-run--strict-budget: abort mid-run if spend exceeds 5× estimate--max-far-set <n>(default 50): explicit cap--judge-model <id>: route judge to larger-context modelP3: Judge chunking (
judges.ts)--max-ideas-per-judge-call)P4: Unicode sanitization (
orchestrator.ts)Postmortem
Full incident report with token flow forensics and architectural proposals (global budgets for all analysis functions, diarization, checkpointing) in
docs/incidents/2026-05-20-lsd-cost-explosion.md.Proposed Future Work
Tests
test/brainstorm/cost-guardrails.test.tstsc --noEmit: clean