Skip to content

v0.39.0.0 feat: brainstorm cost cathedral (P1-P7) + page_links schema fix#1283

Open
garrytan wants to merge 19 commits into
masterfrom
garrytan/shanghai-v3
Open

v0.39.0.0 feat: brainstorm cost cathedral (P1-P7) + page_links schema fix#1283
garrytan wants to merge 19 commits into
masterfrom
garrytan/shanghai-v3

Conversation

@garrytan
Copy link
Copy Markdown
Owner

Summary

The brainstorm cost-explosion incident (gbrain lsd estimated $0.96, billed $50.71 on a 13K-page brain, returned zero ideas) drove a full architectural follow-up. This PR ports the postmortem-driven PR #1234 fix wave AND lands the cathedral that the postmortem proposed:

Cost guardrails (P1–P4, ported from PR #1234):

  • Far-set cap with shuffle in domain-bank — fixes the 1985-page explosion
  • --max-cost / --max-far-set / --strict-budget / --judge-model / --max-ideas-per-judge-call CLI flags on gbrain brainstorm and gbrain lsd
  • Judge auto-chunking (default 100 ideas/call) — fixes the 5.5M-token context overflow
  • UTF-16 surrogate sanitization on cross prompts

Architectural cathedral (P5–P7, new in this PR):

  • BudgetTracker at src/core/budget/budget-tracker.ts — single primitive for cost + runtime + no-pricing caps with typed BudgetExhausted (reason: 'cost' | 'runtime' | 'no_pricing').
  • Gateway-layer enforcement via AsyncLocalStorage (TX5)withBudgetTracker(tracker, fn) installs the tracker; every gateway.chat/embed/rerank inside the scope auto-composes. No caller can forget the cap.
  • payload-fitter (P6) at src/core/diarize/payload-fitter.ts — batch + summarize strategies, parallelism=4, quality gate via min_success_ratio (default 0.75).
  • Brainstorm checkpoint (P7) with FULL idea bodies (TX3) — mid-run cap throws persist progress to ~/.gbrain/brainstorm/<run_id>.json; --resume <run_id> merges pre-crash + post-resume ideas before the judge runs.
  • doctor --remediate --resume (A4 amended) — mid-run cap fires write a remediation checkpoint at ~/.gbrain/remediation/<plan_hash>.json. --max-cost aliases --max-usd.
  • reindex-code --max-cost (F3) — opt-in cap via the same gateway seam.

Schema (F1):

  • New PGLite VIEW page_links AS SELECT id, from_page_id, to_page_id FROM links (narrow projection so legacy-brain bootstrap drops aren't blocked).
  • Migration v81 (page_links_view_alias) for existing brains on either engine.

Consolidation (Q1):

  • Extracted src/core/audit-week-file.ts. Four ISO-week JSONL writers migrated (shell-audit, phantom-audit, audit-slug-fallback, dream-budget-meter).

Test Coverage

  • 73+ new tests across 9 new files (test/core/budget/, test/core/audit-week-file.test.ts, test/core/diarize/, test/brainstorm/checkpoint.test.ts, test/e2e/brainstorm-resume.test.ts, test/core/remediation-checkpoint.test.ts, test/reindex-code-max-cost.serial.test.ts)
  • F1 closes the page_links schema gap end-to-end; E2E (test/e2e/brainstorm-resume.test.ts) no longer needs the test-side workaround view
  • F2 pins the user-facing --max-cost pre-flight refusal path (BudgetExhausted reason='cost' + paste-ready remediation hint + no chat calls)
  • All previous brainstorm + doctor + eval-contradictions tests still pass

Pre-Landing Review

Already cleared via /plan-eng-review (12 findings: A1–A5, Q1–Q3, T1–T3, Perf1) + Codex outside-voice (18 findings → 5 surfaced as TX1–TX5 + resolved; 10 folded inline; 3 already-decided). All resolved with explicit user input. Plan at ~/.claude/plans/system-instruction-you-are-working-rippling-moth.md carries the full GSTACK REVIEW REPORT.

Plan Completion

All 12 plan tasks (T1–T12) plus 4 follow-up tasks (F1–F4) complete. F1 closes the page_links schema gap; F2 adds the pre-flight refusal smoke E2E; F3 wires reindex-code --max-cost; F4 is this PR.

Things to watch after upgrade

  • A4 semantic shift on doctor --remediate: --max-usd was pre-flight-only in v0.36.4.0; it's now ALSO a mid-run hard ceiling. The worst case used to be "the run starts despite the under-estimate"; now the worst case is "the run aborts mid-step and writes a resumable checkpoint." Use gbrain doctor --remediate --resume to continue from the last successful step.
  • Brainstorm checkpoint identity (A5 amended): run_id = sha256(question + profile + sort(close_slugs) + sort(far_slugs)).slice(0,16) — no embedding bits, so swapping embedding models doesn't invalidate resumes. Cycle purge GCs stale checkpoints after 7 days.
  • BudgetExhausted moved: Class now lives at src/core/budget/budget-tracker.ts. The brainstorm orchestrator re-exports the old name for back-compat; existing imports keep working.

Pre-existing failure flagged (NOT blocking)

test/doctor-report-remote.test.ts "full report on healthy brain is healthy status" fails on master too (health_score=50, expected>=70). Verified by checking out master's src/commands/doctor.ts and re-running. Filed as P0 in TODOS.md. Investigate the doctorReportRemote health-score calculation against an empty PGLite — not caused by this branch.

Test plan

  • bun run typecheck — clean
  • Bootstrap tests pass (15/15) after F1's narrow-projection fix
  • Wave tests pass (126/126 across 11 files)
  • Reindex-code tests pass (19/19)
  • bun run test — 8335 pass, 0 in-branch failures (1 pre-existing master failure flagged in TODOS)
  • Branch up to date with origin/master (now 17 commits ahead with the version + bootstrap fix commits)
  • VERSION + package.json + CHANGELOG top-entry all agree at 0.39.0.0

🤖 Generated with Claude Code

garrytan and others added 19 commits May 21, 2026 11:28
Ports PR #1234 with a typed-error swap (Q2). Brings:

- `--max-cost`, `--max-far-set`, `--strict-budget`, `--judge-model`,
  `--max-ideas-per-judge-call` CLI flags on `gbrain brainstorm` / `lsd`
- Domain-bank prefix-cap + shuffle + final-trim to `m` by distance score
- Judge auto-chunks idea sets > 100 across multiple LLM calls
- UTF-16 surrogate sanitization on cross prompts
- Phase-0.5 hard cost ceiling + mid-run cost guard

Phase-1 diff from PR #1234: per-cross error-rethrow uses inline typed
`BudgetExhausted` instead of string-match on the error message. Phase 2
of the wave will move the class to `src/core/budget/budget-tracker.ts`
and the orchestrator will import it.

Postmortem doc + 12-case regression test included verbatim from #1234.

T1 of the brainstorm cost cathedral plan
(~/.claude/plans/system-instruction-you-are-working-rippling-moth.md).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The keystone primitive for the v0.37.x budget cathedral. One class,
one typed error, one schema-stable audit JSONL. Replaces three parallel
copies (brainstorm orchestrator inline class, cycle/budget-meter,
eval-contradictions cost-prompt/tracker) — those adapt to this one in
T5/T6.

Contracts pinned by 26 unit tests:
  - TX1: record() throws BudgetExhausted(reason:'cost') when cumulative
    spend > cap. A single underestimated call cannot leak past the cap.
  - TX2: reserve() hard-fails with BudgetExhausted(reason:'no_pricing')
    when cap is set + model is missing from pricing maps. When cap is
    unset, legacy warn-once behavior is preserved.
  - A3 amended: extractUsageFromError(err, fallback) returns err.usage
    when SDK provides it, else the pessimistic fallback (caller passes
    maxOutputTokens, not the optimistic pre-call estimate).
  - onExhausted callback fires once, synchronously, before the throw
    propagates. Callbacks do sync I/O (writeFileSync) for checkpoint
    persistence.
  - Audit JSONL is schema-stable: every line carries schema_version=1.
    Reorderings tolerated, field renames are breaking.

Also ships src/core/audit-week-file.ts — the shared ISO-week filename
helper consumed by every audit writer in T4. Year-boundary correctness
pinned by 5 cases including 2020-W53 (the 53-week year), 2025-W01
rolling in from 2024-12-30 (Monday), and the GBRAIN_AUDIT_DIR override.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TX5: every gateway.chat / embed / rerank call now auto-composes the
active BudgetTracker via a module-internal AsyncLocalStorage. No
per-call injection seam, no flag plumbing — callers wrap their
entrypoint in `withBudgetTracker(tracker, async () => { ... })` and
every downstream LLM call honors the cap.

Outside any scope, the gateway is a budget no-op (back-compat with the
pre-v0.37 contract).

Wiring:
  - chat(): reserves on entry using prompt-char heuristic + opts.maxTokens.
    Records actual usage from result.usage on success; on failure, charges
    the pessimistic A3-amended fallback so the cap is real.
  - embed(): reserves total estimated input tokens (chars / chars-per-token).
    Records the same total in try/finally; SDK doesn't surface per-batch
    embed token counts.
  - rerank(): reserves and records query + docs char count.
    Reranker pricing isn't in the canonical map yet, so reserve() takes
    the warn-once path under no-cap and the TX2 hard-fail under cap.

6 unit cases pin the contract: chat auto-composes, outside-scope is
no-op, nested scope restores outer, over-cap reserve throws BEFORE
provider call (proves circuit breaker), TX1 mid-run cumulative cap
fires via record(), parallel Promise.all scopes do not bleed trackers.

All 255 existing gateway tests and 50 brainstorm tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lper

Q1: extract the ISO-week filename math into one canonical helper
(src/core/audit-week-file.ts, landed in T2) and migrate every audit
JSONL writer in the codebase to consume it.

Sites migrated:
  - src/core/minions/handlers/shell-audit.ts  (shell-jobs-YYYY-Www.jsonl)
  - src/core/facts/phantom-audit.ts            (phantoms-YYYY-Www.jsonl)
  - src/core/audit-slug-fallback.ts            (slug-fallback-YYYY-Www.jsonl)
  - src/core/cycle/budget-meter.ts             (dream-budget-YYYY-Www.jsonl)

Each call site had its own copy of the ISO-week-from-Date algorithm.
They mostly agreed but subtle drift was already accumulating (one used
local time, one approximated the Thursday-anchor formula, etc.). One
helper, one set of regression tests, no drift.

Compute helpers (computeAuditFilename, computePhantomAuditFilename,
computeSlugFallbackAuditFilename) are preserved as thin wrappers so
existing import sites and tests don't break.

All audit + slug-fallback + phantom + budget-meter tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nded)

Adapter pass: the existing BudgetMeter keeps its public shape
(`BudgetMeter`, `SubmitEstimate`, `BudgetCheckResult`) verbatim so every
dream-cycle call site keeps working without rewires. The audit JSONL
grew one new field on every line: `schema_version: 1`.

A2 amended: the codex outside-voice review relaxed the byte-stable
contract to schema-stable. Field reorderings are tolerated; the
documented set (schema_version, ts, phase, event, model, label,
plus per-event cost or token fields) is what every consumer can rely
on. Renames or removals are breaking.

test/fixtures/dream-budget-schema-v1.jsonl carries one canonical row
per event variant (submit / submit_denied / submit_unpriced) as
documentation of the schema. The new in-suite case in
test/budget-meter.test.ts walks every emitted line and asserts the
fields are present + the right type.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The runner now installs a BudgetTracker scope around its body so every
gateway-layer chat / embed / rerank call (the judge model + per-query
embedding) auto-records via the AsyncLocalStorage from T3. Currently
telemetry-only — the existing CostTracker remains the primary soft-
ceiling enforcement, so the public --budget-usd surface and
PreFlightBudgetError shape are byte-identical.

The wiring is the seam: future waves can promote the cap to BudgetTracker
semantics (TX1 + TX2 semantics on cumulative + no_pricing) by passing
maxCostUsd through to BudgetTracker without touching the CLI.

All 79 eval-contradictions tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A4 amended: doctor --remediate gains a resumable cost ceiling. The
runRemediate loop now runs inside `withBudgetTracker(tracker, ...)` so
every gateway-routed LLM call inside a Minion handler (synthesize,
patterns, consolidate, embed) honors the cap. When BudgetExhausted
fires mid-run, the onExhausted callback persists a checkpoint of
completed step ids + idempotency_keys to
~/.gbrain/remediation/<plan_hash>.json BEFORE the throw propagates,
and the catch surfaces a paste-ready --resume hint.

Wire-up:
  - New --resume <plan_hash> flag (with implicit "most recent matching"
    when no hash given) loads the checkpoint and skips already-
    completed steps. Mismatched plan_hash refuses with an explicit
    message.
  - --max-cost is now an alias for --max-usd. Both spellings honored
    and threaded through to BudgetTracker.maxCostUsd so the cap is
    a real ceiling, not just pre-flight advice.
  - On BudgetExhausted, exit 1 with the resume hint; on clean
    completion, clear the checkpoint.

New file: src/core/remediation-checkpoint.ts with
computePlanHash / save / load / list / clear helpers. Atomic write
via .tmp + rename. Pinned by 13 unit cases including determinism +
sort-order invariance + schema-mismatch return-null + atomic-rename.

All 48 doctor.test.ts cases still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Documents the load-bearing ordering invariant: the gateway's
BudgetTracker reserve() runs (implicitly, via AsyncLocalStorage)
BEFORE acquireLease() inside the subagent loop. A BudgetExhausted
throw must NOT consume a rate-lease slot, because the lease is the
rate-limit pacer for the entire fleet.

The handler body intentionally does NOT explicitly thread BudgetTracker;
TX5 (gateway-layer composition) handles that. The comment is the
reader's signpost.

No behavioral change. All 58 subagent tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Generic utility for fitting arbitrarily-large item lists into a
downstream caller's per-call token budget. Two strategies:

  - 'batch': deterministic token-budgeted chunking. No LLM calls. The
    fitted list shape matches the input; the caller decides how to
    consume it (e.g. brainstorm judge concatenates per-chunk results).
    Surfaces a `dropped` count for items that exceed the per-call cap.

  - 'summarize': embed-cluster into ceil(items/4) groups via cheap
    deterministic nearest-neighbor on cosine; Haiku-summarize each
    cluster via Promise.allSettled at parallelism=4 (Perf1). Each
    Haiku call composes the active BudgetTracker via the gateway's
    AsyncLocalStorage scope (T3) — no per-call injection.

Quality gate (codex outside-voice finding #4): when summarize's
success_ratio < min_success_ratio (default 0.75), the result is
flagged `degraded: true` so the caller (brainstorm) can decide to
surface a partial result or abort. The fitter itself preserves the
successful subset either way.

Tested via 4 cases across two files (T3 contract):
  - happy path (all clusters succeed → degraded=false)
  - partial failure tolerated (1/5 fails, success_ratio=0.8 > 0.75 → degraded=false)
  - high-failure rate flips the gate (3/5 fails → degraded=true)
  - budget-respecting (BudgetExhausted thrown mid-cluster propagates
    via Promise.allSettled)

11 unit cases across batch + summarize. Brainstorm + cost-guardrails
tests still green; judges.ts internal chunking deferred to a follow-up
wave (TODOS) so the existing chunked-batch contract stays byte-stable
during this drop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The brainstorm cathedral capstone. Crashed runs can resume cleanly via
`gbrain brainstorm --resume <run_id>` (and `gbrain lsd --resume` etc).

TX3 load-bearing contract: completed_crosses on disk carries FULL idea
bodies (~50KB per run), not just counts. The resumed BrainstormResult
contains the pre-crash ideas (loaded from disk) merged with the post-
resume ideas — codex's outside-voice finding was that a resume that
produces only "what we generated this run" is silent partial output.

TX4 single rule: --resume continues any cross not in completed_crosses.
The proposed --retry-failed was dropped per codex review; failed AND
never-attempted crosses both go through --resume.

A5 amended: run_id = sha256(question + profile + sort(close_slugs) +
sort(far_slugs)).slice(0,16). NO embedding bits — stable across
embedding-model swaps. 7-day mtime-based GC.

Q2 fold: orchestrator.ts drops its inline BudgetExhausted class and
re-exports the canonical one from src/core/budget/budget-tracker.ts
(Phase 2). runBrainstorm now wraps the body in withBudgetTracker so
every gateway-layer chat call auto-records cost. The cap remains
opts.maxCostUsd (default $5).

New CLI flags:
  --resume <run_id>   Continue any cross not in completed_crosses.
                      Refuses to start when run_id doesn't match the
                      active inputs (paste-ready hint).
  --force-resume      Bypass the 7-day staleness gate.
  --list-runs         Print saved run_ids and exit.

Cycle purge phase (the 9th cycle phase) now also GCs stale brainstorm
checkpoints alongside op_checkpoints (~7d window).

Tests:
  - 20 unit cases in test/brainstorm/checkpoint.test.ts:
    computeRunId is deterministic + slug-array-order invariant + stable
    across embedding-model swaps; round-trip preserves ideas verbatim;
    saveCheckpoint atomic via .tmp+rename; loadCheckpoint returns null
    on missing/schema-mismatch/corrupt-JSON; gcStaleCheckpoints unlinks
    >N days; listRuns mtime-ordered.
  - 3 E2E cases in test/e2e/brainstorm-resume.test.ts:
    crash on cross 4 → first run aborts with checkpoint of crosses 1..N
    with full idea bodies; second run with resumeRunId merges pre-crash
    + post-resume ideas (TX3 contract); mismatched run_id refuses with
    paste-ready hint.

The PGLite schema-gap workaround in the E2E (CREATE VIEW page_links AS
SELECT * FROM links) is filed as a follow-up in TODOS T12 — the
real-engine brainstorm path needs that view to materialize as a
canonical schema fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CHANGELOG entry for the brainstorm cost cathedral (Unreleased slot;
/ship will assign the next version):
  - ELI10 lead per CLAUDE.md voice rules
  - "How to turn it on" with paste-ready commands
  - "Things to watch" calls out the A4 semantic shift for
    `doctor --remediate --max-usd` (pre-flight → mid-run abort
    with resumable checkpoint)
  - Itemized changes by file/area
  - "For contributors" section noting the 73 new tests + the PGLite
    schema-gap workaround for the E2E

CLAUDE.md Key Files: 6 new entries for budget-tracker, audit-week-file,
gateway withBudgetTracker, payload-fitter, brainstorm/checkpoint,
remediation-checkpoint. Regenerated llms-full.txt + llms.txt (passes
test/build-llms.test.ts).

docs/incidents/2026-05-20-lsd-cost-explosion.md gains a closing
"Shipped in v0.37.x (the budget cathedral wave)" section listing P1-P7
completion status + the deferred follow-ups so the incident's audit
trail closes the loop.

TODOS.md gets a new top section for the wave's deferred items:
  - PGLite `page_links` schema gap fix
  - Explicit --max-cost on extract / enrich / integrity auto
  - P5 config-schema budgets: block in ~/.gbrain/config.json
  - Multi-day brainstorm resume (>7d)
  - Async-batched audit writes (profiling trigger criterion)
  - BudgetLedger unification with BudgetTracker
  - judges.ts internal chunking → payload-fitter delegation

Also: fixed a payload-fitter typecheck error (ChatFn import). Final
typecheck is clean on every file the wave touched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brainstorm's domain-bank queries reference `page_links` (pglite-engine.ts:896,
postgres-engine.ts:959) but the canonical table is `links`. Without the alias
view, `gbrain brainstorm` against PGLite fails with `relation "page_links"
does not exist`; the same was a latent bug on Postgres.

This commit lands the fix at three sites:

1. `src/core/pglite-schema.ts` — embedded schema bundle gets the view at
   table-bundle time, so fresh PGLite installs are correct from boot.
2. `src/core/migrate.ts` v81 (`page_links_view_alias`) — existing brains on
   either engine pick up the view via `gbrain apply-migrations`. CREATE OR
   REPLACE VIEW is idempotent; re-running is safe.
3. `test/e2e/brainstorm-resume.test.ts` — removed the ad-hoc workaround view
   from the test setup. The E2E now exercises the same schema path real
   users will see.

`TODOS.md` entry for the gap closed out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pins the user-facing path that closed the original \$50 incident: when
the pre-run estimate exceeds the configured cap, runBrainstorm throws
BudgetExhausted with reason='cost' and a paste-ready hint pointing at
--limit / --max-cost / --max-far-set before any chat call happens.

The four assertions are the four things a real user can verify after
the throw lands:
  1. Typed BudgetExhausted (not a generic Error)
  2. reason === 'cost' (not runtime or no_pricing)
  3. Message names the remediation flags
  4. No provider HTTP would have happened (chat.crossCalls === 0)

Uses the same PGLite engine + tinyProfile + stub chatFn as the existing
--resume tests. Hermetic; ~5s wallclock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires gbrain reindex --code into the v0.38 budget cathedral. When the
caller passes --max-cost N (or --max-cost-usd N), runReindexCode wraps
its per-page import loop in withBudgetTracker so every gateway.embed()
call inside importCodeFile auto-composes the cap. On BudgetExhausted,
the partial-progress result reports what got reindexed before the cap
fired plus a synthetic failure row naming the cap throw.

reindex-code is idempotent (content_hash short-circuit in importCodeFile),
so a re-run after a budget abort picks up where the cap fired — no
manual checkpoint state needed.

Both --max-cost and --max-cost-usd are accepted (symmetry with brainstorm
which uses --max-cost, and a precedent for the spelling we want long-term).

When --max-cost is unset, the body runs outside any tracker scope — byte-
stable pre-F3 behavior for legacy callers.

Files:
  src/commands/reindex-code.ts:
    - ReindexCodeOpts.maxCostUsd?: number
    - runReindexCode wraps body in withBudgetTracker when set
    - runReindexCodeCli parses --max-cost / --max-cost-usd
    - BudgetExhausted caught + returned as partial-progress result
  test/reindex-code-max-cost.serial.test.ts (NEW):
    - dry-run + maxCostUsd happy path
    - empty-brain + maxCostUsd hits early-return cleanly
    - no tracker installed when cap is unset (regression guard for
      the conditional wrap)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The v0.38 page_links view alias initially used SELECT * FROM links, which
broke the pre-v0.13 bootstrap test: applyForwardReferenceBootstrap drops
link_source + origin_page_id to simulate the pre-v0.13 schema shape, but
the SELECT * view created a dependency that blocked the column DROP.

Engine queries only reference pl.id (via COUNT(*)) and pl.to_page_id, so
the view's projection is now SELECT id, from_page_id, to_page_id FROM links
— what callers actually use, no more. This unblocks legacy-brain upgrade
paths AND keeps the bootstrap forward-reference probes safe.

Bootstrap suite: 15/15 pass after the change.

Also files a P0 TODO for a pre-existing test failure
(test/doctor-report-remote.test.ts "full report on healthy brain") that
fails on master too — out of scope for this wave but noticed during
/ship triage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brainstorm cost cathedral wave (P1-P7). MINOR bump per user direction:
new architectural seam (gateway-layer BudgetTracker via AsyncLocalStorage),
5 new modules, new CLI flags (--max-cost / --resume / --list-runs /
--force-resume), new migration v81 (page_links view alias).

No breaking changes — BudgetExhausted re-exported from orchestrator for
back-compat; --max-usd preserved as alias for --max-cost; eval-contradictions
--budget-usd surface byte-identical.

CHANGELOG entry renamed from [Unreleased] to [0.39.0.0] and adds the
mandatory "To take advantage of v0.39.0.0" block per CLAUDE.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI's `check:test-isolation` flagged three tests added in the v0.39.0.0
cathedral that directly mutate `process.env` across test boundaries:

- test/brainstorm/checkpoint.test.ts (mutates GBRAIN_HOME)
- test/core/audit-week-file.test.ts (mutates GBRAIN_AUDIT_DIR)
- test/core/remediation-checkpoint.test.ts (mutates GBRAIN_HOME)

Per CLAUDE.md rule R1: env-mutating tests either use withEnv() OR rename
to *.serial.test.ts (the quarantine escape hatch). The mutation lives in
beforeEach/afterEach which spans the whole describe block, so .serial
rename is the cleaner fix — withEnv() would require restructuring every
test. The serial-test runner gives them their own bun process; no cross-
file env races.

Verified: check:test-isolation passes (527 non-serial unit files clean),
`bun run verify` passes, all 41 tests in the three renamed files pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts:
#	CHANGELOG.md
#	TODOS.md
#	VERSION
#	package.json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant