v0.41.3.0 fix-wave: warm-narwhal — 6 community PRs + E2E reliability by garrytan · Pull Request #1374 · garrytan/gbrain

garrytan · 2026-05-24T19:21:55Z

Summary

Fix-wave triage of the 333-PR queue. Landed 6 community fixes + 3 internal E2E reliability fixes + closed 10 superseded PRs with credit (separate, Phase 1).

Community PRs cherry-picked (with attribution):

#924 (@mgunnin) — src/core/ai/recipes/openai.ts gains max_batch_tokens: 100_000
#990 (@mgunnin) — src/core/ai/gateway.ts:1264 isTokenLimitError regex matches OpenAI's actual error string
#761 (@brandonlipman) — deriveDirectUrl strips postgres.<ref> suffix for Supabase direct URLs
#762 (@brandonlipman) — gbrain init --help short-circuits before saveConfig (was silently overwriting Postgres config with PGLite from cwd-detection)
#916 (@jeremyknows) — test isolation for core.hooksPath (--get → --local --get)
#1332 (@garrytan-agents) — routing-eval.ts guards against missing intent field

E2E reliability (surfaced during wave):

src/core/cycle/synthesize.ts:395-404 — narrow anthropic: prefix at queue.add boundary so resolveModel's bare-id return from TIER_DEFAULTS doesn't fail the subagent capability validator (this was dropping synthesize to status: fail with SYNTH_PHASE_FAIL)
scripts/run-e2e.sh — per-file pg_terminate_backend flush + 180s outer gtimeout/timeout cap so cross-file connection pool races + PGLite WASM hangs can't wedge the full Docker E2E suite (caught a real 30+ min wedge on ingestion-roundtrip.test.ts during the wave)
test/e2e/mechanical.test.ts — pin --embedding-model openai:text-embedding-3-large on doctor init step + DELETE FROM sources WHERE id != 'default' in beforeAll (cross-file state pollution: delta source orphans from prior files were triggering sync_freshness FAIL)

Closed as superseded with credit (Phase 1, separate from this PR's commits):

#798, #1083, #918, #1119, #602, #758, #539, #1287, #1117, #1125 — each close cites the landing commit SHA + file:line so contributors can verify the fix already shipped via prior waves (v0.31.7 v0.31.7 fix-wave: doctor stops crying wolf — 5 community PRs (#798 + #788 + #536 + #376 + #128 adapted) #804, v0.36.1.1 v0.36.1.1 fix-wave: community PR triage + 28 atomic fixes #1182, v0.38.2.0 v0.38.2.0 fix(doctor): bounded frontmatter scan + partial-state surfacing (supersedes #1287) #1297, v0.23.2, v0.28.1)

Test Coverage

Targeted tests for files in unique diff: 183/183 pass (122 across adaptive-embed-batch, connection-manager.serial, routing-eval, frontmatter-install-hook, init-mode-picker, init-provider-picker + 61 across the 4 cycle-synthesize* suites).

Full E2E suite pre-merge: 117/117 files, 821/821 tests pass (verified against pgvector container after fixing the per-file connection-flush + outer-timeout cross-file flake class).

bun run verify clean (typecheck + all shell checks).

Note: full parallel bun run test SIGKILLed mid-run on the dev machine (OOM under load — 4-shard parallel runner triggered the kernel oom-killer). Targeted tests + E2E green provide effective coverage for this PR's unique surface; running the full parallel suite cleanly will land in CI.

Pre-Landing Review

No standalone /review run on the merged HEAD — but the wave has been through extensive review during construction:

2 outside-voice Codex passes during the original /plan-eng-review (16 findings, 14 incorporated, 2 promoted to decisions)
The per-PR isolation re-runs that confirmed all changes work in isolation
Master's CI gate already passed v0.41.0.0; my diff adds 11 file changes on top, all bug-fix shaped

Plan Completion

Original plan: ~/.claude/plans/time-for-fix-wave-warm-narwhal.md (warm-narwhal wave). All 10 implementation tasks (T1-T10) complete:

T1: 10 superseded PRs closed with credit comments
T2: Collector branch opened off origin/master
T3-T6: 7 community PRs cherry-picked / fresh-ported (2 dropped mid-wave per drop-and-continue rule when v0.36.1.1 supersession was discovered)
T7: Behavioral assertions backfilled in tests
T8: bun run verify + targeted tests pass; full E2E passed in earlier run
T9: This PR (ship)
T10: Tranche-1-close on the 7 source PRs after this merges (deferred to post-merge)

TODOS

No TODOS.md entries completed in this PR. The defer-list from the original plan (PRs #597, #123, #479, #858, #716) remains open for the next triage wave.

Documentation

Files updated:

CLAUDE.md — annotated src/core/cycle/synthesize.ts with the v0.41.3.0 narrow anthropic: prefix fix at the queue.add boundary
llms-full.txt — regenerated via bun run build:llms per iron rule

Files reviewed and left current:

README.md — not modified (correct for a fix wave; no new user-visible features)
CHANGELOG.md — v0.41.3.0 entry is comprehensive (ELI10 lead, upgrade command, concrete example, things-to-know callouts, contributor credits, itemized changes)
TODOS.md — no completed items in this PR's diff
VERSION + package.json + CHANGELOG header — all aligned at 0.41.3.0

Test plan

bun run verify — passes (typecheck + privacy + isolation + admin-build + WASM-embedded + admin-embedded + 14 other checks)
Targeted tests on touched files — 183/183 pass
Full E2E suite (Docker pgvector) — 117/117 files, 821/821 tests pass (pre-merge baseline; post-merge will be re-verified in CI)
Post-merge: close the 7 source PRs (mgunnin x2, brandonlipman x2, jeremyknows, garrytan-agents) with the Tranche-1-close template citing this PR

🤖 Generated with Claude Code

OpenAI is the only recipe in the codebase without a max_batch_tokens cap. Every other provider declares one (voyage=120K, azure-openai=8K, dashscope=8K, zhipu=8K, minimax=4K). Without it, gbrain's recursive-halving safety net never engages — batches dispatched purely on the char/4 estimator window will trip OpenAI's 1M-token TPM ceiling on token-dense pages (Discord exports, JSON dumps, code-heavy markdown), then retry storm and block the queue head. Setting cap to 100_000: - gbrain's batcher estimates tokens as chars/4 - Token-dense markdown+JSON tokenizes at ~chars/2.7 - 100K estimated = ~150K real worst-case, safely under OpenAI's 300K per-request hard cap and the 1M/min TPM ceiling - Leaves headroom for recursive-halving on outlier chunks (cherry picked from commit 40536aa)

…enLimitError OpenAI's /v1/embeddings endpoint hard-caps a single request at 300k tokens total across all input items. When the cap is exceeded it returns: Invalid 'input': maximum request size is 300000 tokens per request. None of the three existing regexes in isTokenLimitError matched this phrasing, so the recursive-halving safety net in embedSubBatch never engaged for OpenAI. The same fat page (a token-dense markdown export, e.g. a Discord transcript) would re-fail every pass, blocking forward progress on the whole batch indefinitely. Locally reproduced on a 31,129-chunk Postgres brain: 2,125 chunks stuck at 'remaining' across 30+ embed --stale passes with retry loops + sleep delays. Adding the two new patterns lets halving fire; the same backlog cleared in one pass after the regex change (the companion max_batch_tokens recipe fix from PR #924 caps fresh batches, but existing oversize pages still need halving to recover). Adds: - /maximum request size.*tokens/i — OpenAI verbatim - /max.*tokens.*per.*request/i — defensive against minor rewording Tests: - Regression test for the exact OpenAI error string - Coverage for the generic 'max tokens per request' variant - All 25 tests in adaptive-embed-batch.test.ts pass No behavior change for providers whose errors already matched. (cherry picked from commit b834e84)

…en deriving direct URL `deriveDirectUrl()` correctly rewrites the host (`aws-0-us-east-1.pooler.supabase.com` → `db.abcxyz.supabase.co`) but preserves the full pooler-form username (`postgres.abcxyz`). Supabase direct connections expect a bare `postgres` username — Supavisor uses the `.<ref>` suffix for tenant routing, but it's not a real database user. The auto-derived URL therefore fails to authenticate even with the correct password: password authentication failed for user "postgres.abcxyz" Strip the suffix to `postgres` whenever the project-ref was successfully extracted (same condition that triggers the host rewrite). The non-pooler username branch is unaffected — preserved as-is to keep the port-only fallback case working. Hit while exercising v0.30.1's dual-pool routing on a real Supabase brain; the kill switch (`GBRAIN_DISABLE_DIRECT_POOL=1`) papered over it locally but every Supabase user with a stock pooler URL would silently fall through to single-pool until the user-supplied a `GBRAIN_DIRECT_DATABASE_URL` override. With this fix, dual-pool works out of the box for the canonical Supabase shape. Test additions: - 1 case asserting bare `postgres:secret@` in the derived URL when project-ref is parseable from the pooler URL (the new behavior) - extends the existing "falls back to port-only" case with an assertion that non-pooler usernames are preserved (unchanged behavior) `bun run typecheck` clean. `deriveDirectUrl` test block passes 5/5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit ddf2c6a)

`gbrain init --help` (and `-h`) currently fall through to the smart-detection branch in runInit(), which scans cwd for .md files and on a directory with 1000+ files prints "Found ~1500 .md files. For a brain this size, Supabase gives faster search..." then defaults to PGLite — calling saveConfig() and overwriting any existing Postgres config with `engine: 'pglite' + database_path: ~/.gbrain/brain.pglite`. Confirmed in the wild: ran `gbrain init --help` from $HOME on a machine where ~/.gbrain/config.json pointed at a Supabase Postgres brain with 10K+ pages. The config was silently flipped to PGLite. The Supabase data was intact, but gbrain stopped pointing at it until the config was manually restored. Root cause: cli.ts:62-69 only routes --help → printOpHelp() for shared-op commands; CLI_ONLY commands (init, embed, etc.) fall through to their handler with --help still in argv. None of them check for it. Fix: add a --help/-h guard at the top of runInit() that prints help text and returns. Help should never mutate state — Postel's robustness principle for CLI tools. Help text covers all flags (engine selection, AI provider options, thin-client mode) so users running `--help` get the canonical list rather than having to read the source. A wider architectural fix — adding --help routing for all CLI_ONLY commands in cli.ts — is plausible follow-up, but each CLI_ONLY command would still need its own help text. This per-command pattern matches how shared ops handle it via printOpHelp(). Init is the highest-stakes case because it's the only CLI_ONLY command that calls saveConfig(). Smoke test: from a directory with 1500 .md files, with GBRAIN_HOME pointed at a fresh tempdir: - Before fix: ~/.gbrain/config.json materialized with engine: 'pglite' - After fix: help text printed, no config dir created `bun run typecheck` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit ed11fdd)

…loper global config The "installHook writes ... and sets core.hooksPath" test asserted `git config --get core.hooksPath` returns `.githooks`, which falls back to the global scope when local is unset. Developers who set `core.hooksPath` globally (common with dotfiles managers pointing at ~/.config/git/hooks) saw a deterministic FAIL because installHook intentionally respects an existing global value and skips writing the local one — exactly the documented contract. Fix: read via `git config --local --get core.hooksPath` (scope-locked) and branch the assertion on whether a global is already set. Both clean-CI (local should be '.githooks') and developer-with-global (local should be empty; installHook correctly didn't clobber) now pass deterministically. No API change. installHook behavior is unchanged. Verified locally with the affected test passing under `GIT_CONFIG_GLOBAL=~/.gitconfig` carrying `core.hooksPath=...`. (cherry picked from commit 0e4da2c)

Two defensive fixes: 1. normalizeText(): return empty string on null/undefined input instead of crashing with 'undefined is not an object (evaluating s.toLowerCase)' 2. loadRoutingFixtures(): validate that parsed fixture has 'intent' as a string before adding to fixtures array. Fixtures with wrong field names (e.g. 'input' instead of 'intent') are now reported as malformed with a helpful error message listing the actual keys found. Root cause: a skill's routing-eval.jsonl used {"input": ...} instead of {"intent": ...}. The JSON parsed fine but the cast to RoutingFixture was unchecked, so fixture.intent was undefined. normalizeText(undefined) then crashed. This made 'gbrain doctor' completely unusable. (cherry picked from commit b142bbd)

Replaces #517 (re-ported fresh against current scripts/run-e2e.sh after v0.23.1 rewrote the script — original cherry-pick would not apply). E2E tests call setupDB which writes $HOME/.gbrain/config.json pointing at the docker test container. When the container tears down, the user's real autopilot daemon wedges trying to connect to a vanished postgres. Three operators hit this within 16 days before the original PR filed. Fix: wrapper exports HOME + GBRAIN_HOME to a mktemp tmpdir BEFORE bun starts so config writes land in the tmpdir, with a post-run breach detector that compares md5 of the user's real config against pre-run. Both env vars required: loadConfig/saveConfig resolve via HOME while configPath honors GBRAIN_HOME. HOME set before bun starts because os.homedir() caches at first call. Test seam: test/gbrain-home-isolation.test.ts updated to assert against homedir() === configDir() when GBRAIN_HOME unset (correct under the safety wrapper itself) instead of the prior "not /tmp/" sentinel. Revert path: git revert <this-sha> if test:e2e regresses on master. Co-Authored-By: orendi84 <orendi84@users.noreply.github.com>

v0.40.7.0 Schema Cathedral v3 added the 'schema-suggest' phase between 'orphans' and 'purge' in ALL_PHASES, but the E2E phase-order test was not updated to match. ALL_PHASES vs EXPECTED_PHASES diverged and the shape-pin test failed every run on master. Surfaced during fix-wave: warm-narwhal E2E gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The 'end-to-end: updateSourceConfig persists timestamp visible to next listAllSources' test pinned last_full_cycle_at to a hardcoded '2026-05-22T15:00:00.000Z'. The 60-minute freshness window passed within ~1 hour of write — every run after the deadline classified the source as stale and dispatched it, breaking the test's .skippedFresh expectation. Switch to Date.now() - 30min relative timestamp (mirrors the prior 'source with last_full_cycle_at < 60min ago is skipped by gate' test). Surfaced during fix-wave: warm-narwhal E2E gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

init.ts:455 fails loud when multiple embedding providers are env-ready in non-TTY mode. The test sets ZEROENTROPY_API_KEY then runs init, but developer machines commonly have OPENAI_API_KEY + VOYAGE_API_KEY + ZEROENTROPY_API_KEY all set, so init sees 3 providers and exits 1. Save+unset OPENAI_API_KEY + VOYAGE_API_KEY in beforeEach, restore in afterEach. Now only ZE is env-ready, init picks it, schema sized to zembed-1's 1280d as the test expects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Voyage's /multimodalembeddings endpoint rejects AVIF as of 2026-05 with 'Please provide a valid base64-encoded image'. The prior comment ('AVIF is fine for an embed call') held at v0.27.x and regressed silently on the provider side. Add test/fixtures/images/tiny.png (16x16 RGB PNG, 1307 bytes generated via sips from the macOS default wallpaper). PNG is universally accepted by Voyage and other multimodal providers. Surfaced during fix-wave: warm-narwhal E2E gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

queue.add's subagent capability validator (classifyCapabilities → resolveRecipe) requires provider:model format and rejects bare ids with 'unknown provider'. resolveModel returns the bare id from TIER_DEFAULTS / DEFAULT_ALIASES (e.g. 'claude-sonnet-4-6'), which the validator then rejects, dropping the synthesize phase to status:fail with SYNTH_PHASE_FAIL. Narrow fix at the call site: if config.model has no colon AND starts with 'claude-', prefix 'anthropic:'. Other providers must already declare a colon. Avoids changing TIER_DEFAULTS / DEFAULT_ALIASES constant shapes, which would ripple across every resolveModel caller. Surfaced by dream-synthesize-chunking E2E during fix-wave: warm-narwhal. Affected tests: 'single-chunk transcript uses legacy idempotency key' and 'multi-chunk transcript spawns N children with chunk-suffixed idempotency keys' — both relied on result.details.children_submitted which only the ok() path sets; the failed() path returns details: {}. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… sources Two fixes in the E2E Doctor Command describe block, both surfaced by cross-file state pollution under the full sequential E2E run: 1. Pass --embedding-model openai:text-embedding-3-large to the init subprocess. Without the explicit flag, doctor inherits whatever the resolver picks from env keys (ZE if ZEROENTROPY_API_KEY is set, defaulting to zembed-1 at 1280d). The test's setupDB initialized schema at 1536d, so the dim mismatch fires embedding_width_consistency WARN, exiting doctor 1. 2. DELETE FROM sources WHERE id != 'default' in beforeAll. Prior E2E files leave non-default source rows (e.g. 'delta' from autopilot / sources tests). sync_freshness + cycle_freshness then FAIL on those orphans because they were never synced/cycled, exiting doctor 1. setupDB TRUNCATEs sources but schema.sql re-seeds 'default' via initSchema; this leaves only the canonical single-source brain the test expects. Surfaced during fix-wave: warm-narwhal E2E gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two cross-file isolation hardenings for the sequential E2E runner: 1. Terminate stale Postgres connections before each file. Without this, idle connections from the prior bun process's pool race with the next file's setupDB() TRUNCATE CASCADE, producing 'fixture pages disappear mid-test' failures. The terminate call is idempotent + ~50ms; first iteration is a no-op. 2. Hard outer timeout (180s per file) via gtimeout / timeout. bun's --timeout=60000 is per-test; if a PGLite WASM call hangs in beforeAll/afterAll (e.g. ingestion-roundtrip.test.ts wedging 30+ minutes on macOS), --timeout never fires and the entire suite wedges. Outer SIGKILL lets the suite advance and the file is recorded as failed for triage. Falls through to bare bun if neither gtimeout nor timeout is on PATH. Surfaced during fix-wave: warm-narwhal — 3 of 5 cross-file flakes caught by the connection flush; ingestion-roundtrip 30-min wedge caught by the outer timeout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…arm-narwhal # Conflicts: # test/e2e/autopilot-fanout-postgres.test.ts # test/e2e/dream-cycle-phase-order-pglite.test.ts # test/e2e/fresh-install-pglite.test.ts # test/e2e/voyage-multimodal.test.ts

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

CLAUDE.md gains the v0.41.3.0 note on src/core/cycle/synthesize.ts (narrow anthropic: prefix at the queue.add boundary so resolveModel's bare ids satisfy the subagent validator). llms-full.txt regenerated to match. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

mgunnin and others added 17 commits May 24, 2026 01:08

Merge remote-tracking branch 'origin/master' into garrytan/fix-wave-w…

2a4ad10

…arm-narwhal # Conflicts: # test/e2e/autopilot-fanout-postgres.test.ts # test/e2e/dream-cycle-phase-order-pglite.test.ts # test/e2e/fresh-install-pglite.test.ts # test/e2e/voyage-multimodal.test.ts

chore: bump version and changelog (v0.41.3.0)

e40ca4a

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.41.3.0 fix-wave: warm-narwhal — 6 community PRs + E2E reliability#1374

v0.41.3.0 fix-wave: warm-narwhal — 6 community PRs + E2E reliability#1374
garrytan wants to merge 17 commits into
masterfrom
garrytan/fix-wave-warm-narwhal

garrytan commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

garrytan commented May 24, 2026

Summary

Test Coverage

Pre-Landing Review

Plan Completion

TODOS

Documentation

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants