diff --git a/CHANGELOG.md b/CHANGELOG.md index 9efac2292..5525cabe4 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,45 @@ All notable changes to GBrain will be documented in this file. +## [0.41.5.0] - 2026-05-24 + +**Six community bug-fix PRs land + the E2E suite stops lying about itself.** A fix-wave triage swept the 333-PR queue, closed 10 PRs as already-shipped (with credit, naming the commits + files), and bundled 6 real fixes from the community into one collector. Plus three E2E-suite reliability fixes that surfaced while getting the full Docker suite to 100% green. + +You can now run `gbrain init --help` from inside a directory with 1000+ markdown files without it silently overwriting your Supabase config with PGLite. Your Supabase brain stops auth-failing at the direct connection because the pooler-form `postgres.` username now gets stripped before deriving the direct URL. OpenAI embedding batches that hit the 1M-token TPM ceiling actually engage the recursive-halving safety net (the `Invalid 'input': maximum request size is 300000 tokens per request.` error message now matches the recognition regex; pre-fix it never fired). The dream-cycle's synthesize phase stops dying with `subagent job rejected: data.model "claude-sonnet-4-6" references an unknown provider` because the queue.add subagent validator now sees `anthropic:claude-sonnet-4-6` from a narrow prefix-fix at the call site. + +To turn it on: `gbrain upgrade`. The contributor closure comments include the exact commit SHA + file:line that already shipped each fix, so anyone who filed a duplicate or stale PR can verify the work landed. + +What you'd see in a concrete example. Pre-this-release: `gbrain init --help` from `~/Documents` (with 1500+ `.md` files inferred as a brain candidate) writes `engine: 'pglite'` + `database_path: ~/.gbrain/brain.pglite` to your real config, silently disconnecting you from Supabase. Post-fix: `--help` short-circuits before any state write; help text prints; config untouched. Same shape for the other five fixes: documented bugs, real repros, real fixes, real tests. + +Things to know about. (1) Two cross-file E2E reliability fixes in `scripts/run-e2e.sh`: per-file `pg_terminate_backend` flush kills stale connections from the prior bun process before the next file's `setupDB()` TRUNCATE races them, AND a hard 180s outer `gtimeout`/`timeout` cap so a wedged PGLite WASM call in beforeAll/afterAll can't pin the entire suite (this caught a real 30+ min wedge on `ingestion-roundtrip.test.ts` during the wave). (2) The `gbrain doctor` test in `test/e2e/mechanical.test.ts` now pins `--embedding-model openai:text-embedding-3-large` on its init step (was inheriting whatever the resolver picked from env keys, producing dim-mismatch warnings under sequential E2E) and `DELETE FROM sources WHERE id != 'default'` in beforeAll (was inheriting orphan `delta` source rows from prior files, producing `sync_freshness FAIL`). + +Credit to the 6 community contributors whose PRs landed: @mgunnin (x2: max_batch_tokens + isTokenLimitError regex), @brandonlipman (x2: connection-manager username strip + init --help guard), @jeremyknows (frontmatter-install-hook test isolation), @garrytan-agents (routing-eval intent-field guard). Plus 10 superseded PRs closed with credit (#798, #1083, #918, #1119, #602, #758, #539, #1287, #1117, #1125) — fix already on master via prior waves (v0.31.7 #804 + v0.36.1.1 #1182 + v0.38.2.0 #1297 + others); contributor closures cite each landing commit + file location. + +### Itemized changes + +**The 6 community fix-wave cherry-picks:** + +- **#924 (mgunnin):** `src/core/ai/recipes/openai.ts` gains `max_batch_tokens: 100_000` on the embedding touchpoint. Pre-fix OpenAI was the only recipe missing this cap; the recursive-halving safety net never engaged on token-dense pages (Discord exports, JSON dumps, code-heavy markdown), then retry storm and block the queue head. 100K estimated = ~150K real worst-case, safely under OpenAI's 300K per-request hard cap. +- **#990 (mgunnin):** `src/core/ai/gateway.ts:1264` `isTokenLimitError` now matches `maximum request size.*tokens` so OpenAI's actual error string triggers recursive halving. Pre-fix the regex caught Voyage and generic shapes but not OpenAI's literal wording. Tests in `test/ai/adaptive-embed-batch.test.ts` pin the recognition. +- **#761 (brandonlipman):** `src/core/connection-manager.ts:144-148` `deriveDirectUrl` now strips the `postgres.` pooler-form username down to bare `postgres` when synthesizing the Supabase direct URL. Pre-fix Supabase direct connections silently failed auth because they expect bare `postgres` (the `.` suffix is a pooler-routing-only thing). Tests in `test/connection-manager.serial.test.ts`. +- **#762 (brandonlipman):** `src/commands/init.ts:13-16` adds a `--help`/`-h` short-circuit at the top of `runInit`. Pre-fix `gbrain init --help` from a directory with many `.md` files would fall through to smart-detection, scan cwd, then `saveConfig()` — silently overwriting any existing Postgres config with PGLite defaults. Confirmed in the wild on a 10K-page Supabase brain. +- **#916 (jeremyknows):** `test/frontmatter-install-hook.test.ts` test isolation fix — uses `--local --get` instead of `--get` (which falls back to global config). Without this, developers with `core.hooksPath` set globally (dotfiles managers pointing at `~/.config/git/hooks`) see a deterministic FAIL. +- **#1332 (garrytan-agents):** `src/core/routing-eval.ts` adds defensive guard so `loadRoutingFixtures({intent: undefined})` doesn't crash `gbrain doctor` with `undefined is not an object (evaluating s.toLowerCase)`. Fixture validation now reports malformed entries instead of crashing the whole doctor run. + +**Three E2E reliability fixes (surfaced during this wave):** + +- **`src/core/cycle/synthesize.ts:395-404`** narrow `anthropic:` prefix fix at the queue.add boundary. `resolveModel` returns the bare id from `TIER_DEFAULTS`/`DEFAULT_ALIASES` (e.g. `claude-sonnet-4-6`); the subagent validator requires `provider:model` and rejected with `unknown provider`, dropping synthesize to `status: fail` with `SYNTH_PHASE_FAIL`. Narrow conditional prefix at the call site (only when no colon AND starts with `claude-`) avoids changing the constants which would ripple across every `resolveModel` caller. +- **`scripts/run-e2e.sh` per-file connection flush + outer timeout.** Two cross-file isolation hardenings: (1) `psql -At -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE pid != pg_backend_pid() AND datname = current_database()"` before each file kills idle connections from the prior bun process's pool, which were racing with the next file's `TRUNCATE CASCADE` and producing 'fixture pages disappear mid-test' failures; (2) hard 180s outer `gtimeout`/`timeout` cap so a PGLite WASM hang in beforeAll/afterAll can't wedge the entire suite. Both surfaced during the wave: 3 of 5 cross-file flakes caught by the connection flush; `ingestion-roundtrip` 30-min wedge caught by the outer timeout. +- **`test/e2e/mechanical.test.ts` doctor test hardening.** Two fixes: pin `--embedding-model openai:text-embedding-3-large` on the init subprocess (was inheriting env-resolver defaults that produced dim-mismatch under sequential E2E); `DELETE FROM sources WHERE id != 'default'` in beforeAll (was inheriting orphan `delta` source rows from prior files, producing `sync_freshness FAIL`). + +### For contributors + +Wave triage process notes: +- 333-PR queue evaluated via per-PR isolation runs + cross-reference against master HEAD (the load-bearing trick: read each PR's diff against its OWN base, not against current master, to see the actual intended change without v0.38-0.40 reverts contaminating the view). +- 10 PRs closed-as-superseded with credit comments citing the landing commit SHA + file:line so contributors can verify the fix shipped. The contributor close template is captured in `~/.claude/plans/time-for-fix-wave-warm-narwhal.md`. +- 2 mid-wave additional supersession discoveries (PR #1117 + PR #1125) caught via the `git log -S "configuredProviderIds" origin/master` pattern after master had already absorbed them via v0.36.1.1 #1182 (28-fix collector from 5 weeks ago); both closed with credit pointing at the absorbed commit. +- Tests on the wave reached 117/117 files / 821/821 tests pass against fresh Docker pgvector container after fixing the cross-file flake class. + ## [0.41.1.0] - 2026-05-24 **Your CI can now fail a PR when search retrieval gets worse.** Before this release, gbrain shipped the pieces you'd need to measure retrieval quality (capture, replay, nightly probe, cross-modal runner) but nothing connected them into a loop. You could see a quality drop on your screen but nothing automatically caught it. v0.41 closes the loop end-to-end. You publish a baseline once — a snapshot of how your brain performs on a set of real queries — and then `gbrain eval gate` runs against that baseline on every PR. If results get materially worse OR if the brain stops finding pages it used to find, the command exits non-zero and CI turns red. The wave also wires the nightly quality probe into the autopilot daemon (opt-in via config) so a brain you've left running notices its own degradation. diff --git a/CLAUDE.md b/CLAUDE.md index 52d77e4df..eefc14c32 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -248,7 +248,7 @@ strict behavior when unset. - `src/core/import-checkpoint.ts` (v0.34.2.0) — `loadCheckpoint(brainDir)`, `saveCheckpoint(brainDir, completed)`, `resumeFilter(files, completed, brainDir)`, `clearCheckpoint()`, plus the `ImportCheckpoint` type. Path-set checkpoint format (`{schema_version, brainDir, completed: string[]}`) replaces the v0.33.x positional `{processedIndex: N}` format. Atomic write via `.tmp` + `rename()` so a mid-write crash never leaves a partial JSON. `loadCheckpoint` returns `null` on: missing file, malformed JSON, brainDir mismatch (you ran import against a different brain), and the old positional format (logged to stderr before being discarded). `resumeFilter` returns `{toProcess, skippedCount}` — pure, no I/O, deterministic. `clearCheckpoint` is no-op-on-missing for clean-exit cleanup. Honors `GBRAIN_HOME` via `gbrainPath()` so test isolation via `withEnv({GBRAIN_HOME: tmpdir})` works without monkey-patching the fs layer. Best-effort persistence — `saveCheckpoint` logs warnings on write errors but never throws, so import keeps making progress even if disk is full. - `src/core/sort-newest-first.ts` (v0.34.2.0) — single source of truth for the descending-lex sort that `gbrain import` and `gbrain sync` both apply. Mutates in place (Array.prototype.sort semantics), returns the same array reference for fluent chaining. Empty/single-element inputs short-circuit. Future ordering changes flip one line in this helper instead of touching two CLI commands. Pinned by `test/sort-newest-first.test.ts` (5 hermetic cases: descending order, mixed prefixes, empty input, single-element input, in-place-mutation contract). - `src/core/cycle.ts` — v0.17 brain maintenance cycle primitive (extended to **9 phases in v0.29**). `runCycle(engine: BrainEngine | null, opts: CycleOpts): Promise` composes phases in semantically-driven order: **lint → backlinks → sync → synthesize → extract → patterns → recompute_emotional_weight → embed → orphans**. v0.29 adds the `recompute_emotional_weight` phase between patterns and embed; it sees the union of `syncPagesAffected` + `synthesizeWrittenSlugs` for incremental mode, or all pages when neither anchor is set (full backfill via `gbrain dream --phase recompute_emotional_weight`). v0.29 also extends `CycleReport.totals` with `pages_emotional_weight_recomputed` (additive, schema_version stays "1"). v0.23's `synthesize` phase runs after sync (cross-references see fresh brain) and before extract (auto-link materializes its writes); `patterns` runs after extract so it reads a fresh graph (codex finding #7 — subagent put_page sets `ctx.remote=true` and skips auto-link/timeline by default; extract is the canonical materialization). Three callers: `gbrain dream` CLI, `gbrain autopilot` daemon's inline path, and the Minions `autopilot-cycle` handler. Coordination via `gbrain_cycle_locks` DB table + `~/.gbrain/cycle.lock` file lock with PID-liveness for PGLite. `CycleReport.schema_version: "1"` is stable; totals additively grew in v0.23 (`transcripts_processed`, `synth_pages_written`, `patterns_written`). `yieldBetweenPhases` runs between phases. **v0.23 added `yieldDuringPhase`** for in-phase keepalive — synthesize/patterns call it during long waits to renew the cycle-lock TTL. Engine nullable; lock-skip on read-only phase selections. v0.22.1 (#403): `CycleOpts.signal?: AbortSignal` propagates the worker's abort signal; `checkAborted()` fires between every phase. v0.22.1 (#417): `runPhaseSync` returns `pagesAffected` via `SyncPhaseResult`; `runCycle` captures it and threads to `runPhaseExtract` as the 4th arg. v0.22.1 (Codex F2): `runPhaseSync` takes `willRunExtractPhase: boolean` and sets `noExtract: phases.includes('extract')` so `gbrain dream --phase sync` doesn't silently lose extraction. v0.22.5 (#475): `resolveSourceForDir(engine, brainDir)` threads `sourceId` to `performSync()` so sync reads the per-source `sources.last_commit` anchor instead of the drift-prone global `config.sync.last_commit` key. -- `src/core/cycle/synthesize.ts` (v0.23) — Synthesize phase: conversation-transcript-to-brain pipeline. Reads from `dream.synthesize.session_corpus_dir`, runs cheap Haiku verdict (cached in `dream_verdicts`), then fans out one Sonnet subagent per worth-processing transcript with `allowed_slug_prefixes` (sourced from `skills/_brain-filing-rules.json` `dream_synthesize_paths.globs`). Orchestrator collects slugs from `subagent_tool_executions` (NOT `pages.updated_at` — codex finding #2) and reverse-renders DB → markdown via `serializeMarkdown`. Cooldown via `dream.synthesize.last_completion_ts`, written ONLY on success. Idempotency key `dream:synth::`. Auto-commit deferred to v1.1 (codex #5). `--dry-run` runs Haiku, skips Sonnet (codex #8). Subagent never gets fs-write access. **v0.23.2:** `renderPageToMarkdown` (now exported) stamps `dream_generated: true` and `dream_cycle_date` into every reverse-write's frontmatter; `writeSummaryPage` does the same on the dream-cycle summary index. The marker is the explicit identity surface checked by `isDreamOutput` in `transcript-discovery.ts` — replaces the v0.23.1 content-prefix heuristic that could miss real output (`serializeMarkdown` doesn't embed slugs in body) and false-positive on user transcripts citing brain pages. `judgeSignificance` and `JudgeClient` are exported; `judgeSignificance` accepts a `verdictModel` parameter (default `claude-haiku-4-5-20251001`) loaded from `dream.synthesize.verdict_model` via `loadSynthConfig`. **v0.30.2:** model-aware chunker `splitTranscriptByBudget(content, contentHash, maxChars)` splits oversized transcripts at paragraph boundaries (`## Topic:` → `---` → `\n` ladder) using a deterministic offset seeded from the first 32 bits of `contentHash` so retries chunk identically. Per-chunk char budget computed from `MODEL_CONTEXT_TOKENS[resolvedModel] × 0.9 × 3.5 chars/token`; non-Anthropic ids fall back to a 180K-token safe default with a once-per-process stderr warning. Operator overrides: `dream.synthesize.max_prompt_tokens` (floor 100K, wins when set) and `dream.synthesize.max_chunks_per_transcript` (default 24). Per-chunk idempotency keys `dream:synth:::cof`; single-chunk transcripts preserve the legacy `dream:synth::` key byte-for-byte (D8 lookup), so existing brains skip with `already_synthesized_legacy_single_chunk` instead of re-spending Sonnet on upgrade. `collectChildPutPageSlugs` raw-fetches every (job_id, slug) pair (not `SELECT DISTINCT`) and rewrites bare-hash6 slugs to `-c` for chunked children (D6 — orchestrator-side, zero Sonnet trust). Cap-hit skips don't write to `dream_verdicts`, so raising the cap on next run re-attempts cleanly. D7 scope: bounds INITIAL prompt size only; tool-loop turn-N accumulation is caught by the v0.30.2 terminal-error classification in `subagent.ts`, not bounded ahead of time. +- `src/core/cycle/synthesize.ts` (v0.23) — Synthesize phase: conversation-transcript-to-brain pipeline. Reads from `dream.synthesize.session_corpus_dir`, runs cheap Haiku verdict (cached in `dream_verdicts`), then fans out one Sonnet subagent per worth-processing transcript with `allowed_slug_prefixes` (sourced from `skills/_brain-filing-rules.json` `dream_synthesize_paths.globs`). Orchestrator collects slugs from `subagent_tool_executions` (NOT `pages.updated_at` — codex finding #2) and reverse-renders DB → markdown via `serializeMarkdown`. Cooldown via `dream.synthesize.last_completion_ts`, written ONLY on success. Idempotency key `dream:synth::`. Auto-commit deferred to v1.1 (codex #5). `--dry-run` runs Haiku, skips Sonnet (codex #8). Subagent never gets fs-write access. **v0.23.2:** `renderPageToMarkdown` (now exported) stamps `dream_generated: true` and `dream_cycle_date` into every reverse-write's frontmatter; `writeSummaryPage` does the same on the dream-cycle summary index. The marker is the explicit identity surface checked by `isDreamOutput` in `transcript-discovery.ts` — replaces the v0.23.1 content-prefix heuristic that could miss real output (`serializeMarkdown` doesn't embed slugs in body) and false-positive on user transcripts citing brain pages. `judgeSignificance` and `JudgeClient` are exported; `judgeSignificance` accepts a `verdictModel` parameter (default `claude-haiku-4-5-20251001`) loaded from `dream.synthesize.verdict_model` via `loadSynthConfig`. **v0.30.2:** model-aware chunker `splitTranscriptByBudget(content, contentHash, maxChars)` splits oversized transcripts at paragraph boundaries (`## Topic:` → `---` → `\n` ladder) using a deterministic offset seeded from the first 32 bits of `contentHash` so retries chunk identically. Per-chunk char budget computed from `MODEL_CONTEXT_TOKENS[resolvedModel] × 0.9 × 3.5 chars/token`; non-Anthropic ids fall back to a 180K-token safe default with a once-per-process stderr warning. Operator overrides: `dream.synthesize.max_prompt_tokens` (floor 100K, wins when set) and `dream.synthesize.max_chunks_per_transcript` (default 24). Per-chunk idempotency keys `dream:synth:::cof`; single-chunk transcripts preserve the legacy `dream:synth::` key byte-for-byte (D8 lookup), so existing brains skip with `already_synthesized_legacy_single_chunk` instead of re-spending Sonnet on upgrade. `collectChildPutPageSlugs` raw-fetches every (job_id, slug) pair (not `SELECT DISTINCT`) and rewrites bare-hash6 slugs to `-c` for chunked children (D6 — orchestrator-side, zero Sonnet trust). Cap-hit skips don't write to `dream_verdicts`, so raising the cap on next run re-attempts cleanly. D7 scope: bounds INITIAL prompt size only; tool-loop turn-N accumulation is caught by the v0.30.2 terminal-error classification in `subagent.ts`, not bounded ahead of time. **v0.41.5.0:** narrow `anthropic:` prefix fix at the queue.add boundary (lines 395-404). `resolveModel` returns bare ids from `TIER_DEFAULTS`/`DEFAULT_ALIASES` (e.g. `claude-sonnet-4-6`); the subagent validator requires `provider:model` form and was rejecting with `unknown provider`, dropping synthesize to `status: fail` with `SYNTH_PHASE_FAIL`. Conditional prefix at the call site (only when no colon AND starts with `claude-`) avoids changing the shared constants which would ripple across every `resolveModel` caller. - `src/core/cycle/patterns.ts` (v0.23) — Patterns phase: cross-session theme detection over reflections within `dream.patterns.lookback_days` (default 30). Names a pattern only when ≥`dream.patterns.min_evidence` (default 3) reflections support it. Single Sonnet subagent; same allow-list path as synthesize. Runs AFTER `extract` so the graph is fresh. - `src/core/cycle/extract-facts.ts` (v0.32.2, extended v0.35.6.0) — extract_facts cycle phase. v0.32.2 contract: fence is canonical; per-page wipe (`deleteFactsForPage`) + reinsert from `parseFactsFence` + `extractFactsFromFenceText` + `engine.insertFacts`. Empty-fence guard refuses when v0.31 legacy rows (`row_num IS NULL AND entity_slug IS NOT NULL`) pend the v0_32_2 backfill (status: warn, hint: `gbrain apply-migrations --yes`). **v0.35.6.0** adds a phantom-redirect pre-pass that runs AFTER the legacy-row guard, BEFORE the main reconcile loop. When `opts.brainDir` is set, `runPhantomRedirectPass(engine, brainDir, sourceId, dryRun)` walks unprefixed-slug pages capped by `GBRAIN_PHANTOM_REDIRECT_LIMIT` (default 50). The pass returns `touched_canonicals` — canonical slugs whose disk fence was merged with phantom rows; `runExtractFacts` UNIONs them into the main reconcile slug set so canonical's DB facts derive from the merged fence in the same cycle (round-14 scenario-B fix: phantom had only-on-disk fence, no DB facts). `ExtractFactsResult` gains six phantom fields: `phantomsScanned`, `phantomsRedirected`, `phantomsAmbiguous`, `phantomsSkippedDrift`, `phantomsLockBusy`, `phantomsMorePending`. Three of those bubble to `CycleReport.totals` (`phantoms_redirected`, `phantoms_ambiguous`, `phantoms_skipped_drift`). - `src/core/entities/resolve.ts` (v0.30+, extended v0.35.6.0) — Free-form entity name → canonical slug resolution. `resolveEntitySlug(engine, source_id, raw)`: exact slug → fuzzy (pg_trgm @ 0.4 threshold) → bare-name prefix expansion (`people/-%` then `companies/-%` using correlated-subquery `connection_count` for tiebreaker) → deterministic `slugify` fallback. **v0.35.6.0** exports two new helpers for the phantom-redirect pass: `resolvePhantomCanonical(engine, sourceId, phantomSlug)` — variant that SKIPS the exact-slug step (codex #1: phantom slug `'alice'` exact-matches itself, would make the redirect handler a no-op); returns the canonical only when result is non-null AND contains `/`. `findPrefixCandidates(engine, sourceId, token)` — standalone SQL query returning ALL candidates across `PREFIX_EXPANSION_DIRS` (currently hardcoded `['people', 'companies']`) using `slug LIKE ANY($N::text[])` over patterns `dir/token` + `dir/token-%`; cap of 10 ordered by `connection_count DESC, slug ASC`. NOT a wrapper around `tryPrefixExpansion` because that path returns per-dir top-1 and suppresses ambiguity by design (codex #11). Pinned by `test/phantom-redirect.test.ts` resolvePhantomCanonical describe (3 cases) + findPrefixCandidates describe (6 cases including multi-dir ambiguity and the `people/aliceberg`-doesn't-match-`alice` false-positive guard). diff --git a/VERSION b/VERSION index 3e8954862..d25dbca5c 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.41.1.0 \ No newline at end of file +0.41.5.0 \ No newline at end of file diff --git a/llms-full.txt b/llms-full.txt index 05c16ec80..ff4b2d268 100644 --- a/llms-full.txt +++ b/llms-full.txt @@ -390,7 +390,7 @@ strict behavior when unset. - `src/core/import-checkpoint.ts` (v0.34.2.0) — `loadCheckpoint(brainDir)`, `saveCheckpoint(brainDir, completed)`, `resumeFilter(files, completed, brainDir)`, `clearCheckpoint()`, plus the `ImportCheckpoint` type. Path-set checkpoint format (`{schema_version, brainDir, completed: string[]}`) replaces the v0.33.x positional `{processedIndex: N}` format. Atomic write via `.tmp` + `rename()` so a mid-write crash never leaves a partial JSON. `loadCheckpoint` returns `null` on: missing file, malformed JSON, brainDir mismatch (you ran import against a different brain), and the old positional format (logged to stderr before being discarded). `resumeFilter` returns `{toProcess, skippedCount}` — pure, no I/O, deterministic. `clearCheckpoint` is no-op-on-missing for clean-exit cleanup. Honors `GBRAIN_HOME` via `gbrainPath()` so test isolation via `withEnv({GBRAIN_HOME: tmpdir})` works without monkey-patching the fs layer. Best-effort persistence — `saveCheckpoint` logs warnings on write errors but never throws, so import keeps making progress even if disk is full. - `src/core/sort-newest-first.ts` (v0.34.2.0) — single source of truth for the descending-lex sort that `gbrain import` and `gbrain sync` both apply. Mutates in place (Array.prototype.sort semantics), returns the same array reference for fluent chaining. Empty/single-element inputs short-circuit. Future ordering changes flip one line in this helper instead of touching two CLI commands. Pinned by `test/sort-newest-first.test.ts` (5 hermetic cases: descending order, mixed prefixes, empty input, single-element input, in-place-mutation contract). - `src/core/cycle.ts` — v0.17 brain maintenance cycle primitive (extended to **9 phases in v0.29**). `runCycle(engine: BrainEngine | null, opts: CycleOpts): Promise` composes phases in semantically-driven order: **lint → backlinks → sync → synthesize → extract → patterns → recompute_emotional_weight → embed → orphans**. v0.29 adds the `recompute_emotional_weight` phase between patterns and embed; it sees the union of `syncPagesAffected` + `synthesizeWrittenSlugs` for incremental mode, or all pages when neither anchor is set (full backfill via `gbrain dream --phase recompute_emotional_weight`). v0.29 also extends `CycleReport.totals` with `pages_emotional_weight_recomputed` (additive, schema_version stays "1"). v0.23's `synthesize` phase runs after sync (cross-references see fresh brain) and before extract (auto-link materializes its writes); `patterns` runs after extract so it reads a fresh graph (codex finding #7 — subagent put_page sets `ctx.remote=true` and skips auto-link/timeline by default; extract is the canonical materialization). Three callers: `gbrain dream` CLI, `gbrain autopilot` daemon's inline path, and the Minions `autopilot-cycle` handler. Coordination via `gbrain_cycle_locks` DB table + `~/.gbrain/cycle.lock` file lock with PID-liveness for PGLite. `CycleReport.schema_version: "1"` is stable; totals additively grew in v0.23 (`transcripts_processed`, `synth_pages_written`, `patterns_written`). `yieldBetweenPhases` runs between phases. **v0.23 added `yieldDuringPhase`** for in-phase keepalive — synthesize/patterns call it during long waits to renew the cycle-lock TTL. Engine nullable; lock-skip on read-only phase selections. v0.22.1 (#403): `CycleOpts.signal?: AbortSignal` propagates the worker's abort signal; `checkAborted()` fires between every phase. v0.22.1 (#417): `runPhaseSync` returns `pagesAffected` via `SyncPhaseResult`; `runCycle` captures it and threads to `runPhaseExtract` as the 4th arg. v0.22.1 (Codex F2): `runPhaseSync` takes `willRunExtractPhase: boolean` and sets `noExtract: phases.includes('extract')` so `gbrain dream --phase sync` doesn't silently lose extraction. v0.22.5 (#475): `resolveSourceForDir(engine, brainDir)` threads `sourceId` to `performSync()` so sync reads the per-source `sources.last_commit` anchor instead of the drift-prone global `config.sync.last_commit` key. -- `src/core/cycle/synthesize.ts` (v0.23) — Synthesize phase: conversation-transcript-to-brain pipeline. Reads from `dream.synthesize.session_corpus_dir`, runs cheap Haiku verdict (cached in `dream_verdicts`), then fans out one Sonnet subagent per worth-processing transcript with `allowed_slug_prefixes` (sourced from `skills/_brain-filing-rules.json` `dream_synthesize_paths.globs`). Orchestrator collects slugs from `subagent_tool_executions` (NOT `pages.updated_at` — codex finding #2) and reverse-renders DB → markdown via `serializeMarkdown`. Cooldown via `dream.synthesize.last_completion_ts`, written ONLY on success. Idempotency key `dream:synth::`. Auto-commit deferred to v1.1 (codex #5). `--dry-run` runs Haiku, skips Sonnet (codex #8). Subagent never gets fs-write access. **v0.23.2:** `renderPageToMarkdown` (now exported) stamps `dream_generated: true` and `dream_cycle_date` into every reverse-write's frontmatter; `writeSummaryPage` does the same on the dream-cycle summary index. The marker is the explicit identity surface checked by `isDreamOutput` in `transcript-discovery.ts` — replaces the v0.23.1 content-prefix heuristic that could miss real output (`serializeMarkdown` doesn't embed slugs in body) and false-positive on user transcripts citing brain pages. `judgeSignificance` and `JudgeClient` are exported; `judgeSignificance` accepts a `verdictModel` parameter (default `claude-haiku-4-5-20251001`) loaded from `dream.synthesize.verdict_model` via `loadSynthConfig`. **v0.30.2:** model-aware chunker `splitTranscriptByBudget(content, contentHash, maxChars)` splits oversized transcripts at paragraph boundaries (`## Topic:` → `---` → `\n` ladder) using a deterministic offset seeded from the first 32 bits of `contentHash` so retries chunk identically. Per-chunk char budget computed from `MODEL_CONTEXT_TOKENS[resolvedModel] × 0.9 × 3.5 chars/token`; non-Anthropic ids fall back to a 180K-token safe default with a once-per-process stderr warning. Operator overrides: `dream.synthesize.max_prompt_tokens` (floor 100K, wins when set) and `dream.synthesize.max_chunks_per_transcript` (default 24). Per-chunk idempotency keys `dream:synth:::cof`; single-chunk transcripts preserve the legacy `dream:synth::` key byte-for-byte (D8 lookup), so existing brains skip with `already_synthesized_legacy_single_chunk` instead of re-spending Sonnet on upgrade. `collectChildPutPageSlugs` raw-fetches every (job_id, slug) pair (not `SELECT DISTINCT`) and rewrites bare-hash6 slugs to `-c` for chunked children (D6 — orchestrator-side, zero Sonnet trust). Cap-hit skips don't write to `dream_verdicts`, so raising the cap on next run re-attempts cleanly. D7 scope: bounds INITIAL prompt size only; tool-loop turn-N accumulation is caught by the v0.30.2 terminal-error classification in `subagent.ts`, not bounded ahead of time. +- `src/core/cycle/synthesize.ts` (v0.23) — Synthesize phase: conversation-transcript-to-brain pipeline. Reads from `dream.synthesize.session_corpus_dir`, runs cheap Haiku verdict (cached in `dream_verdicts`), then fans out one Sonnet subagent per worth-processing transcript with `allowed_slug_prefixes` (sourced from `skills/_brain-filing-rules.json` `dream_synthesize_paths.globs`). Orchestrator collects slugs from `subagent_tool_executions` (NOT `pages.updated_at` — codex finding #2) and reverse-renders DB → markdown via `serializeMarkdown`. Cooldown via `dream.synthesize.last_completion_ts`, written ONLY on success. Idempotency key `dream:synth::`. Auto-commit deferred to v1.1 (codex #5). `--dry-run` runs Haiku, skips Sonnet (codex #8). Subagent never gets fs-write access. **v0.23.2:** `renderPageToMarkdown` (now exported) stamps `dream_generated: true` and `dream_cycle_date` into every reverse-write's frontmatter; `writeSummaryPage` does the same on the dream-cycle summary index. The marker is the explicit identity surface checked by `isDreamOutput` in `transcript-discovery.ts` — replaces the v0.23.1 content-prefix heuristic that could miss real output (`serializeMarkdown` doesn't embed slugs in body) and false-positive on user transcripts citing brain pages. `judgeSignificance` and `JudgeClient` are exported; `judgeSignificance` accepts a `verdictModel` parameter (default `claude-haiku-4-5-20251001`) loaded from `dream.synthesize.verdict_model` via `loadSynthConfig`. **v0.30.2:** model-aware chunker `splitTranscriptByBudget(content, contentHash, maxChars)` splits oversized transcripts at paragraph boundaries (`## Topic:` → `---` → `\n` ladder) using a deterministic offset seeded from the first 32 bits of `contentHash` so retries chunk identically. Per-chunk char budget computed from `MODEL_CONTEXT_TOKENS[resolvedModel] × 0.9 × 3.5 chars/token`; non-Anthropic ids fall back to a 180K-token safe default with a once-per-process stderr warning. Operator overrides: `dream.synthesize.max_prompt_tokens` (floor 100K, wins when set) and `dream.synthesize.max_chunks_per_transcript` (default 24). Per-chunk idempotency keys `dream:synth:::cof`; single-chunk transcripts preserve the legacy `dream:synth::` key byte-for-byte (D8 lookup), so existing brains skip with `already_synthesized_legacy_single_chunk` instead of re-spending Sonnet on upgrade. `collectChildPutPageSlugs` raw-fetches every (job_id, slug) pair (not `SELECT DISTINCT`) and rewrites bare-hash6 slugs to `-c` for chunked children (D6 — orchestrator-side, zero Sonnet trust). Cap-hit skips don't write to `dream_verdicts`, so raising the cap on next run re-attempts cleanly. D7 scope: bounds INITIAL prompt size only; tool-loop turn-N accumulation is caught by the v0.30.2 terminal-error classification in `subagent.ts`, not bounded ahead of time. **v0.41.5.0:** narrow `anthropic:` prefix fix at the queue.add boundary (lines 395-404). `resolveModel` returns bare ids from `TIER_DEFAULTS`/`DEFAULT_ALIASES` (e.g. `claude-sonnet-4-6`); the subagent validator requires `provider:model` form and was rejecting with `unknown provider`, dropping synthesize to `status: fail` with `SYNTH_PHASE_FAIL`. Conditional prefix at the call site (only when no colon AND starts with `claude-`) avoids changing the shared constants which would ripple across every `resolveModel` caller. - `src/core/cycle/patterns.ts` (v0.23) — Patterns phase: cross-session theme detection over reflections within `dream.patterns.lookback_days` (default 30). Names a pattern only when ≥`dream.patterns.min_evidence` (default 3) reflections support it. Single Sonnet subagent; same allow-list path as synthesize. Runs AFTER `extract` so the graph is fresh. - `src/core/cycle/extract-facts.ts` (v0.32.2, extended v0.35.6.0) — extract_facts cycle phase. v0.32.2 contract: fence is canonical; per-page wipe (`deleteFactsForPage`) + reinsert from `parseFactsFence` + `extractFactsFromFenceText` + `engine.insertFacts`. Empty-fence guard refuses when v0.31 legacy rows (`row_num IS NULL AND entity_slug IS NOT NULL`) pend the v0_32_2 backfill (status: warn, hint: `gbrain apply-migrations --yes`). **v0.35.6.0** adds a phantom-redirect pre-pass that runs AFTER the legacy-row guard, BEFORE the main reconcile loop. When `opts.brainDir` is set, `runPhantomRedirectPass(engine, brainDir, sourceId, dryRun)` walks unprefixed-slug pages capped by `GBRAIN_PHANTOM_REDIRECT_LIMIT` (default 50). The pass returns `touched_canonicals` — canonical slugs whose disk fence was merged with phantom rows; `runExtractFacts` UNIONs them into the main reconcile slug set so canonical's DB facts derive from the merged fence in the same cycle (round-14 scenario-B fix: phantom had only-on-disk fence, no DB facts). `ExtractFactsResult` gains six phantom fields: `phantomsScanned`, `phantomsRedirected`, `phantomsAmbiguous`, `phantomsSkippedDrift`, `phantomsLockBusy`, `phantomsMorePending`. Three of those bubble to `CycleReport.totals` (`phantoms_redirected`, `phantoms_ambiguous`, `phantoms_skipped_drift`). - `src/core/entities/resolve.ts` (v0.30+, extended v0.35.6.0) — Free-form entity name → canonical slug resolution. `resolveEntitySlug(engine, source_id, raw)`: exact slug → fuzzy (pg_trgm @ 0.4 threshold) → bare-name prefix expansion (`people/-%` then `companies/-%` using correlated-subquery `connection_count` for tiebreaker) → deterministic `slugify` fallback. **v0.35.6.0** exports two new helpers for the phantom-redirect pass: `resolvePhantomCanonical(engine, sourceId, phantomSlug)` — variant that SKIPS the exact-slug step (codex #1: phantom slug `'alice'` exact-matches itself, would make the redirect handler a no-op); returns the canonical only when result is non-null AND contains `/`. `findPrefixCandidates(engine, sourceId, token)` — standalone SQL query returning ALL candidates across `PREFIX_EXPANSION_DIRS` (currently hardcoded `['people', 'companies']`) using `slug LIKE ANY($N::text[])` over patterns `dir/token` + `dir/token-%`; cap of 10 ordered by `connection_count DESC, slug ASC`. NOT a wrapper around `tryPrefixExpansion` because that path returns per-dir top-1 and suppresses ambiguity by design (codex #11). Pinned by `test/phantom-redirect.test.ts` resolvePhantomCanonical describe (3 cases) + findPrefixCandidates describe (6 cases including multi-dir ambiguity and the `people/aliceberg`-doesn't-match-`alice` false-positive guard). diff --git a/package.json b/package.json index bf66f075b..ebdf7e8b1 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "gbrain", - "version": "0.41.1.0", + "version": "0.41.5.0", "description": "Postgres-native personal knowledge brain with hybrid RAG search", "type": "module", "main": "src/core/index.ts", diff --git a/scripts/run-e2e.sh b/scripts/run-e2e.sh index f93c7977c..d945ea2e2 100755 --- a/scripts/run-e2e.sh +++ b/scripts/run-e2e.sh @@ -134,7 +134,29 @@ for f in "${files[@]}"; do name=$(basename "$f") echo "" echo "=== $name ===" - if output=$(bun test --timeout=60000 "$f" 2>&1); then + # Cross-file isolation: terminate any stale connections from the prior + # file's pool before the next file's setupDB() runs. Without this, + # idle postgres connections from the previous bun process race with + # the next file's TRUNCATE CASCADE → cross-file fixture-state pollution + # (people/sarah-chen disappears mid-test, etc.). The terminate call is + # idempotent + fast (~50ms); on the first iteration there's nothing to + # terminate so it's effectively free. + if [ -n "${DATABASE_URL:-}" ]; then + psql "$DATABASE_URL" -At -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE pid != pg_backend_pid() AND datname = current_database()" >/dev/null 2>&1 || true + fi + # Hard outer timeout (180s per file). bun's --timeout is per-test; if a + # PGLite WASM call hangs in beforeAll/afterAll, --timeout never fires and + # the file wedges indefinitely. gtimeout/timeout SIGKILLs the file so the + # suite advances. gtimeout (macOS via coreutils) preferred; timeout (Linux) + # fallback; bare bun (no outer cap) if neither is installed. + if command -v gtimeout >/dev/null 2>&1; then + TIMEOUT_CMD="gtimeout 180" + elif command -v timeout >/dev/null 2>&1; then + TIMEOUT_CMD="timeout 180" + else + TIMEOUT_CMD="" + fi + if output=$($TIMEOUT_CMD bun test --timeout=60000 "$f" 2>&1); then pass_files=$((pass_files + 1)) # Extract pass/fail counts from bun's summary (e.g., "123 pass") p=$(echo "$output" | grep -oE '[0-9]+ pass' | tail -1 | grep -oE '[0-9]+' || echo 0) diff --git a/src/commands/dream.ts b/src/commands/dream.ts index ab63457a6..a82e67ca4 100644 --- a/src/commands/dream.ts +++ b/src/commands/dream.ts @@ -31,6 +31,7 @@ import { type CycleReport, } from '../core/cycle.ts'; import { existsSync } from 'fs'; +import { resolve } from 'node:path'; interface DreamArgs { json: boolean; @@ -144,13 +145,15 @@ async function resolveBrainDir( console.error(`--dir path does not exist: ${explicit}`); process.exit(1); } - return explicit; + // Resolve to absolute so downstream writeFileSync(join(brainDir, slug)) + // can't silently land at cwd when explicit is `.` / `./brain` / etc. + return resolve(explicit); } if (engine) { const configured = await engine.getConfig('sync.repo_path'); if (configured && existsSync(configured)) { - return configured; + return resolve(configured); } } diff --git a/src/commands/init.ts b/src/commands/init.ts index 3c108fcb8..4de5430d5 100644 --- a/src/commands/init.ts +++ b/src/commands/init.ts @@ -11,6 +11,20 @@ import { createEngine } from '../core/engine-factory.ts'; import { discoverOAuth, mintClientCredentialsToken, smokeTestMcp } from '../core/remote-mcp-probe.ts'; export async function runInit(args: string[]) { + // Help guard: cli.ts only routes --help to printOpHelp() for shared-op + // commands; CLI_ONLY commands (init, embed, etc.) fall through to their + // handler with --help in argv. Without this guard, `gbrain init --help` + // proceeds into the smart-detection branch below, scans cwd for .md files, + // and on a directory with 1000+ files (e.g. $HOME for someone whose brain + // and notes share a root) silently overwrites the existing Supabase config + // with a fresh PGLite brain at ~/.gbrain/brain.pglite. Confirmed in the + // wild — flipped a working `engine: postgres` config to `engine: pglite` + // on a brain with 10K+ pages. Help should never mutate state. + if (args.includes('--help') || args.includes('-h')) { + printInitHelp(); + return; + } + const isSupabase = args.includes('--supabase'); const isPGLite = args.includes('--pglite'); const isMcpOnly = args.includes('--mcp-only'); @@ -1401,3 +1415,48 @@ export function reportModStatus(): void { console.log('Soul audit: run `gbrain soul-audit` to customize agent identity'); console.log(''); } + +function printInitHelp() { + console.log(` +gbrain init — initialize a brain (PGLite or Supabase Postgres) + +USAGE + gbrain init [flags] + +ENGINE SELECTION (mutually exclusive) + --pglite Use embedded PGLite (zero-config, default for <1000 .md files) + --supabase Use Supabase Postgres (recommended for 1000+ files) + --url Use a manual Postgres connection string + --mcp-only Thin-client mode: connect to a remote gbrain MCP, no local engine + +OPTIONS + --force Overwrite an existing config (gated by default) + --non-interactive Don't prompt; use defaults + --migrate-only Apply pending schema migrations against the configured engine + without re-saving config (used by post-upgrade and orchestrators) + --json JSON output for status reporting + --path Override default brain path (PGLite only) + --key Provide an API key non-interactively (Supabase only) + --embedding-model + e.g. openai:text-embedding-3-large, voyage:voyage-multimodal-3 + --model Shorthand: pick recipe default for a provider + --embedding-dimensions + Embedding dimensions (must match the model) + --expansion-model + Model for query expansion (default: anthropic:claude-haiku) + --chat-model + Default subagent driver (v0.27+) + +EXAMPLES + gbrain init --pglite # Local-only, no API keys + gbrain init --supabase # Interactive Supabase setup + gbrain init --url postgresql://... # Use a custom Postgres + gbrain init --mcp-only --url https://... # Thin-client mode + +NOTES + - Bare \`gbrain init\` in a directory with 1000+ .md files defaults to Supabase + interactive setup. With <1000 files (or with --pglite explicitly), defaults + to PGLite at ~/.gbrain/brain.pglite. + - Existing config is preserved unless --force is passed. +`.trim()); +} diff --git a/src/core/ai/gateway.ts b/src/core/ai/gateway.ts index 05ac7a7b3..a4da36d7f 100644 --- a/src/core/ai/gateway.ts +++ b/src/core/ai/gateway.ts @@ -1259,7 +1259,10 @@ export function isTokenLimitError(err: unknown): boolean { return ( /max.*allowed.*tokens.*batch/i.test(msg) || /batch.*too.*many.*tokens/i.test(msg) || - /token.*limit.*exceeded/i.test(msg) + /token.*limit.*exceeded/i.test(msg) || + // OpenAI embeddings: "Invalid 'input': maximum request size is 300000 tokens per request." + /maximum request size.*tokens/i.test(msg) || + /max.*tokens.*per.*request/i.test(msg) ); } diff --git a/src/core/ai/recipes/openai.ts b/src/core/ai/recipes/openai.ts index 732638da6..d453a4db0 100644 --- a/src/core/ai/recipes/openai.ts +++ b/src/core/ai/recipes/openai.ts @@ -17,6 +17,12 @@ export const openai: Recipe = { dims_options: [256, 512, 768, 1024, 1536, 3072], cost_per_1m_tokens_usd: 0.13, price_last_verified: '2026-04-20', + // OpenAI per-request hard cap is 300K tokens. Free/Tier-1 TPM is 1M. + // Cap batches conservatively at 100K to handle token-dense content + // (Discord/Slack markdown+JSON tokenizes at ~chars/2.7, not the chars/4 + // estimate the batcher uses). 100K estimated = ~150K real tokens worst-case, + // safely under both the 300K per-request and 1M TPM ceilings. + max_batch_tokens: 100_000, }, expansion: { models: ['gpt-5.2', 'gpt-4o-mini'], diff --git a/src/core/connection-manager.ts b/src/core/connection-manager.ts index 4a4a447a4..73cd913b8 100644 --- a/src/core/connection-manager.ts +++ b/src/core/connection-manager.ts @@ -142,16 +142,22 @@ export function deriveDirectUrl(url: string): string | null { const decodedUser = decodeURIComponent(user); const refMatch = decodedUser.match(/^postgres\.([a-z0-9]+)$/i); let directHost = hostname; + let directUser = parsed.username; if (refMatch && refMatch[1] && isPoolerHost) { directHost = `db.${refMatch[1]}.supabase.co`; + // Supabase direct connections use bare `postgres`; the `postgres.` + // form is pooler-only (Supavisor uses the suffix for tenant routing). + // Without this strip, direct auth fails with `password authentication + // failed for user "postgres."` even though the password is correct. + directUser = 'postgres'; } // Compose direct URL by swapping host + port. Preserve auth, db, query. parsed.hostname = directHost; parsed.port = '5432'; // Reconstruct with the original scheme. const scheme = url.match(/^postgres(?:ql)?:\/\//i)?.[0] ?? 'postgres://'; - const auth = parsed.username - ? `${parsed.username}${parsed.password ? `:${parsed.password}` : ''}@` + const auth = directUser + ? `${directUser}${parsed.password ? `:${parsed.password}` : ''}@` : ''; const search = parsed.search ?? ''; const path = parsed.pathname ?? ''; diff --git a/src/core/cycle/synthesize.ts b/src/core/cycle/synthesize.ts index 9828a4f3a..ea86f3959 100644 --- a/src/core/cycle/synthesize.ts +++ b/src/core/cycle/synthesize.ts @@ -28,7 +28,7 @@ import Anthropic from '@anthropic-ai/sdk'; import { readFileSync, existsSync, writeFileSync, mkdirSync } from 'node:fs'; -import { join, dirname } from 'node:path'; +import { join, dirname, isAbsolute, resolve } from 'node:path'; import type { BrainEngine } from '../engine.ts'; import type { PhaseResult, PhaseError } from '../cycle.ts'; import { MinionQueue } from '../minions/queue.ts'; @@ -239,6 +239,23 @@ export async function runPhaseSynthesize( opts: SynthesizePhaseOpts, ): Promise { const start = Date.now(); + // Normalize brainDir to an absolute path BEFORE any reverse-write. Without + // this, a relative or empty brainDir flows down to writeReversePages → + // `join(brainDir, '${slug}.md')` → relative path → resolves against cwd at + // writeFileSync time, spilling synthesize output into whatever directory + // the cycle ran from (e.g., `companies/novamind.md` at the repo root). + // Surfaced by the warm-narwhal wave when E2E test cleanup found orphan + // synthesize pages at repo root from a `runCycle({brainDir: '.'})` call + // chain. Throw on empty (silent cwd-resolution is worse than a loud + // failure); resolve if relative (`.` / `./brain` / `../sibling` all valid + // inputs but must canonicalize before the write). + if (!opts.brainDir || opts.brainDir.trim() === '') { + return failed(makeError('InternalError', 'BRAINDIR_EMPTY', + 'opts.brainDir is empty; refusing to run synthesize. Pass an absolute path.')); + } + if (!isAbsolute(opts.brainDir)) { + opts.brainDir = resolve(opts.brainDir); + } try { const config = await loadSynthConfig(engine); @@ -393,10 +410,20 @@ export async function runPhaseSynthesize( } const isChunked = chunks.length > 1; + // queue.add subagent validator (classifyCapabilities → resolveRecipe) + // requires `provider:model`. resolveModel can return a bare id when + // TIER_DEFAULTS / DEFAULT_ALIASES carry a bare value; ensure the + // anthropic: prefix is present for known claude-* ids before passing + // to the queue. Non-anthropic providers must already declare a colon. + const subagentModel = config.model.includes(':') + ? config.model + : config.model.toLowerCase().startsWith('claude-') + ? `anthropic:${config.model}` + : config.model; for (let i = 0; i < chunks.length; i++) { const childData: SubagentHandlerData = { prompt: buildSynthesisPrompt(t, chunks[i], i, chunks.length, priorContradictionsBlock), - model: config.model, + model: subagentModel, max_turns: 30, allowed_slug_prefixes: allowedSlugPrefixes, }; diff --git a/src/core/routing-eval.ts b/src/core/routing-eval.ts index 240cc4134..9b567a14e 100644 --- a/src/core/routing-eval.ts +++ b/src/core/routing-eval.ts @@ -96,6 +96,7 @@ export interface RoutingCaseResult { * variants that agents emit in practice. */ export function normalizeText(s: string): string { + if (!s) return ''; return s.toLowerCase().replace(/[^a-z0-9]+/g, ' ').trim(); } @@ -298,6 +299,10 @@ export function loadRoutingFixtures(skillsDir: string): LoadResult { if (raw.startsWith('//') || raw.startsWith('#')) continue; try { const obj = JSON.parse(raw) as RoutingFixture; + if (typeof obj.intent !== 'string') { + malformed.push({ file: fixturePath, line: i + 1, raw, error: `missing required field 'intent' (found keys: ${Object.keys(obj).join(', ')})` }); + continue; + } fixtures.push({ ...obj, source: fixturePath }); } catch (err) { malformed.push({ diff --git a/test/ai/adaptive-embed-batch.test.ts b/test/ai/adaptive-embed-batch.test.ts index 8ad53df0d..33050d4d4 100644 --- a/test/ai/adaptive-embed-batch.test.ts +++ b/test/ai/adaptive-embed-batch.test.ts @@ -155,6 +155,21 @@ describe('isTokenLimitError (pure helper)', () => { expect(isTokenLimitError(new Error('Batch contains too many tokens'))).toBe(true); }); + test('matches OpenAI embeddings "maximum request size" error (regression: PR ###)', () => { + // Real error string returned by OpenAI's /v1/embeddings endpoint when the + // sum of all input items exceeds 300k tokens. Without this match, gbrain's + // recursive-halving safety net never engages on OpenAI and the queue stalls + // forever on token-dense pages. + const openaiErr = new Error( + "Invalid 'input': maximum request size is 300000 tokens per request.", + ); + expect(isTokenLimitError(openaiErr)).toBe(true); + }); + + test('matches generic "max tokens per request" phrasing', () => { + expect(isTokenLimitError(new Error('Exceeded 300000 max tokens per request'))).toBe(true); + }); + test('does not match unrelated errors', () => { expect(isTokenLimitError(new Error('Connection refused'))).toBe(false); expect(isTokenLimitError(new Error('Invalid API key'))).toBe(false); diff --git a/test/connection-manager.serial.test.ts b/test/connection-manager.serial.test.ts index 619961e2c..42c1fd737 100644 --- a/test/connection-manager.serial.test.ts +++ b/test/connection-manager.serial.test.ts @@ -44,6 +44,18 @@ describe('deriveDirectUrl', () => { expect(direct).toContain(':secret@'); // creds preserved }); + test('strips . suffix from username when going pooler→direct', () => { + // Supabase direct connections require bare `postgres`; the `postgres.` + // form is pooler-only (Supavisor uses the suffix for tenant routing). + // Without the strip, direct auth fails with "password authentication + // failed for user postgres." even with the correct password. + const direct = deriveDirectUrl( + 'postgresql://postgres.abcxyz:secret@aws-0-us-east-1.pooler.supabase.com:6543/postgres' + ); + expect(direct).toContain('postgres:secret@'); // bare username + expect(direct).not.toContain('postgres.abcxyz:secret@'); // no pooler suffix + }); + test('falls back to port-only swap when project-ref unparseable', () => { const direct = deriveDirectUrl( 'postgresql://customuser:secret@some.pooler.supabase.com:6543/db' @@ -51,6 +63,7 @@ describe('deriveDirectUrl', () => { expect(direct).toBeTruthy(); expect(direct).toContain(':5432'); expect(direct).toContain('some.pooler.supabase.com'); // host preserved + expect(direct).toContain('customuser:secret@'); // non-pooler username preserved }); test('returns null for non-pooler URL', () => { diff --git a/test/cycle-synthesize-braindir-resolve.test.ts b/test/cycle-synthesize-braindir-resolve.test.ts new file mode 100644 index 000000000..ac6ddf3d4 --- /dev/null +++ b/test/cycle-synthesize-braindir-resolve.test.ts @@ -0,0 +1,79 @@ +/** + * Regression: synthesize phase MUST refuse to write reverse-pages to a + * relative brainDir. Pre-fix, `runCycle({brainDir: '.'})` or any caller + * passing a relative path (or empty string) would silently let + * writeFileSync resolve against cwd, spilling synthesize output into + * `/companies/novamind.md` etc. Surfaced by the warm-narwhal wave + * when E2E test cleanup found orphan synthesize pages at repo root. + * + * Two contracts pinned here: + * 1. Empty/whitespace-only brainDir → returns failed() with code + * `BRAINDIR_EMPTY` (loud, not silent cwd resolution). + * 2. Relative brainDir → resolved to absolute via path.resolve() before + * any reverse-write can use it. Verified by checking opts.brainDir + * after the call returns. + * + * Doesn't drive Anthropic — synthesize hits the "not_configured" skip + * branch first (no corpus dir set), which is sufficient to exercise the + * brainDir gate at function entry. + */ + +import { describe, test, expect, beforeAll, afterAll } from 'bun:test'; +import { mkdtempSync, rmSync } from 'node:fs'; +import { tmpdir } from 'node:os'; +import { join, isAbsolute } from 'node:path'; +import { PGLiteEngine } from '../src/core/pglite-engine.ts'; +import { runPhaseSynthesize } from '../src/core/cycle/synthesize.ts'; + +let engine: PGLiteEngine; +let tmpDir: string; + +beforeAll(async () => { + engine = new PGLiteEngine(); + await engine.connect({}); + await engine.initSchema(); + tmpDir = mkdtempSync(join(tmpdir(), 'synth-braindir-')); +}); + +afterAll(async () => { + await engine.disconnect(); + try { rmSync(tmpDir, { recursive: true, force: true }); } catch { /* */ } +}); + +describe('runPhaseSynthesize brainDir resolution (regression)', () => { + test('empty brainDir returns failed(BRAINDIR_EMPTY) instead of silently resolving against cwd', async () => { + const result = await runPhaseSynthesize(engine, { + brainDir: '', + dryRun: true, + }); + expect(result.status).toBe('fail'); + expect((result as { error?: { code?: string } }).error?.code).toBe('BRAINDIR_EMPTY'); + }); + + test('whitespace-only brainDir also fails BRAINDIR_EMPTY', async () => { + const result = await runPhaseSynthesize(engine, { + brainDir: ' ', + dryRun: true, + }); + expect(result.status).toBe('fail'); + expect((result as { error?: { code?: string } }).error?.code).toBe('BRAINDIR_EMPTY'); + }); + + test('relative brainDir gets resolved to absolute before any reverse-write', async () => { + const opts = { brainDir: '.', dryRun: true }; + // The phase will return early ('not_configured' — no corpus dir set on + // this fresh engine) but the normalization runs unconditionally at entry. + await runPhaseSynthesize(engine, opts); + // After the call, opts.brainDir should be the resolved absolute path, + // proving the normalization fired. + expect(isAbsolute(opts.brainDir)).toBe(true); + expect(opts.brainDir).not.toBe('.'); + }); + + test('absolute brainDir is preserved unchanged', async () => { + const opts = { brainDir: tmpDir, dryRun: true }; + await runPhaseSynthesize(engine, opts); + // Already absolute → no mutation. + expect(opts.brainDir).toBe(tmpDir); + }); +}); diff --git a/test/e2e/mechanical.test.ts b/test/e2e/mechanical.test.ts index 64ca221f3..5428e1d27 100644 --- a/test/e2e/mechanical.test.ts +++ b/test/e2e/mechanical.test.ts @@ -1231,6 +1231,14 @@ describeE2E('E2E: Doctor Command', () => { // migration entries from in-flight workspaces — and surfaces them as the // 'minions_migration' [FAIL] check, exiting with code 1. gbrainHome = mkdtempSync(join(tmpdir(), 'gbrain-doctor-e2e-')); + // Cross-file isolation: prior E2E files can leave non-default `sources` + // rows (e.g. 'delta' from autopilot/sources tests). Doctor's + // sync_freshness + cycle_freshness checks then FAIL on those orphans, + // exit 1, breaking 'doctor exits 0 on healthy DB'. setupDB TRUNCATEs + // sources but schema.sql re-seeds 'default' via initSchema; clean any + // other rows so the doctor sees a clean single-source brain. + const conn = getConn(); + await conn`DELETE FROM sources WHERE id != 'default'`; }, 30_000); afterAll(async () => { await teardownDB(); @@ -1246,9 +1254,15 @@ describeE2E('E2E: Doctor Command', () => { }); test('gbrain doctor exits 0 on healthy DB', () => { - // Init first so config exists for CLI + // Init first so config exists for CLI. Pin --embedding-model explicitly + // so the spawned doctor doesn't pick a different default (e.g. ZE-1280d + // when ZEROENTROPY_API_KEY is in env) that mismatches the 1536d schema + // setupDB initialized, producing a WARN-status embedding_width_consistency + // check and exit 1. Mirrors the same pattern in 'Setup Journey'. Bun.spawnSync({ - cmd: ['bun', 'run', 'src/cli.ts', 'init', '--non-interactive', '--url', process.env.DATABASE_URL!], + cmd: ['bun', 'run', 'src/cli.ts', 'init', '--non-interactive', + '--url', process.env.DATABASE_URL!, + '--embedding-model', 'openai:text-embedding-3-large'], cwd: cliCwd, env: cliEnv(), timeout: 15_000, }); const result = Bun.spawnSync({ diff --git a/test/frontmatter-install-hook.test.ts b/test/frontmatter-install-hook.test.ts index f714b7cdf..4f11177a0 100644 --- a/test/frontmatter-install-hook.test.ts +++ b/test/frontmatter-install-hook.test.ts @@ -31,9 +31,26 @@ describe('frontmatter install-hook (B13)', () => { const content = readFileSync(hookPath, 'utf8'); expect(content).toContain('gbrain frontmatter'); expect(content).toContain('git diff --cached'); - // Configured hooksPath - const hooksPath = execFileSync('git', ['-C', tmp, 'config', '--get', 'core.hooksPath'], { encoding: 'utf8' }).trim(); - expect(hooksPath).toBe('.githooks'); + // installHook's contract is "set core.hooksPath unless it's already set + // elsewhere". Test BOTH branches deterministically by reading the local + // scope only: clean CI → local should be `.githooks`; developer with a + // global core.hooksPath (e.g. dotfiles → ~/.config/git/hooks) → local + // should be empty because installHook correctly skipped clobbering. + // Reading via `--get` without `--local` falls back to global scope when + // local is unset, which made this test environmentally fragile. + let globalHooksPath = ''; + try { + globalHooksPath = execFileSync('git', ['config', '--global', '--get', 'core.hooksPath'], { encoding: 'utf8' }).trim(); + } catch { /* unset is the expected clean-env case */ } + let localHooksPath = ''; + try { + localHooksPath = execFileSync('git', ['-C', tmp, 'config', '--local', '--get', 'core.hooksPath'], { encoding: 'utf8' }).trim(); + } catch { /* unset is fine when global was present */ } + if (globalHooksPath) { + expect(localHooksPath).toBe(''); + } else { + expect(localHooksPath).toBe('.githooks'); + } }); test('installHook refuses to clobber existing hook without --force', () => {