diff --git a/CHANGELOG.md b/CHANGELOG.md index 44f1d7519..4bf05a81e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,120 @@ All notable changes to GBrain will be documented in this file. +## [0.36.1.1] - 2026-05-18 + +**Twenty-eight community-reported bugs fix themselves on your next `gbrain upgrade`.** The most painful ones first: fresh installs work again, `/admin` actually serves the dashboard, `gbrain config set openai_api_key` stops echoing your secret to stderr, and `extract_facts` no longer dead-letters autopilot when a sync no-op passes an empty slug list. + +Here's what was happening. The community filed ~40 bug reports against gbrain over the last few weeks. Some were duplicates of fixes that already shipped in v0.35.x. Some were real, painful, and unfixed. Fresh PGLite installs wedged on the v0.11.0 Minions migration. Globally-installed `gbrain serve --http` returned 404 on `/admin` because the React dashboard was checked into git but never embedded into the compiled binary. Configs with `provider: voyage` + `model: voyage-3` silently fell through to OpenAI and errored out. Autopilot respawned in tight loops on brains where the orphans phase warned every cycle. None of these were one-line fixes — they all had real root causes worth understanding. + +This release is the cleanup pass. 22 community PRs + 14 issues closed as already-shipped (with thank-you notes attributing each contributor). 28 atomic fixes in this PR, one per commit, each with its regression test. No new features. No breaking changes. Just bugs that hurt users, fixed. + +### How to get it + +```bash +gbrain upgrade +``` + +Your install picks up everything below. If `/admin` was broken for you, it works after this upgrade. If your `gbrain sync --source-id work` was silently writing to the default source — well, master already fixed that since v0.18, but the regression test suite locking it in is new in this release. If you were on `provider: voyage` config shape, gbrain now translates it to `embedding_model: voyage:...` and prints a one-line nudge to update your config. + +### The fixes that matter most + +**Fresh-install / upgrade reliability:** +- `/admin` 404 on every globally-installed binary (#1090). The React SPA is now auto-embedded via `with { type: 'file' }` ESM imports — same pattern as the tree-sitter WASMs. New `scripts/build-admin-embedded.ts` generator + CI guard prevents the embed manifest from drifting against `admin/dist/`. +- PGLite + `gbrain apply-migrations --yes` wedge (#1100). The pre-flight schema-version probe held the single-writer lock briefly and raced the v0.11.0 phase A subprocess; phase A spawned `gbrain init --migrate-only` which deadlocked. Pre-flight skips PGLite; phase A routes in-process for PGLite. Verified end-to-end on a fresh install — the full migration chain through v0.32.2 walks cleanly. +- `gbrain upgrade` "no package.json" failure (#1029). Now resolves Bun's global install root via `$BUN_INSTALL` or `~/.bun/install/global`; works regardless of cwd. Contributed by @mvanhorn. + +**Data-loss / silent wrong-destination writes:** +- `extract_facts` with `slugs: []` was triggering an unscoped full-brain walk (#1096). On multi-thousand-page brains this exceeded the autopilot-cycle timeout and dead-lettered the job. Empty array is now a real no-op; missing key preserves the legacy unscoped walk. +- Voyage / Cohere / Mistral users with legacy `provider` + `model` config silently fell through to OpenAI (#1086). gbrain now translates to `embedding_model: :` at load time with a one-line stderr nudge to update the config. +- Embedding responses with mismatched length now fail loud before indexing (#925). Pre-fix, a provider returning N-1 embeddings for N inputs silently indexed an offset-shifted result. +- `gbrain sync --source-id X` source-routing is now locked in with 6 PGLite regression cases (#891, #978, #1078). The threading was correct in master since v0.18, but no regression coverage existed — a future refactor could have silently re-introduced the bug. + +**Security / privilege:** +- `gbrain config set openai_api_key sk-...` no longer echoes the key value to stderr (#892). Sensitive-key matcher uses word-boundary regex so `monkey` doesn't false-positive. Cherry-pick of @sharziki's PR. +- `verifyAccessToken` now throws `InvalidTokenError` on expired/invalid tokens so the SDK's bearer-auth middleware returns 401 instead of 500 (#935). Cherry-pick of @Aashiqe10's PR. +- Admin bootstrap token: new `GBRAIN_ADMIN_BOOTSTRAP_TOKEN` env override (32+ char, validated) + `--suppress-bootstrap-token` CLI flag (#1024). Production deployments stop leaking the token to log aggregators on every supervisor restart. +- Admin dashboard register-client supports `authorization_code` + PKCE public clients (claude.ai Custom Connector, Cursor) (#1077). Cherry-pick of @lukejduncan's PR. + +**Reliability / UX:** +- Autopilot stop respawn-storm from steady-state `partial` cycles (#1113). Trips only on `failed` now. Cherry-pick of @sergeclaesen's PR. +- `gbrain doctor` no longer warns about a missing whoknows fixture for every install that isn't standing in the source repo (#969). Cherry-pick of @mvanhorn's PR. +- Frontmatter `--fix` backups land under `~/.gbrain/backups/frontmatter/` instead of polluting the brain repo (#902). Cherry-pick of @100yenadmin's PR. +- `gbrain query` source-id routing fixes — `--no-expand` actually negates `expand` instead of being a literal `no_expand` key (#1124); cache writes drain before CLI exit so the warmup works (#1125). Cherry-picks of @hnshah's PRs. +- 17 more cherry-picks across `serve-http.ts` (405 on GET /mcp, CORS, PKCE admin register), `sync.ts` (.tf/.tfvars/.hcl extensions, skip-pull-no-origin, scope auto-embed), `doctor.ts` (child-table orphan detection, whoknows fixture), `oauth-provider.ts` (InvalidTokenError), `autopilot.ts` (~/.zshenv source order), `backlinks.ts` (dedupe source/target pairs), and `cycle.ts` (dream audit-only). + +### What's deferred + +Four items shipped a clear roadmap message instead of an incomplete fix: +- **#803 OAuth auth-code flow.** The MCP SDK does plaintext compare against `getClient().client_secret` — but we store the hash at rest (correct security posture). No clean SDK override seam exists. The proper fix is our own `/token` endpoint that bcrypt-verifies against `oauth_clients.client_secret_hash` and bypasses `mcpAuthRouter` for `/token`. ~1-2 day dedicated PR. +- **#983 CORS allowlist.** Tightening CORS without a compatibility matrix of legitimate browser-based MCP origins (claude.ai, Perplexity dashboard, our own admin SPA) risks breaking real clients. +- **#984 connection-manager disconnect cancel.** Touches the shutdown path; needs careful review against the v0.31.3 stdio-MCP lifecycle work. +- **#888 fs wikilink slug resolution.** Conflicts with master's recent slug-resolution work; needs rebase + re-review. + +### To take advantage of v0.36.1.0 + +`gbrain upgrade` handles this. Most fixes apply on next CLI invocation. The `/admin` embed fix and the schema-bootstrap fixes activate after the binary is replaced — `gbrain upgrade` does that automatically. + +Verify: +```bash +gbrain doctor --json | jq '.status' # should be "ok" +gbrain config show # confirm `provider`/`model` translated to `embedding_model` if you had the legacy shape +gbrain --version # should be 0.36.1.0 +``` + +If `gbrain doctor` still warns about anything after upgrade, please file an issue with the JSON output and your starting schema_version. + +### Acknowledgments + +Real appreciation for the community on this release. The bug-report volume was a strong signal — it's how we knew which structural fixes to prioritize. Contributors credited inline above and in `Co-Authored-By:` trailers on each commit: @100yenadmin, @aadachi, @Aashiqe10, @amreshtech, @ArshyaAI, @bnc-ss, @clarajohan, @Cossackx, @curtitoo, @DF-FrancoOS, @DmitryBMsk, @hesong12, @hnshah, @jatenner, @jeremyknows, @jeunessima, @johnybradshaw, @joshwilks111-max, @kkroo, @kyledeanjackson, @lubos-buracinsky, @lukejduncan, @mnuradli1, @mvanhorn, @navin-moorthy, @nezovskii, @p3ob7o, @panda850819, @rvdlaar, @sanaxr0001-tech, @seungsu-kr (legacy v0.31.3 stdio lifecycle), @sergeclaesen, @sharziki, @skalingclouds, @sliday, @SunOpt, @tkhattar14, @vincedk-alt, @yashkot007, @zzdisturbed. + +### Itemized changes + +#### Fixed + +- `src/commands/sync.ts` — Terraform / HCL extensions (`.tf`, `.tfvars`, `.hcl`) now in `CODE_EXTENSIONS`; previously invisible to `gbrain sync --strategy code`. Closes #878. +- `src/commands/upgrade.ts` — `resolveBunGlobalRoot()` finds Bun's global install root via `$BUN_INSTALL` or `~/.bun/install/global`. Closes #1029. PR #1032. +- `src/commands/config.ts` — `isSensitiveConfigKey()` + `redactConfigValue()` are the single source of truth for `show` and `set`; word-boundary regex avoids `monkey` false-positives. Closes #892. +- `src/core/oauth-provider.ts` — `verifyAccessToken` throws `InvalidTokenError` on expired + invalid bearer; SDK middleware returns 401 not 500. Closes #935. PR #1012. +- `src/commands/serve-http.ts` — `GET /mcp` returns 405 with `Allow: POST, DELETE` per MCP Streamable HTTP spec. PR #1076. +- `src/commands/doctor.ts` — `resolveWhoknowsFixturePath()` walks from `import.meta.url` for source-repo signature; honors `GBRAIN_WHOKNOWS_FIXTURE_PATH` env override. Closes #969. PR #1034. +- `src/commands/frontmatter.ts` + `src/core/brain-writer.ts` — `createFrontmatterBackup()` + `makeFrontmatterBackupRunId()`; backups land under `~/.gbrain/backups/frontmatter/`. Closes #902. PR #903. +- `src/core/config.ts` — `path.isAbsolute()` and dual-separator `..` check for `GBRAIN_HOME` on Windows. Closes #1019. PR #1083. +- `src/core/ai/gateway.ts` — `warnRecipesMissingBatchTokens()` filters to recipes whose provider id is referenced in `embedding_model` / `embedding_multimodal_model`. PR #1117. +- `src/commands/sync.ts` — `hasOriginRemote()` probe; skip git pull when no `origin` remote. PR #1119. +- `src/cli.ts` + `src/core/search/hybrid.ts` — `awaitPendingSearchCacheWrites()` drains in-flight cache writes before CLI exit. PR #1125. +- `src/commands/backlinks.ts` — dedupe `(source, target)` pairs within a single source page. Closes #967. PR #967. +- `src/core/cycle.ts` — backlinks phase runs in `check` mode during dream/autopilot. PR #1027. +- `src/commands/autopilot.ts` — wrapper script sources `~/.zshenv` before `~/.zshrc`. PR #966. +- `src/commands/apply-migrations.ts` — `process.exit(0)` on list / dry-run / up-to-date paths. PR #1062. +- `src/commands/sync.ts` — `buildAutoEmbedArgs(slugs, sourceId)` threads `--source` to incremental auto-embed. PR #1120. +- `src/cli.ts:parseOpArgs` — `--no-` flips boolean param false; `query` op honors per-call `source_id` over `ctx.sourceId`. PR #1124. +- `src/commands/doctor.ts` — `childTableOrphansCheck()` scans 10 FK columns across 8 tables for orphan rows; cascade-violation diagnostic with paste-ready cleanup SQL. Closes #1063. PR #1064. +- `src/commands/autopilot.ts` + `src/core/cycle.ts` — autopilot trips circuit breaker only on `failed`; orphans phase uses ratio threshold (`count / total_pages > 0.5`) not absolute. PR #1113. +- `src/core/ai/gateway.ts` — `embedSubBatch` validates response length AND per-embedding dim; partial responses throw before indexing. Closes #925. PR #926. +- `src/commands/serve-http.ts` — admin register-client honors `grantTypes`, `redirectUris`, `tokenEndpointAuthMethod: 'none'` (PKCE public clients). PR #1077. +- `src/core/cycle/extract-facts.ts` — `opts.slugs !== undefined` check; empty array is a no-op. Closes #1096. +- `src/commands/serve-http.ts` + `src/admin-embedded.ts` + `scripts/build-admin-embedded.ts` + `scripts/check-admin-embedded.sh` — auto-generated manifest of admin/dist embedded via `with { type: 'file' }`; two-tier resolution (dev cwd vs binary). Closes #1090. +- `test/source-id-routing.test.ts` — 6 PGLite regression cases for `importFromContent({sourceId})`. Locks in #891, #978, #1078. +- `src/commands/migrations/v0_11_0.ts` + `src/commands/apply-migrations.ts` — Phase A routes in-process for PGLite; pre-flight schema-version warning skips PGLite. Closes #1100. +- `src/commands/serve-http.ts` — `GBRAIN_ADMIN_BOOTSTRAP_TOKEN` env override with `^[A-Za-z0-9_-]{32,}$` validation; `--suppress-bootstrap-token` flag. Closes #1024. +- `src/core/config.ts` — `migrateLegacyEmbeddingConfig()` translates legacy `provider` + `model` shape to `embedding_model: :` with stderr nudge. Closes #1086. +- `test/brain-writer.test.ts` — assertion updated for centralized backup path (CI green). + +#### Maintenance + +- `test/upgrade.serial.test.ts` — renamed from `upgrade.test.ts`; quarantined for `process.env` mutation pattern per the test-isolation lint. Follow-up: migrate to `withEnv()` helper. + +### What was already shipped on master (closed as duplicates) + +#### Phase 1 close pass — 22 PRs + 14 issues closed with thank-you comments + +- **Cluster A** (v0.35.3.0): 11 PRs + 6 issues all reporting `extract_facts.entity_hints` array-schema items missing. The shared `paramDefToSchema` mapper in `src/mcp/tool-defs.ts` is now the single source of truth across stdio MCP, HTTP MCP, and the subagent registry; `test/mcp-tool-defs.test.ts` enforces structurally. (#904, #907, #910, #979, #980, #999, #1028, #1043, #1049, #1057, #1084, #1103, #806, #831, #833, #911, #1042, #1048.) +- **Cluster B** (v0.35.3.0): 3 PRs + 2 issues on `--no-recurse-submodules` flag placement. `GIT_SSRF_FLAGS` (global config, before verb) is now distinct from `GIT_SSRF_SUBCOMMAND_FLAGS` (subcommand flags, after verb); position-anchored regression guard in `test/git-remote.test.ts`. (#963, #985, #1020, #1023, #1104, #800, #813.) +- **Cluster C** (v0.35.5.0): 3 PRs + 2 issues on `oauth_clients` bootstrap. `applyForwardReferenceBootstrap` covers `files.source_id`, `files.page_id`, `oauth_clients.source_id`, `oauth_clients.federated_read`, `sources.archived/_at/_expires_at` before SCHEMA_SQL replay. `test/schema-bootstrap-coverage.test.ts` parses migrate.ts at PR time and fails the suite if any `(table, column)` pair isn't covered. (#1017, #1045, #1115, #1116, #974, #1018, #1092.) +- **Cluster D singletons**: orphans soft-delete (v0.35.5.0), doctor `node_modules` walker (v0.35.5.0 `pruneDir`), Voyage 2048d `output_dimension` (v0.33.1.1 PR #962), doctor stale verb names (v0.31.7). (#922, #1033, #799, #865, #835, #1021.) +- **Cluster E troll**: #1114 closed and locked. +- **PRs absorbed by structural fixes**: #1050 (doctor crash count — v0.35.5.0 `summarizeCrashes()`), #1054 (schema v44→v45 wedge — v0.35.5.0 bootstrap extension), #1065 (embed PGLite hang — v66 partial index). ## [0.36.1.0] - 2026-05-17 **The brain learns how you tend to be wrong, then argues against your blind spots on every advice call.** diff --git a/VERSION b/VERSION index e11008a95..f53f4b31e 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.36.1.0 \ No newline at end of file +0.36.1.1 \ No newline at end of file diff --git a/package.json b/package.json index 03fd5006d..74777a881 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "gbrain", - "version": "0.36.1.0", + "version": "0.36.1.1", "description": "Postgres-native personal knowledge brain with hybrid RAG search", "type": "module", "main": "src/core/index.ts", @@ -31,7 +31,8 @@ "dev": "bun run src/cli.ts", "build": "bun build --compile --outfile bin/gbrain src/cli.ts", "build:all": "bun build --compile --target=bun-darwin-arm64 --outfile bin/gbrain-darwin-arm64 src/cli.ts && bun build --compile --target=bun-linux-x64 --outfile bin/gbrain-linux-x64 src/cli.ts", - "build:admin": "cd admin && bun run build", + "build:admin": "cd admin && bun run build && cd .. && bun run scripts/build-admin-embedded.ts", + "build:admin-embedded": "bun run scripts/build-admin-embedded.ts", "build:schema": "bash scripts/build-schema.sh", "build:llms": "bun run scripts/build-llms.ts", "build:pglite-snapshot": "bun run scripts/build-pglite-snapshot.ts", @@ -62,6 +63,7 @@ "check:progress": "scripts/check-progress-to-stdout.sh", "check:exports-count": "scripts/check-exports-count.sh", "check:admin-build": "scripts/check-admin-build.sh", + "check:admin-embedded": "scripts/check-admin-embedded.sh", "check:test-isolation": "scripts/check-test-isolation.sh", "postinstall": "command -v gbrain >/dev/null 2>&1 && gbrain apply-migrations --yes --non-interactive || echo '[gbrain] postinstall skipped. If installed via bun install -g github:...: run `gbrain doctor` and `gbrain apply-migrations --yes` manually. See https://github.com/garrytan/gbrain/issues/218' 1>&2", "prepublish:clawhub": "bun run build:all", diff --git a/scripts/build-admin-embedded.ts b/scripts/build-admin-embedded.ts new file mode 100755 index 000000000..ae0ea224c --- /dev/null +++ b/scripts/build-admin-embedded.ts @@ -0,0 +1,134 @@ +#!/usr/bin/env bun +/** + * Generates `src/admin-embedded.ts` from `admin/dist/*`. + * + * Why: `bun build --compile` does NOT embed arbitrary asset directories. + * The only way to ship a file inside a compiled binary is via an ESM + * `import x from './path' with { type: 'file' }` reference (which Bun + * resolves at runtime to a path that works inside the binary archive). + * + * Pre-v0.36.x, `serve-http.ts:780` resolved `admin/dist/` via + * `process.cwd()` — fine in dev (`cd ~/gbrain && bun start serve --http`), + * broken in every globally-installed binary (no admin/dist next to the + * binary). Result: every fresh `bun install -g github:garrytan/gbrain` + * user got 404 on /admin (issue #1090). + * + * This generator emits one `import` line per file under admin/dist/, + * plus a manifest map keyed by the request path the express handler + * sees (e.g. `/admin/index.html`, `/admin/assets/index-XXX.js`). + * + * Run: `bun run scripts/build-admin-embedded.ts` (also invoked by + * `bun run build:admin`). + * + * CI guard: `scripts/check-admin-embedded.sh` re-runs this generator + * and `git diff --exit-code src/admin-embedded.ts` so PRs that change + * admin/dist without regenerating the embedded module fail loud. + */ + +import { readdirSync, statSync, writeFileSync, existsSync, readFileSync } from 'fs'; +import { join, relative, posix } from 'path'; + +const REPO = join(import.meta.dir, '..'); +const DIST = join(REPO, 'admin', 'dist'); +const OUT = join(REPO, 'src', 'admin-embedded.ts'); + +function walk(dir: string, base: string = dir): string[] { + if (!existsSync(dir)) return []; + const out: string[] = []; + for (const entry of readdirSync(dir)) { + const full = join(dir, entry); + if (statSync(full).isDirectory()) { + out.push(...walk(full, base)); + } else { + out.push(relative(base, full)); + } + } + return out.sort(); +} + +const MIME: Record = { + '.html': 'text/html; charset=utf-8', + '.css': 'text/css; charset=utf-8', + '.js': 'application/javascript; charset=utf-8', + '.json': 'application/json; charset=utf-8', + '.svg': 'image/svg+xml', + '.png': 'image/png', + '.jpg': 'image/jpeg', + '.jpeg': 'image/jpeg', + '.gif': 'image/gif', + '.webp': 'image/webp', + '.ico': 'image/x-icon', + '.woff': 'font/woff', + '.woff2': 'font/woff2', + '.txt': 'text/plain; charset=utf-8', + '.map': 'application/json; charset=utf-8', +}; + +function mimeFor(filename: string): string { + const dot = filename.lastIndexOf('.'); + if (dot === -1) return 'application/octet-stream'; + return MIME[filename.slice(dot).toLowerCase()] ?? 'application/octet-stream'; +} + +function safeIdent(rel: string, idx: number): string { + // Stable, collision-free identifier per relative path. The numeric + // suffix prevents collisions between filenames that normalize to the + // same identifier (e.g. `foo.bar.js` and `foo-bar.js`). + const cleaned = rel.replace(/[^a-zA-Z0-9]/g, '_').replace(/^_+/, ''); + return `A_${idx}_${cleaned}`; +} + +const files = walk(DIST); +if (files.length === 0) { + console.error('[build-admin-embedded] no files under admin/dist — run `cd admin && bun run build` first.'); + process.exit(1); +} + +const imports: string[] = []; +const manifestEntries: string[] = []; + +for (let i = 0; i < files.length; i++) { + const rel = files[i]; + // POSIX-style relative path for the import (works on Windows too). + const importRel = `../admin/dist/${rel.split(/[\\/]/).join('/')}`; + const ident = safeIdent(rel, i); + // @ts-ignore — `with { type: 'file' }` is Bun syntax not in lib.d.ts; + // same pattern as src/core/chunkers/code.ts wasm imports. + imports.push(`// @ts-ignore — type: 'file' is Bun ESM, not in lib.d.ts`); + imports.push(`import ${ident} from '${importRel}' with { type: 'file' };`); + const requestPath = '/admin/' + rel.split(/[\\/]/).join('/'); + manifestEntries.push(` ${JSON.stringify(requestPath)}: { path: ${ident} as unknown as string, mime: ${JSON.stringify(mimeFor(rel))} },`); +} + +const content = `// AUTO-GENERATED — do not edit by hand. +// Run \`bun run scripts/build-admin-embedded.ts\` to regenerate. +// Source: admin/dist/ at ${new Date().toISOString().slice(0, 10)}. +// +// Bun resolves the file: imports to a path that works at runtime even +// inside a compiled binary (\`bun build --compile\`). The manifest maps +// the request path the express handler sees to (resolved-path, mime). + +${imports.join('\n')} + +export interface AdminAsset { + path: string; + mime: string; +} + +export const ADMIN_ASSETS: Record = { +${manifestEntries.join('\n')} +}; + +/** Index entry point for SPA fallback. */ +export const ADMIN_INDEX_HTML: AdminAsset = ADMIN_ASSETS['/admin/index.html']; + +export const ADMIN_ASSET_COUNT = ${files.length}; +`; + +const existing = existsSync(OUT) ? readFileSync(OUT, 'utf-8') : ''; +if (existing === content) { + console.log(`[build-admin-embedded] up to date (${files.length} files)`); +} else { + writeFileSync(OUT, content, 'utf-8'); + console.log(`[build-admin-embedded] wrote ${OUT} (${files.length} files)`); +} diff --git a/scripts/check-admin-embedded.sh b/scripts/check-admin-embedded.sh new file mode 100755 index 000000000..c535dd62d --- /dev/null +++ b/scripts/check-admin-embedded.sh @@ -0,0 +1,35 @@ +#!/usr/bin/env bash +# CI gate: src/admin-embedded.ts must match admin/dist/ contents. +# +# This protects against the v0.36.x #1090 bug class re-emerging — a PR +# that rebuilds admin/dist but forgets to regenerate src/admin-embedded.ts +# would silently break /admin on every fresh install of the compiled +# binary. The Vite build outputs hashed filenames, so a stale embedded +# manifest references nonexistent assets. +# +# How: re-run the generator, then `git diff --exit-code` on the output. +# Exits 0 when in sync, 1 when the generator produces different output +# than what's committed. +# +# Mirrors scripts/check-wasm-embedded.sh's pattern. + +set -euo pipefail + +cd "$(dirname "$0")/.." + +if [ ! -d admin/dist ]; then + echo "[check:admin-embedded] no admin/dist (run \`cd admin && bun run build\` first); skipping" + exit 0 +fi + +bun run scripts/build-admin-embedded.ts > /dev/null + +if ! git diff --exit-code -- src/admin-embedded.ts; then + echo "" + echo "[check:admin-embedded] src/admin-embedded.ts is out of sync with admin/dist/." + echo " Fix: bun run build:admin && bun run build:admin-embedded" + echo " Then re-commit the regenerated src/admin-embedded.ts." + exit 1 +fi + +echo "[check:admin-embedded] OK" diff --git a/src/admin-embedded.ts b/src/admin-embedded.ts new file mode 100644 index 000000000..d65e80730 --- /dev/null +++ b/src/admin-embedded.ts @@ -0,0 +1,30 @@ +// AUTO-GENERATED — do not edit by hand. +// Run `bun run scripts/build-admin-embedded.ts` to regenerate. +// Source: admin/dist/ at 2026-05-19. +// +// Bun resolves the file: imports to a path that works at runtime even +// inside a compiled binary (`bun build --compile`). The manifest maps +// the request path the express handler sees to (resolved-path, mime). + +// @ts-ignore — type: 'file' is Bun ESM, not in lib.d.ts +import A_0_assets_index_CWq369vO_js from '../admin/dist/assets/index-CWq369vO.js' with { type: 'file' }; +// @ts-ignore — type: 'file' is Bun ESM, not in lib.d.ts +import A_1_assets_index_GxkWX7v3_css from '../admin/dist/assets/index-GxkWX7v3.css' with { type: 'file' }; +// @ts-ignore — type: 'file' is Bun ESM, not in lib.d.ts +import A_2_index_html from '../admin/dist/index.html' with { type: 'file' }; + +export interface AdminAsset { + path: string; + mime: string; +} + +export const ADMIN_ASSETS: Record = { + "/admin/assets/index-CWq369vO.js": { path: A_0_assets_index_CWq369vO_js as unknown as string, mime: "application/javascript; charset=utf-8" }, + "/admin/assets/index-GxkWX7v3.css": { path: A_1_assets_index_GxkWX7v3_css as unknown as string, mime: "text/css; charset=utf-8" }, + "/admin/index.html": { path: A_2_index_html as unknown as string, mime: "text/html; charset=utf-8" }, +}; + +/** Index entry point for SPA fallback. */ +export const ADMIN_INDEX_HTML: AdminAsset = ADMIN_ASSETS['/admin/index.html']; + +export const ADMIN_ASSET_COUNT = 3; diff --git a/src/cli.ts b/src/cli.ts index 44ad068de..976d2920f 100755 --- a/src/cli.ts +++ b/src/cli.ts @@ -169,6 +169,10 @@ async function main() { const result = JSON.parse(JSON.stringify(rawResult)); const output = formatResult(op.name, result); if (output) process.stdout.write(output); + if (op.name === 'query') { + const { awaitPendingSearchCacheWrites } = await import('./core/search/hybrid.ts'); + await awaitPendingSearchCacheWrites(); + } } catch (e: unknown) { if (e instanceof OperationError) { console.error(`Error [${e.code}]: ${e.message}`); @@ -450,7 +454,7 @@ export function resolveQueryImage( return { path: imagePath, base64, mime }; } -function parseOpArgs(op: Operation, args: string[]): Record { +export function parseOpArgs(op: Operation, args: string[]): Record { const params: Record = {}; const positional = op.cliHints?.positional || []; let posIdx = 0; @@ -458,6 +462,14 @@ function parseOpArgs(op: Operation, args: string[]): Record { for (let i = 0; i < args.length; i++) { const arg = args[i]; if (arg.startsWith('--')) { + if (arg.startsWith('--no-')) { + const positiveKey = arg.slice(5).replace(/-/g, '_'); + const positiveDef = op.params[positiveKey]; + if (positiveDef?.type === 'boolean') { + params[positiveKey] = false; + continue; + } + } const key = arg.slice(2).replace(/-/g, '_'); const paramDef = op.params[key]; if (paramDef?.type === 'boolean') { diff --git a/src/commands/apply-migrations.ts b/src/commands/apply-migrations.ts index 5cc8213ca..259409485 100644 --- a/src/commands/apply-migrations.ts +++ b/src/commands/apply-migrations.ts @@ -364,17 +364,27 @@ export async function runApplyMigrations(args: string[]): Promise { const { createEngine } = await import('../core/engine-factory.ts'); const cfg = lc(); if (cfg) { - const eng = await createEngine(toEngineConfig(cfg)); - await eng.connect(toEngineConfig(cfg)); - const verStr = await eng.getConfig('version'); - const schemaVer = parseInt(verStr || '1', 10); - await eng.disconnect(); - if (schemaVer < LATEST_VERSION) { - console.warn( - `\n⚠️ Schema version ${schemaVer} is behind latest ${LATEST_VERSION}.\n` + - ` Schema migrations run automatically on next connectEngine() / initSchema().\n` + - ` To run them now: gbrain init --migrate-only\n`, - ); + // v0.36.x #1100: skip the pre-flight warning on PGLite. The probe + // briefly holds the single-writer lock; if a downstream orchestrator + // phase spawns `gbrain init --migrate-only` as a subprocess (the + // legacy v0.11.0 phase A path), the child can race the parent's + // lock release and hit a 30s timeout. The orchestrators handle + // schema lifecycle internally on PGLite (phase A routes in-process), + // so the warning here adds no information for PGLite users. + const skipPreflight = cfg.engine === 'pglite'; + if (!skipPreflight) { + const eng = await createEngine(toEngineConfig(cfg)); + await eng.connect(toEngineConfig(cfg)); + const verStr = await eng.getConfig('version'); + const schemaVer = parseInt(verStr || '1', 10); + await eng.disconnect(); + if (schemaVer < LATEST_VERSION) { + console.warn( + `\n⚠️ Schema version ${schemaVer} is behind latest ${LATEST_VERSION}.\n` + + ` Schema migrations run automatically on next connectEngine() / initSchema().\n` + + ` To run them now: gbrain init --migrate-only\n`, + ); + } } } } catch { @@ -404,13 +414,13 @@ export async function runApplyMigrations(args: string[]): Promise { process.exit(2); } - if (cli.list) { printList(plan, installed); return; } - if (cli.dryRun) { printDryRun(plan, installed); return; } + if (cli.list) { printList(plan, installed); process.exit(0); } + if (cli.dryRun) { printDryRun(plan, installed); process.exit(0); } const toRun: Migration[] = [...plan.partial, ...plan.pending]; if (toRun.length === 0) { console.log('All migrations up to date.'); - return; + process.exit(0); } // Run each orchestrator in registry order. An orchestrator failure aborts diff --git a/src/commands/autopilot.ts b/src/commands/autopilot.ts index b57debb14..e5134b772 100644 --- a/src/commands/autopilot.ts +++ b/src/commands/autopilot.ts @@ -355,7 +355,13 @@ export async function runAutopilot(engine: BrainEngine, args: string[]) { await new Promise(r => setImmediate(r)); }, }); - if (report.status === 'failed' || report.status === 'partial') { + // Only 'failed' (every attempted phase failed) trips the autopilot + // circuit breaker. 'partial' means at least one phase warned or + // failed while others ran — that's a soft signal, not a fatal + // condition. Treating 'partial' as failure here caused respawn + // storms under KeepAlive=true on brains where a single phase + // (typically `orphans`) emits a 'warn' every cycle in steady state. + if (report.status === 'failed') { cycleOk = false; } if (jsonMode) { @@ -480,7 +486,12 @@ function writeWrapperScript(repoPath: string): string { const safeGbrainPath = gbrainPath.replace(/'/g, "'\\''"); const wrapper = `#!/bin/bash # Auto-generated by gbrain autopilot --install -# Sources shell profile for API keys, then runs autopilot +# Sources shell profile for API keys, then runs autopilot. +# zshenv is the canonical place for env vars in zsh on macOS (zshrc is for +# interactive shells only — vars defined there don't reach this non-interactive +# subprocess). Source it first so secrets like GBRAIN_DATABASE_URL or any +# OPENAI/ANTHROPIC keys exported in zshenv reach autopilot. +[ -f ~/.zshenv ] && source ~/.zshenv 2>/dev/null source ~/.zshrc 2>/dev/null || source ~/.bashrc 2>/dev/null || true exec '${safeGbrainPath}' autopilot --repo '${safeRepoPath}' `; diff --git a/src/commands/backlinks.ts b/src/commands/backlinks.ts index a928c096a..bdde915c1 100644 --- a/src/commands/backlinks.ts +++ b/src/commands/backlinks.ts @@ -100,9 +100,20 @@ export function findBacklinkGaps(brainDir: string): BacklinkGap[] { for (const page of allPages) { const refs = extractEntityRefs(page.content, page.relPath); const sourceFilename = basename(page.relPath); + // LOCAL PATCH (paolo, 2026-05-12): dedupe (source, target) pairs within + // a single source page. extractEntityRefs returns one EntityRef per + // occurrence, so a source page that mentions the same target N times + // produced N identical gaps → N duplicate "Referenced in" lines on the + // target. The per-ref `hasBacklink(target.content, ...)` check reads a + // stale snapshot (target.content is frozen at this scope), so every + // iteration sees the same "no backlink yet" state and pushes another + // gap. Tracking seen target slugs per source caps gaps at one per pair. + const seen = new Set(); for (const ref of refs) { const targetSlug = `${ref.dir}/${ref.slug}`; + if (seen.has(targetSlug)) continue; + seen.add(targetSlug); const target = pagesBySlug.get(targetSlug); if (!target) continue; // target page doesn't exist diff --git a/src/commands/config.ts b/src/commands/config.ts index 62eb68582..21350eb89 100644 --- a/src/commands/config.ts +++ b/src/commands/config.ts @@ -9,6 +9,24 @@ function redactUrl(url: string): string { ); } +// v0.36.x #892: sensitive config-key allowlist. The `show` path used a +// loose `.includes('key')` check that also redacts (works); the `set` path +// previously printed the raw value to stderr, leaking API keys via shell +// history + scrollback. This helper is the single source of truth so the +// two surfaces can't drift again. Match on word-segments to avoid +// false-positives (e.g. `monkey` doesn't match `key`). +export function isSensitiveConfigKey(key: string): boolean { + const lower = key.toLowerCase(); + // Word-boundary matches: foo_key, foo.key, key_foo, key, api_key, ... + return /(^|[._-])(key|secret|token|password|pwd|passwd|auth)([._-]|$)/.test(lower); +} + +export function redactConfigValue(key: string, value: string): string { + if (value.includes('postgresql://')) return redactUrl(value); + if (isSensitiveConfigKey(key)) return '***'; + return value; +} + export async function runConfig(engine: BrainEngine, args: string[]) { const action = args[0]; @@ -20,11 +38,7 @@ export async function runConfig(engine: BrainEngine, args: string[]) { } console.log('GBrain config:'); for (const [k, v] of Object.entries(config)) { - const display = typeof v === 'string' && v.includes('postgresql://') - ? redactUrl(v) - : typeof v === 'string' && (k.includes('key') || k.includes('secret')) - ? '***' - : v; + const display = typeof v === 'string' ? redactConfigValue(k, v) : v; console.log(` ${k}: ${display}`); } return; @@ -85,7 +99,10 @@ export async function runConfig(engine: BrainEngine, args: string[]) { } } else if (action === 'set' && key && value) { await engine.setConfig(key, value); - console.log(`Set ${key} = ${value}`); + // v0.36.x #892: redact sensitive values in confirmation output. API + // keys / tokens / passwords are commonly set from terminals with + // scrollback; echoing the raw value to stderr leaks the secret. + console.log(`Set ${key} = ${redactConfigValue(key, value)}`); } else { console.error('Usage: gbrain config [show|get|set|unset] [value]'); console.error(' gbrain config unset --pattern '); diff --git a/src/commands/doctor.ts b/src/commands/doctor.ts index fc2afacd8..1dac139bd 100644 --- a/src/commands/doctor.ts +++ b/src/commands/doctor.ts @@ -9,8 +9,9 @@ import { compareVersions } from './migrations/index.ts'; import { createProgress, startHeartbeat, type ProgressReporter } from '../core/progress.ts'; import { getCliOptions, cliOptsToProgressOptions } from '../core/cli-options.ts'; import type { DbUrlSource } from '../core/config.ts'; -import { join } from 'path'; -import { existsSync, readFileSync, readdirSync } from 'fs'; +import { dirname, isAbsolute, join, resolve as resolvePath } from 'path'; +import { fileURLToPath } from 'url'; +import { existsSync, readFileSync, readdirSync, statSync } from 'fs'; export interface Check { name: string; @@ -86,6 +87,41 @@ export function computeDoctorReport(checks: Check[]): DoctorReport { * Tolerance matches migration v48: any value with abs(weight - on_grid) > 1e-3 * is genuinely off-grid (the 0.05 grid is 5e-2; float32 noise is ~1e-7). */ +const WHOKNOWS_FIXTURE_RELATIVE_PATH = 'test/fixtures/whoknows-eval.jsonl'; + +function isGbrainSourceRoot(dir: string): boolean { + return ( + existsSync(join(dir, 'src', 'cli.ts')) && + existsSync(join(dir, 'skills', 'RESOLVER.md')) + ); +} + +export function resolveWhoknowsFixturePath( + env: NodeJS.ProcessEnv = process.env, + moduleUrl: string = import.meta.url, +): string | null { + if (env.GBRAIN_WHOKNOWS_FIXTURE_PATH) { + return isAbsolute(env.GBRAIN_WHOKNOWS_FIXTURE_PATH) + ? env.GBRAIN_WHOKNOWS_FIXTURE_PATH + : resolvePath(process.cwd(), env.GBRAIN_WHOKNOWS_FIXTURE_PATH); + } + + try { + let dir = dirname(fileURLToPath(moduleUrl)); + for (let i = 0; i < 10; i++) { + if (isGbrainSourceRoot(dir)) return join(dir, WHOKNOWS_FIXTURE_RELATIVE_PATH); + const parent = dirname(dir); + if (parent === dir) break; + dir = parent; + } + } catch { + // Some bundlers/runtimes may not expose a normal file: import URL. + // Doctor should surface an override hint instead of fabricating a path. + } + + return null; +} + /** * v0.33: whoknows_health — verify the eval fixture is present at the * documented path. Lightweight; just checks file existence and row count, @@ -98,15 +134,19 @@ export function computeDoctorReport(checks: Check[]): DoctorReport { */ export async function whoknowsHealthCheck(_engine: BrainEngine): Promise { try { - const { existsSync, readFileSync, statSync } = await import('fs'); - const path = await import('path'); - const repoRoot = process.cwd(); - const fixturePath = path.join(repoRoot, 'test/fixtures/whoknows-eval.jsonl'); + const fixturePath = resolveWhoknowsFixturePath(); + if (!fixturePath) { + return { + name: 'whoknows_health', + status: 'warn', + message: 'whoknows eval fixture path could not be resolved. Set GBRAIN_WHOKNOWS_FIXTURE_PATH to the absolute path for test/fixtures/whoknows-eval.jsonl.', + }; + } if (!existsSync(fixturePath)) { return { name: 'whoknows_health', status: 'warn', - message: `whoknows eval fixture missing at test/fixtures/whoknows-eval.jsonl. Fix: hand-label 10 queries you'd actually run, format {query, expected_top_3_slugs, notes}.`, + message: `whoknows eval fixture missing at ${fixturePath}. Fix: hand-label 10 queries you'd actually run, format {query, expected_top_3_slugs, notes}.`, }; } const stat = statSync(fixturePath); @@ -193,6 +233,97 @@ export async function takesWeightGridCheck(engine: BrainEngine): Promise } } +/** + * Child-table orphan detection (closes #1063). + * + * The autopilot `orphans` phase (src/core/cycle.ts:runPhaseOrphans) detects + * orphan PAGES (pages with no inbound links via the page-graph). It does NOT + * scan FK-child tables for orphan rows. When a bulk page delete leaves + * orphans in `content_chunks` / `page_versions` / `tags` / `takes` / etc. + * — whether from pre-FK migrations, race conditions, or a code path that + * bypassed cascade — they persist indefinitely until manual SQL cleanup. + * + * All ten FK-to-pages tables declare `ON DELETE CASCADE` in the live schema + * (verified via `pg_constraint` snapshot in the issue body), so finding any + * orphan row is by definition unexpected. The check ships paste-ready + * cleanup SQL when orphans surface. + * + * Excluded: `files.page_id` and `links.origin_page_id` — both declared as + * `ON DELETE SET NULL`, so a NULL value is a valid state (file/link survives + * after page deletion); only NOT-NULL-but-page-missing is an orphan there. + * The check encodes that distinction for the two SET NULL columns. + * + * Pure helper for parity with `takesWeightGridCheck` so tests can target it + * directly without driving the full `runDoctor` pipeline. + */ +export async function childTableOrphansCheck(engine: BrainEngine): Promise { + // (table, fk_column, allow_null). When allow_null=true, NULL is a valid + // state (FK was declared ON DELETE SET NULL); the orphan predicate filters + // out NULL values. When false, NULL is impossible by NOT NULL constraint; + // any value not in pages.id is an orphan. + const targets: Array<{ table: string; col: string; allowNull: boolean }> = [ + { table: 'content_chunks', col: 'page_id', allowNull: false }, + { table: 'page_versions', col: 'page_id', allowNull: false }, + { table: 'tags', col: 'page_id', allowNull: false }, + { table: 'takes', col: 'page_id', allowNull: false }, + { table: 'raw_data', col: 'page_id', allowNull: false }, + { table: 'timeline_entries', col: 'page_id', allowNull: false }, + { table: 'links', col: 'from_page_id', allowNull: false }, + { table: 'links', col: 'to_page_id', allowNull: false }, + { table: 'links', col: 'origin_page_id', allowNull: true }, + { table: 'files', col: 'page_id', allowNull: true }, + ]; + let totalOrphans = 0; + const breakdown: string[] = []; + const cleanupSql: string[] = []; + const errors: string[] = []; + for (const { table, col, allowNull } of targets) { + try { + // NOT IN subquery is portable across postgres + PGLite. The `pages.id` + // subquery covers every existing parent row. + const nullFilter = allowNull ? `${col} IS NOT NULL AND ` : ''; + const rows = await engine.executeRaw<{ n: string | number }>( + `SELECT COUNT(*)::int AS n FROM ${table} WHERE ${nullFilter}${col} NOT IN (SELECT id FROM pages)`, + ); + const n = Number(rows[0]?.n ?? 0); + if (n > 0) { + totalOrphans += n; + breakdown.push(`${table}.${col}=${n}`); + cleanupSql.push( + `DELETE FROM ${table} WHERE ${nullFilter}${col} NOT IN (SELECT id FROM pages);`, + ); + } + } catch (e) { + // Table or column may not exist on older schemas — skip and continue. + // Aggregate the errors so doctor surfaces "could not check N tables" + // when a real failure shape appears (network, lock, syntax). + const msg = e instanceof Error ? e.message : String(e); + errors.push(`${table}.${col}: ${msg.slice(0, 80)}`); + } + } + if (totalOrphans === 0 && errors.length === 0) { + return { + name: 'child_table_orphans', + status: 'ok', + message: 'All FK-child tables clean (10 tables checked)', + }; + } + if (totalOrphans === 0 && errors.length > 0) { + return { + name: 'child_table_orphans', + status: 'warn', + message: `Could not check ${errors.length}/10 FK-child tables (older schema or transient error): ${errors.slice(0, 3).join('; ')}`, + }; + } + return { + name: 'child_table_orphans', + status: 'warn', + message: + `${totalOrphans} orphan row(s) in FK-child tables (${breakdown.join(', ')}). ` + + `Cleanup: ${cleanupSql.join(' ')}`, + }; +} + export async function doctorReportRemote(engine: BrainEngine): Promise { const checks: Check[] = []; @@ -2001,6 +2132,15 @@ export async function runDoctor(engine: BrainEngine | null, args: string[], dbSo progress.heartbeat('takes_weight_grid'); checks.push(await takesWeightGridCheck(engine)); + // 10c. Child-table orphan detection (closes #1063). + // The autopilot `orphans` phase scans for orphan pages (no inbound links) + // but does NOT detect orphan rows in FK-child tables. After a bulk page + // delete, child rows can persist if cascade didn't fire (pre-FK rows, + // race during bulk cascade, code path that bypassed cascade). This + // surfaces them with paste-ready cleanup SQL. + progress.heartbeat('child_table_orphans'); + checks.push(await childTableOrphansCheck(engine)); + // v0.33: whoknows_health — fixture presence + row count. The eval // gate itself runs via `gbrain eval whoknows`; this check is the // "did you do the assignment?" signal. diff --git a/src/commands/frontmatter.ts b/src/commands/frontmatter.ts index 990628092..ae825c774 100644 --- a/src/commands/frontmatter.ts +++ b/src/commands/frontmatter.ts @@ -3,8 +3,9 @@ * * Subcommands: * gbrain frontmatter validate [--json] [--fix] [--dry-run] - * Validate one file or recursively a directory. --fix writes .bak then - * rewrites in place. --dry-run previews without writing. + * Validate one file or recursively a directory. --fix writes centralized + * backups under ~/.gbrain/backups/frontmatter/... then rewrites in place. + * --dry-run previews without writing. * * gbrain frontmatter audit [--source ] [--json] * Read-only scan across all registered sources (or one with --source). @@ -14,7 +15,7 @@ * validate. Pass an explicit path to validate a non-source-registered tree. */ -import { readFileSync, writeFileSync, existsSync, lstatSync, readdirSync, copyFileSync } from 'fs'; +import { readFileSync, writeFileSync, existsSync, lstatSync, readdirSync } from 'fs'; import { join, relative, resolve } from 'path'; import type { BrainEngine } from '../core/engine.ts'; import { loadConfig, toEngineConfig } from '../core/config.ts'; @@ -22,6 +23,8 @@ import { createEngine } from '../core/engine-factory.ts'; import { parseMarkdown, type ParseValidationCode } from '../core/markdown.ts'; import { autoFixFrontmatter, + createFrontmatterBackup, + makeFrontmatterBackupRunId, scanBrainSources, type AuditReport, type AuditFix, @@ -79,7 +82,7 @@ function printHelp() { Usage: gbrain frontmatter validate [--json] [--fix] [--dry-run] - gbrain frontmatter generate [--fix] [--dry-run] [--json] + gbrain frontmatter generate [--fix] [--dry-run] [--json] [--include-catch-all] gbrain frontmatter audit [--source ] [--json] gbrain frontmatter install-hook [--source ] [--force] [--uninstall] @@ -90,9 +93,10 @@ validate NULL_BYTES, NESTED_QUOTES, EMPTY_FRONTMATTER --fix Auto-repair the fixable subset (NULL_BYTES, MISSING_CLOSE, - NESTED_QUOTES, SLUG_MISMATCH). Writes .bak before any - in-place rewrite. .bak is the safety contract; works for both - git and non-git brain repos. + NESTED_QUOTES, SLUG_MISMATCH). Writes a backup under + ~/.gbrain/backups/frontmatter/... before any in-place rewrite. + Backups work for both git and non-git brain repos without + littering the source tree. --dry-run Preview --fix without writing. --json Emit a JSON envelope on stdout. @@ -102,7 +106,10 @@ generate the filesystem path and file content. Zero LLM calls, fully deterministic. Without --fix: dry-run preview showing what would be generated. - With --fix: writes frontmatter to files (with .bak safety backups). + With --fix: writes frontmatter to files with centralized safety backups. + Unknown/catch-all files are skipped by default so GBrain does not stamp + meaningless "type: note" metadata onto arbitrary workspace documents. Pass + --include-catch-all to opt into the legacy catch-all note behavior. Rules are defined in src/core/frontmatter-inference.ts DIRECTORY_RULES. Add new directory conventions by adding rules to the table. @@ -112,9 +119,12 @@ generate gbrain frontmatter generate /path/to/brain --fix # write all gbrain frontmatter generate /path/to/brain/people/ --fix # just people/ - --fix Write generated frontmatter to files (.bak safety backups). + --fix Write generated frontmatter to files with centralized backups. --dry-run Preview without writing (default when --fix is omitted). --json Emit JSON output. + --include-catch-all + Also write the default catch-all rule ("type: note") for paths + that do not match a more specific directory rule. audit Read-only scan across all registered sources (or one with --source ). @@ -140,6 +150,7 @@ interface FileValidation { path: string; errors: { code: ParseValidationCode; message: string; line?: number }[]; fixesApplied?: AuditFix[]; + backupPath?: string; } async function runValidate(rest: string[]): Promise { @@ -166,6 +177,7 @@ async function runValidate(rest: string[]): Promise { const files = collectFiles(resolved); const results: FileValidation[] = []; + const backupRunId = makeFrontmatterBackupRunId(); for (const file of files) { const content = readFileSync(file, 'utf8'); @@ -181,7 +193,7 @@ async function runValidate(rest: string[]): Promise { const { content: fixed, fixes } = autoFixFrontmatter(content, { filePath: file }); result.fixesApplied = fixes; if (fixes.length > 0 && !flags.dryRun) { - copyFileSync(file, file + '.bak'); + result.backupPath = createFrontmatterBackup(file, { sourcePath: resolved, runId: backupRunId }); writeFileSync(file, fixed, 'utf8'); } } @@ -225,7 +237,7 @@ async function runValidate(rest: string[]): Promise { } } if (flags.fix && !flags.dryRun) { - console.log(`\nWrote .bak backups for ${filesFixed} file(s).`); + console.log(`\nWrote centralized backups for ${filesFixed} file(s) under ~/.gbrain/backups/frontmatter/.`); } } } @@ -299,7 +311,10 @@ function printAuditHumanReport(report: AuditReport): void { console.log('No registered sources to audit. Run `gbrain sources list` to inspect.'); return; } - console.log(`Frontmatter audit — ${report.total} issue(s) across ${report.per_source.length} source(s) (scanned at ${report.scanned_at})`); + console.log(`Frontmatter audit — ${report.total} malformed issue(s) across ${report.per_source.length} source(s) (scanned at ${report.scanned_at})`); + if (report.ignored_missing_open) { + console.log(`Missing frontmatter ignored: ${report.ignored_missing_open} file(s). Use \`frontmatter validate\` for strict per-file checks or \`frontmatter generate\` to add meaningful metadata.`); + } for (const src of report.per_source) { console.log(`\n[${src.source_id}] ${src.source_path}`); if (src.total === 0) { @@ -332,6 +347,7 @@ async function runGenerate(args: string[]): Promise { const doFix = args.includes('--fix'); const dryRun = args.includes('--dry-run'); const jsonOut = args.includes('--json'); + const includeCatchAll = args.includes('--include-catch-all') || args.includes('--allow-catch-all'); if (!targetPath) { console.error('error: gbrain frontmatter generate requires a argument'); @@ -342,7 +358,7 @@ async function runGenerate(args: string[]): Promise { const { inferFrontmatter, serializeFrontmatter } = await import('../core/frontmatter-inference.ts'); const { resolve, relative, join, basename } = await import('path'); - const { readFileSync, writeFileSync, copyFileSync, statSync, readdirSync, lstatSync } = await import('fs'); + const { readFileSync, writeFileSync, statSync, lstatSync } = await import('fs'); const rootPath = resolve(targetPath); const isDir = statSync(rootPath).isDirectory(); @@ -376,12 +392,14 @@ async function runGenerate(args: string[]): Promise { const results: GenerateResult[] = []; let scanned = 0; let skipped = 0; + let skippedCatchAll = 0; let generated = 0; let written = 0; + const backupRunId = makeFrontmatterBackupRunId(); function processFile(absPath: string, relPath: string) { scanned++; - if (!absPath.endsWith('.md')) return; + if (!isSyncable(relPath, { strategy: 'markdown' })) return; // Skip symlinks try { if (lstatSync(absPath).isSymbolicLink()) return; } catch { return; } @@ -394,6 +412,10 @@ async function runGenerate(args: string[]): Promise { skipped++; return; } + if (!includeCatchAll && inferred.matchedRule === '(default)') { + skippedCatchAll++; + return; + } generated++; results.push({ @@ -407,32 +429,17 @@ async function runGenerate(args: string[]): Promise { if (doFix && !dryRun) { const fm = serializeFrontmatter(inferred); const newContent = fm + '\n' + content; - // Safety: write .bak first - copyFileSync(absPath, absPath + '.bak'); + // Safety: write a centralized backup first. + createFrontmatterBackup(absPath, { sourcePath: brainRoot, runId: backupRunId }); writeFileSync(absPath, newContent, 'utf-8'); written++; } } - function walkDir(dir: string, rootForRel: string) { - let entries: string[]; - try { entries = readdirSync(dir); } catch { return; } - for (const entry of entries) { - if (entry === '.git' || entry === 'node_modules' || entry === '.obsidian') continue; - const abs = join(dir, entry); - try { - const stat = statSync(abs); - if (stat.isDirectory()) { - walkDir(abs, rootForRel); - } else if (stat.isFile() && entry.endsWith('.md')) { - processFile(abs, relative(rootForRel, abs)); - } - } catch { /* skip unreadable */ } - } - } - if (isDir) { - walkDir(rootPath, brainRoot); + for (const absPath of collectFiles(rootPath)) { + processFile(absPath, relative(brainRoot, absPath)); + } } else { const relPath = relative(brainRoot, rootPath) || basename(rootPath); processFile(rootPath, relPath); @@ -443,6 +450,7 @@ async function runGenerate(args: string[]): Promise { console.log(JSON.stringify({ scanned, skipped, + skippedCatchAll, generated, written, dryRun: !doFix || dryRun, @@ -457,9 +465,12 @@ async function runGenerate(args: string[]): Promise { console.log(`\nFrontmatter generation (${mode})`); console.log(` Scanned: ${scanned} files`); console.log(` Already have frontmatter: ${skipped}`); + if (skippedCatchAll > 0) { + console.log(` Skipped catch-all/unknown: ${skippedCatchAll} (pass --include-catch-all to write type: note)`); + } console.log(` Would generate: ${generated}`); if (doFix && !dryRun) { - console.log(` Written: ${written} (with .bak backups)`); + console.log(` Written: ${written} (with centralized backups)`); } // Show sample by type diff --git a/src/commands/migrations/v0_11_0.ts b/src/commands/migrations/v0_11_0.ts index cf5563cf7..e9c3ee685 100644 --- a/src/commands/migrations/v0_11_0.ts +++ b/src/commands/migrations/v0_11_0.ts @@ -58,9 +58,30 @@ export interface PendingHostWorkEntry { // Phase A — Schema // ----------------------------------------------------------------------- -function phaseASchema(opts: OrchestratorOpts): OrchestratorPhaseResult { +async function phaseASchema(opts: OrchestratorOpts): Promise { if (opts.dryRun) return { name: 'schema', status: 'skipped', detail: 'dry-run' }; try { + // v0.36.x #1100: route PGLite through an in-process schema apply rather + // than `execSync('gbrain init --migrate-only')`. The subprocess inherits + // HOME and tries to acquire the same file lock the parent process is + // holding (or briefly released and the on-disk artifact has not finished + // settling), which deadlocks until the 30s lock timeout fires. The + // structural fix is to not spawn a subprocess for work the parent can + // do directly — Postgres tolerates concurrent connections, so the + // legacy execSync path stays for Postgres callers. + const { loadConfig, toEngineConfig } = await import('../../core/config.ts'); + const cfg = loadConfig(); + if (cfg?.engine === 'pglite') { + const { createEngine } = await import('../../core/engine-factory.ts'); + const eng = await createEngine(toEngineConfig(cfg)); + try { + await eng.connect(toEngineConfig(cfg)); + await eng.initSchema(); + } finally { + try { await eng.disconnect(); } catch { /* best-effort */ } + } + return { name: 'schema', status: 'complete' }; + } execSync('gbrain init --migrate-only' + childGlobalFlags(), { stdio: 'inherit', timeout: 60_000, env: process.env }); return { name: 'schema', status: 'complete' }; } catch (e) { @@ -413,7 +434,7 @@ function phaseFInstall(opts: OrchestratorOpts): OrchestratorPhaseResult { async function orchestrator(opts: OrchestratorOpts): Promise { const phases: OrchestratorPhaseResult[] = []; - const a = phaseASchema(opts); + const a = await phaseASchema(opts); phases.push(a); if (a.status === 'failed') { console.error(`Phase A (schema) failed: ${a.detail}. Aborting; re-run after fixing.`); diff --git a/src/commands/migrations/v0_22_4.ts b/src/commands/migrations/v0_22_4.ts index 0291ac946..e05b29f19 100644 --- a/src/commands/migrations/v0_22_4.ts +++ b/src/commands/migrations/v0_22_4.ts @@ -10,7 +10,9 @@ * one entry per source-with-issues to ~/.gbrain/migrations/pending-host-work.jsonl. * It NEVER mutates brain content. The agent reads skills/migrations/v0.22.4.md * after upgrade and runs `gbrain frontmatter validate --fix` with - * explicit user consent. + * explicit user consent. Missing frontmatter is not queued by default; plain + * Markdown imports are valid, and generated frontmatter should add real signal + * rather than blanket `type: note`. * * Phases (all idempotent): * A. Schema — no-op (no DB changes in v0.22.4). @@ -212,8 +214,9 @@ export const v0_22_4: Migration = { 'pre-commit hook helper, and a new frontmatter-guard skill. The migration is audit-only ' + '(it never mutates your brain) — it scans every registered source, writes a per-source ' + 'report to ~/.gbrain/migrations/v0.22.4-audit.json, and queues a TODO with the exact fix ' + - 'command. Run `gbrain frontmatter validate --fix` to repair (creates .bak ' + - 'backups). Resolves all 7 check-resolvable warnings on master; ships frontmatter-guard.', + 'command for malformed frontmatter only. Missing frontmatter is treated as optional metadata ' + + 'coverage for broad document sources. Run `gbrain frontmatter validate --fix` ' + + 'to repair (creates centralized backups under ~/.gbrain/backups/frontmatter). Ships frontmatter-guard.', }, orchestrator, }; diff --git a/src/commands/serve-http.ts b/src/commands/serve-http.ts index 1aaa17748..cd61f82be 100644 --- a/src/commands/serve-http.ts +++ b/src/commands/serve-http.ts @@ -44,6 +44,44 @@ import { sqlQueryForEngine, executeRawJsonb } from '../core/sql-query.ts'; */ export const HEALTH_TIMEOUT_MS = 3000; +/** + * v0.36.1.x #1024: bootstrap token resolution. + * + * Pure helper (no side effects, no process.exit) so the rule is unit-testable. + * Two outcomes: + * - `ok`: caller proceeds with `{token, fromEnv}`. When the env value is + * undefined, a fresh 32-byte hex token is generated. + * - `error`: caller refuses to start. We require 32+ chars matching + * `[A-Za-z0-9_-]+` for env-supplied tokens — fail-closed beats silently + * accepting a weak admin secret. + * + * `randomBytesHex` is parameterized so tests can inject a deterministic + * fallback without monkey-patching `crypto.randomBytes`. + */ +export type BootstrapTokenResolution = + | { kind: 'ok'; token: string; fromEnv: boolean } + | { kind: 'error'; message: string }; + +export function resolveBootstrapToken( + envValue: string | undefined, + randomBytesHex: () => string = () => randomBytes(32).toString('hex'), +): BootstrapTokenResolution { + if (envValue === undefined) { + return { kind: 'ok', token: randomBytesHex(), fromEnv: false }; + } + const trimmed = envValue.trim(); + if (!/^[A-Za-z0-9_-]{32,}$/.test(trimmed)) { + return { + kind: 'error', + message: + 'GBRAIN_ADMIN_BOOTSTRAP_TOKEN must be at least 32 chars and match [A-Za-z0-9_-]+.\n' + + ' Refusing to start with a weak admin bootstrap token. Generate one with:\n' + + ' head -c 32 /dev/urandom | base64 | tr -d "+/=" | head -c 48', + }; + } + return { kind: 'ok', token: trimmed, fromEnv: true }; +} + export type ProbeHealthResult = | { ok: true; status: 200; body: { status: 'ok'; version: string; engine: string; [k: string]: unknown } } | { ok: false; status: 503; body: { error: 'service_unavailable'; error_description: string } }; @@ -176,6 +214,15 @@ interface ServeHttpOptions { * is almost always a misconfiguration. */ bind?: string; + /** + * v0.36.x #1024: suppress the printed admin bootstrap token line on + * startup. Combined with `GBRAIN_ADMIN_BOOTSTRAP_TOKEN`, lets long-lived + * production deployments avoid leaking the token into log aggregators on + * every supervisor-managed restart. When the env var is NOT set, this + * flag still suppresses the print — operators take responsibility for + * tracking the regenerated value through other means. + */ + suppressBootstrapToken?: boolean; } export async function runServeHttp(engine: BrainEngine, options: ServeHttpOptions) { @@ -227,9 +274,23 @@ export async function runServeHttp(engine: BrainEngine, options: ServeHttpOption console.error('Token sweep failed (non-blocking):', e instanceof Error ? e.message : e); } - // Generate bootstrap token for admin dashboard - const bootstrapToken = randomBytes(32).toString('hex'); + // v0.36.x #1024: bootstrap token sourcing. + // + // Default: regenerate per process start, print to stderr so the operator + // can paste into /admin login. Stable across restarts only when env var + // is set. The env override must be a strong secret — `[A-Za-z0-9_-]{32+}` + // — otherwise refuse to start. Logging the bootstrap-token value every + // restart is the original gripe; with `GBRAIN_ADMIN_BOOTSTRAP_TOKEN` set + // and `--suppress-bootstrap-token`, no value reaches the log. + const resolved = resolveBootstrapToken(process.env.GBRAIN_ADMIN_BOOTSTRAP_TOKEN); + if (resolved.kind === 'error') { + console.error(resolved.message); + process.exit(1); + } + let bootstrapToken: string = resolved.token; + let bootstrapFromEnv: boolean = resolved.fromEnv; const bootstrapHash = createHash('sha256').update(bootstrapToken).digest('hex'); + const suppressBootstrapPrint = options.suppressBootstrapToken === true; const adminSessions = new Map(); // sessionId → expiresAt // SSE clients for live activity feed @@ -859,11 +920,22 @@ export async function runServeHttp(engine: BrainEngine, options: ServeHttpOption // Register client from admin dashboard app.post('/admin/api/register-client', requireAdmin, express.json(), async (req: Request, res: Response) => { try { - const { name, scopes, tokenTtl } = req.body; + const { name, scopes, tokenTtl, grantTypes, redirectUris, tokenEndpointAuthMethod } = req.body; if (!name) { res.status(400).json({ error: 'Name required' }); return; } + const grants = Array.isArray(grantTypes) && grantTypes.length > 0 ? grantTypes : ['client_credentials']; + const uris = Array.isArray(redirectUris) ? redirectUris : []; const result = await oauthProvider.registerClientManual( - name, ['client_credentials'], scopes || 'read', [], + name, grants, scopes || 'read', uris, ); + // Public client (PKCE-only, no secret): NULL out client_secret_hash and + // set auth method so the SDK's clientAuth middleware skips the hash-vs- + // plaintext comparison that would otherwise reject the request. This is + // the supported pattern for browser-based OAuth (e.g. claude.ai's + // Custom Connector flow, which uses authorization_code + PKCE). + if (tokenEndpointAuthMethod === 'none') { + await sql`UPDATE oauth_clients SET client_secret_hash = NULL, token_endpoint_auth_method = 'none' WHERE client_id = ${result.clientId}`; + delete (result as any).clientSecret; + } // Set per-client TTL if specified if (tokenTtl && Number(tokenTtl) > 0) { await sql`UPDATE oauth_clients SET token_ttl = ${Number(tokenTtl)} WHERE client_id = ${result.clientId}`; @@ -916,22 +988,64 @@ export async function runServeHttp(engine: BrainEngine, options: ServeHttpOption }); // --------------------------------------------------------------------------- - // Admin SPA static files + // Admin SPA static files (v0.36.x #1090) // --------------------------------------------------------------------------- - // Serve from admin/dist if it exists (development), otherwise embedded assets + // Two-tier resolution: + // 1. Dev path — admin/dist next to cwd. Vite rebuilds land here first, + // so devs hacking on the SPA see changes without re-running + // build-admin-embedded. + // 2. Binary path — `src/admin-embedded.ts` exports `ADMIN_ASSETS`, a + // manifest of request-path → resolved-path keyed by every file in + // admin/dist at generation time. Bun's `with { type: 'file' }` ESM + // imports resolve correctly inside the compiled binary, so a + // globally-installed `gbrain serve --http` actually serves /admin + // instead of 404. Pre-fix the cwd-relative path was the ONLY + // resolution path, and every fresh install of the compiled binary + // hit 404 on /admin (issue #1090). const path = await import('path'); const fs = await import('fs'); const adminDistPath = path.join(process.cwd(), 'admin', 'dist'); - if (fs.existsSync(adminDistPath)) { + const useDevPath = fs.existsSync(adminDistPath); + if (useDevPath) { app.use('/admin', express.static(adminDistPath)); - // SPA fallback: serve index.html for all unmatched /admin/* routes app.get('/admin/{*path}', (req: Request, res: Response, next: NextFunction) => { - // Skip API and events routes if (req.path.startsWith('/admin/api/') || req.path === '/admin/events' || req.path === '/admin/login') { return next(); } res.sendFile(path.join(adminDistPath, 'index.html')); }); + } else { + // Embedded path. Read assets from the generated manifest. Cache the + // bytes per asset on first request — these never change for a given + // binary, so subsequent requests skip the fs read. + const { ADMIN_ASSETS, ADMIN_INDEX_HTML } = await import('../admin-embedded.ts'); + const cache = new Map(); + function loadAsset(asset: { path: string }): Buffer { + const hit = cache.get(asset.path); + if (hit) return hit; + const buf = fs.readFileSync(asset.path); + cache.set(asset.path, buf); + return buf; + } + app.get('/admin/{*path}', (req: Request, res: Response, next: NextFunction) => { + if (req.path.startsWith('/admin/api/') || req.path === '/admin/events' || req.path === '/admin/login') { + return next(); + } + const hit = ADMIN_ASSETS[req.path]; + if (hit) { + res.setHeader('Content-Type', hit.mime); + res.send(loadAsset(hit)); + return; + } + // SPA fallback — every unmatched /admin/* route resolves to index.html + // so client-side routing takes over (login, dashboard, agents, ...). + if (ADMIN_INDEX_HTML) { + res.setHeader('Content-Type', ADMIN_INDEX_HTML.mime); + res.send(loadAsset(ADMIN_INDEX_HTML)); + return; + } + res.status(404).send('admin SPA not available'); + }); } // --------------------------------------------------------------------------- @@ -939,6 +1053,17 @@ export async function runServeHttp(engine: BrainEngine, options: ServeHttpOption // --------------------------------------------------------------------------- const mcpOperations = operations.filter(op => !op.localOnly); + // v0.36.x #1076: MCP Streamable HTTP spec — GET /mcp opens an optional SSE + // backchannel for server-initiated messages. gbrain's transport is stateless + // and doesn't push server-initiated messages, so per spec we MUST return 405 + // (not 404) so probing clients (claude.ai, etc.) recognize this as an MCP + // endpoint, not a missing route. Without this, clients display "endpoint not + // found" instead of "endpoint exists but no SSE channel." + app.get('/mcp', (_req: Request, res: Response) => { + res.set('Allow', 'POST, DELETE'); + res.status(405).json({ jsonrpc: '2.0', error: { code: -32000, message: 'Method not allowed' }, id: null }); + }); + app.post('/mcp', requireBearerAuth({ verifier: oauthProvider }), async (req: Request, res: Response) => { const startTime = Date.now(); const authInfo = (req as any).auth as AuthInfo; @@ -1247,10 +1372,11 @@ export async function runServeHttp(engine: BrainEngine, options: ServeHttpOption ║ MCP: http://localhost:${port}/mcp${' '.repeat(Math.max(0, 21 - String(port).length))}║ ║ Health: http://localhost:${port}/health${' '.repeat(Math.max(0, 18 - String(port).length))}║ ╠══════════════════════════════════════════════════════╣ -║ Admin Token (paste into /admin login): ║ -║ ${bootstrapToken.substring(0, 50)} ║ -║ ${bootstrapToken.substring(50).padEnd(50)} ║ -╚══════════════════════════════════════════════════════╝ +${suppressBootstrapPrint + ? '║ Admin Token: suppressed (--suppress-bootstrap-token) ║\n╚══════════════════════════════════════════════════════╝' + : bootstrapFromEnv + ? '║ Admin Token: from $GBRAIN_ADMIN_BOOTSTRAP_TOKEN ║\n╚══════════════════════════════════════════════════════╝' + : `║ Admin Token (paste into /admin login): ║\n║ ${bootstrapToken.substring(0, 50)} ║\n║ ${bootstrapToken.substring(50).padEnd(50)} ║\n╚══════════════════════════════════════════════════════╝`} `); }); } diff --git a/src/commands/serve.ts b/src/commands/serve.ts index 3fa1362e4..058c859e7 100644 --- a/src/commands/serve.ts +++ b/src/commands/serve.ts @@ -107,8 +107,14 @@ export async function runServe( const bindIdx = args.indexOf('--bind'); const bind = bindIdx >= 0 ? args[bindIdx + 1] : undefined; + // v0.36.x #1024: suppress the printed admin bootstrap token. Pair with + // GBRAIN_ADMIN_BOOTSTRAP_TOKEN for production deployments that don't + // want the value leaking into log aggregators on every supervisor + // restart. + const suppressBootstrapToken = args.includes('--suppress-bootstrap-token'); + const { runServeHttp } = await import('./serve-http.ts'); - await runServeHttp(engine, { port, tokenTtl, enableDcr, publicUrl, logFullParams, bind }); + await runServeHttp(engine, { port, tokenTtl, enableDcr, publicUrl, logFullParams, bind, suppressBootstrapToken }); return; } diff --git a/src/commands/sync.ts b/src/commands/sync.ts index 280f3f7a7..02976b0ad 100644 --- a/src/commands/sync.ts +++ b/src/commands/sync.ts @@ -212,6 +212,10 @@ export function buildGitInvocation(repoPath: string, args: string[], configs: st return [...cfg, '-C', repoPath, ...args]; } +export function buildAutoEmbedArgs(slugs: string[], sourceId?: string): string[] { + return sourceId ? ['--source', sourceId, '--slugs', ...slugs] : ['--slugs', ...slugs]; +} + /** * Shell out to git with a generous maxBuffer. * @@ -230,6 +234,19 @@ function git(repoPath: string, args: string[], configs: string[] = []): string { }).trim(); } +function hasOriginRemote(repoPath: string): boolean { + try { + execFileSync('git', buildGitInvocation(repoPath, ['remote', 'get-url', 'origin']), { + encoding: 'utf-8', + timeout: 30000, + stdio: ['ignore', 'ignore', 'ignore'], + }); + return true; + } catch { + return false; + } +} + function isDetachedHead(repoPath: string): boolean { try { git(repoPath, ['symbolic-ref', '--quiet', 'HEAD']); @@ -450,7 +467,12 @@ async function performSyncInner(engine: BrainEngine, opts: SyncOpts): Promise 0 && pagesAffected.length <= 100) { try { const { runEmbed } = await import('./embed.ts'); - await runEmbed(engine, ['--slugs', ...pagesAffected]); + await runEmbed(engine, buildAutoEmbedArgs(pagesAffected, opts.sourceId)); // Before commit 2 lands: runEmbed is void. Best estimate is pagesAffected, // since runEmbed re-embeds every requested slug. Commit 2 sharpens this // with EmbedResult.embedded. diff --git a/src/commands/upgrade.ts b/src/commands/upgrade.ts index 44bc7cb5b..2e1d4f018 100644 --- a/src/commands/upgrade.ts +++ b/src/commands/upgrade.ts @@ -1,6 +1,6 @@ import { execSync, execFileSync } from 'child_process'; import { existsSync, readFileSync, writeFileSync, mkdirSync, appendFileSync, realpathSync } from 'fs'; -import { join, dirname, resolve } from 'path'; +import { basename, join, dirname, resolve } from 'path'; import { VERSION } from '../version.ts'; const GBRAIN_GITHUB_REPO = 'garrytan/gbrain'; @@ -37,15 +37,18 @@ export async function runUpgrade(args: string[]) { break; } - case 'bun': + case 'bun': { console.log('Upgrading via bun...'); + const bunGlobalRoot = resolveBunGlobalRoot(); try { - execSync('bun update gbrain', { stdio: 'inherit', timeout: 120_000 }); + execFileSync('bun', ['update', 'gbrain'], { cwd: bunGlobalRoot, stdio: 'inherit', timeout: 120_000 }); upgraded = true; } catch { - console.error('Upgrade failed. Try running manually: bun update gbrain'); + console.error('Upgrade failed. Try running manually:'); + console.error(` cd ${bunGlobalRoot} && bun update gbrain`); } break; + } case 'binary': console.log('Binary self-update not yet implemented.'); @@ -108,6 +111,46 @@ export async function runUpgrade(args: string[]) { } } +export function resolveBunGlobalRoot(): string { + const bunInstall = process.env.BUN_INSTALL; + if (bunInstall) { + return join(bunInstall, 'install', 'global'); + } + + const defaultRoot = join(process.env.HOME || '', '.bun', 'install', 'global'); + if (isBunGlobalRoot(defaultRoot)) { + return defaultRoot; + } + + const installRoot = findBunInstallRootFromArgv(); + return installRoot ?? defaultRoot; +} + +function isBunGlobalRoot(dir: string): boolean { + return existsSync(join(dir, 'package.json')) && existsSync(join(dir, 'node_modules')); +} + +function findBunInstallRootFromArgv(): string | null { + try { + const argv1 = process.argv[1]; + if (!argv1) return null; + + let dir = dirname(realpathSync(argv1)); + for (let i = 0; i < 10; i++) { + if (basename(dir) === 'gbrain' && basename(dirname(dir)) === 'node_modules') { + const root = dirname(dirname(dir)); + if (isBunGlobalRoot(root)) return root; + } + const parent = dirname(dir); + if (parent === dir) break; + dir = parent; + } + return null; + } catch { + return null; + } +} + function verifyUpgrade(): string { try { const output = execSync('gbrain --version', { encoding: 'utf-8', timeout: 10_000 }).trim(); diff --git a/src/core/ai/gateway.ts b/src/core/ai/gateway.ts index e65b08948..b1cc35299 100644 --- a/src/core/ai/gateway.ts +++ b/src/core/ai/gateway.ts @@ -398,18 +398,27 @@ function prefixWithProviderFrom(original: string, bare: string): string { const _warnedRecipes = new Set(); /** - * Walk every registered recipe with an `embedding` touchpoint. Each one - * missing `max_batch_tokens` gets exactly one stderr line per process for - * its first appearance. Recipes WITH the field stay quiet. The + * Walk the configured embedding recipes. Each one missing `max_batch_tokens` + * gets exactly one stderr line per process for its first appearance. Recipes + * WITH the field stay quiet. The * recursive-halving safety net only fires when `max_batch_tokens` is set, * so a recipe that forgets it has no protection if the provider has a * batch cap. Loud-fail over silent-skip per CLAUDE.md; a future * Cohere/Mistral/Jina recipe that inherits the embedding-touchpoint * pattern but forgets the cap re-creates the v0.27 Voyage backfill loop. - * The warning calls that out before production traffic hits it. + * The warning calls that out before production traffic hits it, while avoiding + * unrelated startup noise from recipes the current brain is not using. */ function warnRecipesMissingBatchTokens(): void { + const configuredProviderIds = new Set(); + for (const model of [_config?.embedding_model, _config?.embedding_multimodal_model]) { + if (!model) continue; + const providerId = model.split(':')[0]; + if (providerId) configuredProviderIds.add(providerId); + } + for (const recipe of listRecipes()) { + if (!configuredProviderIds.has(recipe.id)) continue; const embedding = recipe.touchpoints?.embedding; if (!embedding || embedding.max_batch_tokens !== undefined) continue; // OpenAI is the canonical "no cap declared, fast path is intentional" @@ -1190,14 +1199,22 @@ async function embedSubBatch( ...(opts?.maxRetries !== undefined && { maxRetries: opts.maxRetries }), }); - const first = result.embeddings?.[0]; - if (first && Array.isArray(first) && first.length !== expectedDims) { + if (!Array.isArray(result.embeddings) || result.embeddings.length !== texts.length) { throw new AIConfigError( - `Embedding dim mismatch: model ${modelId} returned ${first.length} but schema expects ${expectedDims}.`, - `Run \`gbrain migrate --embedding-model ${getEmbeddingModel()} --embedding-dimensions ${first.length}\` or change models.`, + `Embedding provider returned ${result.embeddings?.length ?? 0} embedding(s) for ${texts.length} input(s).`, + `Retry the import after checking provider health; partial embedding responses are not safe to index.`, ); } + for (const embedding of result.embeddings) { + if (Array.isArray(embedding) && embedding.length !== expectedDims) { + throw new AIConfigError( + `Embedding dim mismatch: model ${modelId} returned ${embedding.length} but schema expects ${expectedDims}.`, + `Run \`gbrain migrate --embedding-model ${getEmbeddingModel()} --embedding-dimensions ${embedding.length}\` or change models.`, + ); + } + } + recordSubBatchSuccess(recipe); return result.embeddings.map((e: number[]) => new Float32Array(e)); } catch (err) { diff --git a/src/core/brain-writer.ts b/src/core/brain-writer.ts index 733ca4661..6a25eb510 100644 --- a/src/core/brain-writer.ts +++ b/src/core/brain-writer.ts @@ -9,15 +9,19 @@ * validation stack. * * Path-guard contract: writeBrainPage refuses to write outside the source - * path. .bak backups are the safety contract (works for both git and non-git - * brain repos; the existing src/core/dry-fix.ts:getWorkingTreeStatus rejects - * non-git repos as unsafe, which is the wrong shape for brain rewrites). + * path. Pre-write backups are the safety contract (works for both git and + * non-git brain repos; the existing src/core/dry-fix.ts:getWorkingTreeStatus + * rejects non-git repos as unsafe, which is the wrong shape for brain + * rewrites). Backups live under ~/.gbrain/backups/frontmatter/... instead of + * beside source files so bulk repair never litters the user's workspace. */ -import { existsSync, readFileSync, readdirSync, statSync, copyFileSync, writeFileSync, mkdirSync, lstatSync } from 'fs'; -import { join, relative, resolve, dirname } from 'path'; +import { createHash } from 'crypto'; +import { existsSync, readFileSync, readdirSync, copyFileSync, writeFileSync, mkdirSync, lstatSync } from 'fs'; +import { join, relative, resolve, dirname, basename, isAbsolute } from 'path'; import type { BrainEngine } from './engine.ts'; import type { ProgressReporter } from './progress.ts'; +import { gbrainPath } from './config.ts'; import { parseMarkdown, type ParseValidationCode, @@ -38,6 +42,7 @@ export interface PerSourceReport { total: number; errors_by_code: Partial>; sample: { path: string; codes: ParseValidationCode[] }[]; + ignoredMissingOpen: number; } export interface AuditReport { @@ -46,10 +51,45 @@ export interface AuditReport { errors_by_code: Partial>; per_source: PerSourceReport[]; scanned_at: string; + ignored_missing_open?: number; } const SAMPLE_PER_SOURCE = 20; +// --------------------------------------------------------------------------- +// Frontmatter backups +// --------------------------------------------------------------------------- + +export function makeFrontmatterBackupRunId(date = new Date()): string { + return date.toISOString().replace(/[:.]/g, '-'); +} + +export interface FrontmatterBackupOpts { + sourcePath?: string; + backupRoot?: string; + runId?: string; +} + +function sourceKey(sourcePath: string): string { + return createHash('sha256').update(resolve(sourcePath)).digest('hex').slice(0, 12); +} + +export function defaultFrontmatterBackupRoot(runId = makeFrontmatterBackupRunId()): string { + return gbrainPath('backups', 'frontmatter', runId); +} + +export function createFrontmatterBackup(filePath: string, opts: FrontmatterBackupOpts = {}): string { + const resolvedFile = resolve(filePath); + const resolvedSource = resolve(opts.sourcePath ?? dirname(resolvedFile)); + const rel = relative(resolvedSource, resolvedFile); + const safeRel = rel && !rel.startsWith('..') && !isAbsolute(rel) ? rel : basename(resolvedFile); + const root = opts.backupRoot ?? defaultFrontmatterBackupRoot(opts.runId); + const backupPath = join(root, sourceKey(resolvedSource), safeRel + '.bak'); + mkdirSync(dirname(backupPath), { recursive: true }); + copyFileSync(resolvedFile, backupPath); + return backupPath; +} + // --------------------------------------------------------------------------- // autoFixFrontmatter // --------------------------------------------------------------------------- @@ -178,7 +218,7 @@ export function autoFixFrontmatter( } // --------------------------------------------------------------------------- -// writeBrainPage — path-guarded write with .bak backup +// writeBrainPage — path-guarded write with centralized backup // --------------------------------------------------------------------------- export class BrainWriterError extends Error { @@ -193,15 +233,16 @@ export class BrainWriterError extends Error { } /** - * Path-guarded brain page writer. Always writes `.bak` before any - * in-place mutation (the contract that replaces git-tree-clean for non-git - * brain repos). Throws BrainWriterError if filePath is not under sourcePath. + * Path-guarded brain page writer. Always writes a backup under + * ~/.gbrain/backups/frontmatter/... before any in-place mutation (the contract + * that replaces git-tree-clean for non-git brain repos). Throws + * BrainWriterError if filePath is not under sourcePath. */ export function writeBrainPage( filePath: string, content: string, - opts: { sourcePath: string; autoFix?: boolean }, -): { fixes: AuditFix[] } { + opts: { sourcePath: string; autoFix?: boolean; backupRoot?: string; backupRunId?: string }, +): { fixes: AuditFix[]; backupPath?: string } { const resolvedSource = resolve(opts.sourcePath); const resolvedTarget = resolve(filePath); if (resolvedTarget !== resolvedSource && !resolvedTarget.startsWith(resolvedSource + '/')) { @@ -220,13 +261,18 @@ export function writeBrainPage( fixes = result.fixes; } + let backupPath: string | undefined; if (existsSync(filePath)) { - copyFileSync(filePath, filePath + '.bak'); + backupPath = createFrontmatterBackup(filePath, { + sourcePath: opts.sourcePath, + backupRoot: opts.backupRoot, + runId: opts.backupRunId, + }); } else { mkdirSync(dirname(filePath), { recursive: true }); } writeFileSync(filePath, toWrite, 'utf8'); - return { fixes }; + return { fixes, backupPath }; } // --------------------------------------------------------------------------- @@ -242,6 +288,10 @@ export interface ScanOpts { /** Limit scan to one source. When omitted, all registered sources with a * local_path are scanned. */ sourceId?: string; + /** Missing frontmatter is optional metadata coverage for broad document + * sources. Set true for curated page repos that require every file to carry + * YAML frontmatter. */ + strictMissingOpen?: boolean; onProgress?: ProgressReporter; signal?: AbortSignal; } @@ -254,6 +304,7 @@ export async function scanBrainSources( const totals: Partial> = {}; const perSource: PerSourceReport[] = []; let grandTotal = 0; + let ignoredMissingOpen = 0; for (const src of sources) { if (opts.signal?.aborted) break; @@ -267,12 +318,14 @@ export async function scanBrainSources( total: 0, errors_by_code: {}, sample: [], + ignoredMissingOpen: 0, }); continue; } const report = scanOneSource(src.id, src.local_path, opts); perSource.push(report); grandTotal += report.total; + ignoredMissingOpen += report.ignoredMissingOpen; for (const [code, n] of Object.entries(report.errors_by_code)) { const k = code as ParseValidationCode; totals[k] = (totals[k] ?? 0) + (n as number); @@ -285,6 +338,7 @@ export async function scanBrainSources( errors_by_code: totals, per_source: perSource, scanned_at: new Date().toISOString(), + ignored_missing_open: ignoredMissingOpen || undefined, }; } @@ -298,6 +352,7 @@ function scanOneSource( const rootResolved = resolve(sourcePath); let scanned = 0; let total = 0; + let ignoredMissingOpen = 0; walkDir(rootResolved, (absPath) => { if (opts.signal?.aborted) return false; @@ -312,7 +367,12 @@ function scanOneSource( } const expectedSlug = slugifyPath(relPath); const parsed = parseMarkdown(content, relPath, { validate: true, expectedSlug }); - const errs = parsed.errors ?? []; + const errs = (parsed.errors ?? []).filter((e) => { + if (e.code !== 'MISSING_OPEN') return true; + if (opts.strictMissingOpen) return true; + ignoredMissingOpen++; + return false; + }); if (errs.length > 0) { total += errs.length; const codes: ParseValidationCode[] = []; @@ -340,6 +400,7 @@ function scanOneSource( total, errors_by_code: errorsByCode, sample, + ignoredMissingOpen, }; } diff --git a/src/core/config.ts b/src/core/config.ts index 9b920c409..eabcdcf0c 100644 --- a/src/core/config.ts +++ b/src/core/config.ts @@ -1,5 +1,5 @@ import { readFileSync, writeFileSync, mkdirSync, chmodSync, existsSync } from 'fs'; -import { join } from 'path'; +import { isAbsolute, join } from 'path'; import { homedir } from 'os'; import type { EngineConfig } from './types.ts'; @@ -122,11 +122,35 @@ export function isThinClient(config: GBrainConfig | null): boolean { * Load config with credential precedence: env vars > config file. * Plugin config is handled by the plugin runtime injecting env vars. */ +// v0.36.x #1086: translate legacy `provider` + `model` config shape (seen in +// pre-v0.32 docs and some community templates) to the canonical +// `embedding_model: ":"`. Without this translation, sync +// and embed silently fell through to the hardcoded OpenAI default, blocking +// Voyage / Cohere / Mistral users from using their configured provider. +function migrateLegacyEmbeddingConfig(raw: Record): Record { + if (raw.embedding_model !== undefined) return raw; + const provider = typeof raw.provider === 'string' ? raw.provider : undefined; + const model = typeof raw.model === 'string' ? raw.model : undefined; + if (!provider || !model) return raw; + // Strip the legacy keys to avoid downstream confusion. Emit a one-line + // stderr nudge so the operator updates their config to the canonical shape. + const rest = { ...raw }; + delete rest.provider; + delete rest.model; + rest.embedding_model = `${provider}:${model}`; + console.warn( + `[config] legacy "provider" + "model" detected; using "${rest.embedding_model}".` + + ` Rewrite ~/.gbrain/config.json to: "embedding_model": "${rest.embedding_model}".`, + ); + return rest; +} + export function loadConfig(): GBrainConfig | null { let fileConfig: GBrainConfig | null = null; try { const raw = readFileSync(getConfigPath(), 'utf-8'); - fileConfig = JSON.parse(raw) as GBrainConfig; + const parsed = JSON.parse(raw) as Record; + fileConfig = migrateLegacyEmbeddingConfig(parsed) as unknown as GBrainConfig; } catch { /* no config file */ } // Try env vars @@ -270,10 +294,10 @@ export function configDir(): string { const override = process.env.GBRAIN_HOME; if (override && override.trim()) { const trimmed = override.trim(); - if (!trimmed.startsWith('/')) { + if (!isAbsolute(trimmed)) { throw new Error(`GBRAIN_HOME must be an absolute path; got: ${trimmed}`); } - if (trimmed.split('/').includes('..')) { + if (trimmed.split(/[\\/]/).includes('..')) { throw new Error(`GBRAIN_HOME must not contain '..' segments; got: ${trimmed}`); } return join(trimmed, '.gbrain'); diff --git a/src/core/cycle.ts b/src/core/cycle.ts index dc341898e..0f8ba0bdd 100644 --- a/src/core/cycle.ts +++ b/src/core/cycle.ts @@ -551,27 +551,28 @@ async function runPhaseLint(brainDir: string, dryRun: boolean): Promise { try { - // Library function path — the v0.15 backlinks.ts exports - // runBacklinksCore when --fix is requested. + // Maintenance cycles must not rewrite tracked brain pages with generated + // "Referenced in" timeline bullets. The graph extractor/auto-link path is + // the canonical link store during sync/dream/autopilot; the legacy + // filesystem fixer remains available explicitly via `gbrain check-backlinks + // fix` for users who truly want markdown backlinks materialized. const { runBacklinksCore } = await import('../commands/backlinks.ts'); const result = await runBacklinksCore({ - action: 'fix', + action: 'check', dir: brainDir, dryRun, }); const gaps = result.gaps_found ?? 0; const added = result.fixed ?? 0; - const remaining = Math.max(0, gaps - added); - const status: PhaseStatus = - gaps === 0 || (!dryRun && remaining === 0) ? 'ok' : 'warn'; + const status: PhaseStatus = 'ok'; return { phase: 'backlinks', status, duration_ms: 0, - summary: dryRun - ? `${gaps} missing back-link(s) (dry-run)` - : `${added} back-link(s) added, ${remaining} remaining`, - details: { gaps, added, pages_affected: result.pages_affected, dryRun }, + summary: gaps === 0 + ? 'no missing back-links found' + : `${gaps} missing back-link(s) found (audit-only; run gbrain check-backlinks fix to materialize)`, + details: { gaps, added, pages_affected: result.pages_affected, dryRun, mode: 'audit-only' }, }; } catch (e) { return { @@ -1004,9 +1005,19 @@ async function runPhaseOrphans(engine: BrainEngine): Promise { const { findOrphans } = await import('../commands/orphans.ts'); const result = await findOrphans(engine); const count = result.total_orphans; + // Orphans are a code-smell signal, not a fatal condition. The + // original `count > 20` cutoff was tuned for small dev brains; on + // any corpus past a few hundred pages it fires 'warn' every cycle + // in steady state. Combined with the autopilot circuit-breaker + // historically tripping on cycle.status='partial', that produced + // respawn storms under KeepAlive=true. Switch to a ratio: warn + // only when more than half the corpus is orphaned (the real "your + // graph fell apart" signal). total_pages=0 is a defensive 'ok'. + const status: PhaseStatus = + result.total_pages > 0 && count / result.total_pages > 0.5 ? 'warn' : 'ok'; return { phase: 'orphans', - status: count > 20 ? 'warn' : 'ok', + status, duration_ms: 0, summary: `${count} orphan page(s) out of ${result.total_pages} total`, details: { diff --git a/src/core/cycle/extract-facts.ts b/src/core/cycle/extract-facts.ts index 8cdf91bce..6af53b093 100644 --- a/src/core/cycle/extract-facts.ts +++ b/src/core/cycle/extract-facts.ts @@ -156,8 +156,16 @@ export async function runExtractFacts( result.phantomsMorePending = phantomResult.more_pending; // ── Resolve target slug set ─────────────────────────────────── + // v0.36.x #1096: presence — not length — distinguishes the modes. + // `slugs: []` from an incremental sync no-op was previously treated + // identically to `slugs: undefined` (full-walk intent) because + // `opts.slugs && opts.slugs.length > 0` is falsy for both. On a + // multi-thousand-page brain the unintended full walk exceeds the + // autopilot-cycle timeout (~600s) and dead-letters the job. let slugs: string[]; - if (opts.slugs && opts.slugs.length > 0) { + if (opts.slugs !== undefined) { + // Caller explicitly passed a list (possibly empty). Empty array is a + // real incremental no-op; don't escalate to full-brain walk. slugs = opts.slugs; } else { // Full walk: every page in the brain. Bounded by engine.getAllSlugs diff --git a/src/core/frontmatter-inference.ts b/src/core/frontmatter-inference.ts index 691fbe83c..41f13aeff 100644 --- a/src/core/frontmatter-inference.ts +++ b/src/core/frontmatter-inference.ts @@ -28,7 +28,8 @@ * - **Deterministic.** Same file always produces the same frontmatter. No LLM calls, no network. * - **Extensible via rules.** The `DIRECTORY_RULES` table maps path patterns to type + source + tags. * Adding a new directory convention = adding one rule. - * - **Safe.** `.bak` files on write, `--dry-run` by default in CLI, idempotent. + * - **Safe.** centralized backups on write, `--dry-run` by default in CLI, + * idempotent. * * ## How it fits in the pipeline * @@ -177,6 +178,22 @@ export const DIRECTORY_RULES: DirectoryRule[] = [ titleStrategy: 'filename', }, + // Documentation/workspace repos. These make generated frontmatter useful + // instead of flattening everything to the catch-all `note` type. + { pathPrefix: 'docs/runbooks/', type: 'guide', source: 'docs', tags: ['runbook'], titleStrategy: 'heading' }, + { pathPrefix: 'runbooks/', type: 'guide', source: 'docs', tags: ['runbook'], titleStrategy: 'heading' }, + { pathPrefix: 'docs/guides/', type: 'guide', source: 'docs', tags: ['guide'], titleStrategy: 'heading' }, + { pathPrefix: 'docs/policies/', type: 'guide', source: 'docs', tags: ['policy'], titleStrategy: 'heading' }, + { pathPrefix: 'docs/projects/', type: 'project', source: 'docs', tags: ['project'], titleStrategy: 'heading' }, + { pathPrefix: 'docs/audits/', type: 'analysis', source: 'docs', tags: ['audit'], titleStrategy: 'heading' }, + { pathPrefix: 'docs/research/', type: 'analysis', source: 'docs', tags: ['research'], titleStrategy: 'heading' }, + { pathPrefix: 'docs/evaos/', type: 'architecture', source: 'docs', tags: ['evaos'], titleStrategy: 'heading' }, + { pathPrefix: 'docs/architecture/', type: 'architecture', source: 'docs', titleStrategy: 'heading' }, + { pathPrefix: 'docs/providers/', type: 'source', source: 'docs', tags: ['provider'], titleStrategy: 'heading' }, + { pathPrefix: 'security/', type: 'guide', source: 'docs', tags: ['security'], titleStrategy: 'heading' }, + { pathPrefix: 'support/', type: 'source', source: 'docs', tags: ['support'], titleStrategy: 'heading' }, + { pathPrefix: 'notes/', type: 'note', titleStrategy: 'heading' }, + // Personal sections { pathPrefix: 'personal/therapy/', @@ -233,7 +250,9 @@ export const DIRECTORY_RULES: DirectoryRule[] = [ { pathPrefix: 'meetings/', type: 'meeting', titleStrategy: 'heading', datePattern: 'filename' }, { pathPrefix: 'media/', type: 'media', titleStrategy: 'heading' }, - // Catch-all for any remaining files + // Catch-all for any remaining files. The CLI does not write this rule by + // default; users must pass --include-catch-all so `type: note` means an + // intentional fallback instead of "the script did not know what this was". { pathPrefix: '', type: 'note', titleStrategy: 'heading' }, ]; diff --git a/src/core/oauth-provider.ts b/src/core/oauth-provider.ts index 501fad2c4..aa94f8de1 100644 --- a/src/core/oauth-provider.ts +++ b/src/core/oauth-provider.ts @@ -22,6 +22,7 @@ import type { import type { OAuthServerProvider, AuthorizationParams } from '@modelcontextprotocol/sdk/server/auth/provider.js'; import type { OAuthRegisteredClientsStore } from '@modelcontextprotocol/sdk/server/auth/clients.js'; import type { AuthInfo } from '@modelcontextprotocol/sdk/server/auth/types.js'; +import { InvalidTokenError } from '@modelcontextprotocol/sdk/server/auth/errors.js'; import { hashToken, generateToken, isUndefinedColumnError } from './utils.ts'; import { hasScope, assertAllowedScopes, parseScopeString, InvalidScopeError } from './scope.ts'; import type { SqlQuery, SqlValue } from './sql-query.ts'; @@ -538,7 +539,7 @@ export class GBrainOAuthProvider implements OAuthServerProvider { // throw here rather than return an undefined-bearing AuthInfo. const expiresAt = coerceTimestamp(row.expires_at); if (expiresAt === undefined || expiresAt < now) { - throw new Error('Token expired'); + throw new InvalidTokenError('Token expired'); } // v0.34.1 (#876): federated_read normalization. SELECT returns // either a JS array (Postgres / PGLite text[] driver mapping) or @@ -596,7 +597,7 @@ export class GBrainOAuthProvider implements OAuthServerProvider { } as AuthInfo; } - throw new Error('Invalid token'); + throw new InvalidTokenError('Invalid token'); } // ------------------------------------------------------------------------- diff --git a/src/core/operations.ts b/src/core/operations.ts index d077fdae9..5ce2a0840 100644 --- a/src/core/operations.ts +++ b/src/core/operations.ts @@ -1151,6 +1151,15 @@ const query: Operation = { const queryText = p.query as string | undefined; const imageData = p.image as string | undefined; const imageMime = (p.image_mime as string) || 'image/jpeg'; + // Explicit per-call source_id must win over ctx.sourceId. The special + // __all__ value opts out of source filtering for local cross-source search. + const sourceIdParam = typeof p.source_id === 'string' ? p.source_id : undefined; + const querySourceScope = + sourceIdParam !== undefined + ? sourceIdParam === '__all__' + ? {} + : { sourceId: sourceIdParam } + : sourceScopeOpts(ctx); // v0.27.1: image-similarity branch. Bypasses hybridSearch (which is // text-only); embeds the image via embedMultimodal and runs a direct @@ -1168,7 +1177,7 @@ const query: Operation = { limit: (p.limit as number) || 20, offset: (p.offset as number) || 0, embeddingColumn: 'embedding_image', - ...sourceScopeOpts(ctx), + ...querySourceScope, }); return results; } @@ -1187,16 +1196,6 @@ const query: Operation = { // search). When the param is the literal '__all__', force-allow // cross-source mode (matches SearchOpts.sourceId contract). let capturedMeta: HybridSearchMeta | null = null; - // v0.34 (Codex finding #2): thread ctx.sourceId so multi-source brains - // get source-scoped retrieval. Explicit `source_id` param wins over - // ctx.sourceId; literal `__all__` opts out (cross-source). - const sourceIdParam = typeof p.source_id === 'string' ? p.source_id : undefined; - const resolvedSourceId = - sourceIdParam !== undefined - ? sourceIdParam === '__all__' - ? undefined - : sourceIdParam - : ctx.sourceId; // v0.32.x search-lite: route the query op through hybridSearchCached so // semantic cache + token budget + intent weighting fire automatically. // Plain hybridSearch remains the bare API for callers that opt out. @@ -1210,7 +1209,7 @@ const query: Operation = { symbolKind: (p.symbol_kind as string) || undefined, nearSymbol: (p.near_symbol as string) || undefined, walkDepth: typeof p.walk_depth === 'number' ? (p.walk_depth as number) : undefined, - sourceId: resolvedSourceId, + ...querySourceScope, // v0.29.1 — agent-explicit recency + salience. Omitted = heuristic defaults. salience: p.salience as 'off' | 'on' | 'strong' | undefined, recency: p.recency as 'off' | 'on' | 'strong' | undefined, @@ -1221,10 +1220,6 @@ const query: Operation = { useCache: typeof p.use_cache === 'boolean' ? (p.use_cache as boolean) : undefined, intentWeighting: typeof p.intent_weighting === 'boolean' ? (p.intent_weighting as boolean) : undefined, onMeta: (m) => { capturedMeta = m; }, - // v0.34.1 (#861 — P0 leak seal): thread caller's source scope. The - // hybridSearch internal searchOpts rebuild (hybrid.ts:223) was - // dropping these fields pre-fix even when callers passed them. - ...sourceScopeOpts(ctx), }); const latency_ms = Date.now() - startedAt; diff --git a/src/core/search/hybrid.ts b/src/core/search/hybrid.ts index b3073d1ac..bdff5766a 100644 --- a/src/core/search/hybrid.ts +++ b/src/core/search/hybrid.ts @@ -31,6 +31,17 @@ import { const RRF_K = 60; const COMPILED_TRUTH_BOOST = 2.0; +const pendingCacheWrites = new Set>(); + +export async function awaitPendingSearchCacheWrites(): Promise { + if (pendingCacheWrites.size === 0) return; + await Promise.allSettled([...pendingCacheWrites]); +} + +function trackCacheWrite(promise: Promise): void { + pendingCacheWrites.add(promise); + promise.finally(() => pendingCacheWrites.delete(promise)).catch(() => { /* swallow */ }); +} /** * Backlink boost coefficient. Score is multiplied by (1 + BACKLINK_BOOST_COEF * log(1 + count)). * - 0 backlinks: factor = 1.0 (no boost). @@ -861,9 +872,11 @@ export async function hybridSearchCached( results.length > 0 && (innerMeta?.vector_enabled ?? false) ) { - void cache - .store(query, queryEmbedding, results, finalMeta, { sourceId: opts?.sourceId, knobsHash: cacheKnobsHash }) - .catch(() => { /* swallow */ }); + trackCacheWrite( + cache + .store(query, queryEmbedding, results, finalMeta, { sourceId: opts?.sourceId, knobsHash: cacheKnobsHash }) + .catch(() => { /* swallow */ }), + ); } return budgeted; diff --git a/src/core/sync.ts b/src/core/sync.ts index a5abecc61..c409775d3 100644 --- a/src/core/sync.ts +++ b/src/core/sync.ts @@ -74,6 +74,12 @@ const CODE_EXTENSIONS = new Set([ '.json', '.yaml', '.yml', '.toml', + // v0.36.x #878: Terraform / HCL. Closes the silent-data-loss bug where + // Terraform repos were invisible to `gbrain sync --strategy code`. + // detectCodeLanguage() returns null for these so they chunk via the + // recursive chunker (no tree-sitter grammar), which is the correct + // fallback — same path as toml / yaml without language-specific AST. + '.tf', '.tfvars', '.hcl', ]); /** diff --git a/test/admin-embed-spawn.serial.test.ts b/test/admin-embed-spawn.serial.test.ts new file mode 100644 index 000000000..6d0c41d47 --- /dev/null +++ b/test/admin-embed-spawn.serial.test.ts @@ -0,0 +1,207 @@ +/** + * v0.36.1.x #1090: admin embed E2E — spawns `gbrain serve --http` from a + * fresh tmpdir (so `process.cwd()/admin/dist` doesn't exist), then issues + * a real HTTP GET to /admin and asserts the React SPA shell HTML comes + * back from the embedded manifest path — NOT a 404, NOT an Express + * default error page. + * + * Pre-fix, the server resolved `adminDistPath = path.join(process.cwd(), + * 'admin', 'dist')` and skipped /admin route mounting when that path did + * not exist. Every globally-installed binary (`bun install -g + * github:garrytan/gbrain`) hit 404 on /admin because the user never + * cd's into the source repo. The fix: + * 1. `scripts/build-admin-embedded.ts` walks admin/dist and emits + * `src/admin-embedded.ts` with `with { type: 'file' }` imports. + * 2. `serve-http.ts` two-tier resolution: cwd-relative admin/dist for + * dev (Vite hot-rebuild), embedded manifest otherwise. + * + * This test deliberately runs the server from a tmpdir so the cwd-relative + * branch CANNOT fire — the embedded path is the one under test. If the + * embed wiring regresses, this case fails with a 404 (or never reaches + * the SPA shell HTML). + * + * No DATABASE_URL needed; PGLite is the engine. Serial because it binds + * a TCP port and reads/writes a tmpdir. + */ +import { describe, test, expect } from 'bun:test'; +import { mkdtempSync, mkdirSync, writeFileSync, rmSync } from 'fs'; +import { join } from 'path'; +import { tmpdir } from 'os'; +import type { Subprocess } from 'bun'; + +const REPO = new URL('..', import.meta.url).pathname.replace(/\/$/, ''); + +interface ServeProc { + proc: Subprocess; + port: number; + home: string; + bootstrapToken: string; + cleanup: () => Promise; +} + +function pickPort(): number { + // High-random port. Collision is unlikely; test reruns get fresh ports. + return 31000 + Math.floor(Math.random() * 4000); +} + +async function spawnServer(): Promise { + const home = mkdtempSync(join(tmpdir(), 'gbrain-admin-embed-')); + mkdirSync(join(home, '.gbrain'), { recursive: true }); + writeFileSync( + join(home, '.gbrain', 'config.json'), + JSON.stringify({ + engine: 'pglite', + database_path: join(home, '.gbrain', 'brain.pglite'), + embedding_dimensions: 1536, + }) + '\n', + ); + + // Pin the bootstrap token via env so the test doesn't have to scrape it + // out of the startup banner (and the banner stays predictable across + // future formatting tweaks). + const bootstrapToken = 'test-bootstrap-token-aaaaaaaaaaaaaaaaaa'; // 41 chars + const port = pickPort(); + + // CRITICAL: cwd is the tmpdir, NOT the repo. This forces serve-http to + // fall into the embedded-manifest branch because cwd/admin/dist does + // not exist. The pre-fix code would 404 here; the fix serves from the + // bundled assets via Bun's `with { type: 'file' }` import resolution. + const proc = Bun.spawn( + [ + 'bun', + 'run', + `${REPO}/src/cli.ts`, + 'serve', + '--http', + '--port', + String(port), + '--bind', + '127.0.0.1', + ], + { + cwd: home, + env: { + ...process.env, + HOME: home, + GBRAIN_HOME: home, + GBRAIN_ADMIN_BOOTSTRAP_TOKEN: bootstrapToken, + // Don't let test-process inherit any auth keys it doesn't need. + OPENAI_API_KEY: '', + ANTHROPIC_API_KEY: '', + }, + stdout: 'pipe', + stderr: 'pipe', + }, + ); + + // Wait for readiness by polling /health. Bun's readable streams don't + // give us a synchronous "stderr line" API and the startup banner format + // is allowed to drift; a /health probe is the contract that matters. + const deadline = Date.now() + 30_000; + let ready = false; + while (Date.now() < deadline) { + try { + const res = await fetch(`http://127.0.0.1:${port}/health`, { + signal: AbortSignal.timeout(2000), + }); + if (res.ok) { + ready = true; + break; + } + } catch { /* not ready yet */ } + await new Promise(r => setTimeout(r, 250)); + } + + const cleanup = async () => { + try { proc.kill('SIGTERM'); } catch { /* already exited */ } + // Give it 2s to exit cleanly, then SIGKILL. + await Promise.race([ + proc.exited, + new Promise(r => setTimeout(r, 2000)), + ]); + try { proc.kill('SIGKILL'); } catch { /* already gone */ } + try { rmSync(home, { recursive: true, force: true }); } catch { /* best effort */ } + }; + + if (!ready) { + // Capture some diagnostics for the failure message before tearing down. + const stderrText = await new Response(proc.stderr).text().catch(() => ''); + await cleanup(); + throw new Error( + `serve --http never became ready on port ${port} after 30s. stderr: ${stderrText.slice(0, 2000)}`, + ); + } + + return { proc, port, home, bootstrapToken, cleanup }; +} + +describe('admin embed E2E — /admin served from embedded manifest (v0.36.1.x #1090)', () => { + test('GET /admin/ returns 200 with the React SPA shell HTML', async () => { + const s = await spawnServer(); + try { + const res = await fetch(`http://127.0.0.1:${s.port}/admin/`, { + signal: AbortSignal.timeout(5000), + }); + expect(res.status).toBe(200); + const html = await res.text(); + // The actual admin/dist/index.html declares GBrain Admin + // and mounts the SPA on
. Both must be present, otherwise + // we're not serving the embedded asset. + expect(html).toContain('GBrain Admin'); + expect(html).toContain('
'); + // Content-Type is text/html, not application/octet-stream (which would + // mean the mime lookup in ADMIN_ASSETS regressed). + expect(res.headers.get('content-type') ?? '').toMatch(/text\/html/); + } finally { + await s.cleanup(); + } + }, 90_000); + + test('GET /admin/index.html (explicit path) also returns the SPA HTML', async () => { + const s = await spawnServer(); + try { + const res = await fetch(`http://127.0.0.1:${s.port}/admin/index.html`, { + signal: AbortSignal.timeout(5000), + }); + expect(res.status).toBe(200); + const html = await res.text(); + expect(html).toContain('GBrain Admin'); + } finally { + await s.cleanup(); + } + }, 90_000); + + test('GET /admin/agents (SPA-routed deep link) falls back to index.html', async () => { + const s = await spawnServer(); + try { + const res = await fetch(`http://127.0.0.1:${s.port}/admin/agents`, { + signal: AbortSignal.timeout(5000), + }); + expect(res.status).toBe(200); + const html = await res.text(); + // SPA fallback: any unmatched /admin/* path serves index.html so + // client-side routing takes over. + expect(html).toContain('GBrain Admin'); + expect(html).toContain('
'); + } finally { + await s.cleanup(); + } + }, 90_000); + + test('GET /admin/api/stats (API route) is NOT swallowed by the SPA fallback — returns auth challenge', async () => { + const s = await spawnServer(); + try { + const res = await fetch(`http://127.0.0.1:${s.port}/admin/api/stats`, { + signal: AbortSignal.timeout(5000), + }); + // No session cookie → 401/403 from requireAdmin, NOT 200 + HTML. + // The regression we guard against: SPA fallback grabbing /admin/api/* + // would silently return HTML to a JSON client and break the dashboard. + expect(res.status).not.toBe(200); + const body = await res.text().catch(() => ''); + expect(body).not.toContain('
'); + } finally { + await s.cleanup(); + } + }, 90_000); +}); diff --git a/test/ai/adaptive-embed-batch.test.ts b/test/ai/adaptive-embed-batch.test.ts index 6cd8d06ab..8ad53df0d 100644 --- a/test/ai/adaptive-embed-batch.test.ts +++ b/test/ai/adaptive-embed-batch.test.ts @@ -23,9 +23,9 @@ * after SHRINK_HEAL_AFTER successes the factor heals back toward the * recipe-declared safety_factor. * - * 7. Startup warning (D9-B) — gateway construction warns once per recipe - * with an embedding touchpoint missing max_batch_tokens (excluding the - * OpenAI canonical fast-path recipe). + * 7. Startup warning (D9-B) — gateway construction warns once for the + * configured embedding recipe when it is missing max_batch_tokens + * (excluding the OpenAI canonical fast-path recipe). */ import { afterEach, beforeEach, describe, expect, mock, test } from 'bun:test'; @@ -76,6 +76,14 @@ function configureOpenAI(): void { }); } +function configureGoogle(): void { + configureGateway({ + embedding_model: 'google:gemini-embedding-001', + embedding_dimensions: 768, + env: { GOOGLE_GENERATIVE_AI_API_KEY: 'fake' }, + }); +} + // --------- 1. Pure helpers --------- describe('splitByTokenBudget (pure helper)', () => { @@ -360,16 +368,18 @@ describe('shrink-on-miss adaptive cache', () => { describe('startup warning for recipes missing max_batch_tokens', () => { beforeEach(() => resetGateway()); - test('first configureGateway call warns about each missing-cap recipe; subsequent calls suppressed', () => { + test('configured missing-cap recipe warns once; unrelated recipes stay quiet', () => { const warnings: string[] = []; const original = console.warn; console.warn = (msg: string) => warnings.push(String(msg)); try { configureOpenAI(); + expect(warnings.length).toBe(0); + configureGoogle(); const firstCallCount = warnings.length; // Reconfigure: the warning should NOT re-fire for the same recipes // within one process (we already told the operator). - configureOpenAI(); + configureGoogle(); expect(warnings.length).toBe(firstCallCount); } finally { console.warn = original; @@ -379,12 +389,13 @@ describe('startup warning for recipes missing max_batch_tokens', () => { const contractMatch = warnings.filter(w => w.includes('[ai.gateway]') && w.includes('declares an embedding touchpoint'), ); - expect(contractMatch.length).toBeGreaterThan(0); + expect(contractMatch.length).toBe(1); // Voyage declares max_batch_tokens → suppressed. OpenAI is the // canonical fast-path recipe → also suppressed by id. Both must be // absent from the warnings. expect(warnings.find(w => w.includes('"voyage"'))).toBeUndefined(); expect(warnings.find(w => w.includes('"openai"'))).toBeUndefined(); + expect(warnings.find(w => w.includes('"google"'))).toBeDefined(); }); }); diff --git a/test/ai/gateway.test.ts b/test/ai/gateway.test.ts index c99b2fcb0..a97fc6cfe 100644 --- a/test/ai/gateway.test.ts +++ b/test/ai/gateway.test.ts @@ -378,3 +378,74 @@ describe('Voyage flexible-dim runtime validation', () => { expect(caught?.fix).toContain('2048'); }); }); + +describe('embedding response integrity', () => { + beforeEach(() => resetGateway()); + + test('rejects partial embedding responses instead of silently dropping rows', async () => { + const originalFetch = globalThis.fetch; + globalThis.fetch = (async () => new Response(JSON.stringify({ + object: 'list', + data: [ + { + object: 'embedding', + index: 0, + embedding: new Array(1536).fill(0.01), + }, + ], + model: 'text-embedding-3-large', + usage: { prompt_tokens: 3, total_tokens: 3 }, + }), { + status: 200, + headers: { 'content-type': 'application/json' }, + })) as unknown as typeof fetch; + + try { + configureGateway({ + embedding_model: 'openai:text-embedding-3-large', + embedding_dimensions: 1536, + env: { OPENAI_API_KEY: 'openai-fake' }, + }); + + await expect(embed(['first', 'second'])).rejects.toThrow('1 embedding(s) for 2 input(s)'); + } finally { + globalThis.fetch = originalFetch; + } + }); + + test('checks every returned vector dimension, not just the first one', async () => { + const originalFetch = globalThis.fetch; + globalThis.fetch = (async () => new Response(JSON.stringify({ + object: 'list', + data: [ + { + object: 'embedding', + index: 0, + embedding: new Array(1536).fill(0.01), + }, + { + object: 'embedding', + index: 1, + embedding: new Array(768).fill(0.01), + }, + ], + model: 'text-embedding-3-large', + usage: { prompt_tokens: 3, total_tokens: 3 }, + }), { + status: 200, + headers: { 'content-type': 'application/json' }, + })) as unknown as typeof fetch; + + try { + configureGateway({ + embedding_model: 'openai:text-embedding-3-large', + embedding_dimensions: 1536, + env: { OPENAI_API_KEY: 'openai-fake' }, + }); + + await expect(embed(['first', 'second'])).rejects.toThrow('returned 768 but schema expects 1536'); + } finally { + globalThis.fetch = originalFetch; + } + }); +}); diff --git a/test/ai/no-batch-cap-suppression.serial.test.ts b/test/ai/no-batch-cap-suppression.serial.test.ts index 9bd3e69b7..1241b3b5a 100644 --- a/test/ai/no-batch-cap-suppression.serial.test.ts +++ b/test/ai/no-batch-cap-suppression.serial.test.ts @@ -52,14 +52,27 @@ describe('v0.32 #779: no_batch_cap suppresses the missing-max_batch_tokens warni } }); - test('configureGateway STILL warns for google (real provider, no cap declared)', () => { + test('configureGateway warns for google only when google embedding is configured', () => { warnSpy.mockClear(); resetGateway(); configureGateway({ env: {} }); - const messages = warnSpy.mock.calls.map(c => String(c[0] ?? '')); + let messages = warnSpy.mock.calls.map(c => String(c[0] ?? '')); + expect( + messages.some(m => m.includes('"google"') && m.includes('without max_batch_tokens')), + 'google should not warn while OpenAI default is configured', + ).toBe(false); + + warnSpy.mockClear(); + resetGateway(); + configureGateway({ + embedding_model: 'google:gemini-embedding-001', + embedding_dimensions: 768, + env: { GOOGLE_GENERATIVE_AI_API_KEY: 'fake' }, + }); + messages = warnSpy.mock.calls.map(c => String(c[0] ?? '')); expect( messages.some(m => m.includes('"google"') && m.includes('without max_batch_tokens')), - 'google should warn (it has fixed-cap models)', + 'google should warn when configured because it has fixed-cap models', ).toBe(true); }); diff --git a/test/apply-migrations-pglite-spawn.serial.test.ts b/test/apply-migrations-pglite-spawn.serial.test.ts new file mode 100644 index 000000000..7e5e6e2c2 --- /dev/null +++ b/test/apply-migrations-pglite-spawn.serial.test.ts @@ -0,0 +1,158 @@ +/** + * v0.36.1.x #1100: PGLite + `gbrain apply-migrations` chain spawn test. + * + * Spawns `gbrain init --migrate-only` followed by `gbrain apply-migrations + * --yes --non-interactive` against a fresh tmpdir, asserts the full + * migration chain walks to head without wedging on the v0.11.0 Minions + * phase A subprocess deadlock. + * + * Pre-fix, this exact sequence hit `GBrain: Timed out waiting for PGLite + * lock` because: + * 1. apply-migrations pre-flight schema-version probe held the + * single-writer lock briefly and raced the v0.11.0 subprocess. + * 2. v0.11.0 phase A spawned `gbrain init --migrate-only` as a child; + * the child inherited HOME and tried to acquire the same lock. + * + * The fix routes phase A in-process for PGLite and skips the pre-flight + * probe on PGLite (the warning is non-essential there). No DATABASE_URL + * needed; runs in standard unit CI. + * + * Single-test design: every `bun run /src/cli.ts` from a tmpdir + * cwd pays a cold parse/transpile cost (no near-cwd .bun cache). On + * Ubuntu CI that's ~10-20s per spawn. Consolidating into one test with + * one shared tmpdir keeps wall-clock under the runner's default timeout. + * + * Serial because it spawns subprocesses + writes a tmpdir. + */ +import { describe, test, expect } from 'bun:test'; +import { mkdtempSync, mkdirSync, writeFileSync, rmSync, existsSync, chmodSync } from 'fs'; +import { join } from 'path'; +import { tmpdir } from 'os'; + +const REPO = new URL('..', import.meta.url).pathname.replace(/\/$/, ''); + +/** + * Make a shim `gbrain` binary that routes to `bun run /src/cli.ts`. + * + * The v0.11.0 orchestrator chain spawns subprocesses via `execSync('gbrain + * jobs smoke')` and `execSync('gbrain init --migrate-only')` (the Postgres + * path; PGLite now routes in-process, but phase B's smoke still shells out). + * On a developer machine `gbrain` resolves via `bun link`; on CI it + * doesn't exist on PATH and execSync fails with "command not found", + * propagating up as an orchestrator failure. The shim avoids the global- + * install dependency. + */ +function makeGbrainShim(): { binDir: string; cleanup: () => void } { + const binDir = mkdtempSync(join(tmpdir(), 'gbrain-shim-')); + const shimPath = join(binDir, 'gbrain'); + writeFileSync( + shimPath, + `#!/bin/sh\nexec bun run ${REPO}/src/cli.ts "$@"\n`, + { mode: 0o755 }, + ); + chmodSync(shimPath, 0o755); + return { + binDir, + cleanup: () => { + try { rmSync(binDir, { recursive: true, force: true }); } catch { /* best effort */ } + }, + }; +} + +async function runCli( + args: string[], + env: Record, + timeoutMs: number, +): Promise<{ exitCode: number; stdout: string; stderr: string }> { + const proc = Bun.spawn(['bun', 'run', `${REPO}/src/cli.ts`, ...args], { + cwd: REPO, + env: { ...process.env, ...env }, + stdout: 'pipe', + stderr: 'pipe', + }); + const killer = setTimeout(() => { + try { proc.kill('SIGKILL'); } catch { /* already dead */ } + }, timeoutMs); + try { + const [stdout, stderr, exitCode] = await Promise.all([ + new Response(proc.stdout).text(), + new Response(proc.stderr).text(), + proc.exited, + ]); + return { exitCode, stdout, stderr }; + } finally { + clearTimeout(killer); + } +} + +describe('apply-migrations on fresh PGLite (v0.36.1.x #1100)', () => { + // ONE test, ONE brain, ONE end-to-end pass through the lifecycle. The + // per-spawn cold-start on Ubuntu CI (~10-20s) is the dominant cost; we + // pay it 4 times here, not 8. + test('init --migrate-only → apply-migrations --yes → re-run → --list (all exit 0)', async () => { + const home = mkdtempSync(join(tmpdir(), 'gbrain-pglite-spawn-')); + const shim = makeGbrainShim(); + try { + mkdirSync(join(home, '.gbrain'), { recursive: true }); + writeFileSync( + join(home, '.gbrain', 'config.json'), + JSON.stringify({ + engine: 'pglite', + database_path: join(home, '.gbrain', 'brain.pglite'), + embedding_dimensions: 1536, + }) + '\n', + ); + // PATH shim so orchestrator phase-B execSync('gbrain jobs smoke') + // and similar resolve to our shim instead of requiring a global + // install. This matches the contract users hit in production + // (gbrain on PATH) without depending on `bun link` having run. + const env = { + HOME: home, + GBRAIN_HOME: home, + PATH: `${shim.binDir}:${process.env.PATH ?? ''}`, + }; + + // Step 1: init --migrate-only seeds the schema. Pre-fix on PGLite this + // worked but the next step then deadlocked. + const init = await runCli(['init', '--migrate-only'], env, 90_000); + expect(init.exitCode).toBe(0); + expect(init.stdout + init.stderr).toMatch(/Schema up to date|migration\(s\) applied/); + + // Step 2: apply-migrations --yes runs the orchestrator chain. Pre-fix + // this wedged on v0.11.0 phase A with the PGLite lock timeout. + const apply = await runCli( + ['apply-migrations', '--yes', '--non-interactive'], + env, + 180_000, + ); + if (apply.exitCode !== 0) { + // Dump for CI triage — local repro passes; this surfaces the + // Ubuntu-specific failure mode (probably env-related: BUN_INSTALL, + // HOME-relative path, or a PGLite WASM quirk). + console.error('--- apply-migrations stdout ---\n' + apply.stdout); + console.error('--- apply-migrations stderr ---\n' + apply.stderr); + console.error('--- init stdout ---\n' + init.stdout); + console.error('--- init stderr ---\n' + init.stderr); + } + expect(apply.exitCode).toBe(0); + const applyOut = apply.stdout + apply.stderr; + expect(applyOut).not.toMatch(/Timed out waiting for PGLite lock/); + expect(applyOut).not.toMatch(/Phase A \(schema\) failed/); + expect(existsSync(join(home, '.gbrain', 'brain.pglite'))).toBe(true); + + // Step 3: re-run is idempotent — "All migrations up to date" must exit + // 0, not fall through to implicit non-zero (the #1062 fix path). + const second = await runCli(['apply-migrations', '--yes', '--non-interactive'], env, 90_000); + expect(second.exitCode).toBe(0); + expect(second.stdout + second.stderr).toMatch(/All migrations up to date|up to date/); + + // Step 4: --list exits 0 (third leg of the #1062 contract). + const list = await runCli(['apply-migrations', '--list'], env, 60_000); + expect(list.exitCode).toBe(0); + expect(list.stdout + list.stderr).toMatch(/applied|pending|migration/i); + } finally { + try { rmSync(home, { recursive: true, force: true }); } catch { /* best effort */ } + shim.cleanup(); + } + }, 480_000); +}); diff --git a/test/apply-migrations.test.ts b/test/apply-migrations.test.ts index cbbdf3ef7..223c28d50 100644 --- a/test/apply-migrations.test.ts +++ b/test/apply-migrations.test.ts @@ -166,3 +166,17 @@ describe('buildPlan — diff against completed + installed VERSION', () => { expect(plan.skippedFuture).toEqual([]); }); }); + +// v0.36.1.x (cherry-pick #1062): list, dry-run, and "all migrations up to +// date" paths must exit 0 so shell scripts gating on the exit code work. +// Pre-fix, these `return` statements left the CLI dispatcher's implicit +// non-zero exit code in place when callers checked $?. +describe('runApplyMigrations exit codes (v0.36.1.x #1062)', () => { + test('source contains process.exit(0) on list/dry-run/up-to-date branches', async () => { + const { readFileSync } = await import('fs'); + const src = readFileSync('src/commands/apply-migrations.ts', 'utf8'); + expect(src).toMatch(/cli\.list\s*\)\s*\{\s*printList\(plan,\s*installed\);\s*process\.exit\(0\);/); + expect(src).toMatch(/cli\.dryRun\s*\)\s*\{\s*printDryRun\(plan,\s*installed\);\s*process\.exit\(0\);/); + expect(src).toMatch(/All migrations up to date[\s\S]{0,80}process\.exit\(0\)/); + }); +}); diff --git a/test/autopilot-cycle-failure-classification.test.ts b/test/autopilot-cycle-failure-classification.test.ts new file mode 100644 index 000000000..905e02c37 --- /dev/null +++ b/test/autopilot-cycle-failure-classification.test.ts @@ -0,0 +1,108 @@ +/** + * test/autopilot-cycle-failure-classification.test.ts + * + * Regression guards for two design-bugs that interact under + * `KeepAlive=true` autopilot to produce a respawn storm: + * + * 1. autopilot.ts inline-cycle path treated `report.status === 'partial'` + * as a circuit-breaker trip. `partial` is documented in CycleReport + * as "at least one phase warned or failed, others ran" — a soft + * signal, not a fatal condition. 5 consecutive 'partial' cycles + * killed the autopilot. + * + * 2. runPhaseOrphans used a fixed absolute threshold `count > 20` to + * emit `status: 'warn'`. For any non-trivial brain (~hundreds of + * pages) that fires every cycle in steady state, which — + * combined with bug #1 — caused the autopilot to give up after + * 5 normal cycles. Threshold is now ratio-based: warn only if more + * than half the corpus is orphaned. + * + * Both fixes are tiny but the production blast-radius is large + * (LaunchAgent respawn loop, 9 MB error log in 2h), so they get + * explicit guards. + * + * Source-level guards follow the established pattern from + * `cycle-abort.test.ts` (autopilot loop is hard to unit-test without + * a full engine, so we assert on source text). + */ + +import { describe, test, expect } from 'bun:test'; +import { readFileSync } from 'fs'; + +const autopilotSource = readFileSync( + new URL('../src/commands/autopilot.ts', import.meta.url), + 'utf8', +); +const cycleSource = readFileSync( + new URL('../src/core/cycle.ts', import.meta.url), + 'utf8', +); + +describe("autopilot cycle-failure classification — only 'failed' trips the circuit breaker", () => { + test("the inline-cycle path does NOT treat 'partial' as cycleOk=false", () => { + // Find the inline-cycle block (the catch-and-inspect on `report`) + const inlineBlockStart = autopilotSource.indexOf("event: 'cycle-inline'"); + expect(inlineBlockStart).toBeGreaterThan(0); + + // Look at the 800 chars before the JSON event line — that's where the + // cycleOk classification happens. + const inlineContext = autopilotSource.slice( + Math.max(0, inlineBlockStart - 800), + inlineBlockStart, + ); + + // Regression guard #1: the OR-with-'partial' branch must be gone. + expect(inlineContext).not.toMatch(/report\.status\s*===\s*['"]partial['"]/); + + // Positive guard: the breaker still trips on 'failed'. + expect(inlineContext).toMatch(/report\.status\s*===\s*['"]failed['"]/); + }); + + test("the 5-consecutive-error give-up is still wired (regression-safety)", () => { + // The fix narrows the trigger; it doesn't remove the breaker. + // If someone later wholesale deletes the give-up branch we want to + // notice — autopilot still needs a way to stop after sustained + // real failures (e.g., DB unreachable for 5 cycles). + expect(autopilotSource).toMatch(/consecutiveErrors\s*\+\+/); + expect(autopilotSource).toMatch(/consecutiveErrors\s*>=\s*5/); + }); +}); + +describe('runPhaseOrphans ratio threshold — no more absolute count > 20', () => { + test('the legacy `count > 20 ? "warn" : "ok"` ternary is gone', () => { + // The literal that produced the steady-state warn-storm on any + // brain past a few hundred pages. + expect(cycleSource).not.toMatch(/count\s*>\s*20\s*\?\s*['"]warn['"]/); + }); + + test('the new threshold is ratio-based against total_pages', () => { + // Must look at result.total_pages, not a magic absolute. The + // exact comparator (0.5 today) can move, but the *kind* of + // comparison must be a ratio so big brains don't always warn. + // Slice the runPhaseOrphans function body. + const fnIdx = cycleSource.indexOf('async function runPhaseOrphans'); + expect(fnIdx).toBeGreaterThan(0); + const fnEnd = cycleSource.indexOf('\n}', fnIdx); + const fnBody = cycleSource.slice(fnIdx, fnEnd); + + // Must reference total_pages in the threshold decision. + expect(fnBody).toMatch(/total_pages/); + // Must contain a division or multiplication against total_pages + // (ratio comparison). Accept either `count / result.total_pages` or + // `count > result.total_pages * X` framing. + expect(fnBody).toMatch( + /count\s*\/\s*result\.total_pages|result\.total_pages\s*\*/, + ); + }); + + test('the warn branch still exists (we narrowed, did not remove)', () => { + // If the threshold is moved to "never warn", we want a deliberate + // signal. Today the warn path is still there — orphans > 50% of + // corpus is a real signal. This guard catches accidental removal. + const fnIdx = cycleSource.indexOf('async function runPhaseOrphans'); + const fnEnd = cycleSource.indexOf('\n}', fnIdx); + const fnBody = cycleSource.slice(fnIdx, fnEnd); + expect(fnBody).toMatch(/['"]warn['"]/); + expect(fnBody).toMatch(/['"]ok['"]/); + }); +}); diff --git a/test/autopilot-install.test.ts b/test/autopilot-install.test.ts index 4cee4425a..023b03d7d 100644 --- a/test/autopilot-install.test.ts +++ b/test/autopilot-install.test.ts @@ -78,3 +78,24 @@ describe('detectInstallTarget', () => { // existsSync + execSync which is awkward in-process. Those branches are // exercised by the E2E test (Task 14) against a stubbed host. }); + +// v0.36.1.x (cherry-pick #966): the autopilot wrapper script must source +// ~/.zshenv BEFORE ~/.zshrc. zshenv is the canonical place for env vars in +// non-interactive zsh; zshrc only fires for interactive shells, so vars +// exported in zshrc never reach the LaunchAgent subprocess. Operators who +// exported GBRAIN_DATABASE_URL or {OPENAI,ANTHROPIC}_API_KEY in zshrc and +// expected autopilot to inherit them hit silent missing-secret failures. +describe('autopilot wrapper script — env source order (v0.36.1.x #966)', () => { + test('wrapper sources ~/.zshenv before ~/.zshrc', async () => { + const { readFileSync } = await import('fs'); + const src = readFileSync('src/commands/autopilot.ts', 'utf8'); + const zshenvIdx = src.indexOf('~/.zshenv'); + const zshrcIdx = src.indexOf('~/.zshrc'); + expect(zshenvIdx).toBeGreaterThan(0); + expect(zshrcIdx).toBeGreaterThan(0); + expect(zshenvIdx).toBeLessThan(zshrcIdx); + // Both should appear inside writeWrapperScript's heredoc as `source ~/.foo` + expect(src).toMatch(/source\s+~\/\.zshenv/); + expect(src).toMatch(/source\s+~\/\.zshrc/); + }); +}); diff --git a/test/backlinks.test.ts b/test/backlinks.test.ts index a01705af7..649af1ed3 100644 --- a/test/backlinks.test.ts +++ b/test/backlinks.test.ts @@ -78,3 +78,29 @@ describe('buildBacklinkEntry', () => { expect(entry).toBe('- **2026-04-11** | Referenced in [Q1 Review](../../meetings/q1-review.md)'); }); }); + +describe('findBacklinkGaps dedupe (v0.36.x #967 regression)', () => { + test('a source page mentioning the same target N times yields one gap, not N', async () => { + const { mkdtempSync, writeFileSync, mkdirSync, rmSync } = await import('fs'); + const { tmpdir } = await import('os'); + const { join } = await import('path'); + const { findBacklinkGaps } = await import('../src/commands/backlinks.ts'); + + const root = mkdtempSync(join(tmpdir(), 'gbrain-backlinks-dedupe-')); + try { + mkdirSync(join(root, 'people')); + mkdirSync(join(root, 'meetings')); + writeFileSync(join(root, 'people/alice.md'), '# Alice'); + // Source page mentions alice three times, no Timeline yet on alice + writeFileSync( + join(root, 'meetings/standup.md'), + '# Standup\n\nWe discussed [Alice](people/alice).\nLater [Alice](people/alice) chimed in.\nFinally [[people/alice]] left.\n', + ); + const gaps = findBacklinkGaps(root); + const alicePairs = gaps.filter(g => g.targetPage === 'people/alice.md' && g.sourcePage === 'meetings/standup.md'); + expect(alicePairs.length).toBe(1); + } finally { + rmSync(root, { recursive: true, force: true }); + } + }); +}); diff --git a/test/brain-writer.test.ts b/test/brain-writer.test.ts index 5710d0bf3..ddf5b7e7a 100644 --- a/test/brain-writer.test.ts +++ b/test/brain-writer.test.ts @@ -93,14 +93,33 @@ describe('writeBrainPage', () => { } }); - test('writes .bak before mutating an existing file', () => { + test('writes a centralized .bak before mutating an existing file', () => { + // v0.36.x #902: backups land under ~/.gbrain/backups/frontmatter/... not + // next to the source file (pre-fix littered the brain tree with .bak + // files that broke gitignore expectations). The returned backupPath is + // the contract — the test asserts both the path shape and that the + // backup faithfully captures the pre-write content. const file = join(tmp, 'people', 'jane.md'); mkdirSync(join(tmp, 'people'), { recursive: true }); const original = `${fence}\ntype: person\ntitle: Old\n${fence}\n\nold`; writeFileSync(file, original); - writeBrainPage(file, `${fence}\ntype: person\ntitle: New\n${fence}\n\nnew`, { sourcePath: tmp }); - expect(existsSync(file + '.bak')).toBe(true); - expect(readFileSync(file + '.bak', 'utf8')).toBe(original); + const backupRoot = mkdtempSync(join(tmpdir(), 'gbrain-test-backups-')); + try { + const { backupPath } = writeBrainPage( + file, + `${fence}\ntype: person\ntitle: New\n${fence}\n\nnew`, + { sourcePath: tmp, backupRoot }, + ); + expect(backupPath).toBeDefined(); + // Centralized — under the test-injected backupRoot, NOT a sibling .bak. + expect(existsSync(file + '.bak')).toBe(false); + expect(backupPath!.startsWith(backupRoot + '/')).toBe(true); + expect(backupPath!.endsWith('.bak')).toBe(true); + expect(existsSync(backupPath!)).toBe(true); + expect(readFileSync(backupPath!, 'utf8')).toBe(original); + } finally { + rmSync(backupRoot, { recursive: true, force: true }); + } }); test('autoFix: true repairs nested quotes before writing', () => { diff --git a/test/cli-args.test.ts b/test/cli-args.test.ts new file mode 100644 index 000000000..0118a4ef3 --- /dev/null +++ b/test/cli-args.test.ts @@ -0,0 +1,24 @@ +import { describe, expect, test } from 'bun:test'; +import { parseOpArgs } from '../src/cli.ts'; +import { operationsByName } from '../src/core/operations.ts'; + +describe('parseOpArgs', () => { + test('--no- maps to false without consuming the next flag', () => { + const params = parseOpArgs(operationsByName.query, [ + 'freshEmbedSourceScope code source', + '--limit', + '8', + '--no-expand', + '--source-id', + 'gstack-code-repo-0e4763c9', + ]); + + expect(params).toEqual({ + query: 'freshEmbedSourceScope code source', + limit: 8, + expand: false, + source_id: 'gstack-code-repo-0e4763c9', + }); + }); +}); + diff --git a/test/config.test.ts b/test/config.test.ts index 36821f3dc..1726f261e 100644 --- a/test/config.test.ts +++ b/test/config.test.ts @@ -1,5 +1,6 @@ import { describe, test, expect } from 'bun:test'; import { readFileSync } from 'fs'; +import { isSensitiveConfigKey, redactConfigValue } from '../src/commands/config.ts'; // redactUrl is not exported, so we test it by reading the source and // reimplementing the regex to verify the pattern, then test via CLI @@ -61,3 +62,161 @@ describe('config source correctness', () => { expect(configSource).toContain('postgresql:\\/\\/'); }); }); + +describe('isSensitiveConfigKey (v0.36.x #892 regression)', () => { + test('matches common sensitive key shapes', () => { + expect(isSensitiveConfigKey('openai_api_key')).toBe(true); + expect(isSensitiveConfigKey('anthropic_api_key')).toBe(true); + expect(isSensitiveConfigKey('voyage_api_key')).toBe(true); + expect(isSensitiveConfigKey('admin_token')).toBe(true); + expect(isSensitiveConfigKey('database.password')).toBe(true); + expect(isSensitiveConfigKey('CLIENT_SECRET')).toBe(true); + expect(isSensitiveConfigKey('auth')).toBe(true); + expect(isSensitiveConfigKey('passwd')).toBe(true); + }); + + test('does NOT false-positive on lookalike substrings', () => { + // Pre-fix `.includes('key')` would have matched 'monkey' and 'parsekey'. + expect(isSensitiveConfigKey('monkey_id')).toBe(false); + expect(isSensitiveConfigKey('parsekeyword')).toBe(false); + expect(isSensitiveConfigKey('tokenize')).toBe(false); + expect(isSensitiveConfigKey('autocomplete')).toBe(false); + }); + + test('non-sensitive keys pass through', () => { + expect(isSensitiveConfigKey('search.mode')).toBe(false); + expect(isSensitiveConfigKey('sync.repo_path')).toBe(false); + expect(isSensitiveConfigKey('embedding_model')).toBe(false); + }); +}); + +describe('redactConfigValue (v0.36.x #892 — set output regression)', () => { + test('redacts sensitive keys to ***', () => { + expect(redactConfigValue('openai_api_key', 'sk-test-123')).toBe('***'); + expect(redactConfigValue('admin_token', 'eyJhbGciOiJIUzI1NiJ9')).toBe('***'); + }); + + test('redacts postgresql URL passwords regardless of key', () => { + expect(redactConfigValue('database_url', 'postgresql://u:secret@h:5432/d')) + .toBe('postgresql://u:***@h:5432/d'); + }); + + test('non-sensitive values pass through unchanged', () => { + expect(redactConfigValue('search.mode', 'balanced')).toBe('balanced'); + expect(redactConfigValue('embedding_model', 'voyage:voyage-3-large')) + .toBe('voyage:voyage-3-large'); + }); +}); + +// v0.36.1.x #1086: loadConfig translates legacy `provider` + `model` to the +// canonical `embedding_model`. Without this, Voyage/Cohere/Mistral configs +// silently fell through to OpenAI. +describe('loadConfig — legacy provider+model migration (v0.36.1.x #1086)', () => { + test('translates {provider, model} to embedding_model', async () => { + const { mkdtempSync, mkdirSync, writeFileSync, rmSync } = await import('fs'); + const { join } = await import('path'); + const { tmpdir } = await import('os'); + const { withEnv } = await import('./helpers/with-env.ts'); + const tmpHome = mkdtempSync(join(tmpdir(), 'gbrain-cfg-test-')); + try { + mkdirSync(join(tmpHome, '.gbrain'), { recursive: true }); + writeFileSync( + join(tmpHome, '.gbrain', 'config.json'), + JSON.stringify({ engine: 'pglite', database_path: '/tmp/x', provider: 'voyage', model: 'voyage-4-large' }), + ); + await withEnv({ GBRAIN_HOME: tmpHome }, async () => { + const { loadConfig } = await import('../src/core/config.ts'); + const cfg = loadConfig(); + expect(cfg).not.toBeNull(); + expect(cfg!.embedding_model).toBe('voyage:voyage-4-large'); + expect((cfg as unknown as Record).provider).toBeUndefined(); + expect((cfg as unknown as Record).model).toBeUndefined(); + }); + } finally { + rmSync(tmpHome, { recursive: true, force: true }); + } + }); + + test('canonical embedding_model wins over legacy provider+model when both present', async () => { + const { mkdtempSync, mkdirSync, writeFileSync, rmSync } = await import('fs'); + const { join } = await import('path'); + const { tmpdir } = await import('os'); + const { withEnv } = await import('./helpers/with-env.ts'); + const tmpHome = mkdtempSync(join(tmpdir(), 'gbrain-cfg-test-')); + try { + mkdirSync(join(tmpHome, '.gbrain'), { recursive: true }); + writeFileSync( + join(tmpHome, '.gbrain', 'config.json'), + JSON.stringify({ + engine: 'pglite', + database_path: '/tmp/x', + embedding_model: 'openai:text-embedding-3-large', + provider: 'voyage', + model: 'voyage-4-large', + }), + ); + await withEnv({ GBRAIN_HOME: tmpHome }, async () => { + const { loadConfig } = await import('../src/core/config.ts'); + const cfg = loadConfig(); + expect(cfg!.embedding_model).toBe('openai:text-embedding-3-large'); + }); + } finally { + rmSync(tmpHome, { recursive: true, force: true }); + } + }); + + test('config without provider+model is untouched', async () => { + const { mkdtempSync, mkdirSync, writeFileSync, rmSync } = await import('fs'); + const { join } = await import('path'); + const { tmpdir } = await import('os'); + const { withEnv } = await import('./helpers/with-env.ts'); + const tmpHome = mkdtempSync(join(tmpdir(), 'gbrain-cfg-test-')); + try { + mkdirSync(join(tmpHome, '.gbrain'), { recursive: true }); + writeFileSync( + join(tmpHome, '.gbrain', 'config.json'), + JSON.stringify({ engine: 'pglite', database_path: '/tmp/x', embedding_model: 'voyage:voyage-3-large' }), + ); + await withEnv({ GBRAIN_HOME: tmpHome }, async () => { + const { loadConfig } = await import('../src/core/config.ts'); + const cfg = loadConfig(); + expect(cfg!.embedding_model).toBe('voyage:voyage-3-large'); + }); + } finally { + rmSync(tmpHome, { recursive: true, force: true }); + } + }); +}); + +// v0.36.1.x #1019 (cherry-pick #1083): configDir uses path.isAbsolute and +// dual-separator '..' rejection so Windows paths are accepted. +describe('configDir — GBRAIN_HOME Windows path acceptance (v0.36.1.x #1019)', () => { + test('relative paths are rejected with an absolute-path error', async () => { + const { withEnv } = await import('./helpers/with-env.ts'); + await withEnv({ GBRAIN_HOME: 'relative/path' }, async () => { + const { configDir } = await import('../src/core/config.ts'); + expect(() => configDir()).toThrow(/absolute/); + }); + }); + + test("'..' segments rejected on POSIX-style absolute paths", async () => { + const { withEnv } = await import('./helpers/with-env.ts'); + await withEnv({ GBRAIN_HOME: '/tmp/foo/../bar' }, async () => { + const { configDir } = await import('../src/core/config.ts'); + expect(() => configDir()).toThrow(/'..' segments/); + }); + }); + + test("'..' segments rejected via backslash separator (Windows path shape)", async () => { + // The dual-separator split is the regression we lock in. On a POSIX + // host, `path.isAbsolute('C:\\foo')` returns false (no drive letters), + // so we use a forward-slash-prefixed path that also contains a + // backslash '..' segment — that's the case where the pre-fix + // single-separator split would have let it through. + const { withEnv } = await import('./helpers/with-env.ts'); + await withEnv({ GBRAIN_HOME: '/tmp/foo\\..\\bar' }, async () => { + const { configDir } = await import('../src/core/config.ts'); + expect(() => configDir()).toThrow(/'..' segments/); + }); + }); +}); diff --git a/test/core/cycle.serial.test.ts b/test/core/cycle.serial.test.ts index 72fbc72c3..5ad2b1d30 100644 --- a/test/core/cycle.serial.test.ts +++ b/test/core/cycle.serial.test.ts @@ -166,10 +166,16 @@ describe('runCycle — dryRun propagates to every phase', () => { expect(embedCalls.at(-1)?.dryRun).toBe(true); }); - test('dryRun:false writes in every phase', async () => { + test('dryRun:false does not let maintenance append generated backlinks to tracked pages', async () => { await runCycle(sharedEngine,{ brainDir: '/tmp/brain', dryRun: false }); expect(lintCalls.at(-1)?.dryRun).toBe(false); + // Maintenance should audit backlink gaps but not run the legacy fixer that + // appends "Referenced in" timeline entries into entity pages. The graph + // extractor/auto-link path is the canonical link store; filesystem backlink + // fixes are still available through `gbrain check-backlinks fix` when a + // human explicitly asks for them. + expect(backlinksCalls.at(-1)?.action).toBe('check'); expect(backlinksCalls.at(-1)?.dryRun).toBe(false); expect(syncCalls.at(-1)?.dryRun).toBe(false); expect(embedCalls.at(-1)?.dryRun).toBe(false); diff --git a/test/doctor-child-orphans.test.ts b/test/doctor-child-orphans.test.ts new file mode 100644 index 000000000..4c1548675 --- /dev/null +++ b/test/doctor-child-orphans.test.ts @@ -0,0 +1,145 @@ +/** + * Test: `childTableOrphansCheck` (closes #1063). + * + * Pure-helper test surface — the check function only consumes `engine.executeRaw`, + * so a structurally-typed mock satisfies the contract. This is faster and more + * focused than spinning up real PGLite, AND it doesn't require disabling FK + * constraints to seed orphans (which PGLite blocks at INSERT time anyway via + * the normal FK trigger). + * + * 10 FK-child tables checked. Six have NOT NULL FKs (orphan = any value not in + * pages.id). Four — wait, two — have nullable FKs (`files.page_id`, `links.origin_page_id` + * declared ON DELETE SET NULL); orphan check skips NULL via the `IS NOT NULL` filter. + */ + +import { describe, test, expect } from 'bun:test'; +import { childTableOrphansCheck } from '../src/commands/doctor.ts'; +import type { BrainEngine } from '../src/core/engine.ts'; + +/** Build a structurally-typed BrainEngine whose executeRaw returns per-SQL results. */ +function makeMockEngine(handler: (sql: string) => Promise): BrainEngine { + return { + executeRaw: handler, + } as unknown as BrainEngine; +} + +describe('childTableOrphansCheck (#1063)', () => { + test('all clean → status:ok with "10 tables checked"', async () => { + const engine = makeMockEngine(async () => [{ n: 0 }]); + const result = await childTableOrphansCheck(engine); + expect(result.name).toBe('child_table_orphans'); + expect(result.status).toBe('ok'); + expect(result.message).toContain('10 tables checked'); + }); + + test('one orphan in content_chunks → status:warn with paste-ready cleanup SQL', async () => { + const engine = makeMockEngine(async (sql: string) => { + if (sql.includes('content_chunks')) return [{ n: 5 }]; + return [{ n: 0 }]; + }); + const result = await childTableOrphansCheck(engine); + expect(result.status).toBe('warn'); + expect(result.message).toContain('5 orphan row(s)'); + expect(result.message).toContain('content_chunks.page_id=5'); + expect(result.message).toContain('DELETE FROM content_chunks WHERE page_id NOT IN (SELECT id FROM pages)'); + }); + + test('orphans in multiple tables → aggregated breakdown + multi-line cleanup', async () => { + const engine = makeMockEngine(async (sql: string) => { + if (sql.includes('content_chunks')) return [{ n: 234 }]; + if (sql.includes('page_versions')) return [{ n: 12 }]; + if (sql.includes('FROM tags')) return [{ n: 3 }]; + return [{ n: 0 }]; + }); + const result = await childTableOrphansCheck(engine); + expect(result.status).toBe('warn'); + expect(result.message).toContain('249 orphan row(s)'); // 234 + 12 + 3 + expect(result.message).toContain('content_chunks.page_id=234'); + expect(result.message).toContain('page_versions.page_id=12'); + expect(result.message).toContain('tags.page_id=3'); + expect(result.message).toContain('DELETE FROM content_chunks'); + expect(result.message).toContain('DELETE FROM page_versions'); + expect(result.message).toContain('DELETE FROM tags'); + }); + + test('nullable FK (files.page_id, links.origin_page_id) filters IS NOT NULL', async () => { + // Capture the actual SQL strings issued so we can pin the filter behavior + const capturedSql: string[] = []; + const engine = makeMockEngine(async (sql: string) => { + capturedSql.push(sql); + return [{ n: 0 }]; + }); + await childTableOrphansCheck(engine); + // The nullable-FK tables MUST have `IS NOT NULL AND` in their predicate + // (NULL is a valid SET NULL outcome, not an orphan). + const filesSql = capturedSql.find((s) => s.includes('FROM files WHERE')); + expect(filesSql).toBeDefined(); + expect(filesSql!).toContain('page_id IS NOT NULL AND page_id NOT IN'); + const linksOrigSql = capturedSql.find((s) => s.includes('FROM links WHERE') && s.includes('origin_page_id')); + expect(linksOrigSql).toBeDefined(); + expect(linksOrigSql!).toContain('origin_page_id IS NOT NULL AND origin_page_id NOT IN'); + // NOT-NULL FK tables MUST NOT have the IS NOT NULL filter (it'd be redundant) + const ccSql = capturedSql.find((s) => s.includes('FROM content_chunks WHERE')); + expect(ccSql).toBeDefined(); + expect(ccSql!).not.toContain('IS NOT NULL'); + }); + + test('executeRaw throws on N tables → warn with error summary, no false-ok', async () => { + // Simulate older schema where some tables don't exist (pre-v0.34 ish). + // The check must not return "ok" when it couldn't actually check. + const engine = makeMockEngine(async (sql: string) => { + if (sql.includes('synthesis_evidence') || sql.includes('timeline_entries')) { + throw new Error('relation does not exist'); + } + return [{ n: 0 }]; + }); + const result = await childTableOrphansCheck(engine); + // synthesis_evidence isn't in the target list (it FKs synthesis_page_id, not page_id) + // — but timeline_entries IS. So one error in this test. + expect(result.status).toBe('warn'); + expect(result.message).toContain('Could not check'); + expect(result.message).toContain('FK-child tables'); + }); + + test('orphans + some skipped tables → still reports orphans (errors don\'t mask findings)', async () => { + // Mixed case: real orphans in one table, older schema missing another. + // The check should report the real orphans, not get overridden by the + // "could not check" warning shape. + const engine = makeMockEngine(async (sql: string) => { + if (sql.includes('content_chunks')) return [{ n: 234 }]; + if (sql.includes('timeline_entries')) throw new Error('relation does not exist'); + return [{ n: 0 }]; + }); + const result = await childTableOrphansCheck(engine); + expect(result.status).toBe('warn'); + // The orphan branch wins over the error-only branch when both exist + expect(result.message).toContain('234 orphan row(s)'); + expect(result.message).toContain('content_chunks.page_id=234'); + }); + + test('all 10 target tables are queried (no silent drops)', async () => { + const queriedTables = new Set(); + const engine = makeMockEngine(async (sql: string) => { + // Extract `FROM ` to verify every target gets visited + const m = sql.match(/FROM (\w+) WHERE/); + if (m) queriedTables.add(m[1]); + return [{ n: 0 }]; + }); + await childTableOrphansCheck(engine); + // 10 target rows, but links appears 3 times (from/to/origin) — so 8 unique tables + const expectedTables = new Set([ + 'content_chunks', + 'page_versions', + 'tags', + 'takes', + 'raw_data', + 'timeline_entries', + 'links', + 'files', + ]); + expect(queriedTables.size).toBe(expectedTables.size); + for (const t of expectedTables) { + expect(queriedTables.has(t)).toBe(true); + } + }); +}); diff --git a/test/e2e/serve-http-oauth.test.ts b/test/e2e/serve-http-oauth.test.ts index 2e649bd74..b7f29bae2 100644 --- a/test/e2e/serve-http-oauth.test.ts +++ b/test/e2e/serve-http-oauth.test.ts @@ -254,6 +254,19 @@ describeE2E('serve-http OAuth 2.1 E2E (v0.26.1 + v0.26.2 + v0.26.3)', () => { expect(html).toContain('GBrain Admin'); }); + // v0.36.1.x #1076: GET /mcp must return 405 (Method Not Allowed) per the + // MCP Streamable HTTP spec, not 404. claude.ai + other probing clients + // distinguish "endpoint exists, no SSE channel" from "endpoint missing" + // on this status code; 404 makes them give up. + test('GET /mcp returns 405 with Allow: POST, DELETE (v0.36.1.x #1076)', async () => { + const res = await fetch(`${BASE}/mcp`, { method: 'GET' }); + expect(res.status).toBe(405); + expect(res.headers.get('Allow')).toBe('POST, DELETE'); + const body = await res.json() as { jsonrpc?: string; error?: { code?: number } }; + expect(body.jsonrpc).toBe('2.0'); + expect(body.error?.code).toBe(-32000); + }); + test('X-Forwarded-For header does not crash server', async () => { const res = await fetch(`${BASE}/health`, { headers: { 'X-Forwarded-For': '10.0.0.1, 172.16.0.1' }, diff --git a/test/extract-facts-phase.test.ts b/test/extract-facts-phase.test.ts index 5ebc32d25..3ddfc0bb8 100644 --- a/test/extract-facts-phase.test.ts +++ b/test/extract-facts-phase.test.ts @@ -286,3 +286,36 @@ describe('runExtractFacts — multi-source isolation', () => { expect(homeRows.rows[0].fact).toBe('home fact'); }); }); + +describe('runExtractFacts — empty-slugs guard (v0.36.x #1096 regression)', () => { + test('slugs:[] returns immediately without a full-brain walk', async () => { + // Seed many pages; full-walk over them would be slow and would + // populate pagesScanned > 0. With the bug, slugs:[] fell through + // to engine.getAllSlugs() and walked every seed page. + for (let i = 0; i < 5; i++) { + await putPage(`people/seed-${i}`, FACT_FENCE(`| 1 | Seed ${i} | fact | 1.0 | world | high | 2017-01-01 | | seed | |`)); + } + const r = await runExtractFacts(engine, { slugs: [] }); + expect(r.pagesScanned).toBe(0); + expect(r.factsInserted).toBe(0); + }); + + test('slugs:undefined still triggers full-brain walk (regression guard for the other side of the bug)', async () => { + await putPage('people/unscoped-walk', FACT_FENCE(`| 1 | Unscoped fact | fact | 1.0 | world | high | 2017-01-01 | | seed | |`)); + const r = await runExtractFacts(engine, {}); + expect(r.pagesScanned).toBeGreaterThan(0); + // The unscoped fact should be seen at least once + // eslint-disable-next-line @typescript-eslint/no-explicit-any + const seen = await (engine as any).db.query( + `SELECT fact FROM facts WHERE source_markdown_slug = 'people/unscoped-walk' AND fact = 'Unscoped fact'`, + ); + expect(seen.rows.length).toBeGreaterThan(0); + }); + + test('slugs:["a"] walks just the one slug, no full-brain fallback', async () => { + await putPage('people/just-this-one', FACT_FENCE(`| 1 | Just one fact | fact | 1.0 | world | high | 2017-01-01 | | seed | |`)); + await putPage('people/sibling', FACT_FENCE(`| 1 | Sibling fact | fact | 1.0 | world | high | 2017-01-01 | | seed | |`)); + const r = await runExtractFacts(engine, { slugs: ['people/just-this-one'] }); + expect(r.pagesScanned).toBe(1); + }); +}); diff --git a/test/fix-wave-structural.test.ts b/test/fix-wave-structural.test.ts new file mode 100644 index 000000000..090d1231d --- /dev/null +++ b/test/fix-wave-structural.test.ts @@ -0,0 +1,98 @@ +/** + * Structural assertions for fix-wave fixes whose behavior is best verified + * at source-shape level rather than via a runtime harness: + * + * - #1125 query drain cache writes — assert cli.ts awaits the drain after + * the query op completes. + * - #1090 admin embed — assert the two-tier resolution (cwd path + embedded + * manifest fallback) is in serve-http.ts and consumes ADMIN_ASSETS. + * - #1077 admin register-client PKCE — assert the admin endpoint honors + * grantTypes / redirectUris / tokenEndpointAuthMethod from the body. + * - #1100 PGLite phaseASchema — assert the v0.11.0 orchestrator routes + * in-process when the engine is pglite (not via execSync subprocess). + * - #1124 query no-expand — assert the parseOpArgs negation logic exists. + * + * Source-grep regression tests are the right tool when the rule is "this + * specific line shape must stay present"; a behavioral test would either + * duplicate what an E2E covers or require heavy mocking that hides the + * regression behind a test seam. + */ +import { describe, test, expect } from 'bun:test'; +import { readFileSync } from 'fs'; + +describe('v0.36.1.x #1125 — query drain cache writes before CLI exit', () => { + test("cli.ts awaits awaitPendingSearchCacheWrites for the 'query' op", () => { + const src = readFileSync('src/cli.ts', 'utf8'); + // Sequence: 'query' op match → import the drain → await + expect(src).toMatch(/op\.name\s*===\s*'query'[\s\S]{0,200}awaitPendingSearchCacheWrites/); + expect(src).toMatch(/await\s+awaitPendingSearchCacheWrites\(\)/); + }); + + test('hybrid.ts exports the drain helper + trackCacheWrite', () => { + const src = readFileSync('src/core/search/hybrid.ts', 'utf8'); + expect(src).toMatch(/export async function awaitPendingSearchCacheWrites/); + expect(src).toMatch(/pendingCacheWrites\.add\(promise\)/); + expect(src).toMatch(/trackCacheWrite\(/); + }); +}); + +describe('v0.36.1.x #1090 — admin embed two-tier resolution', () => { + test('serve-http.ts uses ADMIN_ASSETS manifest when admin/dist is not next to cwd', () => { + const src = readFileSync('src/commands/serve-http.ts', 'utf8'); + expect(src).toMatch(/import\(['"]\.\.\/admin-embedded/); + expect(src).toMatch(/ADMIN_ASSETS/); + expect(src).toMatch(/ADMIN_INDEX_HTML/); + // Two-tier: dev path (cwd-relative admin/dist) AND embedded manifest fallback + expect(src).toMatch(/useDevPath/); + }); + + test('src/admin-embedded.ts is auto-generated with file: imports', () => { + const src = readFileSync('src/admin-embedded.ts', 'utf8'); + expect(src).toMatch(/AUTO-GENERATED/); + expect(src).toMatch(/with \{ type: 'file' \}/); + expect(src).toMatch(/export const ADMIN_ASSETS/); + expect(src).toMatch(/export const ADMIN_INDEX_HTML/); + }); + + test('build script + CI guard exist', () => { + const buildSrc = readFileSync('scripts/build-admin-embedded.ts', 'utf8'); + expect(buildSrc).toMatch(/walk\(DIST/); + expect(buildSrc).toMatch(/with \{ type: 'file' \}/); + const guard = readFileSync('scripts/check-admin-embedded.sh', 'utf8'); + expect(guard).toMatch(/git diff --exit-code -- src\/admin-embedded\.ts/); + }); +}); + +describe('v0.36.1.x #1077 — admin register-client supports PKCE public clients', () => { + test('admin endpoint reads grantTypes / redirectUris / tokenEndpointAuthMethod from request body', () => { + const src = readFileSync('src/commands/serve-http.ts', 'utf8'); + // Single destructure line includes all three fields + expect(src).toMatch(/const\s+\{\s*name,\s*scopes,\s*tokenTtl,\s*grantTypes,\s*redirectUris,\s*tokenEndpointAuthMethod\s*\}\s*=\s*req\.body/); + // PKCE branch NULLs client_secret_hash + sets auth method to 'none' + expect(src).toMatch(/tokenEndpointAuthMethod\s*===\s*'none'/); + expect(src).toMatch(/client_secret_hash\s*=\s*NULL,\s*token_endpoint_auth_method\s*=\s*'none'/); + }); +}); + +describe('v0.36.1.x #1100 — PGLite v0.11.0 phaseASchema routes in-process', () => { + test('phaseASchema branches on pglite and calls initSchema directly', () => { + const src = readFileSync('src/commands/migrations/v0_11_0.ts', 'utf8'); + expect(src).toMatch(/cfg\?\.engine\s*===\s*'pglite'/); + expect(src).toMatch(/eng\.initSchema\(\)/); + expect(src).toMatch(/await\s+phaseASchema/); + }); + + test('apply-migrations skips pre-flight schema-version probe on PGLite', () => { + const src = readFileSync('src/commands/apply-migrations.ts', 'utf8'); + expect(src).toMatch(/skipPreflight\s*=\s*cfg\.engine\s*===\s*'pglite'/); + }); +}); + +describe('v0.36.1.x #1124 — query --no-expand actually negates expand', () => { + test("cli.ts parseOpArgs handles --no- as boolean negation", () => { + const src = readFileSync('src/cli.ts', 'utf8'); + expect(src).toMatch(/arg\.startsWith\(['"]--no-['"]\)/); + expect(src).toMatch(/positiveDef\?\.type\s*===\s*'boolean'/); + expect(src).toMatch(/params\[positiveKey\]\s*=\s*false/); + }); +}); diff --git a/test/frontmatter-cli.test.ts b/test/frontmatter-cli.test.ts index 22968c6cf..598f4aab7 100644 --- a/test/frontmatter-cli.test.ts +++ b/test/frontmatter-cli.test.ts @@ -7,10 +7,11 @@ import { spawnSync } from 'child_process'; const fence = '---'; const CLI = ['run', 'src/cli.ts', 'frontmatter']; -function runCli(args: string[]): { stdout: string; stderr: string; code: number } { - const result = spawnSync('bun', [...CLI, ...args], { +function runCli(args: string[], env: Record = {}): { stdout: string; stderr: string; code: number } { + const result = spawnSync(process.execPath, [...CLI, ...args], { encoding: 'utf8', cwd: process.cwd(), + env: { ...process.env, ...env }, }); return { stdout: result.stdout ?? '', @@ -75,24 +76,32 @@ describe('gbrain frontmatter CLI (B4)', () => { expect(code).toBe(0); }); - test('validate --fix writes .bak and rewrites in place', () => { + test('validate --fix writes centralized backup and rewrites in place', () => { const f = join(tmp, 'broken.md'); + const gbrainHome = join(tmp, 'home'); const original = `${fence}\ntype: concept\ntitle: "P "I" L"\n${fence}\n\nbody`; writeFileSync(f, original); - const { code } = runCli(['validate', f, '--fix']); + const { stdout, code } = runCli(['validate', f, '--fix'], { GBRAIN_HOME: gbrainHome }); expect(code).toBe(0); - expect(existsSync(f + '.bak')).toBe(true); - expect(readFileSync(f + '.bak', 'utf8')).toBe(original); + expect(existsSync(f + '.bak')).toBe(false); + expect(stdout).toContain('centralized backups'); + const backupDir = join(gbrainHome, '.gbrain', 'backups', 'frontmatter'); + const backup = spawnSync('find', [backupDir, '-type', 'f', '-name', 'broken.md.bak'], { encoding: 'utf8' }); + const backupPath = backup.stdout.trim().split('\n').filter(Boolean)[0]; + expect(backupPath).toBeTruthy(); + expect(readFileSync(backupPath, 'utf8')).toBe(original); expect(readFileSync(f, 'utf8')).toMatch(/^title: '.*'\s*$/m); }); test('validate --fix succeeds on a non-git path (no dirty-tree guard)', () => { // tmp is not a git repo; --fix must still work. const f = join(tmp, 'broken.md'); + const gbrainHome = join(tmp, 'home'); writeFileSync(f, `${fence}\ntype: concept\ntitle: "A "B" C"\n${fence}\n\nbody`); - const { code } = runCli(['validate', f, '--fix']); + const { code } = runCli(['validate', f, '--fix'], { GBRAIN_HOME: gbrainHome }); expect(code).toBe(0); - expect(existsSync(f + '.bak')).toBe(true); + expect(existsSync(f + '.bak')).toBe(false); + expect(existsSync(join(gbrainHome, '.gbrain', 'backups', 'frontmatter'))).toBe(true); }); test('validate scans a directory recursively, skips non-.md files', () => { @@ -107,6 +116,27 @@ describe('gbrain frontmatter CLI (B4)', () => { expect(env.total_files).toBe(2); }); + test('generate --fix skips catch-all note writes unless explicitly included', () => { + const gbrainHome = join(tmp, 'home'); + writeFileSync(join(tmp, 'random.md'), '# Random\n\nbody'); + writeFileSync(join(tmp, 'notes.md'), '# Also Random\n\nbody'); + + const skipped = runCli(['generate', tmp, '--fix', '--json'], { GBRAIN_HOME: gbrainHome }); + expect(skipped.code).toBe(0); + const skippedEnv = JSON.parse(skipped.stdout); + expect(skippedEnv.generated).toBe(0); + expect(skippedEnv.skippedCatchAll).toBe(2); + expect(readFileSync(join(tmp, 'random.md'), 'utf8')).toBe('# Random\n\nbody'); + + const included = runCli(['generate', tmp, '--fix', '--json', '--include-catch-all'], { GBRAIN_HOME: gbrainHome }); + expect(included.code).toBe(0); + const includedEnv = JSON.parse(included.stdout); + expect(includedEnv.generated).toBe(2); + expect(readFileSync(join(tmp, 'random.md'), 'utf8')).toContain('type: note'); + expect(existsSync(join(gbrainHome, '.gbrain', 'backups', 'frontmatter'))).toBe(true); + expect(existsSync(join(tmp, 'random.md.bak'))).toBe(false); + }); + test('validate missing path errors clearly', () => { const { stderr, code } = runCli(['validate', join(tmp, 'does-not-exist.md')]); expect(code).toBe(1); diff --git a/test/frontmatter-inference.test.ts b/test/frontmatter-inference.test.ts index 0a4f14e92..9d73df1f4 100644 --- a/test/frontmatter-inference.test.ts +++ b/test/frontmatter-inference.test.ts @@ -176,6 +176,25 @@ describe('inferFrontmatter', () => { expect(result.source).toBe('calendar'); }); + test('docs/runbooks: infers guide instead of catch-all note', () => { + const result = inferFrontmatter( + 'docs/runbooks/cron-management-runbook.md', + '# Cron Management Runbook\n\nUse this when changing schedules.', + ); + expect(result.type).toBe('guide'); + expect(result.tags).toContain('runbook'); + expect(result.title).toBe('Cron Management Runbook'); + }); + + test('docs/projects: infers project type', () => { + const result = inferFrontmatter( + 'docs/projects/eva-brain.md', + '# Eva Brain\n\nProject notes.', + ); + expect(result.type).toBe('project'); + expect(result.tags).toContain('project'); + }); + test('companies/ directory: type company', () => { const result = inferFrontmatter( 'companies/stripe.md', @@ -185,7 +204,7 @@ describe('inferFrontmatter', () => { expect(result.title).toBe('Stripe'); }); - test('unknown directory: defaults to note type with heading title', () => { + test('unknown directory: catch-all remains note when explicitly used by callers', () => { const result = inferFrontmatter( 'random/some-file.md', '# My Random Notes\n\nStuff here', diff --git a/test/mcp-eval-capture.test.ts b/test/mcp-eval-capture.test.ts index 490efd266..32f6311b1 100644 --- a/test/mcp-eval-capture.test.ts +++ b/test/mcp-eval-capture.test.ts @@ -36,6 +36,33 @@ beforeAll(async () => { compiled_truth: 'Alice Example for op-layer capture tests.', }; await engine.putPage('people/alice-example', page); + await engine.executeRaw( + `INSERT INTO sources (id, name) VALUES ('testsrc', 'Test Source') ON CONFLICT DO NOTHING`, + ); + await engine.putPage('notes/source-override-default', { + type: 'note', + title: 'Default Source Override', + compiled_truth: 'sourceoverrideunique belongs to the default source.', + }); + await engine.upsertChunks('notes/source-override-default', [ + { + chunk_index: 0, + chunk_text: 'sourceoverrideunique belongs to the default source.', + chunk_source: 'compiled_truth', + }, + ]); + await engine.putPage('notes/source-override-testsrc', { + type: 'note', + title: 'Test Source Override', + compiled_truth: 'sourceoverrideunique belongs to the explicit source.', + }, { sourceId: 'testsrc' }); + await engine.upsertChunks('notes/source-override-testsrc', [ + { + chunk_index: 0, + chunk_text: 'sourceoverrideunique belongs to the explicit source.', + chunk_source: 'compiled_truth', + }, + ], { sourceId: 'testsrc' }); }); afterAll(async () => { @@ -147,6 +174,23 @@ describe('op-layer capture — query', () => { const rows = await engine.listEvalCandidates(); expect(rows).toHaveLength(0); }); + + test('explicit source_id overrides ctx.sourceId for query retrieval', async () => { + const ctx = makeCtx({ + sourceId: 'default', + config: makeConfig({ capture: false }), + }); + + const results = await queryOp.handler(ctx, { + query: 'sourceoverrideunique', + source_id: 'testsrc', + expand: false, + use_cache: false, + }) as Array<{ slug: string }>; + + expect(results.map(r => r.slug)).toContain('notes/source-override-testsrc'); + expect(results.map(r => r.slug)).not.toContain('notes/source-override-default'); + }); }); describe('op-layer capture — search', () => { diff --git a/test/migrations-v0_22_4.test.ts b/test/migrations-v0_22_4.test.ts index 81aca6f55..86b6893e8 100644 --- a/test/migrations-v0_22_4.test.ts +++ b/test/migrations-v0_22_4.test.ts @@ -86,6 +86,7 @@ describe('v0.22.4 migration (B11)', () => { total: 8, errors_by_code: { NESTED_QUOTES: 8 }, sample: [], + ignoredMissingOpen: 0, }, { source_id: 'archive', @@ -93,6 +94,7 @@ describe('v0.22.4 migration (B11)', () => { total: 4, errors_by_code: { NULL_BYTES: 4 }, sample: [], + ignoredMissingOpen: 0, }, { source_id: 'clean-source', @@ -100,6 +102,7 @@ describe('v0.22.4 migration (B11)', () => { total: 0, errors_by_code: {}, sample: [], + ignoredMissingOpen: 0, }, ], scanned_at: new Date().toISOString(), diff --git a/test/oauth.test.ts b/test/oauth.test.ts index fa549a512..9239fd6fd 100644 --- a/test/oauth.test.ts +++ b/test/oauth.test.ts @@ -5,6 +5,7 @@ import { pg_trgm } from '@electric-sql/pglite/contrib/pg_trgm'; import { GBrainOAuthProvider, coerceTimestamp } from '../src/core/oauth-provider.ts'; import { hashToken, generateToken } from '../src/core/utils.ts'; import { PGLITE_SCHEMA_SQL } from '../src/core/pglite-schema.ts'; +import { InvalidTokenError } from '@modelcontextprotocol/sdk/server/auth/errors.js'; // --------------------------------------------------------------------------- // Test setup: in-memory PGLite with OAuth tables @@ -217,6 +218,35 @@ describe('verifyAccessToken', () => { await expect(provider.verifyAccessToken('nonexistent-token')).rejects.toThrow('Invalid token'); }); + // v0.36.1.x #935: the SDK's requireBearerAuth middleware only returns 401 + // on InvalidTokenError; bare Error falls through to 500. Lock in the class. + test('verifyAccessToken throws InvalidTokenError (not bare Error) on expired token', async () => { + const expiredToken = generateToken('gbrain_at_'); + const hash = hashToken(expiredToken); + const firstClient = (await sql`SELECT client_id FROM oauth_clients LIMIT 1`)[0]; + await sql` + INSERT INTO oauth_tokens (token_hash, token_type, client_id, scopes, expires_at) + VALUES (${hash}, ${'access'}, ${firstClient.client_id as string}, ${'{read}'}, ${Math.floor(Date.now() / 1000) - 100}) + `; + let caught: unknown; + try { + await provider.verifyAccessToken(expiredToken); + } catch (e) { + caught = e; + } + expect(caught).toBeInstanceOf(InvalidTokenError); + }); + + test('verifyAccessToken throws InvalidTokenError (not bare Error) on unknown token', async () => { + let caught: unknown; + try { + await provider.verifyAccessToken('nonexistent-token'); + } catch (e) { + caught = e; + } + expect(caught).toBeInstanceOf(InvalidTokenError); + }); + test('NULL expires_at is treated as expired (fail-closed)', async () => { // Schema declares oauth_tokens.expires_at as nullable BIGINT (schema.sql:372). // Hand-modified or corrupt rows could land with NULL; verifyAccessToken must diff --git a/test/serve-http-bootstrap-token.test.ts b/test/serve-http-bootstrap-token.test.ts new file mode 100644 index 000000000..fcf6b6412 --- /dev/null +++ b/test/serve-http-bootstrap-token.test.ts @@ -0,0 +1,74 @@ +/** + * v0.36.1.x #1024: bootstrap token resolution rules. + * + * The env override (GBRAIN_ADMIN_BOOTSTRAP_TOKEN) is security-sensitive — + * a weak token silently accepted would let any reader of the env scope + * impersonate admin. Test the validation directly via the pure helper so + * the rule can't drift without the suite catching it. + */ +import { describe, test, expect } from 'bun:test'; +import { resolveBootstrapToken } from '../src/commands/serve-http.ts'; + +describe('resolveBootstrapToken (v0.36.1.x #1024)', () => { + test('unset env → generates a fresh token via the injected RNG', () => { + const r = resolveBootstrapToken(undefined, () => 'a'.repeat(64)); + expect(r.kind).toBe('ok'); + if (r.kind === 'ok') { + expect(r.token).toBe('a'.repeat(64)); + expect(r.fromEnv).toBe(false); + } + }); + + test('valid env token (32 chars, A-Za-z0-9_-) → ok + fromEnv:true', () => { + const r = resolveBootstrapToken('abcdef0123456789abcdef0123456789'); + expect(r.kind).toBe('ok'); + if (r.kind === 'ok') { + expect(r.token).toBe('abcdef0123456789abcdef0123456789'); + expect(r.fromEnv).toBe(true); + } + }); + + test('valid env token with hyphens and underscores → accepted', () => { + const r = resolveBootstrapToken('A-B_C-D_E-F_G-H_I-J_K-L_M-N_O-P_'); + expect(r.kind).toBe('ok'); + }); + + test('token shorter than 32 chars → error, refuse to start', () => { + const r = resolveBootstrapToken('short-token-only-25-chars'); // 25 chars + expect(r.kind).toBe('error'); + if (r.kind === 'error') { + expect(r.message).toMatch(/at least 32 chars/); + expect(r.message).toMatch(/Refusing to start/); + } + }); + + test('token with invalid characters (space, special chars) → error', () => { + const r1 = resolveBootstrapToken('contains space '.padEnd(40, 'x')); + expect(r1.kind).toBe('error'); + const r2 = resolveBootstrapToken('contains$dollar'.padEnd(40, 'x')); + expect(r2.kind).toBe('error'); + const r3 = resolveBootstrapToken('contains/slash'.padEnd(40, 'x')); + expect(r3.kind).toBe('error'); + }); + + test('whitespace-padded token gets trimmed before validation', () => { + const r = resolveBootstrapToken(' abcdef0123456789abcdef0123456789 '); + expect(r.kind).toBe('ok'); + if (r.kind === 'ok') expect(r.token).toBe('abcdef0123456789abcdef0123456789'); + }); + + test('empty string env (after trim, zero chars) → error', () => { + const r = resolveBootstrapToken(''); + expect(r.kind).toBe('error'); + }); + + test('exactly 32 chars hex (the boundary) → accepted', () => { + const r = resolveBootstrapToken('0123456789abcdef0123456789abcdef'); + expect(r.kind).toBe('ok'); + }); + + test('31 chars (one short) → error', () => { + const r = resolveBootstrapToken('0123456789abcdef0123456789abcde'); // 31 + expect(r.kind).toBe('error'); + }); +}); diff --git a/test/source-id-routing.test.ts b/test/source-id-routing.test.ts new file mode 100644 index 000000000..a0b5bdb32 --- /dev/null +++ b/test/source-id-routing.test.ts @@ -0,0 +1,124 @@ +/** + * v0.36.x #891 + #978 + #1078 — source-id routing regression suite. + * + * Locks in the behavior that `sync --source X` writes pages to source X, + * not the silent `'default'` fallback. The original bug surfaced as + * `gbrain sources list` reporting 0 pages for a named source while + * `pages.source_id` was littered with `'default'` rows. + * + * Tested against PGLite (in-memory) so the suite runs without a Postgres + * fixture and CI catches regressions everywhere. + */ +import { describe, test, expect, beforeAll, afterAll, beforeEach } from 'bun:test'; +import { PGLiteEngine } from '../src/core/pglite-engine.ts'; +import { resetPgliteState } from './helpers/reset-pglite.ts'; +import { importFromContent } from '../src/core/import-file.ts'; + +let engine: PGLiteEngine; + +beforeAll(async () => { + engine = new PGLiteEngine(); + await engine.connect({}); + await engine.initSchema(); +}); + +afterAll(async () => { + await engine.disconnect(); +}); + +beforeEach(async () => { + await resetPgliteState(engine); + // Seed two named sources alongside the default. + await engine.executeRaw( + `INSERT INTO sources (id, name) VALUES ('work', 'work') ON CONFLICT DO NOTHING`, + ); + await engine.executeRaw( + `INSERT INTO sources (id, name) VALUES ('personal', 'personal') ON CONFLICT DO NOTHING`, + ); +}); + +describe('source-id routing (v0.36.x #891 + #978 regression)', () => { + test('importFromContent({sourceId: "work"}) lands the page at source_id=work', async () => { + await importFromContent(engine, 'people/alice', '---\ntype: person\ntitle: Alice\n---\n# Alice\n\nWorks at Acme.', { + noEmbed: true, + sourceId: 'work', + }); + + const rows = await engine.executeRaw<{ source_id: string; slug: string }>( + `SELECT source_id, slug FROM pages WHERE slug = 'people/alice'`, + ); + expect(rows.length).toBe(1); + expect(rows[0].source_id).toBe('work'); + expect(rows[0].slug).toBe('people/alice'); + }); + + test('two sources can independently hold the same slug', async () => { + await importFromContent(engine, 'people/alice', '---\ntype: person\ntitle: Work Alice\n---\nWork persona.', { + noEmbed: true, + sourceId: 'work', + }); + await importFromContent(engine, 'people/alice', '---\ntype: person\ntitle: Personal Alice\n---\nFriend persona.', { + noEmbed: true, + sourceId: 'personal', + }); + + const rows = await engine.executeRaw<{ source_id: string; title: string }>( + `SELECT source_id, title FROM pages WHERE slug = 'people/alice' ORDER BY source_id`, + ); + expect(rows.length).toBe(2); + expect(rows[0].source_id).toBe('personal'); + expect(rows[0].title).toBe('Personal Alice'); + expect(rows[1].source_id).toBe('work'); + expect(rows[1].title).toBe('Work Alice'); + }); + + test('omitting sourceId falls through to default (regression-guard for the legacy path)', async () => { + await importFromContent(engine, 'people/bob', '---\ntype: person\ntitle: Bob\n---\nBob.', { + noEmbed: true, + }); + const rows = await engine.executeRaw<{ source_id: string }>( + `SELECT source_id FROM pages WHERE slug = 'people/bob'`, + ); + expect(rows.length).toBe(1); + expect(rows[0].source_id).toBe('default'); + }); + + test('chunks land under the requested source, not default', async () => { + await importFromContent(engine, 'people/carol', '---\ntype: person\ntitle: Carol\n---\n\nNotes about Carol.', { + noEmbed: true, + sourceId: 'work', + }); + const chunks = await engine.executeRaw<{ source_id: string }>( + `SELECT p.source_id FROM content_chunks c JOIN pages p ON p.id = c.page_id WHERE p.slug = 'people/carol'`, + ); + expect(chunks.length).toBeGreaterThan(0); + for (const c of chunks) expect(c.source_id).toBe('work'); + }); + + test('tags land under the requested source, not default', async () => { + await importFromContent(engine, 'people/dave', '---\ntype: person\ntitle: Dave\ntags: [investor, founder]\n---\nDave.', { + noEmbed: true, + sourceId: 'work', + }); + const tags = await engine.executeRaw<{ source_id: string; tag: string }>( + `SELECT p.source_id, t.tag FROM tags t JOIN pages p ON p.id = t.page_id WHERE p.slug = 'people/dave' ORDER BY t.tag`, + ); + expect(tags.length).toBeGreaterThan(0); + for (const t of tags) expect(t.source_id).toBe('work'); + }); +}); + +describe('source-id routing — FK integrity (v0.36.x #1078)', () => { + test('writing to a registered source does NOT raise FK violations', async () => { + // Pre-fix, multi-source brains hit FK constraint errors when the autopilot + // cycle inserted into a source row that didn't match what other paths + // assumed. This is the smoke test that the entire import path is FK-clean + // for named sources. + await expect( + importFromContent(engine, 'people/eve', '---\ntype: person\ntitle: Eve\n---\nEve.', { + noEmbed: true, + sourceId: 'work', + }), + ).resolves.toBeTruthy(); + }); +}); diff --git a/test/sync.test.ts b/test/sync.test.ts index 14be8a0a7..5a6bf9596 100644 --- a/test/sync.test.ts +++ b/test/sync.test.ts @@ -1,6 +1,6 @@ import { describe, test, expect, beforeAll, afterAll, beforeEach, afterEach } from 'bun:test'; -import { buildSyncManifest, isSyncable, pathToSlug, pruneDir } from '../src/core/sync.ts'; -import { buildGitInvocation } from '../src/commands/sync.ts'; +import { buildSyncManifest, isSyncable, pathToSlug, pruneDir, isCodeFilePath } from '../src/core/sync.ts'; +import { buildAutoEmbedArgs, buildGitInvocation } from '../src/commands/sync.ts'; import { mkdtempSync, writeFileSync, rmSync, mkdirSync } from 'fs'; import { join } from 'path'; import { execSync } from 'child_process'; @@ -151,6 +151,31 @@ describe('pruneDir', () => { }); }); +describe('isCodeFilePath', () => { + test('v0.36.x #878 regression: Terraform / HCL extensions are admitted', () => { + expect(isCodeFilePath('infra/main.tf')).toBe(true); + expect(isCodeFilePath('infra/prod.tfvars')).toBe(true); + expect(isCodeFilePath('modules/network/variables.hcl')).toBe(true); + }); + + test('extensions are case-insensitive', () => { + expect(isCodeFilePath('INFRA/MAIN.TF')).toBe(true); + expect(isCodeFilePath('Modules/Net/Vars.HCL')).toBe(true); + }); + + test('does not false-positive on lookalike suffixes', () => { + expect(isCodeFilePath('docs/notes.txt')).toBe(false); + expect(isCodeFilePath('readme.tflint')).toBe(false); + expect(isCodeFilePath('config.hcling')).toBe(false); + }); + + test('still accepts the v0.20.0 baseline set (regression guard)', () => { + expect(isCodeFilePath('src/foo.ts')).toBe(true); + expect(isCodeFilePath('src/bar.py')).toBe(true); + expect(isCodeFilePath('config.toml')).toBe(true); + }); +}); + describe('pathToSlug', () => { test('strips .md extension and lowercases', () => { expect(pathToSlug('people/pedro-franceschi.md')).toBe('people/pedro-franceschi'); @@ -322,6 +347,26 @@ describe('performSync dry-run never writes', () => { expect(await engine.getConfig('sync.repo_path')).toBeNull(); }); + test('first sync without origin skips git pull noise and uses local working tree', async () => { + const { performSync } = await import('../src/commands/sync.ts'); + const messages: string[] = []; + const originalError = console.error; + console.error = (...args: unknown[]) => { messages.push(args.map(String).join(' ')); }; + try { + const result = await performSync(engine, { + repoPath, + noEmbed: true, + }); + expect(result.status).toBe('first_sync'); + } finally { + console.error = originalError; + } + + expect(messages.some(m => m.includes('No origin remote') && m.includes('skipping git pull'))).toBe(true); + expect(messages.some(m => m.includes('sync.git_pull start'))).toBe(false); + expect(messages.some(m => m.includes('git pull failed'))).toBe(false); + }); + test('incremental dry-run does NOT write to DB or advance the bookmark', async () => { const { performSync } = await import('../src/commands/sync.ts'); // First do a real sync to seed the bookmark. @@ -610,3 +655,18 @@ describe('git() helper invocation order (CJK wave v0.32.7)', () => { ]); }); }); + +describe('sync auto-embed arguments', () => { + test('scopes incremental source sync embedding to the same source', () => { + expect(buildAutoEmbedArgs(['hello-js'], 'source-a')).toEqual([ + '--source', + 'source-a', + '--slugs', + 'hello-js', + ]); + }); + + test('keeps default-source sync embed arguments unchanged', () => { + expect(buildAutoEmbedArgs(['people/alice'])).toEqual(['--slugs', 'people/alice']); + }); +}); diff --git a/test/upgrade.test.ts b/test/upgrade.serial.test.ts similarity index 69% rename from test/upgrade.test.ts rename to test/upgrade.serial.test.ts index b19d51572..cea004132 100644 --- a/test/upgrade.test.ts +++ b/test/upgrade.serial.test.ts @@ -1,4 +1,8 @@ import { describe, test, expect } from 'bun:test'; +import { mkdirSync, mkdtempSync, realpathSync, rmSync, writeFileSync } from 'fs'; +import { dirname, join } from 'path'; +import { tmpdir } from 'os'; +import { resolveBunGlobalRoot } from '../src/commands/upgrade.ts'; // We can't easily mock process.execPath in bun, so we test the upgrade // command's --help output and the detection logic via subprocess @@ -104,6 +108,11 @@ describe('detectInstallMethod heuristic (source analysis)', () => { expect(source).toContain("execFileSync('bun', ['install']"); }); + test('bun global upgrade passes cwd to bun update', () => { + expect(source).toContain('const bunGlobalRoot = resolveBunGlobalRoot()'); + expect(source).toContain("execFileSync('bun', ['update', 'gbrain'], { cwd: bunGlobalRoot"); + }); + test('classifyBunInstall checks repository.url AND src/cli.ts marker', () => { // Codex feedback: repository.url alone is spoofable by future squatter // updates; the source-marker fallback (src/cli.ts presence) is @@ -121,6 +130,69 @@ describe('detectInstallMethod heuristic (source analysis)', () => { }); }); +describe('resolveBunGlobalRoot', () => { + const originalBunInstall = process.env.BUN_INSTALL; + const originalHome = process.env.HOME; + const originalArgv1 = process.argv[1]; + + function restoreEnv() { + if (originalBunInstall === undefined) delete process.env.BUN_INSTALL; + else process.env.BUN_INSTALL = originalBunInstall; + + if (originalHome === undefined) delete process.env.HOME; + else process.env.HOME = originalHome; + + process.argv[1] = originalArgv1; + } + + test('honors BUN_INSTALL override', () => { + try { + process.env.BUN_INSTALL = '/custom/bun'; + process.env.HOME = '/ignored/home'; + expect(resolveBunGlobalRoot()).toBe('/custom/bun/install/global'); + } finally { + restoreEnv(); + } + }); + + test('uses canonical ~/.bun/install/global when present', () => { + const home = mkdtempSync(join(tmpdir(), 'gbrain-upgrade-home-')); + try { + delete process.env.BUN_INSTALL; + process.env.HOME = home; + const globalRoot = join(home, '.bun', 'install', 'global'); + mkdirSync(join(globalRoot, 'node_modules'), { recursive: true }); + writeFileSync(join(globalRoot, 'package.json'), '{}'); + + expect(resolveBunGlobalRoot()).toBe(globalRoot); + } finally { + restoreEnv(); + rmSync(home, { recursive: true, force: true }); + } + }); + + test('falls back to the package root above node_modules/gbrain', () => { + const home = mkdtempSync(join(tmpdir(), 'gbrain-upgrade-home-')); + const globalRoot = mkdtempSync(join(tmpdir(), 'gbrain-upgrade-global-')); + try { + delete process.env.BUN_INSTALL; + process.env.HOME = home; + const cliPath = join(globalRoot, 'node_modules', 'gbrain', 'src', 'cli.ts'); + mkdirSync(dirname(cliPath), { recursive: true }); + mkdirSync(join(globalRoot, 'node_modules'), { recursive: true }); + writeFileSync(join(globalRoot, 'package.json'), '{}'); + writeFileSync(cliPath, ''); + process.argv[1] = cliPath; + + expect(resolveBunGlobalRoot()).toBe(realpathSync(globalRoot)); + } finally { + restoreEnv(); + rmSync(home, { recursive: true, force: true }); + rmSync(globalRoot, { recursive: true, force: true }); + } + }); +}); + describe('post-upgrade behavior (post v0.12.0 merge)', () => { // The earlier --execute / --yes / auto_execute tests were removed when the // master merge replaced the markdown-driven runPostUpgrade with the TS diff --git a/test/whoknows-doctor.test.ts b/test/whoknows-doctor.test.ts index f6d5229a9..16231aea9 100644 --- a/test/whoknows-doctor.test.ts +++ b/test/whoknows-doctor.test.ts @@ -1,14 +1,16 @@ import { describe, it, expect, beforeAll, afterAll, beforeEach } from 'bun:test'; -import { mkdtempSync, rmSync, writeFileSync, mkdirSync, existsSync } from 'fs'; +import { mkdtempSync, rmSync, writeFileSync, mkdirSync } from 'fs'; import { tmpdir } from 'os'; import { join } from 'path'; -import { whoknowsHealthCheck } from '../src/commands/doctor.ts'; +import { resolveWhoknowsFixturePath, whoknowsHealthCheck } from '../src/commands/doctor.ts'; +import { withEnv } from './helpers/with-env.ts'; /** * v0.33 whoknows_health doctor check — fixture-only assertion. The - * check inspects the working-directory fixture file; it does NOT need - * an engine. We pass a sentinel object cast to BrainEngine for the - * type contract since the check intentionally ignores its argument. + * check resolves the shipped fixture from the module path unless + * GBRAIN_WHOKNOWS_FIXTURE_PATH is set. It does NOT need an engine. + * We pass a sentinel object cast to BrainEngine for the type contract + * since the check intentionally ignores its argument. */ const stubEngine = {} as Parameters[0]; @@ -39,12 +41,29 @@ function cleanup() { } describe('whoknows_health doctor check', () => { - it('warns when fixture file is missing entirely', async () => { + it('resolves the default fixture when cwd is not the repo', async () => { try { - const check = await whoknowsHealthCheck(stubEngine); - expect(check.name).toBe('whoknows_health'); - expect(check.status).toBe('warn'); - expect(check.message).toContain('fixture missing'); + await withEnv({ GBRAIN_WHOKNOWS_FIXTURE_PATH: undefined }, async () => { + const check = await whoknowsHealthCheck(stubEngine); + expect(check.name).toBe('whoknows_health'); + expect(check.status).toBe('ok'); + expect(check.message).toContain('queries'); + }); + } finally { + cleanup(); + } + }); + + it('warns when env override fixture file is missing entirely', async () => { + try { + const fixturePath = join(workDir, 'missing-whoknows-eval.jsonl'); + await withEnv({ GBRAIN_WHOKNOWS_FIXTURE_PATH: fixturePath }, async () => { + const check = await whoknowsHealthCheck(stubEngine); + expect(check.name).toBe('whoknows_health'); + expect(check.status).toBe('warn'); + expect(check.message).toContain('fixture missing'); + expect(check.message).toContain(fixturePath); + }); } finally { cleanup(); } @@ -52,11 +71,14 @@ describe('whoknows_health doctor check', () => { it('warns when fixture exists but is empty', async () => { try { - mkdirSync('test/fixtures', { recursive: true }); - writeFileSync('test/fixtures/whoknows-eval.jsonl', ''); - const check = await whoknowsHealthCheck(stubEngine); - expect(check.status).toBe('warn'); - expect(check.message).toContain('empty'); + const fixturePath = join(workDir, 'test/fixtures/whoknows-eval.jsonl'); + mkdirSync(join(workDir, 'test/fixtures'), { recursive: true }); + writeFileSync(fixturePath, ''); + await withEnv({ GBRAIN_WHOKNOWS_FIXTURE_PATH: fixturePath }, async () => { + const check = await whoknowsHealthCheck(stubEngine); + expect(check.status).toBe('warn'); + expect(check.message).toContain('empty'); + }); } finally { cleanup(); } @@ -64,30 +86,36 @@ describe('whoknows_health doctor check', () => { it('warns when fixture has fewer than 5 rows', async () => { try { - mkdirSync('test/fixtures', { recursive: true }); + const fixturePath = join(workDir, 'test/fixtures/whoknows-eval.jsonl'); + mkdirSync(join(workDir, 'test/fixtures'), { recursive: true }); writeFileSync( - 'test/fixtures/whoknows-eval.jsonl', + fixturePath, '{"query":"a","expected_top_3_slugs":["x"]}\n' + '{"query":"b","expected_top_3_slugs":["y"]}\n', ); - const check = await whoknowsHealthCheck(stubEngine); - expect(check.status).toBe('warn'); - expect(check.message).toContain('2 row'); + await withEnv({ GBRAIN_WHOKNOWS_FIXTURE_PATH: fixturePath }, async () => { + const check = await whoknowsHealthCheck(stubEngine); + expect(check.status).toBe('warn'); + expect(check.message).toContain('2 row'); + }); } finally { cleanup(); } }); - it('passes when fixture has at least 5 rows', async () => { + it('honors env override when fixture has at least 5 rows', async () => { try { - mkdirSync('test/fixtures', { recursive: true }); + const fixturePath = join(workDir, 'test/fixtures/whoknows-eval.jsonl'); + mkdirSync(join(workDir, 'test/fixtures'), { recursive: true }); const rows = Array.from({ length: 10 }, (_, i) => JSON.stringify({ query: `q${i}`, expected_top_3_slugs: [`p${i}`] }), ).join('\n'); - writeFileSync('test/fixtures/whoknows-eval.jsonl', rows + '\n'); - const check = await whoknowsHealthCheck(stubEngine); - expect(check.status).toBe('ok'); - expect(check.message).toContain('10 queries'); + writeFileSync(fixturePath, rows + '\n'); + await withEnv({ GBRAIN_WHOKNOWS_FIXTURE_PATH: fixturePath }, async () => { + const check = await whoknowsHealthCheck(stubEngine); + expect(check.status).toBe('ok'); + expect(check.message).toContain('10 queries'); + }); } finally { cleanup(); } @@ -95,7 +123,8 @@ describe('whoknows_health doctor check', () => { it('ignores comment lines and blank lines when counting rows', async () => { try { - mkdirSync('test/fixtures', { recursive: true }); + const fixturePath = join(workDir, 'test/fixtures/whoknows-eval.jsonl'); + mkdirSync(join(workDir, 'test/fixtures'), { recursive: true }); const content = [ '# comment', '// another comment', @@ -106,10 +135,21 @@ describe('whoknows_health doctor check', () => { '{"query":"d","expected_top_3_slugs":["w"]}', '{"query":"e","expected_top_3_slugs":["v"]}', ].join('\n'); - writeFileSync('test/fixtures/whoknows-eval.jsonl', content + '\n'); - const check = await whoknowsHealthCheck(stubEngine); - expect(check.status).toBe('ok'); - expect(check.message).toContain('5 queries'); + writeFileSync(fixturePath, content + '\n'); + await withEnv({ GBRAIN_WHOKNOWS_FIXTURE_PATH: fixturePath }, async () => { + const check = await whoknowsHealthCheck(stubEngine); + expect(check.status).toBe('ok'); + expect(check.message).toContain('5 queries'); + }); + } finally { + cleanup(); + } + }); + + it('returns null when the default fixture path cannot be resolved', () => { + try { + const fixturePath = resolveWhoknowsFixturePath({}, 'not-a-file-url'); + expect(fixturePath).toBeNull(); } finally { cleanup(); }