v0.37.11.0: fresh-install PGLite embedding setup fix wave#1286
Merged
Conversation
… keep working The v0.37 fix wave changes the canonical gateway defaults to zeroentropyai:zembed-1 / 1280 (matching what v0.36 already chose as the system default). 20+ test files have hardcoded new Float32Array(1536) fixtures that match the OLD schema default. Without this preload, those tests fail with a vector-dim-mismatch on insert. The preload is gateway-only — it doesn't change which model gbrain ships to production users. Tests that want the new ZE/1280 defaults call configureGateway() explicitly in their own beforeAll. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…registry Closes the v0.36 defaults drift bug class. The gateway shipped zeroentropyai:zembed-1 / 1280 as the system default in v0.36 but eight other places kept hardcoding 1536 / text-embedding-3-large. Fresh gbrain init --pglite sized the column to 1536, the embed pipeline used ZE/1280, and every page failed with dim mismatch. - New src/core/ai/defaults.ts leaf module is the canonical source for DEFAULT_EMBEDDING_MODEL / DEFAULT_EMBEDDING_DIMENSIONS. Schema and registry helpers import from this lean module instead of pulling the full gateway (which loads every provider SDK). - src/core/ai/gateway.ts re-exports the constants for back-compat. - src/core/pglite-schema.ts getPGLiteSchema() defaults track gateway. - src/core/postgres-engine.ts getPostgresSchema() default args track gateway (same drift on the Postgres path — codex round 1 CDX-1). - Both engine.initSchema() fallbacks track gateway constants (no more stale OpenAI/1536 catch-block defaults). - Schema seed stops stripping the provider prefix; full provider:model is stored in the DB config table (codex round 1 CDX-4). - Chunk-row INSERT defaults track gateway (codex round 2 CDX2-4 — pglite-engine:1611 + postgres-engine:1647 were production write sites previously hardcoded to text-embedding-3-large). - src/core/search/embedding-column.ts loadRegistry + isCacheSafe gain the cfg > gateway > DEFAULT resolution chain (codex round 2 CDX2-3). The gateway tier matters because callers that configure the gateway (init paths, tests, programmatic SDK) expect the registry to mirror that state when cfg doesn't have an explicit embedding_model. Tests: - schema-templating: default expectation flips to ZE/1280 (v0.37 truth). - embedding-dim-check: 3 new engine-kind branching cases + updated fresh-brain expectation (under legacy preload). - embedding-column: registry + isCacheSafe expectations match new chain. - v0_28_5-fix-wave E2E: engineKind required arg propagated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nest config-set, sync/reinit help
Closes the "fresh init doesn't work + config-set silently lies" bug
class end-to-end. Six related changes that ship together because the
file-plane/DB-plane contract only holds when init paths, config-set,
the gateway env mapping, and the recipe text all agree.
Lane B (init paths):
- initPGLite, initPostgres, initMigrateOnly always configureGateway()
before engine.initSchema(). Pre-fix the call was gated on flags, so
bare `gbrain init --pglite` left the gateway unconfigured and the
engine fell through to stale OpenAI/1536 defaults instead of the
ZE/1280 the gateway would have resolved.
- New configureGatewayWithMergedPrecedence() helper applies the locked
precedence chain `CLI > env > existing file > gateway internal`.
- printResolvedAIChoice() shows the resolved model/dim at init time +
surfaces a ZE setup hint inline when the API key is missing.
- B.4: saveConfig merge uses loadConfigFileOnly() so transient env
state (DATABASE_URL, etc.) never poisons ~/.gbrain/config.json
(codex round 2 CDX-5).
- B.5: extend the v0.28.5 dim-mismatch detector so it fires when the
gateway-resolved dim differs from the existing column, not only
when --embedding-dimensions is explicit (codex round 2 CDX-6).
Lane C (config plane):
- New `loadConfigFileOnly()` reads ~/.gbrain/config.json only — no env
merge, no DATABASE_URL inference. Safe write-back source for init.
- GBrainConfig gains `zeroentropy_api_key?: string`. loadConfig merges
process.env.ZEROENTROPY_API_KEY. buildGatewayConfig at cli.ts:1401
maps it into env.ZEROENTROPY_API_KEY so ZE recipes finally see it
(codex round 2 CDX2-5+6 — the v1 fix landed in the wrong file).
- `gbrain config set embedding_model` and `... embedding_dimensions`
refuse unconditionally and print a paste-ready wipe-and-reinit
recipe. No --force escape (codex round 2 CDX2-13).
- migrate-engine.ts adds a contract comment at the DB-plane write
site documenting "DB stores schema-applied metadata; file plane is
canonical for runtime gateway config" + preserves the existing
file-plane config across engine migration.
Lane D.1 (recipe text):
- embeddingMismatchMessage() takes an `engineKind` arg. PGLite branch
emits a wipe-and-reinit recipe using gbrainPath('brain.pglite') or
the caller's databasePath override. Postgres branch keeps the SQL
ALTER recipe.
- The PGLite recipe recommends `gbrain reinit-pglite` (new sugar
command below) as the one-line path before falling back to the
by-hand mv + init + sync sequence.
Lane D.4 (sync help dispatch):
- `sync` and `reinit-pglite` added to CLI_ONLY_SELF_HELP so their own
--help branches reach the user (pre-fix the generic short-circuit
fired first and the dedicated usage was unreachable; codex round 2
CDX2-12).
- `gbrain sync --help` short-circuits BEFORE engine bind so users on
a fresh tmpdir (no config) can read the help without hitting
no-such-config errors.
Sugar:
- New `gbrain reinit-pglite --embedding-model X --embedding-dimensions N`
wraps the wipe + init + sync dance into one command. Backs up the
brain to <path>.bak. TTY confirmation unless --yes. --no-sync to
defer the resync. --json for scripts.
Tests:
- test/cli.test.ts sync-help test rewritten for the new
per-command-usage output (lists --no-embed which is the v0.37
user-visible flag the wave wanted to surface).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…atch sites embedding-pipeline error UX. Pre-fix, a fresh-install dim mismatch produced raw Postgres "expected N dimensions, not M" errors page after page, surfacing only after the worker pool drained the entire corpus. Sync swallowed embed errors at TWO catch sites and never surfaced the recovery recipe. embed.ts: - New `EmbeddingDimMismatchError` tagged class with the paste-ready recipe baked in. - `runEmbedCore` pre-flights via `readContentChunksEmbeddingDim` + gateway.getEmbeddingDimensions() before the worker pool spins up. On mismatch, throws the typed error which the CLI wrapper catches and prints. Dry-run skips the check (no embed risk). - Catches the headline fresh-install bug class at first call instead of letting it hammer N parallel API calls into dim-rejected inserts. sync.ts: - Both embed catches at sync.ts:990 (incremental) and sync.ts:1129 (first-sync) detect EmbeddingDimMismatchError and surface the recipe + a `--no-embed` tip on stderr (codex round 2 CDX2-8: incremental path was previously silent; only the first-sync path was flagged). - Non-mismatch embed failures still stay best-effort (rate limits, transient network) — those shouldn't break sync. - Sync calls runEmbedCore directly instead of runEmbed (which calls process.exit on error and bypasses sync's catch). - Sync gets a proper --help block listing every meaningful flag: --no-embed, --workers, --source, --skip-failed, --retry-failed, --watch, --interval, --no-pull, --all, --json, --yes, --dry-run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…key lookup Doctor's embedding checks were reading the DB config table for embedding_model / embedding_dimensions / zeroentropy_api_key. Post v0.37 the file plane is canonical (the DB plane is schema-applied metadata, not runtime gateway config) so those reads produced stale verdicts on fresh installs whose DB row hadn't been written. - checkEmbeddingWidthConsistency reads gateway.getEmbeddingDimensions() and gateway.getEmbeddingModel() instead of engine.getConfig(...). Reuses readContentChunksEmbeddingDim from the same shared helper init + embed use. On mismatch, the fix hint threads engineKind + databasePath into the new branched recipe (codex round 1 CDX-8 + Lane E.1/E.2). - checkZeEmbeddingHealth reads gateway for the model + loadConfigFileOnly for the key. Fires when (a) resolved model starts with zeroentropyai: AND (b) ZEROENTROPY_API_KEY is unset in env AND (c) file plane has no zeroentropy_api_key (codex round 2 CDX2-10). - loadRecommendationContext reads gateway for both fields and recognizes the ZE key alongside OpenAI/Anthropic in the hasEmbeddingApiKey check, so brains on ZE no longer look "healthy" just because OPENAI_API_KEY happens to be set (codex round 2 CDX2-11). Tests rewritten for the gateway-source-of-truth contract via configureGateway() in beforeAll. Added a "gateway unconfigured: skips with ok" case so doctor doesn't false-warn on cold-boot brains. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ipe + TODOS Lands the v0.37 PGLite fresh-install fix wave's structural tests and the user-facing migration recipe overhaul. test/v0_37_fix_wave.test.ts (new): 22 unit cases pinning the lanes: - Lane A: defaults module exports, getPGLiteSchema/getPostgresSchema default-args, registry + isCacheSafe under the `cfg > gateway > DEFAULT` chain (both gateway-set and gateway-reset branches). - Lane B: loadConfigFileOnly env isolation + DATABASE_URL inference refusal + null-on-missing. - Lane C.3: buildGatewayConfig maps zeroentropy_api_key + process.env wins over config (operator escape hatch contract). - Lane D.2: EmbeddingDimMismatchError shape + tag. - Lane D.4: structural assertion that `sync` is in CLI_ONLY_SELF_HELP. - Deferred-TODO ship: reinit-pglite is registered correctly + embeddingMismatchMessage PGLite branch recommends it. docs/embedding-migrations.md: PGLite section moved to top (the default install). The recommended path is `gbrain reinit-pglite` one-liner; the by-hand mv + init + sync sequence stays as the fallback recipe. Postgres SQL ALTER recipe preserved. New section on `gbrain config set` refusal explains the file-plane vs DB-plane contract so users don't follow stale documentation. TODOS.md: 4 deferred follow-ups filed with concrete file pointers: - gbrain embed --try-fallback (provider auto-switch with consent gate) - Full plane unification for non-schema-sizing fields - Worker-pool shared AbortController for mid-run dim drift - Cleanup of back-compat constants in src/core/embedding.ts Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The structural fix-wave tests in test/v0_37_fix_wave.test.ts pin lane-level
invariants (exports, registry chain, signature shapes). The audit found 10+
END-TO-END behaviors that the structural tests didn't actually reach.
This file fills the highest-leverage gaps.
Unit coverage (test/v0_37_gap_fill.test.ts, 12 cases):
- Lane A.7: chunk-row INSERT default tracks DEFAULT_EMBEDDING_MODEL
constant (pre-fix this was the literal 'text-embedding-3-large' at
pglite-engine.ts:1611 + postgres-engine.ts:1647 — production write
sites that were never directly tested; codex round 2 CDX2-4).
- Lane A.8: schema seed stores full provider:model in DB config
(pre-fix the .split(':') strip dropped the prefix; codex round 1
CDX-4). Asserts a fresh ZE init stores `zeroentropyai:zembed-1`
in the config table, not bare `zembed-1`.
- Lane B precedence: explicit CLI > env > existing file > default
test (codex round 2 CDX2-7 contradiction guard).
- Lane C.3 env merge: process.env.ZEROENTROPY_API_KEY threads through
loadConfig → cfg.zeroentropy_api_key; loadConfigFileOnly does NOT.
- Lane D.2 end-to-end: schema=1536 + gateway=1280 →
EmbeddingDimMismatchError fires AND the embed transport is never
called (the whole point of pre-flight). Plus dry-run skips the
check.
- Lane D.3 source-text grep: both sync.ts catch sites detect the
typed error + the `--no-embed` tip is present (CDX2-8).
- Lane E.4 source-text grep: loadRecommendationContext is
provider-aware (reads gateway + branches on ZE/OpenAI key).
- reinit-pglite contract: refuses on non-PGLite engines + refuses
when required flags are missing.
E2E (test/e2e/fresh-install-pglite.test.ts, 2 cases):
- Bare `gbrain init --pglite` produces a `vector(1280)` schema, prints
the resolved choice, persists defaults to config.json — the headline
scenario that v0.37 ships to fix.
- init → seed page → embed end-to-end: chunks have non-null
embeddings; no dim mismatch despite the wave's defaults change.
Both E2E cases are IN-PROCESS (per CDX2-12: CLI-subprocess E2E can't
inherit `__setEmbedTransportForTests`). They run with stubbed transport
returning synthetic 1280-dim vectors so we never hit real provider APIs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an afterAll that restores the gateway to OpenAI/1536 (matching the bunfig preload) at the end of the reinit-pglite describe. Belt-and- suspenders: earlier describe blocks in this file already restore, but if the reinit-pglite tests ever start mutating the gateway in the future, this protects downstream test files in the same bun-test shard from inheriting a non-default state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
README + topologies + embedding-providers were still pointing users at `gbrain config set embedding_model X` / `embedding_dimensions N`. As of v0.37.10.0 those writes are refused — the schema column has to resize alongside the config. Point at `gbrain reinit-pglite` (PGLite) and the SQL recipe in `docs/embedding-migrations.md` (Postgres) instead. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Resolves conflicts in VERSION (kept 0.37.11.0), package.json, CHANGELOG.md (kept both v0.37.11.0 and master's v0.37.10.0 entries), src/commands/config.ts (combined hard-refuse for schema-sizing fields + master's general unknown-key gate), src/commands/embed.ts (run master's noEmbedding gate first, then my dim-mismatch preflight), src/commands/init.ts (took master's preflight + configureGateway + post-init invariant flow, kept my Lane B.4 file-plane merge via loadConfigFileOnly so existing user fields aren't clobbered, added engineKind/databasePath args to the post-init mismatch recipe, inlined the ZE-key setup hint), src/core/embedding-dim-check.ts (kept both gbrainPath import and master's dim-validation + EmbeddingDisabledError surface). Quarantines three pre-existing flakes to .serial.test.ts (HTTP server module state + gateway state contamination under parallel shards): - test/doctor-remote.test.ts - test/cross-modal-hybrid-integration.test.ts - test/search/hybrid-reranker-integration.test.ts Regenerates llms.txt + llms-full.txt for the merged CLAUDE.md. Verified: typecheck clean, 8375 pass / 0 fail on full unit suite. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CI's `check:test-isolation` lint flagged R1 violations (direct `process.env.GBRAIN_HOME` mutation) in both new fix-wave test files. Per the documented quarantine pattern in CLAUDE.md, rename to `*.serial.test.ts` instead of refactoring through `withEnv()` — both files use beforeEach/afterEach env wiring that's already serial-safe. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
garrytan
added a commit
that referenced
this pull request
May 22, 2026
Conflicts resolved: - VERSION → 0.38.1.0 (higher semver wins; master bumped 0.37.10.0 → 0.37.11.0) - package.json → 0.38.1.0 (trio agreement) - CHANGELOG.md → my v0.38.1.0 entry stays on top; master's new v0.37.11.0 entry inserted between mine and v0.37.10.0 - src/cli.ts CLI_ONLY Set → union of master's `reinit-pglite` and my `capture` CLI verbs Master's v0.37.11.0 brings the fresh-install PGLite embedding setup fix wave (#1286): default vector(1280) schema matching the gateway's zembed-1 default, `gbrain reinit-pglite` wipe-and-reinit command, and proper ZE API key plumbing. No collisions with v0.38 ingestion substrate beyond the cli.ts dispatcher Set. bun install + bun run typecheck → clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fresh
gbrain init --pgliteproduced a brain where no embedding provider worked out of the box: schema sized to 1536 (the OpenAI default), gateway resolving to ZeroEntropy at 1280, firstgbrain embed --staledim-mismatching,gbrain config set embedding_modelsilently no-op, ZeroEntropy keys had no config plane. 9 reported bugs; 2 rounds of codex outside-voice review caught 26 more. All 26 folded into the plan; full wave shipped.ELI10: if you ran
gbrain init --pglitein v0.37.x and tried to embed, you got opaque dimension errors with no clear way out. v0.37.11.0 makes the fresh-install path actually work, makes the failure-mode loud + paste-ready when you do hit a mismatch, and addsgbrain reinit-pglitefor one-command brain rebuilds.What's in this PR
Five lanes plus a
gbrain reinit-pglitesugar command. Each lane is a coherent commit set; the wave was structured so a future bisect can land on the right slice cleanly.Lane A — Single source of truth for defaults. New module
src/core/ai/defaults.tsexportsDEFAULT_EMBEDDING_MODEL+DEFAULT_EMBEDDING_DIMENSIONS. Every hardcoded1536/text-embedding-3-largeliteral in production code paths (PGLite schema, Postgres schema, both engine fallbacks, embedding-column registry, isCacheSafe baseline, chunk-row INSERT defaults) replaced with the import. Schema seed no longer strips the provider prefix — DB config storesprovider:modelend-to-end.Lane B — Init paths configure gateway and merge config.
initPGLite,initPostgres, andinitMigrateOnlyallconfigureGateway()unconditionally beforeengine.initSchema(). Resolution precedence locked across the codebase:CLI flags this invocation > existing file plane > resolved defaults from gateway. Resolved embedding model + dimensions get printed at init and persisted toconfig.jsoneven when the user passes no flags, sogbrain config showreflects the active default. NewloadConfigFileOnly()helper (insrc/core/config.ts) — read-back source for safe merge that doesn't poisonconfig.jsonwith env-only state. v0.28.5 dim-mismatch guard extended to fire on re-init even when no explicit--embedding-dimensionsis passed.Lane C — Config plane honesty.
gbrain config set embedding_model/embedding_dimensionsrefused unconditionally (no--forceescape) with paste-ready wipe-and-reinit recipe. ZeroEntropy credentials get a real config plane:zeroentropy_api_keyfield onGBrainConfig, env merge inloadConfig, mapping incli.ts:buildGatewayConfig. Internal DB-write sites (ze-switch, migrate-engine) gained contract comments documenting the file-plane-is-canonical invariant.Lane D — Error UX and recipe correctness. New tagged
EmbeddingDimMismatchErrorclass insrc/commands/embed.ts; pre-flight check fires loud + structured before the embed loop begins. Both sync-side embed catch sites (incremental:990and first-sync:1129) detect the tagged error and print the recipe +--no-embedtip.embeddingMismatchMessageextended with engine kind + database path so PGLite emitsgbrain reinit-pglite ...and Postgres emits the SQL ALTER recipe.syncadded toCLI_ONLY_SELF_HELPsogbrain sync --helpreaches its dispatch with a comprehensive usage block.docs/embedding-migrations.mdrestructured PGLite-first.Lane E — Doctor correctness.
checkEmbeddingWidthConsistency,checkZeEmbeddingHealth, andloadRecommendationContextall read gateway state instead of DB plane (the canonical schema-sizing source post-Lane C). Provider-aware key check recognizes ZeroEntropy alongside OpenAI / Anthropic — no more false-warns when ZE is the active provider and OpenAI keys aren't configured.gbrain reinit-pglite— new one-command wrapper: backs up the existing brain to<path>.bak, runsgbrain initwith the new flags (preserving every other config field — chat model, expansion model, API keys), and re-syncs the brain repo.--no-syncskips the resync,--yesskips the TTY confirmation,--jsonfor scripts. Refuses non-PGLite engines (Postgres has the in-place SQL recipe). 293-line CLI insrc/commands/reinit-pglite.ts.Tests
test/v0_37_fix_wave.test.ts(structural lane assertions) + 12 intest/v0_37_gap_fill.test.ts(end-to-end behavior + reinit-pglite contracts), plus updates totest/embedding-dim-check.test.ts,test/ai/schema-templating.test.ts,test/search/embedding-column.test.ts,test/cli.test.ts,test/doctor-ze-checks.test.ts,test/e2e/v0_28_5-fix-wave.test.ts.test/e2e/fresh-install-pglite.test.ts(in-process, no DATABASE_URL needed) exercises the headline path: baregbrain init --pglite→ import → embed (via__setEmbedTransportForTestsinjection) → chunks have non-null embeddings.test/helpers/legacy-embedding-preload.tsregistered viabunfig.tomlpreloadarray so 1536-dim test fixtures keep working under the new ZE-default world without per-file mutation.What's deferred
Four follow-up TODOs filed in
TODOS.md:embed --try-fallbackfor auto-switching providers on quota / auth failures (silent provider switching = silent vector-space corruption; needs explicit consent design).gbrain reinit-pgliteanalog for Postgres (currently SQL recipe).embedAll()sharedAbortControllerso worker-pool dim-mismatches stop within 1-2 in-flight pages instead of draining the queue (current behavior: catches per-page; the top-level still emits the recipe exactly once).Test plan
bun run typecheckcleanbun run test:e2e)gbrain reinit-pglite --helpandgbrain sync --helpreach dispatch and print usagegbrain config set embedding_model openai:text-embedding-3-largeexits 1 with the wipe-and-reinit recipegbrain init --pgliteconfigures ZE / 1280 end-to-end;gbrain config showreflects the active defaultgbrain doctorreads gateway-resolved values (not DB plane) for schema-sizing checks🤖 Generated with Claude Code