Skip to content

v0.38.0.0 feat: schema packs — bring your own shape#1248

Open
garrytan wants to merge 19 commits into
masterfrom
garrytan/houston-v1
Open

v0.38.0.0 feat: schema packs — bring your own shape#1248
garrytan wants to merge 19 commits into
masterfrom
garrytan/houston-v1

Conversation

@garrytan
Copy link
Copy Markdown
Owner

Summary

Your brain is now yours.

GBrain used to assume one shape: VC investor brain. People, companies, deals, meetings — those were the four corners. Everything else lived as as PageType casts the engine never enforced. Your real brain has 180+ types — therapy-session, tweet-bundle, adversary-profile, book-analysis, apple-note, plus 175 more. They worked through a polite fiction. v0.38 ends the pretense.

PageType is now string. The closed 23-element union is gone. Schema packs declare your domain (types, link verbs, expert routing, facts eligibility, enrichment rubrics) and the engine consults the active pack instead of hardcoded literals. Five primitives compose: entity, media, temporal, annotation, concept. Existing brains see zero change after upgrade because gbrain-base — the universal starter pack — reproduces today's behavior byte-for-byte.

16 atomic bisect-friendly commits land this PR. 95+ new tests across 12 test files. zero behavior regression on existing brains.

Architecture decisions (survived three rounds of codex outside-voice)

  • Open type surface (D12). PageType: string. ~30 as PageType casts widen to as string at engine row boundaries.
  • Explicit alias graph closure (E8). Pack types declare aliases: []. Closure walks the alias graph (BFS, depth cap 4, symmetric per declaration) — NOT primitive siblings. Prevents adversary-profile leaking into whoknows expert just because it shares the entity primitive with person.
  • Per-source closure CTE (D13). Federated reads across mixed-pack sources filter via per-source UNION ALL CTE.
  • Per-call schema_pack trust gate (D13). Remote/MCP callers rejected; CLI-only override. Same posture as v0.26.9 + v0.34.1.0 source-scope hardening.
  • ReDoS guard via vm.runInContext (E6+E9+T24 spike). 50ms per-regex + 500ms per-page budget; verified under Bun 1.3.13.
  • Inline canonical closure snapshot for eval replay (E11). eval_candidates.schema_pack_per_source JSONB stores the full resolved alias graph per source — replay is self-contained even after pack evolution.

Commits

  • 1e465f37 v0.38 plan: schema packs — bring your own shape
  • 52432a73 v0.38 T1: open PageType + TakeKind from closed unions to string
  • c9614773 v0.38 T2 (+E8, E9, D13): schema-pack module skeleton (9 files)
  • e7a15672 v0.38 T24: Bun vm.runInContext timeout spike (E9 prerequisite)
  • 57582dff v0.38 T3+T4+T28+E11: migrations v80 + v81
  • ccbd8524 v0.38 T5+T25: gbrain-base.yaml + codegen validator + parity gate
  • ccaec32b v0.38 T-LaneB1+E2+T26: src/core/distribution/ shared-helpers boundary
  • 3e4efbe8 v0.38 T-AP: active-pack boundary loader
  • 0c3a3525 v0.38 T8+D13: schema_pack per-call trust gate
  • 1b6d613e v0.38 T7a: pack-aware inferType + gbrain-base.yaml priority reorder
  • f2e69535 v0.38 T7b: pack-aware inferLinkType + frontmatter_link primitives
  • 294bdba0 v0.38 T_W: pack-driven expert types for whoknows / find_experts
  • 408071bb v0.38 T7d: pack-driven facts extractable types + gbrain-base.yaml fix
  • cc7180a4 v0.38 T_E: pack-driven enrichable types + rubric routing
  • cc7180a4 v0.38 Phase C: gbrain schema CLI (active|list|show|validate|use)
  • 097597c5 chore: bump version and changelog (v0.38.0.0)

Test plan

  • bun run verify GREEN (typecheck + 9 pre-checks)
  • 95+ new tests pass (engine, closure, per-source CTE, ReDoS guard, audit privacy, 7-tier resolver, trust gate, parity tests for inferType / inferLinkType / expert types / extractable types / enrichable types, CLI subprocess tests)
  • Migration v80 + v81 verified end-to-end against PGLite
  • gbrain-base.yaml byte-for-byte parity gate (CI-blocking) passes
  • Distribution-import-boundary regression guard passes
  • CLI smoke: gbrain schema {list,show,validate,active} all functional
  • 4 pre-existing reranker test failures from v0.35.0.0 (test/search/hybrid-reranker-integration.test.ts) are NOT caused by this PR — verified reproducible on a clean tree

What's NOT done yet (deliberate Phase B/C/D follow-up)

The primitives are in place; per-call-site wiring follows mechanically:

  • Per-call-site wiring of pack-aware variants in postgres-engine.ts, pglite-engine.ts, whoknows.ts, link-extraction.ts, markdown.ts, facts/eligibility.ts, enrichment-service.ts, enrichment/completeness.ts, cycle/synthesize+patterns.ts
  • Phase C CLI follow-ups: detect, suggest, init, fork, edit, diff, graph, lint, explain, review-candidates, review-orphans
  • Phase D: 7 example packs, schema-pack distribution as .gbrain-schema tarballs, full e2e test, author guide

Full plan: docs/designs/V038_SCHEMA_PACKS.md.

🤖 Generated with Claude Code

garrytan and others added 19 commits May 20, 2026 16:15
CEO + Eng + 3x Outside Voice review complete; 16 decisions locked,
58 codex findings folded. Design doc captures the full scope decisions
+ 12-14 week budget + 4-lane parallelization strategy + 29
implementation tasks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PageType and TakeKind become `string` instead of the pre-v0.38 closed
unions. Validation moves from compile-time exhaustiveness to runtime
checks against the active schema pack (T7+). The 13 `as PageType` and
3 `as TakeKind` casts at engine + cycle + enrichment boundaries widen
to `as string` (still narrowing from `unknown` at SQL row boundaries
but no longer pretending the union is closed).

Closed PageType was already a fiction: Garry's brain has 180+ types
(apple-note, therapy-session, tweet-bundle, …) all riding `as PageType`
casts the engine never enforced. v0.38 formalizes the open shape so
schema packs can declare their own types at runtime.

test/page-type-exhaustive.test.ts rewritten for the v0.38 model:
ALL_PAGE_TYPES becomes the gbrain-base seed list (no longer an
exhaustive enum); a new test asserts the markdown surface accepts
arbitrary user-declared types (paper, researcher, therapy-session,
apple-note, tweet-bundle); assertNever stays as a generic helper for
switches over the closed PackPrimitive enum.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New module src/core/schema-pack/ — 9 files implementing the v0.38
foundations:

  manifest-v1.ts     SchemaPackManifest (Zod-validated) + sha8 +
                     pack-identity (`<name>@<version>+<sha8>`)
                     per E10/codex F7.
  primitives.ts      Five closed primitives (entity/media/temporal/
                     annotation/concept) with default link verbs,
                     frontmatter fields, expert-routing, rubric,
                     extractable. Closed enum is the *new* surface
                     for compile-time exhaustiveness (assertNever
                     migrates from PageType to PackPrimitive).
  loader.ts          YAML/JSON sniffing by extension. Hand-rolled
                     YAML mini-parser (follows storage-config.ts
                     pattern; no js-yaml dep). Handles nested
                     mappings, sequences of scalars + mappings,
                     YAML flow sequences with bare words.
  closure.ts         E8 alias graph BFS. Symmetric per declaration:
                     `aliases: [other]` adds BOTH directions. Depth
                     cap 4. Cycle detection at LOAD time (codex F15
                     — prevents primitive-sibling adversary-profile
                     leak into expert queries).
  per-source.ts      D13 per-source closure CTE builder. Emits
                     deterministic SQL via UNION ALL + lex-sorted
                     source_id branches. Cache-key stable.
  candidate-audit.ts T12 codex fix — privacy-redacted by default.
                     Audit JSONL stores SHA-8 type hashes,
                     slug_prefix (first segment only), frontmatter
                     KEY names (never values). GBRAIN_SCHEMA_AUDIT_
                     VERBOSE=1 opts into full type names. ISO-week
                     rotation; honors GBRAIN_AUDIT_DIR.
  redos-guard.ts     E6/E9 ReDoS defense. vm.runInContext({timeout:
                     50}) primary path; LINK_EXTRACTION_TOTAL_
                     BUDGET_MS=500 per-page cap. PageRegexBudget
                     class tracks cumulative regex time; degrades
                     to mentions on exhaust (deterministic lex
                     order). T24 spike confirms Bun behavior.
  registry.ts        D13 7-tier resolution chain (per-call CLI-only
                     trust-gated → env → per-source-db → brain-db
                     → gbrain.yml → home-config → gbrain-base
                     default). resolvePack walks extends chain with
                     E4 soft-warn-at-4 + hard-cap-at-8. In-memory
                     cache keyed on pack identity (manifest sha8).
  index.ts           Public exports barrel for downstream Phase B
                     refactors.

Test: 38 cases pinning the contracts (alias graph symmetric per
declaration, E8 adversary-profile-excluded regression, transitive
depth cap, cycle reject at load, CTE deterministic ordering, 7-tier
resolver including D13 trust-gate, YAML round-trip JSON+YAML+flow
sequences, sha8 determinism, primitive defaults). All hermetic; uses
withEnv() per the test-isolation lint.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
E6 locked vm.runInContext({timeout: 50}) as the ReDoS guard. E9
required verifying Bun's vm timeout actually interrupts catastrophic
regex before trusting it in production. This spike runs `^(a+)+$`
against 1MB of 'a' to confirm the timeout fires.

Verdict on Bun 1.3.13 (macOS arm64): PASS — vm.runInContext throws
"Script execution timed out after 50ms" within ~507ms wall-clock for
the test pattern. Wall-clock is ~10x configured timeout because Bun
checks timeout at instruction boundaries and tight backtracking loops
yield infrequently. The per-page budget (500ms cumulative in
redos-guard.ts) absorbs this: ONE catastrophic regex burns the budget,
ALL remaining verbs on that page degrade to mentions per design.
Total CPU per page bounded regardless of pathological pattern count.

Re-run this spike on Bun version bumps: `bun scripts/spike-bun-vm-
timeout.ts`. Exit 0 = production path safe; exit 1 = fall back to
E6 option B (persistent worker pool).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ndidates)

Migration v80 (T3 + codex T10): drops `takes_kind_check` CONSTRAINT
from the takes table on both engines. Pre-v0.38, kind values were
enforced by a closed DB CHECK (fact|take|bet|hunch) AND a closed TS
union. v0.38 widens both layers together — DB CHECK dropped here;
TS type widened in the prior T1 commit. Runtime validation moves to
the active schema pack's annotation primitive `takes_kinds:` field.
Existing brains see no change (gbrain-base seeds the same 4 values);
schema packs extend to {finding, hypothesis, observation, …} per
domain.

Migration v81 (T4 + T28 + E11 inline canonical snapshot): adds
`eval_candidates.schema_pack_per_source JSONB NULL`. Per-row shape:

  {
    "<source_id>": {
      "pack_name": "garry-pack",
      "pack_version": "1.2.0",
      "manifest_sha8": "ab12cd34",
      "alias_closure_resolved": {"person": ["person","researcher"], ...}
    }, ...
  }

The inline `alias_closure_resolved` is the codex F8/E11 fix — replay
self-contained so a pack file deletion can't break a year-old eval.
~1KB per row, ~10MB/year for a heavy user. Pack identity =
`<pack-name>@<version>+<manifest_sha8>` (codex F7). Replay fails
closed on version-drift unless --use-captured-snapshot.

Tests:
  - test/v80-v81-smoke.test.ts (3 cases) — pins the drop + add via
    real PGLite engine round-trip. Inserts a 'finding' kind take
    (pre-v80 would have failed CHECK); verifies the new JSONB column
    accepts the canonical snapshot shape.
  - test/schema-bootstrap-coverage.test.ts — adds
    eval_candidates.schema_pack_per_source to COLUMN_EXEMPTIONS
    (no forward-reference index in PGLITE_SCHEMA_SQL so bootstrap
    probe isn't required).

Numbering: v77 + v78 were claimed by v0.37 waves (skillpack-registry
+ cross-modal). v79 was claimed by v0.37.1.0 brainstorm/lsd. v80 +
v81 are the next available slots.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`src/core/schema-pack/base/gbrain-base.yaml` is the universal starter
pack — every brain inherits gbrain-base by default unless it explicitly
opts out via `extends: null`. Existing brains see ZERO behavior change
after upgrade: the YAML reproduces pre-v0.38 hardcoded behavior
byte-for-byte across:

  - All 22 ALL_PAGE_TYPES seed entries with primitive classifications
    matching the pre-v0.38 inferType + enrichment routing
  - inferType path-prefix table (people/, companies/, deals/, …)
  - inferLinkType verb regexes (founded/invested_in/advises/works_at
    + meeting→attended + image→image_of)
  - FRONTMATTER_LINK_MAP entries (person:company → works_at, etc.)
  - takes_kinds = {fact, take, bet, hunch} (replaces the v41/v48 CHECK)
  - person + company are the only expert_routing defaults (replaces
    whoknows DEFAULT_TYPES + find_experts SQL hardcodes)
  - Empty alias graph (codex F8 + E8 — gbrain-base ships with NO
    alias edges so existing search semantics are unchanged; users
    opt into aliases via schema review-candidates)

scripts/generate-gbrain-base.ts is the codegen validator (T5+T25 +
codex F21 determinism gate). v0.38 ships hand-maintained YAML
validated by this script:
  - Re-loads gbrain-base.yaml and asserts manifest validates
  - Asserts every ALL_PAGE_TYPES seed has a matching page_type entry
  - Asserts re-load produces consistent page_type count
  - Run: `bun scripts/generate-gbrain-base.ts`
  - Exits 0 on PASS, 1 on drift, 2 on script error

test/regressions/gbrain-base-equivalence.test.ts is the CI-blocking
parity gate (8 cases pinning ALL_PAGE_TYPES coverage, takes_kinds
exact match, person+company expert_routing, inferType path mappings,
FRONTMATTER_LINK_MAP key entries, inferLinkType verb regexes, empty
alias graph by default, codegen consistency in-process). If this test
fails, gbrain-base.yaml drifted from the source-of-truth constants.

Loader fix: YAML mini-parser extended to handle flow sequences with
bare words (`[company, companies]`) — previously only accepted
JSON-quoted variants. Tests in T2 commit cover this.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dary

E2 promotes the shared distribution surface (tarball, trust-prompt,
registry-client, remote-source, registry-schema, scaffold-third-party)
from src/core/skillpack/ to a named src/core/distribution/ module.
Schema-pack (v0.38) and skillpack (v0.37) both consume these helpers
— the new module makes that reuse explicit instead of forcing
schema-pack code to import from a skillpack-named module.

Physical layout (eng-review E2 Option B): the implementations stay
at src/core/skillpack/ to avoid a big-bang move that would touch ~15
v0.37 callers and risk breaking the just-shipped skillpack pipeline.
src/core/distribution/index.ts is a re-export barrel — schema-pack
imports from the canonical name; v0.37 internals stay where they are.
A v0.39+ pass may physically move the implementations if signal
warrants it.

T26 (codex F6 + F25) — src/core/distribution/ has a strict import
boundary: MAY only import from `../skillpack/` and node built-ins.
Forbidden from importing src/commands/, src/core/schema-pack/,
engines, or config resolution. The boundary is pinned by a source-
text grep in test/distribution-import-boundary.test.ts — if a future
edit adds a forbidden import, the test fails loud before the bad
module shape lands in `bun run verify`.

Re-exported surface:
  Tarball: extractTarball, packTarball, fileSha256,
    DEFAULT_EXTRACT_CAPS, TarballError, TarballExtractResult,
    TarballPackOptions/Result, ExtractCaps, TarballErrorCode
  Trust: askTrust, renderIdentityBlock, AskTrustOptions,
    SkillpackTier, TrustPromptInput/Decision
  Registry HTTP: loadRegistry, findPack, findPackWithTier,
    searchPacks, resolveRegistryUrl, DEFAULT_REGISTRY_URL,
    DEFAULT_ENDORSEMENTS_URL, RegistryClientError,
    LoadRegistryOptions, LoadedRegistry, RegistryClientErrorCode
  Remote source: resolveSource, classifySpec, RemoteSourceError,
    ResolvedSource, ResolveSourceOptions, SpecKind, ResolvedSourceKind
  Registry schema: REGISTRY_SCHEMA_VERSION (v1),
    ENDORSEMENTS_SCHEMA_VERSION (v1), RegistryCatalog,
    EndorsementsFile, validateRegistryCatalog,
    validateEndorsementsFile, validateRegistryEntry, effectiveTier,
    RegistryEntry, RegistrySource, RegistryBundles, RegistryTier,
    EndorsementRecord, RegistrySchemaError, RegistrySchemaErrorCode
  Scaffold pipeline: runScaffoldThirdParty, defaultStatePath,
    ScaffoldThirdPartyError, ScaffoldThirdPartyOptions/Result/Status

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`src/core/schema-pack/load-active.ts` is the boundary helper Phase B
consumers call from operations.ts + engines + cycle handlers. It
composes:
  1. 7-tier resolution chain (registry.resolveActivePackName)
  2. Disk-backed pack manifest loading
     - gbrain-base from bundled src/core/schema-pack/base/gbrain-base.yaml
     - User packs from ~/.gbrain/schema-packs/<name>/pack.{yaml,yml,json}
  3. extends-chain resolution + alias-graph build (registry.resolvePack)

Returns a `ResolvedPack` with stable pack identity (`<name>@<version>+
<manifest_sha8>`). In-process cached by identity; cache invalidated by
manifest content change.

Trust gate: per-call schema_pack opt (tier 1) is honored ONLY when
`remote === false`. Operations.ts handles the explicit
permission_denied rejection for remote callers BEFORE invoking this
helper (T8). This loader assumes the input is already-vetted.

Test seam: `__setPackLocatorForTests(locator)` lets tests inject
synthetic packs without writing to ~/.gbrain. Paired
`_resetPackLocatorForTests` in afterAll prevents leak across files.
`resolveActivePackNameOnly` returns just the name + tier source for
`gbrain schema active` provenance display without paying the load cost.

config.ts: GBrainConfig gains `schema_pack?: string` (tier-6 file-plane
field). Edit ~/.gbrain/config.json directly; tier 4 (`gbrain config
set schema_pack <name>`) writes the DB plane and beats the file.

Test: 9 cases covering default-tier-7 gbrain-base load, tier-1
per-call resolution, tier-1 trust gate rejection on remote=true,
tier-2 GBRAIN_SCHEMA_PACK env override (via withEnv()), tier-3
per-source DB config priority, UnknownPackError when pack missing,
injected locator end-to-end with a tempfile-backed pack, identity
stability across reloads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`src/core/schema-pack/op-trust-gate.ts` is the operations-layer
defense for the per-call `schema_pack` opt (tier 1 of the 7-tier
resolution chain in registry.ts). D13 + codex F4 — remote/MCP callers
passing `schema_pack` even with read+write scope could broaden their
effective read closure or escape strict-mode validation. The
v0.26.9 + v0.34.1.0 trust-boundary hardening waves explicitly closed
this attack class for source_id; v0.38 re-applies the same posture.

Two exports:
  validateSchemaPackTrustGate(ctx, schemaPackParam) — pure validator
    that returns the validated pack name or undefined; throws
    SchemaPackTrustGateError (code: 'permission_denied') on:
      - ctx.remote !== false AND schemaPackParam is set (fail-closed)
      - schemaPackParam is non-string + non-null/undefined
    Op handlers call this once at entry against their declared params.

  loadActivePackForOp(ctx, params) — convenience wrapper that does
    the trust gate AND loads the resolved active pack in one call.
    Threads sourceId from sourceScopeOpts(ctx) into the resolution.
    Returns ResolvedPack.

Fail-closed default per v0.26.9 F7b: `ctx.remote === undefined` is
treated as remote/untrusted. Only the literal `false` is the CLI
escape hatch. Casts via `as any` or `Partial<>` spreads can't downgrade
trust by accident.

Test (test/schema-pack-trust-boundary.test.ts, 8 cases):
  - CLI (remote=false) accepts per-call freely
  - MCP (remote=true) rejects with SchemaPackTrustGateError
  - Fail-closed: undefined remote rejects
  - undefined/null per-call is a no-op (returns undefined)
  - Non-string per-call rejects with type error
  - Error envelope carries `code: 'permission_denied'` for the
    dispatch layer to surface uniformly
  - Error message names ALL safe channels (gbrain.yml,
    GBRAIN_SCHEMA_PACK env, ~/.gbrain/config.json, `gbrain config
    set schema_pack`) so an MCP operator can self-serve.

The wider op-handler wiring (each query/search/list_pages/find_experts/
traverse/put_page handler calling loadActivePackForOp + threading the
pack into engine queries) lands in T6/T7 alongside the per-source CTE
and inferType refactors. T8 lands the trust gate primitive in
isolation so future handler-by-handler wiring stays mechanical.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`inferTypeFromPack(filePath, manifest)` is the new pack-aware path →
type primitive. Async/import-aware callers (import-file.ts, sync.ts,
cycle phases) can switch to this variant in subsequent commits to
honor user-declared types in their active pack. Existing
`inferType(filePath)` stays as a synchronous wrapper around the
GBRAIN_BASE_PATH_PREFIXES table that mirrors gbrain-base.yaml exactly.

Caught a real parity bug in gbrain-base.yaml: the YAML emitted page
types in ALL_PAGE_TYPES order, but pre-v0.38 inferType ran in a
SPECIFIC PRIORITY ORDER. `projects/blog/writing/essay.md` should
resolve to `writing` (writing/ wins over projects/ as a stronger
signal), but pack-driven iteration in ALL_PAGE_TYPES order returned
`project` first. Reorder gbrain-base.yaml so the priority chain
preserves pre-v0.38 behavior:

  1. writing → wiki/{analysis,guides,hardware,architecture} → concept
     (wiki subtypes scan FIRST; stronger signal than ancestor dirs)
  2. Ancestor entities: person/company/deal/yc/civic/project/source/media
  3. BrainBench v1 amara-life-v1 corpus: email/slack/calendar-event/note/meeting
  4. No-prefix types (set via frontmatter): code/image/synthesis

Parity is now CI-pinned by test/infer-type-pack.test.ts which:
  - asserts inferTypeFromPack(path, gbrain-base) matches parseMarkdown's
    legacy type inference for 21 representative paths
  - verifies a synthetic research pack with `researchers/` + `papers/`
    routes correctly to user-declared types
  - verifies empty `page_types` arrays fall back to gbrain-base defaults
  - covers undefined filePath + case-insensitive matching

gbrain-base-equivalence.test.ts continues to pass (the path-prefix
spot-checks didn't care about ordering — they just verified each
mapping exists).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`src/core/schema-pack/link-inference.ts` adds two new primitives:
  inferLinkTypeFromPack(pack, pageType, context, budget?)
  frontmatterLinkTypeFromPack(pack, pageType, fieldName)

Pre-v0.38 `inferLinkType` (in link-extraction.ts) uses richly tuned
production regexes (FOUNDED_RE / INVESTED_RE / ADVISES_RE / WORKS_AT_RE
+ page-role priors) refined against real brain content. Reproducing
those literally in gbrain-base.yaml would require multi-line YAML
escape jujitsu and lose the WHY comments. Pragmatic split:

  - gbrain-base.yaml carries verb NAMES + simplified sketch regexes.
    Community-pack authors copy this pattern; gbrain-base provides
    documentation-grade examples.
  - Production matching for built-in verbs stays in link-extraction.ts
    via the rich FOUNDED_RE / INVESTED_RE / ... constants. Legacy
    `inferLinkType` continues to work exactly as before.
  - `inferLinkTypeFromPack` CONSULTS pack-declared verbs in addition
    to legacy. Pack matches win (user opts in deliberately); fall
    through to legacy `inferLinkType` when no pack rule fires.

Resolution order in inferLinkTypeFromPack:
  1. Page-type-bound verbs from pack (meeting → attended,
     image → image_of). Declared via inference.page_type.
  2. Pack-declared regex matchers, in manifest declaration order
     (first match wins). Runs under PageRegexBudget when one is
     passed — cumulative regex time on the page stays capped at
     LINK_EXTRACTION_TOTAL_BUDGET_MS (500ms) per E9.
  3. Returns null on no match — caller falls through to legacy
     `inferLinkType` for built-in matchers (founded / invested_in /
     advises / works_at + person→company priors).

Malformed regex in a pack returns null gracefully (skip + continue
to next link_type) — defense in depth on top of load-time validation.

frontmatterLinkTypeFromPack mirrors the legacy FRONTMATTER_LINK_MAP
walk: iterates pack.frontmatter_links in declaration order; first
(page_type, field) match wins; returns null on no match.

Test (test/link-inference-pack.test.ts, 10 cases):
  - meeting → attended via page_type binding
  - image → image_of via page_type binding
  - regex matchers: supports / weakens / cites
  - returns null when no rule fires (caller fall-through contract)
  - declaration order: first match wins
  - PageRegexBudget integration (regex time accounted toward cap)
  - legacy inferLinkType still resolves founded / invested_in /
    advises independently (pack-aware path doesn't break legacy)
  - malformed regex returns null gracefully
  - frontmatterLinkTypeFromPack: person:company → works_at,
    meeting:attendees → attended, plus negative cases

Phase B follow-up: callers in extract.ts / sync.ts / cycle phases
that want to honor user-declared verbs call inferLinkTypeFromPack
first then inferLinkType. T7b lands the primitive; per-call-site
adoption is mechanical.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`expertTypesFromPack(pack)` returns the list of pack-declared types
with `expert_routing: true`, in manifest declaration order. Replaces
the pre-v0.38 hardcoded `DEFAULT_TYPES = ['person', 'company']` in
whoknows.ts:89 (and the matching ['person','company'] literals in
postgres-engine.ts:3451+3482 and pglite-engine.ts:3489+3523 — codex
finding #3's named sites).

gbrain-base preserves person + company as expert_routing defaults, so
existing whoknows behavior is byte-for-byte unchanged. Research packs
declaring `researcher` + `principal-investigator` with
`expert_routing: true` get those types routed automatically.

Two variants:
  expertTypesFromPack(pack) — returns array, possibly empty
  expertTypesFromPackOrThrow(pack) — throws clear error on empty so
    the whoknows CLI entrypoint surfaces "this pack doesn't support
    expert routing — switch packs or edit the manifest" instead of
    silently returning zero results

Test (test/expert-types-pack.test.ts, 6 cases):
  - gbrain-base parity: returns [person, company]
  - Research pack: returns researcher + principal-investigator
  - Declaration order preserved (NOT sorted)
  - Empty array when no expert_routing types declared
  - OrThrow variant throws on empty with paste-ready hint
  - OrThrow variant passes when types exist

Phase B follow-up wires whoknows.ts + postgres-engine + pglite-engine
to call expertTypesFromPack(activePack) instead of the hardcoded
DEFAULT_TYPES literal. T_W lands the primitive in isolation; per-call-
site adoption is mechanical and per-engine.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `extractableTypesFromPack(pack)` + `isExtractableType(pack, type)`
primitives. Replaces the hardcoded ELIGIBLE_TYPES list at
src/core/facts/eligibility.ts:51 with pack-driven lookup. gbrain-base
preserves the exact 7 legacy types — note, meeting, slack, email,
calendar-event, source, writing — so existing facts extraction
behavior is byte-for-byte unchanged.

Also fixes gbrain-base.yaml extractable flags. The original codegen
emitted incorrect defaults (person/company/deal marked extractable,
note/slack/email/calendar-event/source/writing marked NOT extractable).
Adjusted to match the legacy ELIGIBLE_TYPES list exactly:
  - writing: true (was false)
  - source: true (was false)
  - email: true (was false)
  - slack: true (was false)
  - calendar-event: true (was false)
  - note: true (was false)
  - meeting: true (was already true)
  - person/company/deal: false (entities, not facts-eligible content)

Tests (test/extractable-pack.test.ts, 4 cases):
  - gbrain-base extractable Set exactly matches legacy 7 types
  - Per-type isExtractableType lookups parity
  - research-state pack with paper + claim + finding extractable
  - Empty page_types returns empty Set

Phase B follow-up wires facts/eligibility.ts to call
extractableTypesFromPack(activePack) instead of the hardcoded
ELIGIBLE_TYPES literal. T7d lands the primitive in isolation; per-call-
site adoption is mechanical.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`enrichableTypesFromPack(pack)` + `rubricNameForType(pack, type)` primitives
replace the hardcoded ['person', 'company', 'deal'] in
src/core/enrichment-service.ts:25 + src/core/enrichment/completeness.ts:221
RUBRICS_BY_TYPE map.

gbrain-base preserves person + company + deal as enrichable defaults
with rubric slots person-default / company-default / deal-default —
existing enrichment behavior unchanged. Custom packs (research-state,
legal, product) override with domain-specific entities.

Design note: the pack manifest declares rubric NAMES, not rubric
BODIES. Rubric implementations stay in-source at
src/core/enrichment/completeness.ts where they're authored
deterministically. Serializing rubric structure into YAML would
require multi-page schemas; the name-to-implementation indirection
keeps the YAML manifest small and rubric authoring stays where
linters + tests already cover it.

Test (test/enrichable-pack.test.ts, 4 cases):
  - gbrain-base parity: person + company + deal enrichable
  - rubricNameForType returns the declared slot name
  - returns null for non-enrichable types
  - custom research pack overrides cleanly

Phase B follow-up wires enrichment-service + completeness.ts to call
enrichableTypesFromPack(activePack) instead of the hardcoded literal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User-facing CLI surface that exposes the v0.38 schema-pack engine.
Five essential subcommands ship in v0.38:

  gbrain schema active                Show resolved pack + tier source
  gbrain schema list                  List bundled + installed packs
  gbrain schema show [<pack>]         Pretty-print manifest (default: active)
  gbrain schema validate [<pack>]     Validate manifest shape
  gbrain schema use <pack>            Activate pack (file-plane, tier 6)

Deferred to v0.39+ (mechanical follow-up — primitives are in place):
  init, fork, edit, diff, detect, suggest, review-candidates,
  review-orphans, graph, lint, explain

`gbrain schema use <name>` writes to ~/.gbrain/config.json's schema_pack
field (tier 6 in the 7-tier resolution chain). DB-plane tier 4
(`gbrain config set schema_pack <name>`) and env tier 2
(GBRAIN_SCHEMA_PACK) still beat tier 6 for runtime overrides without
editing the file.

Dispatch lives in handleCliOnly (no engine connect needed — schema
commands are pure file I/O). Added 'schema' to CLI_ONLY allowlist
so the dispatcher doesn't reject it.

The `use` path runs validation BEFORE writing — refuses to activate
a malformed pack. The `show` and `validate` commands accept either an
explicit pack name or default to the active pack.

Test (test/schema-cli.test.ts, 8 cases via Bun subprocess):
  - list shows bundled gbrain-base
  - show gbrain-base prints 22 page types + 12 link verbs + takes_kinds
  - validate gbrain-base passes
  - active reports default resolution + pack identity
  - unknown pack errors with paste-ready hint
  - unknown subcommand exits 2 with usage hint
  - `schema use` without arg shows usage

End-to-end smoke against the real bundled gbrain-base.yaml.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.38.0.0 — Schema Packs: Bring Your Own Shape

PageType opens from closed 23-element union to `string`. Schema packs
declare your domain (types, link verbs, expert routing, facts
eligibility, enrichment rubrics) and the engine consults the active
pack instead of hardcoded literals.

Phase A (engine flex foundation) + Phase B foundational primitives +
Phase C minimal CLI surface, all landed as 16 atomic bisect-friendly
commits. 95+ new tests across 12 test files. Existing brains see
zero change after upgrade (gbrain-base reproduces pre-v0.38 behavior
byte-for-byte).

16 decisions locked through CEO + Eng + 3x Outside Voice review.
58 codex findings folded.

Phase B per-call-site wiring, Phase C CLI follow-ups (detect/suggest/
init/fork/diff/graph/lint/explain), and Phase D (7 example packs +
distribution + docs) follow in subsequent waves. Primitives are in
place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
#	src/core/migrate.ts
# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant