v0.37.3.0 feat: skill_brain_first doctor check + auto-fix + declarative opt-out (supersedes #1206)#1215
Merged
Merged
Conversation
…ve opt-out Cathedral wave superseding PR #1206. Doctor now scans every SKILL.md for external-lookup tools (web_search / web_fetch / exa / perplexity / happenstance / crustdata / captain_api / firecrawl) and warns when the skill has no brain- first compliance signal. gbrain doctor --fix auto-inserts the canonical > **Convention:** see [conventions/brain-first.md](...) callout via the dry-fix.ts MISSING_RULE_PATTERNS extension (sharing safety gates with the existing REPLACE patterns). Motivated by the 2026-05-19 tweet-shield incident: cross-modal eval flagged Garry's Palantir tweet as risky because no model knew he built it, but the brain already had "designed the entire Finance product UI" and "150+ PSDs from April-December 2006." Static check catches authorship; v0.37+ runtime gate (filed in TODOS.md) closes the dispatch side. Key design decisions locked via /plan-eng-review + codex outside-voice review: - A1: frontmatter ships only brain_first: exempt (no required/n/a enum) - A2: snapshot+diff audit at ~/.gbrain/audit/skill-brain-first-YYYY-Www.jsonl with transition-only writes (stable brains = 0 lines/run) - A3: scaffold template pre-inserts callout; skillify check fails (exit 1) on external + no callout + no exempt - A4: position-relative gate is BODY-ONLY (frontmatter tools: [web_search] declaration doesn't false-flag the skill) - Q1: single pure analyzeSkillBrainFirst() helper consumed by 3 surfaces - CMT1: no upgrade migration — doctor surfaces hint, --fix applies via dry-fix safety gates (user stays in loop) - CMT2: dropped tools+writes_pages auto-exemption (was hiding mixed-class skills like idea-ingest/meeting-ingestion/data-research) Trio: VERSION + package.json + CHANGELOG aligned at 0.37.1.0. 56 unit cases + 12 E2E cases pass. 170 related existing tests pass unchanged. Self-dogfood: gbrain doctor against this repo's skills/ reports skill_brain_first: ok across 43 skills (compliant or exempt). functional-area-resolver and strategic-reading skills gained brain_first: exempt to validate the declarative opt-out in production code (both name perplexity in dispatcher prose without calling it). Co-Authored-By: garrytan-agents <noreply@github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reconciled v0.37.1.0 bump against master's v0.37.0.0 skillpack-ecosystem wave. VERSION + package.json + CHANGELOG.md keep brain-first wave's claim (0.37.1.0) above master's entries; llms-full.txt regenerated against the merged docs. Verify gate + brain-first unit + E2E suites green post-merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Added Key Files entries for the four new modules: - src/core/skill-frontmatter.ts (shared parser) - src/core/skill-brain-first.ts (analyzer + FORMERLY_HARDCODED_EXEMPT) - src/core/skill-fix-gates.ts (extracted safety primitives) - src/core/audit-skill-brain-first.ts (snapshot+diff JSONL) Extended existing entries: - src/core/filing-audit.ts: rewired to shared parser - src/core/dry-fix.ts: MISSING_RULE_PATTERNS INSERT pattern type - src/commands/doctor.ts: skill_brain_first check + tweet-shield framing - src/commands/skillify-check.ts: required item 12 + scaffold pre-insert Added test inventory entries: - test/skill-brain-first.test.ts (56 unit cases) - test/e2e/skill-brain-first.test.ts (12 E2E cases) Regenerated llms-full.txt via bun run build:llms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nnect CI run #76881161092 failed because scripts/check-skill-brain-first.sh invoked plain `gbrain doctor --json`, which routes through connectEngine(). With no ~/.gbrain/config.json present (CI's case — runner is bun-only, no brain init), connectEngine() exits 1 with "No brain configured." and emits zero stdout. The python parser sees an empty file and returns parse_error, failing the verify gate. Fix: pass --fast to doctor. --fast routes through runDoctor(null, ...) which runs the filesystem-only check set (resolver_health, skill_conformance, skill_brain_first) and emits the standard single-line JSON envelope the parser expects. skill_brain_first is filesystem-only by design (scans SKILL.md, no DB touch), so --fast is the correct knob, not a workaround. Verified by reproducing the CI failure mode locally with GBRAIN_HOME=/tmp/empty-... — gate now passes both with and without a configured brain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #1214 (brainstorm + lsd) claimed v0.37.1.0 concurrently with #1215. Skipping 0.37.2.0 leaves a buffer for #1214's adjacent slot. Trio (VERSION + package.json + CHANGELOG header + inline "To take advantage of v0.37.3.0" block) aligned at 0.37.3.0. No behavior changes — version metadata only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Master landed PR #1214 (brainstorm + lsd) as v0.37.1.0. Wave's v0.37.3.0 claim sits cleanly above it in the CHANGELOG with both entries preserved. VERSION + package.json + CHANGELOG header all at 0.37.3.0; master's v0.37.1.0 entry slots in as the second-most- recent release. llms-full.txt regenerated against the merged docs. Verify gate + 114 brain-first + adjacent unit/E2E cases green post-merge under CI-simulated GBRAIN_HOME=/tmp/empty-... env. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan
added a commit
that referenced
this pull request
May 20, 2026
…1211) * v0.36.1.1 hotfix: takes_resolution_consistency CHECK accepts 'unresolvable' Unblocks production grading scripts that write the judge's 4th verdict type. Before this fix, every quality='unresolvable' INSERT/UPDATE hit a CHECK violation — 0 of 34 writes landed in a recent prod run. Migration v74 widens BOTH: - takes_resolution_consistency (table-level CHECK) — admits the ('unresolvable', NULL) pair alongside the existing 4 legal shapes - resolved_quality column-level CHECK — drops the auto-generated name from v37, re-adds as takes_resolved_quality_values with the 4-state enum Backward compatible. Existing rows with quality IN (NULL, 'correct', 'incorrect', 'partial') all satisfy the new CHECKs unchanged. TakesScorecard gains sibling fields unresolvable_count + unresolvable_rate; the existing `resolved` field deliberately keeps its 3-state meaning so historical scorecards compare apples-to-apples (T1c sibling-field design from the eng review). Pinned by: - test/takes-resolution.test.ts — R1-R5 round-trip - test/migrate.test.ts — v74 structural assertions + PGLite E2E suite exercising all valid + invalid (quality, outcome) shapes Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(e2e/schema-drift): reset public schema in beforeAll to isolate from caller bootstrap state Previously the test trusted caller-provided DATABASE_URL to point at a fresh database. CLAUDE.md's E2E lifecycle prescribes 'gbrain doctor --json' as the bootstrap step (needed by oauth-related tests for table creation), but doctor configures the gateway and bakes the configured embedding model into content_chunks.model DEFAULT during the initial CREATE TABLE. On re-run, CREATE TABLE IF NOT EXISTS is a no-op and the bootstrapped default sticks. PGLite (always fresh-in-memory) gets the unconfigured-gateway fallback 'text-embedding-3-large'. The test reported phantom drift: pg.default="'zembed-1'::text" pglite.default="'text-embedding-3-large'::text" Fix: DROP SCHEMA public CASCADE + CREATE SCHEMA public before pg.initSchema. Resets every table/index/sequence/constraint added by prior tooling. The PGLite side is already fresh-per-test by construction. Verified order-independent: - Fresh DB → 6/0 pass - After 'gbrain doctor' bootstrap → 6/0 pass - Full E2E suite (mechanical + schema-drift) → 84/0 pass Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(safety): gate schema-drift DROP SCHEMA + relax TakesScorecard interface Two findings from Codex adversarial review on the v0.37.0.1 hotfix: 1. **DROP SCHEMA safety gate (P0).** test/e2e/schema-drift.test.ts had an unguarded DROP SCHEMA public CASCADE. A developer running the E2E with DATABASE_URL pointing at a real brain or staging DB would lose the entire public schema. The fix: triple-check before destruction. - Parse the DATABASE_URL hostname + db name - Allow reset only when: explicit GBRAIN_TEST_DB=1 OR (localhost host AND test-shaped db name like gbrain_test, *_test, test_*, *_e2e) - Refuse otherwise with a loud paste-ready warning - The test still proceeds (the parity check is the fail-safe — if the caller already had a fresh DB, parity passes; if not, parity fails LOUDLY instead of nuking their data) Verified all three branches: localhost+gbrain_test resets (6/0 pass); localhost+production_brain refuses + warns (6/0 pass against pre-existing schema); GBRAIN_TEST_DB=1 override on production_brain name allows reset. 2. **TakesScorecard interface compat.** Making `unresolvable_count` + `unresolvable_rate` required fields on the public TakesScorecard interface broke downstream SDK consumers who construct scorecard fixtures (gbrain-evals, custom engines). The hotfix shouldn't impose a compile-break on hotfix users. Fix: make both fields optional (`?: number` / `?: number | null`). `finalizeScorecard` still always populates them, so all internal code sees the real values. External fixtures that omit them compile cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(safety,ux): tighten DROP SCHEMA gate + surface unresolvable in scorecard CLI Second codex adversarial pass on v0.37.0.1 surfaced two residual findings. **P0 — Safety gate still bypassable.** First-pass safety gate used `explicitOptIn || (isLocalhost && looksLikeTestDb)` — meaning `GBRAIN_TEST_DB=1` bypassed BOTH the host check AND the db-name check. Someone running the E2E with that env set against a production DATABASE_URL would still nuke their schema. Codex re-flagged it as P0. Tightened logic: `looksLikeTestDb && (isLocalhost || ciOptIn)`. The db-name pattern is now the hard floor — `gbrain_test`, `*_test`, `test_*`, `*_e2e`. GBRAIN_TEST_DB=1 only relaxes the localhost requirement (for CI service-name hosts). Setting the env on a DATABASE_URL pointing at `production_data` is explicitly refused with a paste-ready message naming the failed check. Verified 3 ways: - gbrain_test + localhost → resets (6/0 pass) - production_data + GBRAIN_TEST_DB=1 → REFUSES with clear message - foo_e2e + GBRAIN_TEST_DB=1 → resets (test-shaped name passes) **P2 — gbrain takes scorecard hides the unresolvable signal.** Early-return on `resolved === 0` was triggered before the new sibling fields rendered. A brain with only `quality='unresolvable'` verdicts — the spec's whole production case — printed "No resolved bets yet" and exited. The unresolvable_rate field was unreachable from the human CLI unless the user knew to pass `--json`. Fix: gate the early-return on `resolved === 0 AND unresolvable_count === 0`. Render `unresolvable` count + `unresolvable_rate` alongside `partial_rate` when present. Threshold warn at 30% (mirrors PARTIAL_RATE_WARNING_THRESHOLD) pointing at retrieval coverage, not prediction accuracy — the actionable read for high-unresolvable brains. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: renumber v0.37.0.1 → v0.37.2.0 (v0.37.1.0 claimed by other PRs) PRs #1214 and #1215 both claim v0.37.1.0; bumping past to the next free slot. Migration v79 renamed `takes_unresolvable_quality_v0_37_0_1` → `takes_unresolvable_quality_v0_37_2_0`. VERSION + package.json + CHANGELOG + llms bundles + inline doc references all swept. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Master landed PR #1211 (takes_resolution_consistency CHECK accepts 'unresolvable') as v0.37.2.0. Wave's v0.37.3.0 claim still sits one slot above. CHANGELOG preserves both entries with master's v0.37.2.0 slotting in as the second-most-recent release. llms-full.txt regenerated against merged docs. Verify gate + 114 brain-first + adjacent unit/E2E cases green post-merge under CI-simulated GBRAIN_HOME=/tmp/empty-... env. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9 tasks
garrytan
added a commit
that referenced
this pull request
May 21, 2026
…view fix (#1267) * v0.37.6.0 feat: voyage-code-3 discoverability + reindex-code cost-preview fix For agents indexing source code with gbrain, the right embedding model is now obvious — and the brain tells you so out loud. Decision tree + Topology 3 doc + Topology 3 setup-skill pointer + runtime stderr nudge from `gbrain reindex --code` against non-code-tuned models. Same diff fixes the stale hardcoded `text-embedding-3-large` in the cost preview that would have made the nudge land badly. Tests: 3 new files (recipe regression, nudge logic + CLI integration, cost-preview IRON-RULE regression). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * todos: file pre-existing doctor-report-remote score regression (#1215 follow-up) Noticed during /ship of v0.37.6.0. The skill_brain_first check added in v0.37.3.0 (#1215) appears to tank the doctor health score on fresh PGLite test brains, causing test/doctor-report-remote.test.ts:65 to fail with health_score: 50 (expects >=70). Pre-existing on master; not in this branch's touch surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Supersedes PR #1206 (cathedral wave). Doctor now scans every SKILL.md for external-lookup tools (
web_search/web_fetch/exa/perplexity/happenstance/crustdata/captain_api/firecrawl) and warns when the skill has no brain-first compliance signal.gbrain doctor --fixauto-inserts the canonical> **Convention:** see [conventions/brain-first.md](...)callout via a newdry-fix.ts MISSING_RULE_PATTERNSextension (sharing safety gates with the existing REPLACE patterns).Motivated by the 2026-05-19 tweet-shield incident: cross-modal eval flagged Garry's Palantir tweet as risky because no model knew he built it — but the brain already had "designed the entire Finance product UI" and "150+ PSDs from April-December 2006."
Structural changes (vs PR #1206's minimal version):
EXEMPT_SKILLSallowlist → structural-signal inference + declarativebrain_first: exemptfrontmatter opt-outautoDetectSkillsDirReadOnly()4-tier fallbackCheck.issues[]array with richfixpayloaddry-fix.tsMISSING_RULE_PATTERNS with full safety gates (working-tree-clean, code-fence, install-path, proximity)~/.gbrain/audit/skill-brain-first-YYYY-Www.jsonl(transition-only writes; stable brains = 0 lines/run)Key design decisions (locked via /plan-eng-review + codex outside-voice review):
brain_first: exemptships (norequired/n/a)tools: [web_search]doesn't false-flag)analyzeSkillBrainFirst()helper, three consumers--fixapplies via dry-fix safety gates (user stays in loop)tools+writes_pagesexemption rule (was hiding mixed-class skills likeidea-ingest,meeting-ingestion,data-research)Test Coverage
--fixdry-run/apply/idempotency cycle, shape assertions, audit transition signalPre-Landing Review
CLEARED via /plan-eng-review on this branch — 9 findings resolved (4 architecture, 3 code quality, 2 test). 0 critical gaps flagged. Outside-voice via codex (gpt-5.5 high-reasoning) found 18 issues — 15 absorbed as plan bake-ins; 3 elevated to cross-model tensions resolved by user (CMT1/CMT2/CMT3, all picked codex direction).
Design Review
No frontend files changed — design review skipped.
Eval Results
No prompt-related files changed — evals skipped.
Greptile Review
No Greptile comments yet (PR being created).
Scope Drift
CLEAN — intent was "implement skill_brain_first cathedral wave"; delivered exactly that. No out-of-scope changes. 2 mid-implementation skill frontmatter edits (
functional-area-resolver,strategic-readinggainedbrain_first: exempt) validate the declarative opt-out mechanism in production code.Plan Completion
Verification Results
Skipped — no dev server running, no UI surface. The wave is CLI-only (
gbrain doctor).TODOS
Three new wave follow-ups added to
TODOS.md:skill_brain_first_trend(~30 LOC; ships once audit data accumulates weeks of history)Documentation
CLAUDE.md updated for the v0.37.1.0 skill_brain_first wave: four new Key Files entries (
skill-frontmatter.ts,skill-brain-first.ts,skill-fix-gates.ts,audit-skill-brain-first.ts); four extended entries (filing-audit.ts,dry-fix.ts,doctor.ts,skillify-check.ts); two new test inventory lines covering the unit + E2E suites.llms-full.txtregenerated per the mandatory chaser rule.README, CHANGELOG, TODOS, and VERSION required no edits — CHANGELOG already shipped with the canonical ELI10 lead + "To take advantage of v0.37.1.0" block, TODOS already carried the three wave follow-ups, README only references
gbrain doctorat a high level (the new check is a sub-surface).Test plan
bun run typecheckcleanbun run verifyclean (15 checks + typecheck, including newcheck:skill-brain-firstgate)bun test test/skill-brain-first.test.ts— 56 unit cases passbun test test/e2e/skill-brain-first.test.ts— 12 E2E cases passbun test test/filing-audit.test.ts test/dry-fix.test.ts test/doctor*.test.ts— 170 cases pass unchangedbun run test— 8053 unit tests across 8 shards + 22 serial files, 0 failuresGBRAIN_SKILLS_DIR=$(pwd)/skills gbrain doctor --jsonreportsskill_brain_first: ok(43 skills compliant or exempt)Post-merge actions
Co-Authored-By: garrytan-agentscredit + comment naming this commit (T13 from plan, deferred per CMT3)🤖 Generated with Claude Code