v0.37.3.0 feat: skill_brain_first doctor check + auto-fix + declarative opt-out (supersedes #1206) by garrytan · Pull Request #1215 · garrytan/gbrain

garrytan · 2026-05-20T03:23:53Z

Summary

Supersedes PR #1206 (cathedral wave). Doctor now scans every SKILL.md for external-lookup tools (web_search / web_fetch / exa / perplexity / happenstance / crustdata / captain_api / firecrawl) and warns when the skill has no brain-first compliance signal. gbrain doctor --fix auto-inserts the canonical > **Convention:** see [conventions/brain-first.md](...) callout via a new dry-fix.ts MISSING_RULE_PATTERNS extension (sharing safety gates with the existing REPLACE patterns).

Motivated by the 2026-05-19 tweet-shield incident: cross-modal eval flagged Garry's Palantir tweet as risky because no model knew he built it — but the brain already had "designed the entire Finance product UI" and "150+ PSDs from April-December 2006."

Structural changes (vs PR #1206's minimal version):

Hardcoded 40-name EXEMPT_SKILLS allowlist → structural-signal inference + declarative brain_first: exempt frontmatter opt-out
Bypassed resolver → canonical autoDetectSkillsDirReadOnly() 4-tier fallback
Flat message → structured Check.issues[] array with rich fix payload
No auto-fix → dry-fix.ts MISSING_RULE_PATTERNS with full safety gates (working-tree-clean, code-fence, install-path, proximity)
No regression prevention → scaffold pre-inserts canonical callout + skillify check fails (exit 1) on external+no-callout+no-exempt
No audit → snapshot+diff JSONL at ~/.gbrain/audit/skill-brain-first-YYYY-Www.jsonl (transition-only writes; stable brains = 0 lines/run)

Key design decisions (locked via /plan-eng-review + codex outside-voice review):

A1: only brain_first: exempt ships (no required / n/a)
A2: snapshot+diff audit, transition-only
A3: scaffold pre-insert + skillify check fail
A4: position-relative gate is BODY-ONLY (frontmatter tools: [web_search] doesn't false-flag)
Q1: one pure analyzeSkillBrainFirst() helper, three consumers
CMT1: no upgrade migration — doctor surfaces hint, --fix applies via dry-fix safety gates (user stays in loop)
CMT2: dropped tools+writes_pages exemption rule (was hiding mixed-class skills like idea-ingest, meeting-ingestion, data-research)

Test Coverage

COVERAGE: 62/62 paths tested (100%)  |  Code paths: 50  |  E2E: 12  |  EVAL: 0 (structural)
QUALITY: ★★★ on the position-relative gate, snapshot+diff, regression set
GAPS: 0

56 unit cases over 9 fixtures absorb PR feat: add skill_brain_first doctor check #1206's 10 inline test strings (IRON-RULE regression preservation)
12 E2E cases cover the --fix dry-run/apply/idempotency cycle, shape assertions, audit transition signal
170 related existing tests (filing-audit, dry-fix, doctor) pass unchanged
8053 unit tests across 8 shards + 22 serial files pass post-merge; 0 failures
Tests: 1247 → 1259 (+12 new test files including 9 fixture SKILL.md)

Pre-Landing Review

CLEARED via /plan-eng-review on this branch — 9 findings resolved (4 architecture, 3 code quality, 2 test). 0 critical gaps flagged. Outside-voice via codex (gpt-5.5 high-reasoning) found 18 issues — 15 absorbed as plan bake-ins; 3 elevated to cross-model tensions resolved by user (CMT1/CMT2/CMT3, all picked codex direction).

Design Review

No frontend files changed — design review skipped.

Eval Results

No prompt-related files changed — evals skipped.

Greptile Review

No Greptile comments yet (PR being created).

Scope Drift

CLEAN — intent was "implement skill_brain_first cathedral wave"; delivered exactly that. No out-of-scope changes. 2 mid-implementation skill frontmatter edits (functional-area-resolver, strategic-reading gained brain_first: exempt) validate the declarative opt-out mechanism in production code.

Plan Completion

PLAN COMPLETION AUDIT
═══════════════════════════════
Plan: ~/.claude/plans/system-instruction-you-are-working-humming-quail.md

## Implementation Tasks
  [DONE] T1 — src/core/skill-frontmatter.ts shared parser
  [DONE] T2 — src/core/skill-brain-first.ts pure analyzer + FORMERLY_HARDCODED_EXEMPT
  [DONE] T3 — src/core/skill-fix-gates.ts safety primitives extracted
  [DONE] T4 — src/core/dry-fix.ts MISSING_RULE_PATTERNS + INSERT expander
  [DONE] T5 — src/core/audit-skill-brain-first.ts snapshot+diff JSONL
  [DONE] T6 — src/commands/doctor.ts skillBrainFirstCheck orchestrator
  [DONE] T7 — skillify required item 12 + scaffold pre-insert
  [DONE] T8 — skills/conventions/brain-first.md declarative opt-out section
  [DONE] T9 — fixture corpus (9 SKILL.md) + 56-case unit suite
  [DONE] T10 — E2E + live OpenClaw script + CI guard wired into verify
  [DONE] T11 — TODOS.md (3 wave follow-ups: runtime gate, trend doctor, regex tightening)
  [DONE] T12 — VERSION + package.json + CHANGELOG bumped to v0.37.1.0
  [DEFERRED] T13 — close PR #1206 (happens after this merges per CMT3)

─────────────────────────────────
COMPLETION: 12/13 DONE, 1 DEFERRED (post-merge action), 0 NOT DONE, 0 UNVERIFIABLE
─────────────────────────────────

Verification Results

Skipped — no dev server running, no UI surface. The wave is CLI-only (gbrain doctor).

TODOS

Three new wave follow-ups added to TODOS.md:

v0.37+: Runtime brain-first gate at MCP dispatch (full wave; closes tweet-shield root cause at enforcement layer)
v0.36.x: Audit trend doctor check skill_brain_first_trend (~30 LOC; ships once audit data accumulates weeks of history)
v0.36.x: Tighten external-lookup regex to reduce false-positive rate on name mentions

Documentation

CLAUDE.md updated for the v0.37.1.0 skill_brain_first wave: four new Key Files entries (skill-frontmatter.ts, skill-brain-first.ts, skill-fix-gates.ts, audit-skill-brain-first.ts); four extended entries (filing-audit.ts, dry-fix.ts, doctor.ts, skillify-check.ts); two new test inventory lines covering the unit + E2E suites. llms-full.txt regenerated per the mandatory chaser rule.

README, CHANGELOG, TODOS, and VERSION required no edits — CHANGELOG already shipped with the canonical ELI10 lead + "To take advantage of v0.37.1.0" block, TODOS already carried the three wave follow-ups, README only references gbrain doctor at a high level (the new check is a sub-surface).

Test plan

bun run typecheck clean
bun run verify clean (15 checks + typecheck, including new check:skill-brain-first gate)
bun test test/skill-brain-first.test.ts — 56 unit cases pass
bun test test/e2e/skill-brain-first.test.ts — 12 E2E cases pass
bun test test/filing-audit.test.ts test/dry-fix.test.ts test/doctor*.test.ts — 170 cases pass unchanged
bun run test — 8053 unit tests across 8 shards + 22 serial files, 0 failures
Self-dogfood: GBRAIN_SKILLS_DIR=$(pwd)/skills gbrain doctor --json reports skill_brain_first: ok (43 skills compliant or exempt)
Master merged in (v0.37.0.0 skillpack-ecosystem wave) post-conflict-resolution

Post-merge actions

Close PR feat: add skill_brain_first doctor check #1206 with Co-Authored-By: garrytan-agents credit + comment naming this commit (T13 from plan, deferred per CMT3)

🤖 Generated with Claude Code

…ve opt-out Cathedral wave superseding PR #1206. Doctor now scans every SKILL.md for external-lookup tools (web_search / web_fetch / exa / perplexity / happenstance / crustdata / captain_api / firecrawl) and warns when the skill has no brain- first compliance signal. gbrain doctor --fix auto-inserts the canonical > **Convention:** see [conventions/brain-first.md](...) callout via the dry-fix.ts MISSING_RULE_PATTERNS extension (sharing safety gates with the existing REPLACE patterns). Motivated by the 2026-05-19 tweet-shield incident: cross-modal eval flagged Garry's Palantir tweet as risky because no model knew he built it, but the brain already had "designed the entire Finance product UI" and "150+ PSDs from April-December 2006." Static check catches authorship; v0.37+ runtime gate (filed in TODOS.md) closes the dispatch side. Key design decisions locked via /plan-eng-review + codex outside-voice review: - A1: frontmatter ships only brain_first: exempt (no required/n/a enum) - A2: snapshot+diff audit at ~/.gbrain/audit/skill-brain-first-YYYY-Www.jsonl with transition-only writes (stable brains = 0 lines/run) - A3: scaffold template pre-inserts callout; skillify check fails (exit 1) on external + no callout + no exempt - A4: position-relative gate is BODY-ONLY (frontmatter tools: [web_search] declaration doesn't false-flag the skill) - Q1: single pure analyzeSkillBrainFirst() helper consumed by 3 surfaces - CMT1: no upgrade migration — doctor surfaces hint, --fix applies via dry-fix safety gates (user stays in loop) - CMT2: dropped tools+writes_pages auto-exemption (was hiding mixed-class skills like idea-ingest/meeting-ingestion/data-research) Trio: VERSION + package.json + CHANGELOG aligned at 0.37.1.0. 56 unit cases + 12 E2E cases pass. 170 related existing tests pass unchanged. Self-dogfood: gbrain doctor against this repo's skills/ reports skill_brain_first: ok across 43 skills (compliant or exempt). functional-area-resolver and strategic-reading skills gained brain_first: exempt to validate the declarative opt-out in production code (both name perplexity in dispatcher prose without calling it). Co-Authored-By: garrytan-agents <noreply@github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Reconciled v0.37.1.0 bump against master's v0.37.0.0 skillpack-ecosystem wave. VERSION + package.json + CHANGELOG.md keep brain-first wave's claim (0.37.1.0) above master's entries; llms-full.txt regenerated against the merged docs. Verify gate + brain-first unit + E2E suites green post-merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Added Key Files entries for the four new modules: - src/core/skill-frontmatter.ts (shared parser) - src/core/skill-brain-first.ts (analyzer + FORMERLY_HARDCODED_EXEMPT) - src/core/skill-fix-gates.ts (extracted safety primitives) - src/core/audit-skill-brain-first.ts (snapshot+diff JSONL) Extended existing entries: - src/core/filing-audit.ts: rewired to shared parser - src/core/dry-fix.ts: MISSING_RULE_PATTERNS INSERT pattern type - src/commands/doctor.ts: skill_brain_first check + tweet-shield framing - src/commands/skillify-check.ts: required item 12 + scaffold pre-insert Added test inventory entries: - test/skill-brain-first.test.ts (56 unit cases) - test/e2e/skill-brain-first.test.ts (12 E2E cases) Regenerated llms-full.txt via bun run build:llms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nnect CI run #76881161092 failed because scripts/check-skill-brain-first.sh invoked plain `gbrain doctor --json`, which routes through connectEngine(). With no ~/.gbrain/config.json present (CI's case — runner is bun-only, no brain init), connectEngine() exits 1 with "No brain configured." and emits zero stdout. The python parser sees an empty file and returns parse_error, failing the verify gate. Fix: pass --fast to doctor. --fast routes through runDoctor(null, ...) which runs the filesystem-only check set (resolver_health, skill_conformance, skill_brain_first) and emits the standard single-line JSON envelope the parser expects. skill_brain_first is filesystem-only by design (scans SKILL.md, no DB touch), so --fast is the correct knob, not a workaround. Verified by reproducing the CI failure mode locally with GBRAIN_HOME=/tmp/empty-... — gate now passes both with and without a configured brain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR #1214 (brainstorm + lsd) claimed v0.37.1.0 concurrently with #1215. Skipping 0.37.2.0 leaves a buffer for #1214's adjacent slot. Trio (VERSION + package.json + CHANGELOG header + inline "To take advantage of v0.37.3.0" block) aligned at 0.37.3.0. No behavior changes — version metadata only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PRs #1214 and #1215 both claim v0.37.1.0; bumping past to the next free slot. Migration v79 renamed `takes_unresolvable_quality_v0_37_0_1` → `takes_unresolvable_quality_v0_37_2_0`. VERSION + package.json + CHANGELOG + llms bundles + inline doc references all swept.

Master landed PR #1214 (brainstorm + lsd) as v0.37.1.0. Wave's v0.37.3.0 claim sits cleanly above it in the CHANGELOG with both entries preserved. VERSION + package.json + CHANGELOG header all at 0.37.3.0; master's v0.37.1.0 entry slots in as the second-most- recent release. llms-full.txt regenerated against the merged docs. Verify gate + 114 brain-first + adjacent unit/E2E cases green post-merge under CI-simulated GBRAIN_HOME=/tmp/empty-... env. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…1211) * v0.36.1.1 hotfix: takes_resolution_consistency CHECK accepts 'unresolvable' Unblocks production grading scripts that write the judge's 4th verdict type. Before this fix, every quality='unresolvable' INSERT/UPDATE hit a CHECK violation — 0 of 34 writes landed in a recent prod run. Migration v74 widens BOTH: - takes_resolution_consistency (table-level CHECK) — admits the ('unresolvable', NULL) pair alongside the existing 4 legal shapes - resolved_quality column-level CHECK — drops the auto-generated name from v37, re-adds as takes_resolved_quality_values with the 4-state enum Backward compatible. Existing rows with quality IN (NULL, 'correct', 'incorrect', 'partial') all satisfy the new CHECKs unchanged. TakesScorecard gains sibling fields unresolvable_count + unresolvable_rate; the existing `resolved` field deliberately keeps its 3-state meaning so historical scorecards compare apples-to-apples (T1c sibling-field design from the eng review). Pinned by: - test/takes-resolution.test.ts — R1-R5 round-trip - test/migrate.test.ts — v74 structural assertions + PGLite E2E suite exercising all valid + invalid (quality, outcome) shapes Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(e2e/schema-drift): reset public schema in beforeAll to isolate from caller bootstrap state Previously the test trusted caller-provided DATABASE_URL to point at a fresh database. CLAUDE.md's E2E lifecycle prescribes 'gbrain doctor --json' as the bootstrap step (needed by oauth-related tests for table creation), but doctor configures the gateway and bakes the configured embedding model into content_chunks.model DEFAULT during the initial CREATE TABLE. On re-run, CREATE TABLE IF NOT EXISTS is a no-op and the bootstrapped default sticks. PGLite (always fresh-in-memory) gets the unconfigured-gateway fallback 'text-embedding-3-large'. The test reported phantom drift: pg.default="'zembed-1'::text" pglite.default="'text-embedding-3-large'::text" Fix: DROP SCHEMA public CASCADE + CREATE SCHEMA public before pg.initSchema. Resets every table/index/sequence/constraint added by prior tooling. The PGLite side is already fresh-per-test by construction. Verified order-independent: - Fresh DB → 6/0 pass - After 'gbrain doctor' bootstrap → 6/0 pass - Full E2E suite (mechanical + schema-drift) → 84/0 pass Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(safety): gate schema-drift DROP SCHEMA + relax TakesScorecard interface Two findings from Codex adversarial review on the v0.37.0.1 hotfix: 1. **DROP SCHEMA safety gate (P0).** test/e2e/schema-drift.test.ts had an unguarded DROP SCHEMA public CASCADE. A developer running the E2E with DATABASE_URL pointing at a real brain or staging DB would lose the entire public schema. The fix: triple-check before destruction. - Parse the DATABASE_URL hostname + db name - Allow reset only when: explicit GBRAIN_TEST_DB=1 OR (localhost host AND test-shaped db name like gbrain_test, *_test, test_*, *_e2e) - Refuse otherwise with a loud paste-ready warning - The test still proceeds (the parity check is the fail-safe — if the caller already had a fresh DB, parity passes; if not, parity fails LOUDLY instead of nuking their data) Verified all three branches: localhost+gbrain_test resets (6/0 pass); localhost+production_brain refuses + warns (6/0 pass against pre-existing schema); GBRAIN_TEST_DB=1 override on production_brain name allows reset. 2. **TakesScorecard interface compat.** Making `unresolvable_count` + `unresolvable_rate` required fields on the public TakesScorecard interface broke downstream SDK consumers who construct scorecard fixtures (gbrain-evals, custom engines). The hotfix shouldn't impose a compile-break on hotfix users. Fix: make both fields optional (`?: number` / `?: number | null`). `finalizeScorecard` still always populates them, so all internal code sees the real values. External fixtures that omit them compile cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(safety,ux): tighten DROP SCHEMA gate + surface unresolvable in scorecard CLI Second codex adversarial pass on v0.37.0.1 surfaced two residual findings. **P0 — Safety gate still bypassable.** First-pass safety gate used `explicitOptIn || (isLocalhost && looksLikeTestDb)` — meaning `GBRAIN_TEST_DB=1` bypassed BOTH the host check AND the db-name check. Someone running the E2E with that env set against a production DATABASE_URL would still nuke their schema. Codex re-flagged it as P0. Tightened logic: `looksLikeTestDb && (isLocalhost || ciOptIn)`. The db-name pattern is now the hard floor — `gbrain_test`, `*_test`, `test_*`, `*_e2e`. GBRAIN_TEST_DB=1 only relaxes the localhost requirement (for CI service-name hosts). Setting the env on a DATABASE_URL pointing at `production_data` is explicitly refused with a paste-ready message naming the failed check. Verified 3 ways: - gbrain_test + localhost → resets (6/0 pass) - production_data + GBRAIN_TEST_DB=1 → REFUSES with clear message - foo_e2e + GBRAIN_TEST_DB=1 → resets (test-shaped name passes) **P2 — gbrain takes scorecard hides the unresolvable signal.** Early-return on `resolved === 0` was triggered before the new sibling fields rendered. A brain with only `quality='unresolvable'` verdicts — the spec's whole production case — printed "No resolved bets yet" and exited. The unresolvable_rate field was unreachable from the human CLI unless the user knew to pass `--json`. Fix: gate the early-return on `resolved === 0 AND unresolvable_count === 0`. Render `unresolvable` count + `unresolvable_rate` alongside `partial_rate` when present. Threshold warn at 30% (mirrors PARTIAL_RATE_WARNING_THRESHOLD) pointing at retrieval coverage, not prediction accuracy — the actionable read for high-unresolvable brains. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: renumber v0.37.0.1 → v0.37.2.0 (v0.37.1.0 claimed by other PRs) PRs #1214 and #1215 both claim v0.37.1.0; bumping past to the next free slot. Migration v79 renamed `takes_unresolvable_quality_v0_37_0_1` → `takes_unresolvable_quality_v0_37_2_0`. VERSION + package.json + CHANGELOG + llms bundles + inline doc references all swept. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Master landed PR #1211 (takes_resolution_consistency CHECK accepts 'unresolvable') as v0.37.2.0. Wave's v0.37.3.0 claim still sits one slot above. CHANGELOG preserves both entries with master's v0.37.2.0 slotting in as the second-most-recent release. llms-full.txt regenerated against merged docs. Verify gate + 114 brain-first + adjacent unit/E2E cases green post-merge under CI-simulated GBRAIN_HOME=/tmp/empty-... env. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…view fix (#1267) * v0.37.6.0 feat: voyage-code-3 discoverability + reindex-code cost-preview fix For agents indexing source code with gbrain, the right embedding model is now obvious — and the brain tells you so out loud. Decision tree + Topology 3 doc + Topology 3 setup-skill pointer + runtime stderr nudge from `gbrain reindex --code` against non-code-tuned models. Same diff fixes the stale hardcoded `text-embedding-3-large` in the cost preview that would have made the nudge land badly. Tests: 3 new files (recipe regression, nudge logic + CLI integration, cost-preview IRON-RULE regression). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * todos: file pre-existing doctor-report-remote score regression (#1215 follow-up) Noticed during /ship of v0.37.6.0. The skill_brain_first check added in v0.37.3.0 (#1215) appears to tank the doctor health score on fresh PGLite test brains, causing test/doctor-report-remote.test.ts:65 to fail with health_score: 50 (expects >=70). Pre-existing on master; not in this branch's touch surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

garrytan and others added 3 commits May 19, 2026 20:06

garrytan mentioned this pull request May 20, 2026

feat: add skill_brain_first doctor check #1206

Closed

garrytan and others added 2 commits May 19, 2026 20:29

garrytan changed the title ~~v0.37.1.0 feat: skill_brain_first doctor check + auto-fix + declarative opt-out (supersedes #1206)~~ v0.37.3.0 feat: skill_brain_first doctor check + auto-fix + declarative opt-out (supersedes #1206) May 20, 2026

garrytan merged commit 772253e into master May 20, 2026
7 checks passed

garrytan mentioned this pull request May 21, 2026

v0.37.8.0 feat: voyage-code-3 discoverability + reindex-code cost-preview fix #1267

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.37.3.0 feat: skill_brain_first doctor check + auto-fix + declarative opt-out (supersedes #1206)#1215

v0.37.3.0 feat: skill_brain_first doctor check + auto-fix + declarative opt-out (supersedes #1206)#1215
garrytan merged 7 commits into
masterfrom
garrytan/islamabad-v3

garrytan commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garrytan commented May 20, 2026

Summary

Test Coverage

Pre-Landing Review

Design Review

Eval Results

Greptile Review

Scope Drift

Plan Completion

Verification Results

TODOS

Documentation

Test plan

Post-merge actions

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant