Skip to content

v0.37.3.0 feat: skill_brain_first doctor check + auto-fix + declarative opt-out (supersedes #1206)#1215

Merged
garrytan merged 7 commits into
masterfrom
garrytan/islamabad-v3
May 20, 2026
Merged

v0.37.3.0 feat: skill_brain_first doctor check + auto-fix + declarative opt-out (supersedes #1206)#1215
garrytan merged 7 commits into
masterfrom
garrytan/islamabad-v3

Conversation

@garrytan
Copy link
Copy Markdown
Owner

Summary

Supersedes PR #1206 (cathedral wave). Doctor now scans every SKILL.md for external-lookup tools (web_search / web_fetch / exa / perplexity / happenstance / crustdata / captain_api / firecrawl) and warns when the skill has no brain-first compliance signal. gbrain doctor --fix auto-inserts the canonical > **Convention:** see [conventions/brain-first.md](...) callout via a new dry-fix.ts MISSING_RULE_PATTERNS extension (sharing safety gates with the existing REPLACE patterns).

Motivated by the 2026-05-19 tweet-shield incident: cross-modal eval flagged Garry's Palantir tweet as risky because no model knew he built it — but the brain already had "designed the entire Finance product UI" and "150+ PSDs from April-December 2006."

Structural changes (vs PR #1206's minimal version):

  • Hardcoded 40-name EXEMPT_SKILLS allowlist → structural-signal inference + declarative brain_first: exempt frontmatter opt-out
  • Bypassed resolver → canonical autoDetectSkillsDirReadOnly() 4-tier fallback
  • Flat message → structured Check.issues[] array with rich fix payload
  • No auto-fix → dry-fix.ts MISSING_RULE_PATTERNS with full safety gates (working-tree-clean, code-fence, install-path, proximity)
  • No regression prevention → scaffold pre-inserts canonical callout + skillify check fails (exit 1) on external+no-callout+no-exempt
  • No audit → snapshot+diff JSONL at ~/.gbrain/audit/skill-brain-first-YYYY-Www.jsonl (transition-only writes; stable brains = 0 lines/run)

Key design decisions (locked via /plan-eng-review + codex outside-voice review):

  • A1: only brain_first: exempt ships (no required / n/a)
  • A2: snapshot+diff audit, transition-only
  • A3: scaffold pre-insert + skillify check fail
  • A4: position-relative gate is BODY-ONLY (frontmatter tools: [web_search] doesn't false-flag)
  • Q1: one pure analyzeSkillBrainFirst() helper, three consumers
  • CMT1: no upgrade migration — doctor surfaces hint, --fix applies via dry-fix safety gates (user stays in loop)
  • CMT2: dropped tools+writes_pages exemption rule (was hiding mixed-class skills like idea-ingest, meeting-ingestion, data-research)

Test Coverage

COVERAGE: 62/62 paths tested (100%)  |  Code paths: 50  |  E2E: 12  |  EVAL: 0 (structural)
QUALITY: ★★★ on the position-relative gate, snapshot+diff, regression set
GAPS: 0
  • 56 unit cases over 9 fixtures absorb PR feat: add skill_brain_first doctor check #1206's 10 inline test strings (IRON-RULE regression preservation)
  • 12 E2E cases cover the --fix dry-run/apply/idempotency cycle, shape assertions, audit transition signal
  • 170 related existing tests (filing-audit, dry-fix, doctor) pass unchanged
  • 8053 unit tests across 8 shards + 22 serial files pass post-merge; 0 failures
  • Tests: 1247 → 1259 (+12 new test files including 9 fixture SKILL.md)

Pre-Landing Review

CLEARED via /plan-eng-review on this branch — 9 findings resolved (4 architecture, 3 code quality, 2 test). 0 critical gaps flagged. Outside-voice via codex (gpt-5.5 high-reasoning) found 18 issues — 15 absorbed as plan bake-ins; 3 elevated to cross-model tensions resolved by user (CMT1/CMT2/CMT3, all picked codex direction).

Design Review

No frontend files changed — design review skipped.

Eval Results

No prompt-related files changed — evals skipped.

Greptile Review

No Greptile comments yet (PR being created).

Scope Drift

CLEAN — intent was "implement skill_brain_first cathedral wave"; delivered exactly that. No out-of-scope changes. 2 mid-implementation skill frontmatter edits (functional-area-resolver, strategic-reading gained brain_first: exempt) validate the declarative opt-out mechanism in production code.

Plan Completion

PLAN COMPLETION AUDIT
═══════════════════════════════
Plan: ~/.claude/plans/system-instruction-you-are-working-humming-quail.md

## Implementation Tasks
  [DONE] T1 — src/core/skill-frontmatter.ts shared parser
  [DONE] T2 — src/core/skill-brain-first.ts pure analyzer + FORMERLY_HARDCODED_EXEMPT
  [DONE] T3 — src/core/skill-fix-gates.ts safety primitives extracted
  [DONE] T4 — src/core/dry-fix.ts MISSING_RULE_PATTERNS + INSERT expander
  [DONE] T5 — src/core/audit-skill-brain-first.ts snapshot+diff JSONL
  [DONE] T6 — src/commands/doctor.ts skillBrainFirstCheck orchestrator
  [DONE] T7 — skillify required item 12 + scaffold pre-insert
  [DONE] T8 — skills/conventions/brain-first.md declarative opt-out section
  [DONE] T9 — fixture corpus (9 SKILL.md) + 56-case unit suite
  [DONE] T10 — E2E + live OpenClaw script + CI guard wired into verify
  [DONE] T11 — TODOS.md (3 wave follow-ups: runtime gate, trend doctor, regex tightening)
  [DONE] T12 — VERSION + package.json + CHANGELOG bumped to v0.37.1.0
  [DEFERRED] T13 — close PR #1206 (happens after this merges per CMT3)

─────────────────────────────────
COMPLETION: 12/13 DONE, 1 DEFERRED (post-merge action), 0 NOT DONE, 0 UNVERIFIABLE
─────────────────────────────────

Verification Results

Skipped — no dev server running, no UI surface. The wave is CLI-only (gbrain doctor).

TODOS

Three new wave follow-ups added to TODOS.md:

  • v0.37+: Runtime brain-first gate at MCP dispatch (full wave; closes tweet-shield root cause at enforcement layer)
  • v0.36.x: Audit trend doctor check skill_brain_first_trend (~30 LOC; ships once audit data accumulates weeks of history)
  • v0.36.x: Tighten external-lookup regex to reduce false-positive rate on name mentions

Documentation

CLAUDE.md updated for the v0.37.1.0 skill_brain_first wave: four new Key Files entries (skill-frontmatter.ts, skill-brain-first.ts, skill-fix-gates.ts, audit-skill-brain-first.ts); four extended entries (filing-audit.ts, dry-fix.ts, doctor.ts, skillify-check.ts); two new test inventory lines covering the unit + E2E suites. llms-full.txt regenerated per the mandatory chaser rule.

README, CHANGELOG, TODOS, and VERSION required no edits — CHANGELOG already shipped with the canonical ELI10 lead + "To take advantage of v0.37.1.0" block, TODOS already carried the three wave follow-ups, README only references gbrain doctor at a high level (the new check is a sub-surface).

Test plan

  • bun run typecheck clean
  • bun run verify clean (15 checks + typecheck, including new check:skill-brain-first gate)
  • bun test test/skill-brain-first.test.ts — 56 unit cases pass
  • bun test test/e2e/skill-brain-first.test.ts — 12 E2E cases pass
  • bun test test/filing-audit.test.ts test/dry-fix.test.ts test/doctor*.test.ts — 170 cases pass unchanged
  • bun run test — 8053 unit tests across 8 shards + 22 serial files, 0 failures
  • Self-dogfood: GBRAIN_SKILLS_DIR=$(pwd)/skills gbrain doctor --json reports skill_brain_first: ok (43 skills compliant or exempt)
  • Master merged in (v0.37.0.0 skillpack-ecosystem wave) post-conflict-resolution

Post-merge actions

🤖 Generated with Claude Code

garrytan and others added 3 commits May 19, 2026 20:06
…ve opt-out

Cathedral wave superseding PR #1206. Doctor now scans every SKILL.md for
external-lookup tools (web_search / web_fetch / exa / perplexity / happenstance
/ crustdata / captain_api / firecrawl) and warns when the skill has no brain-
first compliance signal. gbrain doctor --fix auto-inserts the canonical
> **Convention:** see [conventions/brain-first.md](...) callout via the
dry-fix.ts MISSING_RULE_PATTERNS extension (sharing safety gates with the
existing REPLACE patterns).

Motivated by the 2026-05-19 tweet-shield incident: cross-modal eval flagged
Garry's Palantir tweet as risky because no model knew he built it, but the
brain already had "designed the entire Finance product UI" and "150+ PSDs
from April-December 2006." Static check catches authorship; v0.37+ runtime
gate (filed in TODOS.md) closes the dispatch side.

Key design decisions locked via /plan-eng-review + codex outside-voice review:
- A1: frontmatter ships only brain_first: exempt (no required/n/a enum)
- A2: snapshot+diff audit at ~/.gbrain/audit/skill-brain-first-YYYY-Www.jsonl
  with transition-only writes (stable brains = 0 lines/run)
- A3: scaffold template pre-inserts callout; skillify check fails (exit 1)
  on external + no callout + no exempt
- A4: position-relative gate is BODY-ONLY (frontmatter tools: [web_search]
  declaration doesn't false-flag the skill)
- Q1: single pure analyzeSkillBrainFirst() helper consumed by 3 surfaces
- CMT1: no upgrade migration — doctor surfaces hint, --fix applies via
  dry-fix safety gates (user stays in loop)
- CMT2: dropped tools+writes_pages auto-exemption (was hiding mixed-class
  skills like idea-ingest/meeting-ingestion/data-research)

Trio: VERSION + package.json + CHANGELOG aligned at 0.37.1.0. 56 unit cases
+ 12 E2E cases pass. 170 related existing tests pass unchanged. Self-dogfood:
gbrain doctor against this repo's skills/ reports skill_brain_first: ok
across 43 skills (compliant or exempt). functional-area-resolver and
strategic-reading skills gained brain_first: exempt to validate the
declarative opt-out in production code (both name perplexity in dispatcher
prose without calling it).

Co-Authored-By: garrytan-agents <noreply@github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reconciled v0.37.1.0 bump against master's v0.37.0.0 skillpack-ecosystem
wave. VERSION + package.json + CHANGELOG.md keep brain-first wave's claim
(0.37.1.0) above master's entries; llms-full.txt regenerated against the
merged docs. Verify gate + brain-first unit + E2E suites green post-merge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Added Key Files entries for the four new modules:
- src/core/skill-frontmatter.ts (shared parser)
- src/core/skill-brain-first.ts (analyzer + FORMERLY_HARDCODED_EXEMPT)
- src/core/skill-fix-gates.ts (extracted safety primitives)
- src/core/audit-skill-brain-first.ts (snapshot+diff JSONL)

Extended existing entries:
- src/core/filing-audit.ts: rewired to shared parser
- src/core/dry-fix.ts: MISSING_RULE_PATTERNS INSERT pattern type
- src/commands/doctor.ts: skill_brain_first check + tweet-shield framing
- src/commands/skillify-check.ts: required item 12 + scaffold pre-insert

Added test inventory entries:
- test/skill-brain-first.test.ts (56 unit cases)
- test/e2e/skill-brain-first.test.ts (12 E2E cases)

Regenerated llms-full.txt via bun run build:llms.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan and others added 2 commits May 19, 2026 20:29
…nnect

CI run #76881161092 failed because scripts/check-skill-brain-first.sh
invoked plain `gbrain doctor --json`, which routes through connectEngine().
With no ~/.gbrain/config.json present (CI's case — runner is bun-only,
no brain init), connectEngine() exits 1 with "No brain configured." and
emits zero stdout. The python parser sees an empty file and returns
parse_error, failing the verify gate.

Fix: pass --fast to doctor. --fast routes through runDoctor(null, ...)
which runs the filesystem-only check set (resolver_health,
skill_conformance, skill_brain_first) and emits the standard
single-line JSON envelope the parser expects. skill_brain_first is
filesystem-only by design (scans SKILL.md, no DB touch), so --fast is
the correct knob, not a workaround.

Verified by reproducing the CI failure mode locally with
GBRAIN_HOME=/tmp/empty-... — gate now passes both with and without
a configured brain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #1214 (brainstorm + lsd) claimed v0.37.1.0 concurrently with #1215.
Skipping 0.37.2.0 leaves a buffer for #1214's adjacent slot. Trio
(VERSION + package.json + CHANGELOG header + inline "To take advantage
of v0.37.3.0" block) aligned at 0.37.3.0.

No behavior changes — version metadata only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@garrytan garrytan changed the title v0.37.1.0 feat: skill_brain_first doctor check + auto-fix + declarative opt-out (supersedes #1206) v0.37.3.0 feat: skill_brain_first doctor check + auto-fix + declarative opt-out (supersedes #1206) May 20, 2026
garrytan added a commit that referenced this pull request May 20, 2026
PRs #1214 and #1215 both claim v0.37.1.0; bumping past to the next free
slot. Migration v79 renamed `takes_unresolvable_quality_v0_37_0_1` →
`takes_unresolvable_quality_v0_37_2_0`. VERSION + package.json +
CHANGELOG + llms bundles + inline doc references all swept.
Master landed PR #1214 (brainstorm + lsd) as v0.37.1.0. Wave's
v0.37.3.0 claim sits cleanly above it in the CHANGELOG with both
entries preserved. VERSION + package.json + CHANGELOG header all
at 0.37.3.0; master's v0.37.1.0 entry slots in as the second-most-
recent release. llms-full.txt regenerated against the merged docs.

Verify gate + 114 brain-first + adjacent unit/E2E cases green
post-merge under CI-simulated GBRAIN_HOME=/tmp/empty-... env.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan added a commit that referenced this pull request May 20, 2026
…1211)

* v0.36.1.1 hotfix: takes_resolution_consistency CHECK accepts 'unresolvable'

Unblocks production grading scripts that write the judge's 4th verdict
type. Before this fix, every quality='unresolvable' INSERT/UPDATE hit
a CHECK violation — 0 of 34 writes landed in a recent prod run.

Migration v74 widens BOTH:
  - takes_resolution_consistency (table-level CHECK) — admits the
    ('unresolvable', NULL) pair alongside the existing 4 legal shapes
  - resolved_quality column-level CHECK — drops the auto-generated
    name from v37, re-adds as takes_resolved_quality_values with the
    4-state enum

Backward compatible. Existing rows with quality IN (NULL, 'correct',
'incorrect', 'partial') all satisfy the new CHECKs unchanged.

TakesScorecard gains sibling fields unresolvable_count + unresolvable_rate;
the existing `resolved` field deliberately keeps its 3-state meaning
so historical scorecards compare apples-to-apples (T1c sibling-field
design from the eng review).

Pinned by:
  - test/takes-resolution.test.ts — R1-R5 round-trip
  - test/migrate.test.ts — v74 structural assertions + PGLite E2E
    suite exercising all valid + invalid (quality, outcome) shapes

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(e2e/schema-drift): reset public schema in beforeAll to isolate from caller bootstrap state

Previously the test trusted caller-provided DATABASE_URL to point at a fresh
database. CLAUDE.md's E2E lifecycle prescribes 'gbrain doctor --json' as the
bootstrap step (needed by oauth-related tests for table creation), but doctor
configures the gateway and bakes the configured embedding model into
content_chunks.model DEFAULT during the initial CREATE TABLE.

On re-run, CREATE TABLE IF NOT EXISTS is a no-op and the bootstrapped default
sticks. PGLite (always fresh-in-memory) gets the unconfigured-gateway fallback
'text-embedding-3-large'. The test reported phantom drift:
  pg.default="'zembed-1'::text"  pglite.default="'text-embedding-3-large'::text"

Fix: DROP SCHEMA public CASCADE + CREATE SCHEMA public before pg.initSchema.
Resets every table/index/sequence/constraint added by prior tooling. The PGLite
side is already fresh-per-test by construction.

Verified order-independent:
  - Fresh DB → 6/0 pass
  - After 'gbrain doctor' bootstrap → 6/0 pass
  - Full E2E suite (mechanical + schema-drift) → 84/0 pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(safety): gate schema-drift DROP SCHEMA + relax TakesScorecard interface

Two findings from Codex adversarial review on the v0.37.0.1 hotfix:

1. **DROP SCHEMA safety gate (P0).** test/e2e/schema-drift.test.ts had an
   unguarded DROP SCHEMA public CASCADE. A developer running the E2E with
   DATABASE_URL pointing at a real brain or staging DB would lose the entire
   public schema. The fix: triple-check before destruction.
   - Parse the DATABASE_URL hostname + db name
   - Allow reset only when: explicit GBRAIN_TEST_DB=1 OR (localhost host AND
     test-shaped db name like gbrain_test, *_test, test_*, *_e2e)
   - Refuse otherwise with a loud paste-ready warning
   - The test still proceeds (the parity check is the fail-safe — if the
     caller already had a fresh DB, parity passes; if not, parity fails
     LOUDLY instead of nuking their data)

   Verified all three branches: localhost+gbrain_test resets (6/0 pass);
   localhost+production_brain refuses + warns (6/0 pass against pre-existing
   schema); GBRAIN_TEST_DB=1 override on production_brain name allows reset.

2. **TakesScorecard interface compat.** Making `unresolvable_count` +
   `unresolvable_rate` required fields on the public TakesScorecard
   interface broke downstream SDK consumers who construct scorecard
   fixtures (gbrain-evals, custom engines). The hotfix shouldn't impose
   a compile-break on hotfix users.

   Fix: make both fields optional (`?: number` / `?: number | null`).
   `finalizeScorecard` still always populates them, so all internal code
   sees the real values. External fixtures that omit them compile cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(safety,ux): tighten DROP SCHEMA gate + surface unresolvable in scorecard CLI

Second codex adversarial pass on v0.37.0.1 surfaced two residual findings.

**P0 — Safety gate still bypassable.** First-pass safety gate used
`explicitOptIn || (isLocalhost && looksLikeTestDb)` — meaning
`GBRAIN_TEST_DB=1` bypassed BOTH the host check AND the db-name check.
Someone running the E2E with that env set against a production DATABASE_URL
would still nuke their schema. Codex re-flagged it as P0.

Tightened logic: `looksLikeTestDb && (isLocalhost || ciOptIn)`. The db-name
pattern is now the hard floor — `gbrain_test`, `*_test`, `test_*`, `*_e2e`.
GBRAIN_TEST_DB=1 only relaxes the localhost requirement (for CI service-name
hosts). Setting the env on a DATABASE_URL pointing at `production_data` is
explicitly refused with a paste-ready message naming the failed check.

Verified 3 ways:
  - gbrain_test + localhost → resets (6/0 pass)
  - production_data + GBRAIN_TEST_DB=1 → REFUSES with clear message
  - foo_e2e + GBRAIN_TEST_DB=1 → resets (test-shaped name passes)

**P2 — gbrain takes scorecard hides the unresolvable signal.** Early-return
on `resolved === 0` was triggered before the new sibling fields rendered.
A brain with only `quality='unresolvable'` verdicts — the spec's whole
production case — printed "No resolved bets yet" and exited. The
unresolvable_rate field was unreachable from the human CLI unless the user
knew to pass `--json`.

Fix: gate the early-return on `resolved === 0 AND unresolvable_count === 0`.
Render `unresolvable` count + `unresolvable_rate` alongside `partial_rate`
when present. Threshold warn at 30% (mirrors PARTIAL_RATE_WARNING_THRESHOLD)
pointing at retrieval coverage, not prediction accuracy — the actionable
read for high-unresolvable brains.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: renumber v0.37.0.1 → v0.37.2.0 (v0.37.1.0 claimed by other PRs)

PRs #1214 and #1215 both claim v0.37.1.0; bumping past to the next free
slot. Migration v79 renamed `takes_unresolvable_quality_v0_37_0_1` →
`takes_unresolvable_quality_v0_37_2_0`. VERSION + package.json +
CHANGELOG + llms bundles + inline doc references all swept.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Master landed PR #1211 (takes_resolution_consistency CHECK accepts
'unresolvable') as v0.37.2.0. Wave's v0.37.3.0 claim still sits one
slot above. CHANGELOG preserves both entries with master's v0.37.2.0
slotting in as the second-most-recent release. llms-full.txt
regenerated against merged docs.

Verify gate + 114 brain-first + adjacent unit/E2E cases green
post-merge under CI-simulated GBRAIN_HOME=/tmp/empty-... env.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@garrytan garrytan merged commit 772253e into master May 20, 2026
7 checks passed
garrytan added a commit that referenced this pull request May 21, 2026
…view fix (#1267)

* v0.37.6.0 feat: voyage-code-3 discoverability + reindex-code cost-preview fix

For agents indexing source code with gbrain, the right embedding model
is now obvious — and the brain tells you so out loud. Decision tree +
Topology 3 doc + Topology 3 setup-skill pointer + runtime stderr nudge
from `gbrain reindex --code` against non-code-tuned models. Same diff
fixes the stale hardcoded `text-embedding-3-large` in the cost preview
that would have made the nudge land badly.

Tests: 3 new files (recipe regression, nudge logic + CLI integration,
cost-preview IRON-RULE regression).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* todos: file pre-existing doctor-report-remote score regression (#1215 follow-up)

Noticed during /ship of v0.37.6.0. The skill_brain_first check added in
v0.37.3.0 (#1215) appears to tank the doctor health score on fresh PGLite
test brains, causing test/doctor-report-remote.test.ts:65 to fail with
health_score: 50 (expects >=70). Pre-existing on master; not in this
branch's touch surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant