Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
109b716
fix(link-extraction): v0.10.5 drive works_at + advises accuracy on ri…
garrytan Apr 18, 2026
52ba00f
feat(eval): type-accuracy runner on rich-prose corpus + wire into all.ts
garrytan Apr 18, 2026
629ba85
feat(eval): Phase 2 adapter interface + EXT-1 ripgrep+BM25 baseline
garrytan Apr 18, 2026
633be38
feat(eval): Phase 2 EXT-2 vector-only RAG adapter
garrytan Apr 18, 2026
bfa8564
feat(eval): Phase 2 EXT-3 hybrid-without-graph adapter — graph isolated
garrytan Apr 18, 2026
e2a5dc4
feat(eval): Phase 2 query validator + Tier 5 Fuzzy + Tier 5.5 synthet…
garrytan Apr 18, 2026
f0649e2
feat(eval): Phase 3 world.html explorer + eval:* CLI surface
garrytan Apr 18, 2026
b81373d
docs(eval): Phase 3 contributor docs + CI workflow for eval/ tests
garrytan Apr 18, 2026
c6b308a
fix(eval): teardown PGLite engines so bun run eval:run exits 0
garrytan Apr 19, 2026
4c885d0
docs(bench): 2026-04-19 multi-adapter scorecard
garrytan Apr 19, 2026
679a3f9
docs(bench): 2026-04-19 gbrain v0.11.1 vs v0.12.1 regression comparison
garrytan Apr 19, 2026
d0d3cf0
Merge remote-tracking branch 'origin/master' into garrytan/gbrain-evals
garrytan Apr 19, 2026
739c50a
Merge remote-tracking branch 'origin/master' into garrytan/gbrain-evals
garrytan Apr 20, 2026
18baca6
feat(eval): BrainBench v1 portable JSON schemas + gold templates
garrytan Apr 20, 2026
53e32ee
feat(eval): amara-life-v1 skeleton + Page.type enum for email/slack/c…
garrytan Apr 20, 2026
1f38c4f
feat(eval): amara-life-gen.ts with structured cache key + $20 cost gate
garrytan Apr 20, 2026
d4a2d51
chore: bump version and changelog (v0.15.0)
garrytan Apr 20, 2026
8f326cb
docs: update CLAUDE.md + README + eval/README for v0.15.0 BrainBench
garrytan Apr 20, 2026
934d7ea
feat(eval): Day 4 — pdf-parse + flight-recorder + tool-bridge (dry_ru…
garrytan Apr 20, 2026
05aec3f
feat(eval): Day 5 — agent adapter + judge with structured evidence co…
garrytan Apr 20, 2026
839258d
feat(eval): Day 6 — adversarial-injections + Cat 6 prose-scale + Cat …
garrytan Apr 20, 2026
33e2e73
feat(eval): Day 7 — Cat 5 provenance runner + structured classify_cla…
garrytan Apr 20, 2026
a4cdb41
feat(eval): Day 8 — Cat 8 skill compliance + Cat 9 end-to-end workflows
garrytan Apr 20, 2026
e1e699f
feat(eval): Day 9 — sealed qrels via PublicPage + PublicQuery at adap…
garrytan Apr 20, 2026
7e98739
feat(eval): Day 10 — all.ts rewrite + llm-budget + BrainBench N tiers
garrytan Apr 20, 2026
3dc7d69
Merge remote-tracking branch 'origin/master' into garrytan/gbrain-evals
garrytan Apr 20, 2026
cb5df8f
fix(eval): drop top_p from amara-life-gen Opus params + gitignore _ca…
garrytan Apr 20, 2026
03f49b6
Merge remote-tracking branch 'origin/master' into garrytan/gbrain-evals
garrytan Apr 21, 2026
d02acd6
chore: bump version to 0.20.0
garrytan Apr 21, 2026
ac51765
refactor: extract BrainBench to sibling gbrain-evals repo
garrytan Apr 22, 2026
27762ab
Merge origin/master into garrytan/gbrain-evals
garrytan Apr 23, 2026
871227c
chore: bump to v0.20.0
garrytan Apr 23, 2026
f26bc0d
Merge remote-tracking branch 'origin/master' into garrytan/gbrain-evals
garrytan Apr 23, 2026
8124821
ci: remove eval-tests workflow (moved to gbrain-evals)
garrytan Apr 23, 2026
423eba6
Merge remote-tracking branch 'origin/master' into garrytan/gbrain-evals
garrytan Apr 24, 2026
626aebf
fix(tests): bump PGLite hook timeouts to 60s for parallel-load stability
garrytan Apr 24, 2026
6f21719
test: coverage for inferType() BrainBench corpus dirs
garrytan Apr 24, 2026
d035a72
docs(TODOS): mark BrainBench Cats 5/6/8/9/11 + v0.10.5 inferLinkType …
garrytan Apr 24, 2026
9e567bb
docs: sync CLAUDE.md + polish CHANGELOG voice for v0.20.0
garrytan Apr 24, 2026
96852c0
docs: regenerate llms-full.txt after CLAUDE.md + CHANGELOG edits (fix…
garrytan Apr 24, 2026
60ee8ed
Merge remote-tracking branch 'origin/master' into garrytan/gbrain-evals
garrytan Apr 24, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE/tier5-queries.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
<!--
Tier 5.5 Externally-Authored Query Submission template
See eval/CONTRIBUTING.md for the full workflow.
-->

## Summary

Submitting **N** Tier 5.5 queries for BrainBench.

- Author handle: `@your-handle`
- File location: `eval/external-authors/your-handle/queries.json`
- Queries authored fresh (not copy-pasted from a model output)
- Slugs verified against `eval/data/world-v1/` (via `bun run eval:world:view`)

## Checklist

- [ ] `bun run eval:query:validate eval/external-authors/your-handle/queries.json` passes
- [ ] At least 20 queries
- [ ] Each query has either `gold.relevant` (with real slugs) or `gold.expected_abstention: true`
- [ ] Temporal queries have `as_of_date` set (`corpus-end` | `per-source` | ISO-8601)
- [ ] Phrasing is varied (not all the same template)
- [ ] `author` field matches my handle

## Phrasing variety (optional self-audit)

Tick the styles represented in your batch:

- [ ] Full sentence questions
- [ ] Fragment-style ("crypto founder Goldman Sachs background")
- [ ] Comparison ("X vs Y")
- [ ] Follow-up ("And who else...")
- [ ] Imperative ("Pull up Alice Davis")
- [ ] Trait-based ("the demanding engineering leader")
- [ ] Abstention bait (answer is "not in corpus")

## Notes to reviewer

Anything worth flagging — ambiguous cases, corpus gaps you found, specific
phrasings you were uncertain about.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,7 @@ supabase/.temp/
.claude/skills/
.idea
eval/reports/
eval/data/world-v1/world.html

# BrainBench amara-life-v1 Opus cache (regenerate via eval:generate-amara-life)
eval/data/amara-life-v1/_cache/
61 changes: 61 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,67 @@

All notable changes to GBrain will be documented in this file.

## [0.20.0] - 2026-04-23

## **BrainBench moves out. gbrain gets its install surface back.**
## **The eval harness + 5MB fictional corpus now live in a sibling repo; gbrain exposes a clean public API they consume.**

BrainBench is gbrain's benchmark harness. 10/12 Cats, 4-adapter scorecard, 418-item fictional corpus, 314 tests. Previously it lived inside this repo. Every `bun install` pulled down the eval tree, `docs/benchmarks/*.md` reports, `pdf-parse` devDep, and auxiliary test fixtures whether or not you ever ran a benchmark. For the 99% of gbrain users who want a knowledge-brain CLI, that's ~5MB of noise.

v0.20 moves BrainBench to [github.com/garrytan/gbrain-evals](https://github.com/garrytan/gbrain-evals). gbrain stays the knowledge-brain CLI + library. `gbrain-evals` depends on gbrain via GitHub URL and consumes it through the public exports map. Same benchmarks, same scorecards, same Cat runners, same 418-item fictional amara-life corpus, just a separate install. Folks who don't care about evals never download them. Folks who do clone one extra repo.

The clean separation also gives gbrain a first real public API surface. `package.json` adds 11 new subpath exports (`gbrain/engine`, `gbrain/pglite-engine`, `gbrain/search/hybrid`, `gbrain/link-extraction`, `gbrain/extract`, and so on) covering every gbrain internal the eval harness reaches into. Third-party tools (not just BrainBench) now have a stable contract to consume. Removing any of these exports is a breaking change going forward.

### What moved where

| Stays in gbrain | Moves to gbrain-evals |
|-----------------|----------------------|
| `src/` (CLI, MCP, engines, operations, skills runtime) | `eval/` (runners, adapters, generators, schemas, gold, cli) |
| `Page.type` enum including `email/slack/calendar-event/note/meeting` (useful for any ingested format, not just evals) | `test/eval/` (314 tests across 14 files) |
| `inferType()` heuristics for the new directory patterns | `docs/benchmarks/*.md` (all scorecards + regression reports) |
| Public exports map (11 new subpaths gbrain-evals consumes) | `pdf-parse` devDep (only eval/runner/loaders/pdf.ts used it) |
| `src/core/` test suite (1696 tests) | `eval:*` scripts (run from gbrain-evals now) |

### What this means for you

If you install gbrain via `git clone + bun install` or via npm/clawhub, you get a smaller, cleaner checkout. No eval corpus. No benchmark reports. No pdf-parse. `bun test` runs only gbrain's own test suite, not eval tests.

If you want to run BrainBench: `git clone https://github.com/garrytan/gbrain-evals && cd gbrain-evals && bun install && bun run eval:run`. gbrain-evals fetches gbrain from GitHub via `"gbrain": "github:garrytan/gbrain#master"` so you always benchmark against the latest source.

If you're a third-party library author importing gbrain internals: the new exports map is now your stable contract. Pin `gbrain/<subpath>` imports against a version, not a file path.

### Itemized changes

**Extracted to [gbrain-evals](https://github.com/garrytan/gbrain-evals):**
- `eval/` ... schemas, runners, adapters, generators, queries, CLI tools, docs (CONTRIBUTING, RUNBOOK, CREDITS).
- `test/eval/` ... 14 test files, 314 tests covering schemas, sealed qrels, tool-bridge, agent adapter, judge, recorder, Cat 5/6/8/9/11, amara-life skeleton, adversarial-injections, pdf loader.
- `docs/benchmarks/` ... all scorecards and regression reports (4-adapter, v0.11 vs v0.12, Minions production/lab, tweet ingestion, knowledge runtime v0.13, BrainBench v1).
- `pdf-parse` devDep ... only consumed by `eval/runner/loaders/pdf.ts`.
- `eval:*` package.json scripts ... now live in gbrain-evals's `package.json` and run from there.

**Kept in gbrain (useful beyond evals):**
- `Page.type` enum extensions in `src/core/types.ts`: `email | slack | calendar-event | note | meeting`. Any user ingesting an inbox dump, Slack export, iCal file, or meeting transcript benefits from first-class types.
- `inferType()` heuristics in `src/core/markdown.ts` for `/emails/`, `/slack/`, `/cal/`, `/notes/`, `/meetings/` directory patterns.
- 11 new public `exports` in `package.json`: `./pglite-engine`, `./link-extraction`, `./import-file`, `./transcription`, `./embedding`, `./config`, `./markdown`, `./backoff`, `./search/hybrid`, `./search/expansion`, `./extract`. These form gbrain's public-API contract for downstream consumers.

**Docs synced:**
- `README.md` ... benchmark references now point at the gbrain-evals repo.
- `CLAUDE.md` ... BrainBench section replaced with a pointer to gbrain-evals + the list of public exports that consumers depend on.
- `src/commands/migrations/v0_12_0.ts` ... migration banner text references `github.com/garrytan/gbrain-evals` instead of a local `docs/benchmarks/*.md` path that no longer resolves.

**Tests:** 1717 gbrain tests pass, 0 failures, 174 skipped (E2E requiring `DATABASE_URL`). The full eval suite (314 tests) moves with `gbrain-evals` and runs from there.

### To take advantage of v0.20

For gbrain users:
1. `gbrain upgrade` ... no action required. The extraction is transparent.
2. If you previously ran `bun run eval:*` scripts from this repo: those scripts no longer exist here. `git clone https://github.com/garrytan/gbrain-evals && bun install` to get them.

For gbrain-evals consumers:
1. Clone the sibling repo: `git clone https://github.com/garrytan/gbrain-evals`
2. `bun install && bun run eval:run`
3. Follow `gbrain-evals/eval/RUNBOOK.md` for full category runs and scorecard reproduction.

## [0.19.1] - 2026-04-24

### Added
Expand Down
39 changes: 35 additions & 4 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,14 @@ strict behavior when unset.
- `src/core/embedding.ts` — OpenAI text-embedding-3-large, batch, retry, backoff
- `src/core/check-resolvable.ts` — Resolver validation: reachability, MECE overlap, DRY checks, structured fix objects. v0.14.1: `CROSS_CUTTING_PATTERNS.conventions` is an array (notability gate accepts both `conventions/quality.md` and `_brain-filing-rules.md`). New `extractDelegationTargets()` parses `> **Convention:**`, `> **Filing rule:**`, and inline backtick references. DRY suppression is proximity-based via `DRY_PROXIMITY_LINES = 40`.
- `src/core/repo-root.ts` — Shared `findRepoRoot(startDir?)` (v0.16.4): walks up from `startDir` (default `process.cwd()`) looking for `skills/RESOLVER.md`. Zero-dependency module imported by both `doctor.ts` and `check-resolvable.ts`. Parameterized `startDir` makes tests hermetic.
- `src/commands/check-resolvable.ts` — Standalone CLI wrapper (v0.16.4) over `checkResolvable()`. Exports `parseFlags`, `resolveSkillsDir`, `DEFERRED`, `runCheckResolvable`. Exit rule: **1 on any issue (warnings OR errors)**, stricter than doctor's `ok` flag — honors README:259. Stable JSON envelope `{ok, skillsDir, report, autoFix, deferred, error, message}` — same shape on success and error paths. `--fix` path runs `autoFixDryViolations` BEFORE `checkResolvable` (same ordering as doctor). `deferred[]` array surfaces pending Checks 5 (trigger routing eval) and 6 (brain filing) with issue URLs. `scripts/skillify-check.ts` subprocess-calls `gbrain check-resolvable --json` (cached per process) and fails loud on binary-missing — no silent false-pass.
- `src/commands/check-resolvable.ts` — Standalone CLI wrapper (v0.16.4) over `checkResolvable()`. Exports `parseFlags`, `resolveSkillsDir`, `DEFERRED`, `runCheckResolvable`. Exit rule: **1 on any issue (warnings OR errors)**, stricter than doctor's `ok` flag — honors README:259. Stable JSON envelope `{ok, skillsDir, report, autoFix, deferred, error, message}` — same shape on success and error paths. `--fix` path runs `autoFixDryViolations` BEFORE `checkResolvable` (same ordering as doctor). `scripts/skillify-check.ts` subprocess-calls `gbrain check-resolvable --json` (cached per process) and fails loud on binary-missing — no silent false-pass. **v0.19:** AGENTS.md workspaces now resolve natively (see `src/core/resolver-filenames.ts`) — gbrain inspects the 107-skill OpenClaw deployment whether the routing file is `RESOLVER.md` or `AGENTS.md`. `DEFERRED[]` is empty — Checks 5 + 6 shipped as real code, not issue URLs.
- `src/core/resolver-filenames.ts` (v0.19) — central list of accepted routing filenames (`RESOLVER.md`, `AGENTS.md`). Shared by `findRepoRoot`, `check-resolvable`, and skillpack install so every code path walks the same fallback chain.
- `src/commands/skillify.ts` + `src/core/skillify/{generator,templates}.ts` (v0.19) — `gbrain skillify scaffold <name>` creates all stubs for a new skill in one command: SKILL.md, script, tests, routing-eval.jsonl, resolver entry, filing-rules pointer. `gbrain skillify check <script>` runs the 10-step checklist (LLM evals, routing evals, check-resolvable gate, filing audit) against a candidate skill before it lands.
- `src/commands/skillify-check.ts` (v0.19) — `gbrain skillpack-check` agent-readable health report. Exit 0/1/2 for CI pipeline gating; JSON for debugging. Wraps `check-resolvable --json`, `doctor --json`, and migration ledger into one payload so agents can decide whether a human action is required.
- `src/commands/skillpack.ts` + `src/core/skillpack/{bundle,installer}.ts` (v0.19) — `gbrain skillpack install` drops gbrain's curated 25-skill bundle into a host workspace, managed-block style. Never clobbers local edits; tracks a skill manifest so subsequent `install --update` diffs cleanly. Bundle builder (`skillpack/bundle.ts`) packages the set from `skills/` into a versioned payload.
- `src/core/skill-manifest.ts` (v0.19) — parser for `skill-manifest.json` records. Used by skillpack installer to detect drift between the shipped bundle and the user's local edits, so updates merge instead of overwriting.
- `src/commands/routing-eval.ts` + `src/core/routing-eval.ts` (v0.19) — `gbrain routing-eval` catches user phrasings that route to the wrong skill. Reads `skills/<name>/routing-eval.jsonl` fixtures (`{intent, expected_skill, ambiguous_with?}`). Structural layer runs in `check-resolvable` by default (zero API cost); `--llm` opts into a Haiku tie-break layer for CI. False positives surface before users hit them.
- `src/core/filing-audit.ts` + `skills/_brain-filing-rules.json` (v0.19) — Check 6 of `check-resolvable`. Parses new `writes_pages:` / `writes_to:` frontmatter on skills and audits their filing claims against the filing-rules JSON. Warning-only in v0.19, upgrades to error in v0.20.
- `src/core/dry-fix.ts` — `gbrain doctor --fix` engine. `autoFixDryViolations(fixes, {dryRun})` rewrites inlined rules to `> **Convention:** see [path](path).` callouts via three shape-aware expanders (bullet / blockquote / paragraph). Five guards: working-tree-dirty (`getWorkingTreeStatus()` returns 3-state `'clean' | 'dirty' | 'not_a_repo'`), no-git-backup, inside-code-fence, already-delegated (40-line proximity, consistent with detector), ambiguous-multi-match, block-is-callout. `execFileSync` array args (no shell — no injection surface). EOF newline preserved.
- `src/core/backoff.ts` — Adaptive load-aware throttling: CPU/memory checks, exponential backoff, active hours multiplier
- `src/core/fail-improve.ts` — Deterministic-first, LLM-fallback loop with JSONL failure logging and auto-test generation
Expand Down Expand Up @@ -113,7 +120,7 @@ strict behavior when unset.
- `docs/guides/diligence-ingestion.md` — Data room to brain pages pipeline
- `docs/designs/HOMEBREW_FOR_PERSONAL_AI.md` — 10-star vision for integration system
- `docs/mcp/` — Per-client setup guides (Claude Desktop, Code, Cowork, Perplexity)
- `docs/benchmarks/` — Search quality benchmark results (reproducible, fictional data)
- BrainBench (benchmark suite + corpus): lives in the separate [gbrain-evals](https://github.com/garrytan/gbrain-evals) repo. Not installed alongside gbrain.
- `skills/_brain-filing-rules.md` — Cross-cutting brain filing rules (referenced by all brain-writing skills)
- `skills/RESOLVER.md` — Skill routing table (based on the agent-fork AGENTS.md pattern)
- `skills/conventions/` — Cross-cutting rules (quality, brain-first, model-routing, test-before-bulk, cross-modal)
Expand Down Expand Up @@ -144,6 +151,21 @@ strict behavior when unset.
- `src/commands/report.ts` — Structured report saver (audit trail for maintenance/enrichment)
- `openclaw.plugin.json` — ClawHub bundle plugin manifest

### BrainBench — in a sibling repo (v0.20+)

BrainBench — the public benchmark for personal-knowledge agent stacks — lives in
[github.com/garrytan/gbrain-evals](https://github.com/garrytan/gbrain-evals). It
depends on gbrain as a consumer; gbrain never pulls in the ~5MB eval corpus or
the pdf-parse dev dep at install time.

gbrain's public API surface (the exports map in `package.json`) is what
gbrain-evals consumes: `gbrain/engine`, `gbrain/types`, `gbrain/operations`,
`gbrain/pglite-engine`, `gbrain/link-extraction`, `gbrain/import-file`,
`gbrain/transcription`, `gbrain/embedding`, `gbrain/config`, `gbrain/markdown`,
`gbrain/backoff`, `gbrain/search/hybrid`, `gbrain/search/expansion`,
`gbrain/extract`. Removing any of these is a breaking change for the
gbrain-evals consumer.

## Commands

Run `gbrain --help` or `gbrain --tools-json` for full command reference.
Expand Down Expand Up @@ -236,7 +258,15 @@ parity), `test/cli.test.ts` (CLI structure), `test/config.test.ts` (config redac
`test/sync.test.ts` (sync logic + v0.12.3 regression guard asserting top-level `engine.transaction` is not called),
`test/doctor.test.ts` (doctor command + v0.12.3 assertions that `jsonb_integrity` scans the four v0.12.0 write sites and `markdown_body_completeness` is present),
`test/utils.test.ts` (shared SQL utilities + `tryParseEmbedding` null-return and single-warn semantics),
`test/build-llms.test.ts` (llms.txt/llms-full.txt generator: path resolution, idempotence, spec shape, regen-drift guard, content contract, AGENTS.md install-path mirror, size-budget enforcement — 7 cases).
`test/build-llms.test.ts` (llms.txt/llms-full.txt generator: path resolution, idempotence, spec shape, regen-drift guard, content contract, AGENTS.md install-path mirror, size-budget enforcement — 7 cases),
`test/check-resolvable-cli.test.ts` (v0.19 CLI wrapper: exit codes, JSON envelope shape, AGENTS.md fallback chain),
`test/regression-v0_16_4.test.ts` (findRepoRoot regression guard — hermetic startDir parameterization),
`test/filing-audit.test.ts` (v0.19 Check 6: `writes_pages` / `writes_to` frontmatter, filing-rules JSON validation),
`test/routing-eval.test.ts` (v0.19 Check 5: fixture parsing, structural routing, ambiguous_with, Haiku tie-break layer),
`test/skill-manifest.test.ts` (v0.19 skill manifest parser: drift detection, managed-block markers),
`test/skillify-scaffold.test.ts` (v0.19 `gbrain skillify scaffold` stubs: SKILL.md, script, tests, routing-eval fixtures),
`test/skillpack-install.test.ts` (v0.19 `gbrain skillpack install` managed-block install / update / no-clobber semantics),
`test/skillpack-sync-guard.test.ts` (v0.19 sync-guard: bundled skills stay byte-identical to `skills/` source).

E2E tests (`test/e2e/`): Run against real Postgres+pgvector. Require `DATABASE_URL`.
- `bun run test:e2e` runs Tier 1 (mechanical, all operations, no API keys). Includes 9 dedicated cases for the postgres-engine `addLinksBatch` / `addTimelineEntriesBatch` bind path — postgres-js's `unnest()` binding is structurally different from PGLite's and gets its own coverage.
Expand All @@ -245,6 +275,7 @@ E2E tests (`test/e2e/`): Run against real Postgres+pgvector. Require `DATABASE_U
- `test/e2e/postgres-jsonb.test.ts` — v0.12.2 regression test. Round-trips all 5 JSONB write sites (pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata, page_versions.frontmatter) against real Postgres and asserts `jsonb_typeof='object'` plus `->>'key'` returns the expected scalar. The test that should have caught the original double-encode bug.
- `test/e2e/jsonb-roundtrip.test.ts` — v0.12.3 companion regression against the 4 doctor-scanned JSONB sites. Assertion-level overlap with `postgres-jsonb.test.ts` is intentional defense-in-depth: if doctor's scan surface ever drifts from the actual write surface, one of these tests catches it.
- `test/e2e/upgrade.test.ts` runs check-update E2E against real GitHub API (network required)
- `test/e2e/openclaw-reference-compat.test.ts` (v0.19) — exercises `check-resolvable` + `skillpack install` against a minimal AGENTS.md workspace fixture (`test/fixtures/openclaw-reference-minimal/`), regression guard for the 107-skill OpenClaw deployment shape
- Tier 2 (`skills.test.ts`) requires OpenClaw + API keys, runs nightly in CI
- If `.env.testing` doesn't exist in this directory, check sibling worktrees for one:
`find ../ -maxdepth 2 -name .env.testing -print -quit` and copy it here if found.
Expand Down Expand Up @@ -465,7 +496,7 @@ Voice rules:

Source material to pull from:
- CHANGELOG.md previous entry for prior context
- `docs/benchmarks/[latest].md` for the headline numbers
- Latest `gbrain-evals/docs/benchmarks/[latest].md` for headline numbers (sibling repo)
- Recent commits (`git log <prev-version>..HEAD --oneline`) for what shipped
- Don't make up numbers. If a metric isn't in a benchmark or production data, don't
include it. Say "no measurement yet" if asked.
Expand Down
Loading
Loading