diff --git a/CHANGELOG.md b/CHANGELOG.md index 00b3dc4d6..7cf5100a5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,54 @@ All notable changes to GBrain will be documented in this file. +## [0.37.5.0] - 2026-05-20 + +**`gbrain doctor` stops flagging your tags as broken when they're not.** + +If you have a tag line like `tags: ["yc", "w2025", "ai"]` in your frontmatter, that's perfectly valid YAML. The doctor used to flag it anyway. One user with a 105K-page brain saw 6,981 of these flagged at once. The fix: the validator now parses suspicious values with a real YAML parser before complaining. Valid YAML stops getting flagged. Genuinely broken titles like `title: "Foo "bar" baz"` still get caught. + +**How to use it** + +```bash +gbrain upgrade +gbrain doctor --json | jq '.checks[] + | select(.name=="frontmatter_integrity") + | .breakdown.NESTED_QUOTES' +``` + +The NESTED_QUOTES count on your brain should drop toward zero on next `gbrain doctor`. No data rewrite needed. No `gbrain frontmatter generate --fix` sweep. The existing files are already valid YAML. + +**Why this is the right layer** + +`@garrytan-agents` opened PR #1217 with a one-line fix to the emitter side: switch tag serialization from `JSON.stringify` (double-quoted) to single-quoted YAML. That made the headline 6,981-error case go away by changing what new writes look like. But Codex's outside-voice review during planning caught a deeper bug: even with a perfect emitter, the validator at `src/core/markdown.ts:219-238` was a raw quote counter (`count(unescaped ") >= 3 => error`) that doesn't understand YAML at all. It would still flag a clean single-quoted scalar like `title: 'a: "b" "c"'` (6 unescaped `"` characters, but valid YAML). The fix had to land on the dumb side, not the emitter side. + +PR #1217 was closed; thanks to @garrytan-agents for the 6,981-error signal that exposed the underlying class. + +**What's safe to know about** + +- The fix is additive. Lines with `< 3` unescaped quotes still pass instantly (existing fast path). Only suspicious lines pay the per-line YAML parse, and only when count >= 3 (rare on healthy data). +- `js-yaml@3.14.2` is now a direct dependency (was transitive via gray-matter). Adding a direct pin so a future gray-matter major bump can't yank the import. +- The frontmatter emitter at `src/core/frontmatter-inference.ts` is unchanged. Existing tag style (`tags: ["yc"]`) is now correctly recognized as valid; the cosmetic consistency with `brain-writer.ts:184`'s single-quote repair style is a follow-up TODO, not part of this fix. + +### Itemized changes + +#### Fixed + +- `src/core/markdown.ts:219-238` — NESTED_QUOTES validator now disambiguates via `js-yaml.safeLoad`. The count-of-quotes heuristic stays as a fast path; suspicious lines (count >= 3) are parsed before being flagged. Closes the 6,981-error class for any brain whose frontmatter has been valid all along. +- `js-yaml` declared as a direct dependency in `package.json`; `@types/js-yaml` added to devDependencies. `bun.lock` re-resolves the transitive entry to a top-level pin (no version change). + +#### Added + +- 5 new YAML-aware regression cases in `test/markdown-validation.test.ts`: + - flow sequence with quoted tags does NOT trigger (6,981-error regression guard) + - single-quoted scalar with literal inner double quotes does NOT trigger + - escaped-as-`''` quotes inside flow seq do NOT trigger + - genuinely broken nested quotes STILL trigger + - unclosed bracket STILL surfaces NESTED_QUOTES or YAML_PARSE (never silent) + +#### For contributors + +- `js-yaml` is now an explicit direct dep. New code that needs YAML emission/parsing should import from it directly rather than relying on gray-matter's transitive resolution. ## [0.37.4.0] - 2026-05-20 **A nightly safety net for the bug class that bit gbrain 10 times in 2 years.** diff --git a/TODOS.md b/TODOS.md index 4c72dfe5b..42bdfedb0 100644 --- a/TODOS.md +++ b/TODOS.md @@ -1,6 +1,9 @@ # TODOS +## v0.37.5.0 NESTED_QUOTES validator follow-up + +- [ ] **v0.37.x+: unify `serializeFrontmatter` tag/title quoting with `brain-writer.ts:184`'s single-quote-with-`''`-escape style for consistency.** Cosmetic only now that the validator at `src/core/markdown.ts:219-238` is YAML-aware (v0.37.5.0). Today the emitter still produces `tags: ["yc"]` (double-quoted via `JSON.stringify`) while the repair path produces `tags: ['yc']` (single-quoted). Both are valid YAML and the validator accepts both, so this is cosmetic — but new writes drifting from repair-side output reads as inconsistency. Original signal: PR #1217 by @garrytan-agents (closed in favor of the validator fix). Touch `src/core/frontmatter-inference.ts:391-416` only; should be ~5 LOC + the existing test at `test/frontmatter-inference.test.ts:239` updated. ## v0.37.4.0 pgGraph CI scaffolding follow-ups (v0.37.x+) - [ ] **T8 truncation signal — defer until dedupe-then-cap SQL + Postgres parity E2E.** v0.37.4.0 ships `frontierCap` as the actually-useful protection but strips the `onTruncation` callback after /review adversarial pass (Claude + Codex both flagged). Two bugs in the v1 algorithm: (a) FALSE POSITIVE — `count == cap` at a depth fires the callback even when the graph organically has exactly cap unique nodes at that depth with no truncation; (b) FALSE NEGATIVE — recursive `LIMIT N` runs BEFORE outer `SELECT DISTINCT`, so diamond graphs (one parent fans out to N+5 candidates with duplicates) can have the LIMIT eat its slots on dupes, then DISTINCT collapses to = 3) { + if (count < 3) continue; + + // 3+ unescaped quotes — could be valid YAML (flow seq, single-quoted + // scalar with inner quotes, bare scalar with embedded quotes) or + // genuinely broken. Parse the value to disambiguate. + let isValidYaml = false; + try { + yamlSafeLoad(value); + isValidYaml = true; + } catch { + // YAML parse failed — line is genuinely broken + } + + if (!isValidYaml) { errors.push({ code: 'NESTED_QUOTES', message: 'Nested double quotes in YAML value (use single quotes for the outer)', diff --git a/test/markdown-validation.test.ts b/test/markdown-validation.test.ts index 0b2b03d0e..9e5f8aab0 100644 --- a/test/markdown-validation.test.ts +++ b/test/markdown-validation.test.ts @@ -135,6 +135,50 @@ describe('parseMarkdown validation surface', () => { }); }); + // The validator's count-of-quotes heuristic is too dumb: it flagged + // valid YAML flow sequences (the v0.x 6,981-error class on Garry's + // brain) and single-quoted scalars with literal inner quotes. The + // fallback runs js-yaml.safeLoad on suspicious values; only flags + // genuinely unparseable lines. + describe('NESTED_QUOTES — YAML-aware fallback', () => { + test('flow sequence with quoted tags does NOT trigger (6,981-error regression guard)', () => { + const md = `${fence}\ntype: concept\ntitle: x\ntags: ["yc", "w2025", "ai"]\n${fence}\n\nbody`; + const parsed = parseMarkdown(md, undefined, { validate: true }); + expect(parsed.errors!.filter(e => e.code === 'NESTED_QUOTES')).toHaveLength(0); + }); + + test('single-quoted scalar with literal inner double quotes does NOT trigger', () => { + // value: 'a: "b" "c" "d"' — 6 unescaped " by raw count, but valid YAML + const md = `${fence}\ntype: concept\ntitle: 'a: "b" "c" "d"'\n${fence}\n\nbody`; + const parsed = parseMarkdown(md, undefined, { validate: true }); + expect(parsed.errors!.filter(e => e.code === 'NESTED_QUOTES')).toHaveLength(0); + }); + + test('escaped-as-single-pair quotes inside flow seq do NOT trigger', () => { + const md = `${fence}\ntype: concept\ntitle: x\ntags: ["Men''s Fashion", "yc"]\n${fence}\n\nbody`; + const parsed = parseMarkdown(md, undefined, { validate: true }); + expect(parsed.errors!.filter(e => e.code === 'NESTED_QUOTES')).toHaveLength(0); + }); + + test('genuinely broken nested quotes STILL trigger', () => { + // Outer " followed by stray inner " — yaml.safeLoad throws. + const md = `${fence}\ntype: concept\ntitle: "Foo "bar" baz "qux" end"\n${fence}\n\nbody`; + const parsed = parseMarkdown(md, undefined, { validate: true }); + expect(parsed.errors!.map(e => e.code)).toContain('NESTED_QUOTES'); + }); + + test('unclosed bracket on a suspicious line STILL surfaces some parse error', () => { + // Either NESTED_QUOTES (line-level parse fail) or YAML_PARSE + // (whole-frontmatter parse fail) — never silent. + const md = `${fence}\ntype: concept\ntitle: x\ntags: ["yc", "w2025"\n${fence}\n\nbody`; + const parsed = parseMarkdown(md, undefined, { validate: true }); + const broken = parsed.errors!.filter( + e => e.code === 'NESTED_QUOTES' || e.code === 'YAML_PARSE' + ); + expect(broken.length).toBeGreaterThan(0); + }); + }); + describe('EMPTY_FRONTMATTER', () => { test('--- --- with nothing between', () => { const md = `${fence}\n${fence}\n\nbody`;