Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,54 @@

All notable changes to GBrain will be documented in this file.

## [0.37.5.0] - 2026-05-20

**`gbrain doctor` stops flagging your tags as broken when they're not.**

If you have a tag line like `tags: ["yc", "w2025", "ai"]` in your frontmatter, that's perfectly valid YAML. The doctor used to flag it anyway. One user with a 105K-page brain saw 6,981 of these flagged at once. The fix: the validator now parses suspicious values with a real YAML parser before complaining. Valid YAML stops getting flagged. Genuinely broken titles like `title: "Foo "bar" baz"` still get caught.

**How to use it**

```bash
gbrain upgrade
gbrain doctor --json | jq '.checks[]
| select(.name=="frontmatter_integrity")
| .breakdown.NESTED_QUOTES'
```

The NESTED_QUOTES count on your brain should drop toward zero on next `gbrain doctor`. No data rewrite needed. No `gbrain frontmatter generate --fix` sweep. The existing files are already valid YAML.

**Why this is the right layer**

`@garrytan-agents` opened PR #1217 with a one-line fix to the emitter side: switch tag serialization from `JSON.stringify` (double-quoted) to single-quoted YAML. That made the headline 6,981-error case go away by changing what new writes look like. But Codex's outside-voice review during planning caught a deeper bug: even with a perfect emitter, the validator at `src/core/markdown.ts:219-238` was a raw quote counter (`count(unescaped ") >= 3 => error`) that doesn't understand YAML at all. It would still flag a clean single-quoted scalar like `title: 'a: "b" "c"'` (6 unescaped `"` characters, but valid YAML). The fix had to land on the dumb side, not the emitter side.

PR #1217 was closed; thanks to @garrytan-agents for the 6,981-error signal that exposed the underlying class.

**What's safe to know about**

- The fix is additive. Lines with `< 3` unescaped quotes still pass instantly (existing fast path). Only suspicious lines pay the per-line YAML parse, and only when count >= 3 (rare on healthy data).
- `js-yaml@3.14.2` is now a direct dependency (was transitive via gray-matter). Adding a direct pin so a future gray-matter major bump can't yank the import.
- The frontmatter emitter at `src/core/frontmatter-inference.ts` is unchanged. Existing tag style (`tags: ["yc"]`) is now correctly recognized as valid; the cosmetic consistency with `brain-writer.ts:184`'s single-quote repair style is a follow-up TODO, not part of this fix.

### Itemized changes

#### Fixed

- `src/core/markdown.ts:219-238` — NESTED_QUOTES validator now disambiguates via `js-yaml.safeLoad`. The count-of-quotes heuristic stays as a fast path; suspicious lines (count >= 3) are parsed before being flagged. Closes the 6,981-error class for any brain whose frontmatter has been valid all along.
- `js-yaml` declared as a direct dependency in `package.json`; `@types/js-yaml` added to devDependencies. `bun.lock` re-resolves the transitive entry to a top-level pin (no version change).

#### Added

- 5 new YAML-aware regression cases in `test/markdown-validation.test.ts`:
- flow sequence with quoted tags does NOT trigger (6,981-error regression guard)
- single-quoted scalar with literal inner double quotes does NOT trigger
- escaped-as-`''` quotes inside flow seq do NOT trigger
- genuinely broken nested quotes STILL trigger
- unclosed bracket STILL surfaces NESTED_QUOTES or YAML_PARSE (never silent)

#### For contributors

- `js-yaml` is now an explicit direct dep. New code that needs YAML emission/parsing should import from it directly rather than relying on gray-matter's transitive resolution.
## [0.37.4.0] - 2026-05-20

**A nightly safety net for the bug class that bit gbrain 10 times in 2 years.**
Expand Down
3 changes: 3 additions & 0 deletions TODOS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# TODOS


## v0.37.5.0 NESTED_QUOTES validator follow-up

- [ ] **v0.37.x+: unify `serializeFrontmatter` tag/title quoting with `brain-writer.ts:184`'s single-quote-with-`''`-escape style for consistency.** Cosmetic only now that the validator at `src/core/markdown.ts:219-238` is YAML-aware (v0.37.5.0). Today the emitter still produces `tags: ["yc"]` (double-quoted via `JSON.stringify`) while the repair path produces `tags: ['yc']` (single-quoted). Both are valid YAML and the validator accepts both, so this is cosmetic — but new writes drifting from repair-side output reads as inconsistency. Original signal: PR #1217 by @garrytan-agents (closed in favor of the validator fix). Touch `src/core/frontmatter-inference.ts:391-416` only; should be ~5 LOC + the existing test at `test/frontmatter-inference.test.ts:239` updated.
## v0.37.4.0 pgGraph CI scaffolding follow-ups (v0.37.x+)

- [ ] **T8 truncation signal — defer until dedupe-then-cap SQL + Postgres parity E2E.** v0.37.4.0 ships `frontierCap` as the actually-useful protection but strips the `onTruncation` callback after /review adversarial pass (Claude + Codex both flagged). Two bugs in the v1 algorithm: (a) FALSE POSITIVE — `count == cap` at a depth fires the callback even when the graph organically has exactly cap unique nodes at that depth with no truncation; (b) FALSE NEGATIVE — recursive `LIMIT N` runs BEFORE outer `SELECT DISTINCT`, so diamond graphs (one parent fans out to N+5 candidates with duplicates) can have the LIMIT eat its slots on dupes, then DISTINCT collapses to <cap unique nodes, missing real truncation. Fix shape: rewrite both engine impls to dedupe candidates (by `(slug, id)` or page id, source-scoped) BEFORE applying the LIMIT — i.e., `(SELECT DISTINCT ON ... ORDER BY slug, id LIMIT N)` inside the recursive term instead of post-CTE DISTINCT. Then write the missing `test/e2e/engine-parity-frontier-cap.test.ts` (Postgres against PGLite, identical chosen slugs when cap fires + stable ordering). Restore `TruncationInfo` + `opts.onTruncation` to `TraverseGraphOpts` with the cap-after-dedupe shape. Callers that need truncation visibility in the interim can compare `result.length` against expected fanout bounds. /review found it; not a blocker for v0.37.4.0 because the cap itself works correctly and is back-compat (default unset = no behavior change).
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.37.4.0
0.37.5.0
4 changes: 4 additions & 0 deletions bun.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "gbrain",
"version": "0.37.4.0",
"version": "0.37.5.0",
"description": "Postgres-native personal knowledge brain with hybrid RAG search",
"type": "module",
"main": "src/core/index.ts",
Expand Down Expand Up @@ -101,6 +101,7 @@
"express-rate-limit": "^7.5.0",
"gray-matter": "^4.0.3",
"heic-decode": "^2.1.0",
"js-yaml": "^3.14.2",
"marked": "^18.0.0",
"openai": "^4.0.0",
"pgvector": "^0.2.0",
Expand All @@ -114,6 +115,7 @@
"@types/cookie-parser": "^1.4.7",
"@types/cors": "^2.8.19",
"@types/express": "^5.0.6",
"@types/js-yaml": "^3.12.10",
"bun-types": "^1.3.13",
"fast-check": "^4.8.0",
"typescript": "^5.6.0"
Expand Down
26 changes: 23 additions & 3 deletions src/core/markdown.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import matter from 'gray-matter';
import { safeLoad as yamlSafeLoad } from 'js-yaml';
import type { PageType } from './types.ts';
import { slugifyPath } from './sync.ts';

Expand Down Expand Up @@ -217,8 +218,14 @@ function collectValidationErrors(
}

// 5. NESTED_QUOTES — common breakage pattern: `title: "Name "Nick" Last"`.
// Detect any frontmatter `key: ...` line whose value contains 3 or more
// unescaped double-quote characters. A clean quoted value has 2.
// The heuristic: a frontmatter `key: value` line with 3+ unescaped
// double-quote characters is suspicious. But raw quote-counting is
// too dumb: a YAML flow sequence like `tags: ["yc", "w2025"]` has
// 4 unescaped `"` by design (valid), and a single-quoted scalar
// like `title: 'a: "b" "c"'` has literal inner `"` (also valid).
// Disambiguate by running js-yaml on just the value; only flag
// lines that genuinely fail to parse. The full-frontmatter YAML
// parse error is caught separately by check 6 (YAML_PARSE) below.
for (let i = firstNonEmpty + 1; i < closeLine; i++) {
const line = lines[i];
const m = line.match(/^\s*[A-Za-z_][\w-]*\s*:\s*(.*)$/);
Expand All @@ -228,7 +235,20 @@ function collectValidationErrors(
for (let j = 0; j < value.length; j++) {
if (value[j] === '"' && (j === 0 || value[j - 1] !== '\\')) count++;
}
if (count >= 3) {
if (count < 3) continue;

// 3+ unescaped quotes — could be valid YAML (flow seq, single-quoted
// scalar with inner quotes, bare scalar with embedded quotes) or
// genuinely broken. Parse the value to disambiguate.
let isValidYaml = false;
try {
yamlSafeLoad(value);
isValidYaml = true;
} catch {
// YAML parse failed — line is genuinely broken
}

if (!isValidYaml) {
errors.push({
code: 'NESTED_QUOTES',
message: 'Nested double quotes in YAML value (use single quotes for the outer)',
Expand Down
44 changes: 44 additions & 0 deletions test/markdown-validation.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,50 @@ describe('parseMarkdown validation surface', () => {
});
});

// The validator's count-of-quotes heuristic is too dumb: it flagged
// valid YAML flow sequences (the v0.x 6,981-error class on Garry's
// brain) and single-quoted scalars with literal inner quotes. The
// fallback runs js-yaml.safeLoad on suspicious values; only flags
// genuinely unparseable lines.
describe('NESTED_QUOTES — YAML-aware fallback', () => {
test('flow sequence with quoted tags does NOT trigger (6,981-error regression guard)', () => {
const md = `${fence}\ntype: concept\ntitle: x\ntags: ["yc", "w2025", "ai"]\n${fence}\n\nbody`;
const parsed = parseMarkdown(md, undefined, { validate: true });
expect(parsed.errors!.filter(e => e.code === 'NESTED_QUOTES')).toHaveLength(0);
});

test('single-quoted scalar with literal inner double quotes does NOT trigger', () => {
// value: 'a: "b" "c" "d"' — 6 unescaped " by raw count, but valid YAML
const md = `${fence}\ntype: concept\ntitle: 'a: "b" "c" "d"'\n${fence}\n\nbody`;
const parsed = parseMarkdown(md, undefined, { validate: true });
expect(parsed.errors!.filter(e => e.code === 'NESTED_QUOTES')).toHaveLength(0);
});

test('escaped-as-single-pair quotes inside flow seq do NOT trigger', () => {
const md = `${fence}\ntype: concept\ntitle: x\ntags: ["Men''s Fashion", "yc"]\n${fence}\n\nbody`;
const parsed = parseMarkdown(md, undefined, { validate: true });
expect(parsed.errors!.filter(e => e.code === 'NESTED_QUOTES')).toHaveLength(0);
});

test('genuinely broken nested quotes STILL trigger', () => {
// Outer " followed by stray inner " — yaml.safeLoad throws.
const md = `${fence}\ntype: concept\ntitle: "Foo "bar" baz "qux" end"\n${fence}\n\nbody`;
const parsed = parseMarkdown(md, undefined, { validate: true });
expect(parsed.errors!.map(e => e.code)).toContain('NESTED_QUOTES');
});

test('unclosed bracket on a suspicious line STILL surfaces some parse error', () => {
// Either NESTED_QUOTES (line-level parse fail) or YAML_PARSE
// (whole-frontmatter parse fail) — never silent.
const md = `${fence}\ntype: concept\ntitle: x\ntags: ["yc", "w2025"\n${fence}\n\nbody`;
const parsed = parseMarkdown(md, undefined, { validate: true });
const broken = parsed.errors!.filter(
e => e.code === 'NESTED_QUOTES' || e.code === 'YAML_PARSE'
);
expect(broken.length).toBeGreaterThan(0);
});
});

describe('EMPTY_FRONTMATTER', () => {
test('--- --- with nothing between', () => {
const md = `${fence}\n${fence}\n\nbody`;
Expand Down
Loading