Skip to content

fix(frontmatter): defense-in-depth against JSON-style arrays in YAML#1238

Closed
garrytan-agents wants to merge 1 commit into
garrytan:masterfrom
garrytan-agents:fix/frontmatter-json-array-guard
Closed

fix(frontmatter): defense-in-depth against JSON-style arrays in YAML#1238
garrytan-agents wants to merge 1 commit into
garrytan:masterfrom
garrytan-agents:fix/frontmatter-json-array-guard

Conversation

@garrytan-agents
Copy link
Copy Markdown
Contributor

Problem

The #1 source of NESTED_QUOTES frontmatter errors is JSON-style arrays in YAML:

# This is what LLMs and ingestion code produce:
tags: ["yc", "w2025"]

# This is what YAML needs:
tags: ['yc', 'w2025']

JSON.stringify() wraps values in double quotes. When code does tags: [${items.map(t => JSON.stringify(t)).join(', ')}], it produces broken YAML. This caused 6,981 validation errors across a 105K-page brain — the single largest contributor to the frontmatter integrity health score.

Fix: Four Layers

1. Auto-fix on frontmatter validate --fix (brain-writer.ts)

New step 3a in autoFixFrontmatter() detects JSON-style arrays and rewrites them to single-quoted YAML. Handles apostrophes by falling back to double quotes.

2. Better detection in validator (markdown.ts)

NESTED_QUOTES detection now has two sub-patterns:

  • 5a: JSON-style arrays (clearer message: "use single quotes")
  • 5b: Original nested scalar quotes

3. Auto-normalize on put_page (operations.ts)

Every put_page call now runs autoFixFrontmatter() on incoming content before import. Non-blocking — if normalization throws, original content passes through. Agent-written pages with JSON arrays are silently fixed on write.

4. Agent guidance (frontmatter-guard SKILL.md)

New "Prevention" section with correct/incorrect YAML examples, explaining WHY JSON.stringify causes the bug and what to do instead.

Companion PRs

Impact

  • Prevents ~7K validation errors per brain
  • Silently heals agent-written pages on ingest
  • Teaches agents to avoid the pattern in the first place

Three layers to stop NESTED_QUOTES from recurring:

1. **autoFixFrontmatter (brain-writer.ts):** New step 3a detects and
   rewrites JSON-style arrays (`["x", "y"]` → `['x', 'y']`) before
   the existing nested-quote scalar fix. Handles apostrophes in values
   by falling back to double quotes. Runs on `frontmatter validate --fix`
   and `writeBrainPage({autoFix: true})`.

2. **Validator (markdown.ts):** NESTED_QUOTES detection now has two
   sub-patterns — 5a catches JSON-style arrays specifically (with a
   clearer error message: "use single quotes") and 5b catches the
   original nested scalar quotes.

3. **put_page normalization (operations.ts):** Every `put_page` call now
   runs `autoFixFrontmatter()` on incoming content before import.
   Non-blocking — if normalization throws, original content is used.
   This means agent-written pages with JSON arrays are silently fixed
   on write instead of accumulating thousands of validation errors.

4. **Agent guidance (frontmatter-guard SKILL.md):** New "Prevention"
   section with correct/incorrect YAML examples, explaining WHY
   JSON.stringify causes the bug and what to do instead. Agents that
   read this skill before writing frontmatter will avoid the pattern.

Root cause: LLMs and ingestion code use JSON.stringify for YAML array
items, producing `tags: ["yc", "w2025"]` which breaks YAML parsing.
This caused 6,981 errors across a 105K-page brain.

Companion to PR garrytan#1217 (serializer fix in frontmatter-inference.ts).
@garrytan
Copy link
Copy Markdown
Owner

Superseded by #1252 (v0.37.6.0 wave).

This PR's four-layer defense-in-depth was reviewed against the already-merged PR #1229 (validator fix, shipped as v0.37.5.0) and absorbed into the wave with two changes:

Kept (with refinements):

  • ✅ Layer 1 (auto-fix engine): narrowed allow-list to tags: / aliases: keys only. The original broad regex ([A-Za-z_][\w-]*) would have rewritten typed-numeric arrays like scores: ["1", "2"] into string arrays — caught by codex outside-voice review.
  • ✅ Layer 1: shared nestedQuotesFixed dedup gate with existing step 3 so a file with both JSON-array AND nested-scalar rewrites surfaces as ONE NESTED_QUOTES audit entry, not two.
  • ✅ Layer 4 (SKILL.md Prevention section): absorbed verbatim with v0.37.5.0-aware framing.

Dropped:

Thank you @garrytan-agents — Layer 1 + Layer 4 + the original framing made it into v0.37.6.0 via #1252. Attribution preserved via Co-Authored-By: trailer on the wave commit.

@garrytan garrytan closed this May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants