Skip to content

v0.37.4.0 feat: pgGraph-inspired CI scaffolding wave (heavy tests + fuzz + RSS gate + frontier cap)#1228

Merged
garrytan merged 2 commits into
masterfrom
garrytan/austin
May 21, 2026
Merged

v0.37.4.0 feat: pgGraph-inspired CI scaffolding wave (heavy tests + fuzz + RSS gate + frontier cap)#1228
garrytan merged 2 commits into
masterfrom
garrytan/austin

Conversation

@garrytan
Copy link
Copy Markdown
Owner

Summary

Adopts CI/test scaffolding patterns from pgGraph (a Postgres-extension project we evaluated and rejected as a runtime dep, but whose tests/heavy/ directory closes bug classes that have bitten gbrain in production). Plus one opt-in production change: BFS frontier cap on traverseGraph.

The wave landed in one commit after three plan-review passes (CEO scope + Eng dual-voice + Codex 2nd-pass verification of the revised plan).

Performance / robustness:

  • tests/heavy/measure_rss.sh — peak-RSS measurement against a 200-page synthetic workload; informational-only on macOS, baseline refresh gated to Linux CI runners
  • tests/heavy/read_latency_under_sync.sh — search p50/p95/p99 with baseline (no writes) vs under-load (parallel writers); delta_pct reported per percentile

Bug-class prevention (catches issues caught only in prod historically):

  • tests/heavy/pg_upgrade_matrix.sh — walks pre-v0.13 and pre-v0.18 simulated brain shapes forward to head via the engine's bootstrap → SCHEMA_SQL → migrations → verifySchema chain. Catches whole-system upgrade wedges. Honest contract: multi-layer healing means single-probe regressions aren't isolated here — test/schema-bootstrap-coverage.test.ts covers that.
  • test/fuzz/ — fast-check property tests over 8 trust-boundary validators. 2 are bundle-pure (escapeLikePattern, parseFactsFence) and guarded by scripts/check-fuzz-purity.sh (wired into verify). 5 are property-tested without the purity guarantee. 1 (validateUploadPath) gets its own fs-backed file with temp-dir confinement.
  • tests/heavy/sync_lock_regression.sh — N concurrent gbrain sync against one DB; asserts 1 winner + N-1 fast-fail lock-busy + zero leaked gbrain_cycle_locks rows. Correct semantics — eng review caught that the original plan asserted "wait + queue" but performSync actually fails fast.

Engine API addition (back-compat):

  • BrainEngine.traverseGraph(slug, depth, opts?) opts gain frontierCap?: number + onTruncation?: (info: TruncationInfo) => void. Both Postgres and PGLite implement parenthesized LIMIT N ORDER BY (slug, id) inside the recursive CTE. Return shape unchanged — Promise<GraphNode[]> preserved for MCP wire stability. Per-call callback closure (NOT engine-instance state) so concurrent traversals don't cross-talk.

Infra / dev experience:

  • tests/heavy/ directory convention + scripts/run-heavy.sh + bun run test:heavy. Helper files prefixed with _ are skipped by the runner.
  • .github/workflows/heavy-tests.yml — cron '17 8 * * *' + pull_request labeled trigger (heavy-tests) + Postgres service + artifact upload on failure. Pinned action SHAs per CLAUDE.md convention.

Test Coverage

Layer What Status
Unit tests 8052 pass / 0 fail across 8 shards (345s wallclock)
T8 regression 5 contracts pinned (cap-unset, cap-hit, cap-not-hit, MCP wire-shape, concurrency) ✅ all pass
Fuzz suite 12 properties × 1000 runs across 3 test files ✅ ~3s, runs in default bun test
bun run verify All 14 pre-checks + typecheck ✅ green
bun run test:heavy (PGLite-only paths) RSS gate + read-latency under sync ✅ green
bun run test:heavy (Postgres paths) pg_upgrade_matrix + sync_lock_regression — smoke-tested locally ✅ green

The new check:fuzz-purity gate ran against the pure-target list; all 2 verified bundle-pure (zero transitive node:fs / node:child_process / engine imports).

Pre-Landing Review

Three review passes before any code:

  1. CEO scope review — selected Approach C (full sweep, 9 tasks) over Approach B
  2. Eng dual-voice (Claude subagent + Codex) — 8 convergent CRITICAL/HIGH findings, all corrected in the plan before implementation
  3. Codex 2nd-pass verification on the revised plan — caught 3 NEW issues the first pass missed:
    • lastTraverseTruncation as engine-instance state would have been concurrency-unsafe. Switched to per-call onTruncation callback.
    • require.cache snapshot for the fuzz purity guard is theatrical under Bun's ESM loader. Switched to bun build --target=bun + grep (bundle-true verification).
    • Committed 50K-page PGLite fixture would have been a repo-size risk and contradicted T1's no-blobs principle. Switched to in-process synthesis.

All 3 caught issues addressed before this PR.

Plan Completion

All 9 tasks complete (T1-T9). Plan file: ~/.claude/plans/system-instruction-you-are-working-sorted-gizmo.md.

Task Description Verified
T1 Schema-migration matrix (deterministic builder)
T2 Fuzz harness + purity guard
T3 RSS budget gate (Linux-only baseline)
T4 tests/heavy convention + runner
T5 heavy-tests.yml CI workflow
T6 read-latency-under-sync
T7 sync lock regression
T8 BFS frontier cap on traverseGraph (prototype-first design) ✅ 5 contracts pinned
T9 docs + CHANGELOG

TODOS

No items closed by this wave. The 6GB RES on query`` observation in TODOS.md gets observability from T3's RSS gate but isn't fixed by it — leaving the entry open.

Documentation

CLAUDE.md updated:

  • File taxonomy section gains tests/heavy/*.sh entry (with underscore-prefix helper convention)
  • File taxonomy section gains test/fuzz/*.test.ts entry (with the purity-guard mechanism explanation)
  • traverseGraph entry notes the new TraverseGraphOpts + TruncationInfo exports and the per-call callback design

llms-full.txt regenerated.

Test plan

  • All 8052 unit tests pass
  • bun run verify green (including new check:fuzz-purity)
  • bun run test:heavy green locally (PGLite paths and Postgres paths via gbrain-test-pg container)
  • T8 regression test pins 5 contracts including concurrency independence
  • gh workflow view heavy-tests will validate the cron once the workflow file lands on master

🤖 Generated with Claude Code

@garrytan garrytan changed the title v0.40.1.0 feat: pgGraph-inspired CI scaffolding wave (heavy tests + fuzz + RSS gate + frontier cap) v0.37.4.0 feat: pgGraph-inspired CI scaffolding wave (heavy tests + fuzz + RSS gate + frontier cap) May 20, 2026
Schema-migration matrix + fuzz harness + RSS budget gate + read-latency
under sync + sync lock regression + tests/heavy convention + nightly CI
workflow + BFS frontier cap on traverseGraph.

CI infra (T1-T7):
- tests/heavy/ directory convention + scripts/run-heavy.sh + bun run test:heavy
- tests/heavy/pg_upgrade_matrix.sh: walk pre-v0.13 + pre-v0.18 brain shapes
  forward to head via bootstrap → SCHEMA_SQL → migrations → verifySchema
- test/fuzz/{pure,mixed,filesystem}-validators.test.ts: 1000-run fast-check
  property tests across 8 trust-boundary validators
- scripts/check-fuzz-purity.sh: bun-bundle + grep guard, wired into verify
- tests/heavy/measure_rss.sh: in-memory PGLite workload + peak RSS measurement
  via /proc/self/status (Linux) or process.memoryUsage().rss fallback (macOS,
  refuses to write baseline)
- tests/heavy/read_latency_under_sync.sh: phase A baseline + phase B under
  parallel writer load, reports p50/p95/p99 + delta_pct
- tests/heavy/sync_lock_regression.sh: N concurrent gbrain sync against one
  DB, asserts 1 winner + N-1 lock-busy + zero leaked gbrain_cycle_locks rows
- .github/workflows/heavy-tests.yml: cron '17 8 * * *' + heavy-tests label
  trigger + Postgres service + artifact upload on failure

Engine (T8):
- BrainEngine.traverseGraph opts gain frontierCap?: number + onTruncation?:
  (info: TruncationInfo) => void callback. Return shape preserved
  (Promise<GraphNode[]>) for MCP wire stability.
- Postgres CTE: parenthesized LIMIT N ORDER BY (slug, id) inside recursive term.
- PGLite: same SQL with positional params.
- Per-call callback closure — not engine-instance state — so concurrent
  traversals on the same engine don't cross-talk. 5 contracts pinned in
  test/regressions/v0_36_frontier_cap.test.ts.

Three plan-review passes ran before any code: CEO scope review (Approach C),
Eng dual-voice review (Claude subagent + Codex), and Codex 2nd-pass against
the revised plan. The 2nd pass caught issues the first two missed (Bun ESM
vs require.cache; engine-instance metadata stomping under concurrency;
fixture-size inconsistency). All addressed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Resolves:
- VERSION: keep branch's 0.37.4.0 (master at 0.37.3.0; my slot is next)
- package.json: keep 0.37.4.0; merge `verify` to include BOTH new gates —
  master's check:skill-brain-first AND branch's check:fuzz-purity
- CHANGELOG.md: strip markers; both sides' entries kept
  (0.37.4.0 above master's 0.37.3.0 + 0.37.2.0)
- TODOS.md: strip markers; both sides' new follow-up sections kept
  (branch's pgGraph follow-ups + master's skill_brain_first follow-ups)

Trio agrees: VERSION=package.json=CHANGELOG=0.37.4.0.
Verify + typecheck clean. T8 + fuzz tests still pass on merged state.
garrytan added a commit that referenced this pull request May 21, 2026
Same content, different slot in the version queue. v0.40.0.1 was the
queue allocator's default safe slot (bumped past PR #1128's claimed
0.40.0.0). v0.37.5.0 is a PATCH above #1228's claimed 0.37.4.0 and
sits closer to current master (0.37.1.0) in CHANGELOG ordering.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@garrytan garrytan merged commit 9a3ef3c into master May 21, 2026
8 checks passed
garrytan added a commit that referenced this pull request May 21, 2026
…agging valid YAML) (#1229)

* fix(markdown): YAML-aware NESTED_QUOTES validator

The validator at src/core/markdown.ts:219-238 was a syntactic
count-of-quotes heuristic that flagged any frontmatter line with 3+
unescaped " characters. That heuristic is too dumb: valid YAML flow
sequences like `tags: ["yc", "w2025"]` and single-quoted scalars like
`title: 'a: "b" "c"'` both have 3+ unescaped " by design.

Fix: keep the count fast path, then disambiguate with js-yaml.safeLoad
on the value. Only flag lines that genuinely fail to parse. The
full-frontmatter YAML_PARSE check (check 6) still catches structural
failures.

Closes the 6,981-error class on Garry's 105K-page brain in one ~10
LOC change — existing data on disk was already valid YAML; the
validator was wrong about it. No `gbrain frontmatter generate --fix`
sweep needed.

js-yaml@3.14.2 promoted from transitive (via gray-matter) to direct
dependency. @types/js-yaml@3.12.10 added to devDependencies.

5 new YAML-aware test cases in test/markdown-validation.test.ts:
- flow sequence with quoted tags does NOT trigger (6,981 regression guard)
- single-quoted scalar with literal inner double quotes does NOT trigger
- escaped-as-'' quotes inside flow seq do NOT trigger
- genuinely broken nested quotes STILL trigger
- unclosed bracket STILL surfaces NESTED_QUOTES or YAML_PARSE

Closes PR #1217 — outside-voice (codex) review caught that the bug
was the validator, not the emitter. Original 6,981-error signal from
@garrytan-agents.

* chore: bump version and changelog (v0.40.0.1)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore: retarget version slot v0.40.0.1 -> v0.37.5.0

Same content, different slot in the version queue. v0.40.0.1 was the
queue allocator's default safe slot (bumped past PR #1128's claimed
0.40.0.0). v0.37.5.0 is a PATCH above #1228's claimed 0.37.4.0 and
sits closer to current master (0.37.1.0) in CHANGELOG ordering.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant