fix: add timeout to doctor frontmatter_integrity check#1287
Open
garrytan-agents wants to merge 1 commit into
Open
fix: add timeout to doctor frontmatter_integrity check#1287garrytan-agents wants to merge 1 commit into
garrytan-agents wants to merge 1 commit into
Conversation
On brains with 200K+ pages, the frontmatter scan walks every .md file on disk across all registered sources. This synchronous FS walk can take minutes (observed: >60s on a 216K-page brain with 3 sources), causing the doctor command to appear hung. scanBrainSources already supports an AbortSignal via opts.signal — the walkDir callback checks signal.aborted on every file, and the source loop breaks on abort. This commit passes AbortSignal.timeout (default 30s) from the doctor caller so the check degrades gracefully instead of blocking the entire health report. Configurable via GBRAIN_DOCTOR_FM_TIMEOUT_MS for brains that need more or less time. When the timeout fires, doctor reports a warn with instructions to run the full scan directly.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On brains with 200K+ pages,
gbrain doctorhangs indefinitely during thefrontmatter_integritycheck. The check callsscanBrainSources, which synchronously walks every.mdfile on disk across all registered federated sources. On a production brain with 216K pages and 3 sources (default + zion-brain + media-corpus), this walk takes >60s and makes the doctor command appear hung.The monitoring system that calls
gbrain doctoron a cron has a 60s timeout — so the frontmatter check causes the entire health report to fail, masking all other checks.Root Cause
scanBrainSourcesalready supports anAbortSignalviaopts.signal— thewalkDircallback checkssignal.abortedon every file (line 430 of brain-writer.ts), and the source loop breaks on abort (line 382). However, the doctor caller indoctor.tsnever passes a signal, so the scan runs unbounded.Solution
Pass
AbortSignal.timeout(30000)from the doctor caller toscanBrainSources. When the timeout fires:warnstatus with instructions to run the full scan directlyThe timeout is configurable via
GBRAIN_DOCTOR_FM_TIMEOUT_MS(default: 30s) for brains that need more or less time.Changes
src/commands/doctor.ts(frontmatter_integrity section):AbortSignal.timeout(fmTimeoutMs)before the scanscanBrainSources(engine, { signal: fmAbort })AbortErrorin catch block and report actionable messageResults
gbrain frontmatter validatedirectlyTesting
gbrain frontmatter validate <path>still works independently for targeted repair