Skip to content

autopilot: cached sync anchor commit invalidated by upstream force-push / hard-reset triggers indefinite full-reimport loop; cycle-failure-cap fails to exit because launchd plist sets KeepAlive=true #1205

@bwarminski

Description

@bwarminski

Environment

  • gbrain 0.26.0 (bun script install via bun install -g gbrain)
  • macOS, PGLite engine (~/.gbrain/config.json engine: pglite)
  • Autopilot installed via gbrain autopilot --install → launchd plist at ~/Library/LaunchAgents/com.gbrain.autopilot.plist with KeepAlive=true and RunAtLoad=true
  • Autopilot repo: a git working tree that another tool may force-push or hard-reset (in my case, ~/.claude/skills/gstack which gets git reset --hard origin/main during gstack auto-upgrades)

Trigger

  1. gbrain autopilot is running normally against --repo <X>. It has cached a sync anchor commit somewhere in its sync state (this case: 17d8df4d).
  2. Something outside gbrain does git reset --hard origin/main (or any other history-rewriting operation) inside <X>. The cached anchor commit is no longer reachable.
  3. Next autopilot cycle: git cat-file <anchor> fails. Autopilot logs:
    fatal: git cat-file: could not get object info
    Sync anchor commit 17d8df4d missing (force push?). Running full reimport.
    
  4. Full reimport starts, never converges. Cycle counter eventually hits "cycle-failure-cap" and logs Autopilot stopping (cycle-failure-cap).
  5. Process exits.
  6. launchd restarts the process within seconds (KeepAlive=true).
  7. New process inherits the same stale anchor, repeats from step 3.

The loop is CPU-bound (~99%) because each reimport is a hot loop reading + chunking + extracting 100+ markdown files.

What actually broke for me

Because PGLite is single-process and autopilot held the database file open continuously, gbrain serve (the MCP entry point) could not acquire the database and blocked at startup. Claude Code's MCP client timed out the connection, surfacing as mcp__gbrain__* disconnected in every session for hours. The autopilot loop was invisible from gbrain doctor (--fast --json showed status: warnings, health_score: 90) because doctor checks didn't try to acquire the DB.

This is the lock-contention case that #677 anticipates, with autopilot as the antagonist holding the lock.

Evidence

~/.gbrain/autopilot.err (excerpt across multiple respawns):

fatal: git cat-file: could not get object info
Sync anchor commit 17d8df4d missing (force push?). Running full reimport.
[gbrain import] Skipping symlink: …
[import.files] start
[import.files] 4/123 (3%) imported=4 skipped=0 errors=0
…
[import.files] 78/123 (63%) imported=78 skipped=0 errors=0
[cycle.lint] start
[cycle.lint] done
[cycle.backlinks] start
…
[cycle.sync] start
fatal: git cat-file: could not get object info
Sync anchor commit 17d8df4d missing (force push?). Running full reimport.
…

~/.gbrain/autopilot.log (showing cycle-failure-cap exit and immediate respawn):

[cycle-inline partial] lint=0 backlinks=0 synced=0 extracted=0 embedded=0 orphans=124
[cycle] score=10 elapsed=1s next=150s
Autopilot stopping (cycle-failure-cap).
Autopilot starting. Repo: /Users/…/.claude/skills/gstack, interval: 300s
[autopilot] running steps inline (engine=pglite)
Running full import of /Users/…/.claude/skills/gstack...
Found 123 markdown files
Stale lock file found (>10 min). Taking over.

Process info during the loop:

PID    %CPU  ETIME    COMMAND
7995   99.0  15:21    bun /Users/…/.bun/bin/gbrain autopilot --repo /Users/…/.claude/skills/gstack

Expected behavior

  • A missing sync anchor commit should trigger at most one full reimport, after which a new anchor is established at current HEAD.
  • If full reimport cannot converge, autopilot should exit with a non-zero status and a clear log message; under KeepAlive=true the OS will restart it, but at the very least the loop should make forward progress.
  • Alternatively: cycle-failure-cap should write a sentinel file or set state such that autopilot self-disables on next launch until manually re-enabled, instead of being respawned indefinitely by launchd.
  • gbrain doctor should surface "autopilot loop" or "DB held by long-running process" as a check (related to PGLite MCP server and maintenance commands need a cooperative single-owner mode #677's cooperative-single-owner direction).

Actual behavior

  • Full reimport runs every cycle but never updates the sync anchor (or updates it to a value that doesn't survive a process restart).
  • cycle-failure-cap exits the process but launchd respawns it. The "stopping" message is misleading — autopilot is not actually stopped, the OS supervises it back up.
  • MCP server is silently unavailable for as long as the loop runs.

Workaround

launchctl unload ~/Library/LaunchAgents/com.gbrain.autopilot.plist
pkill -9 -f "gbrain autopilot"
rm -f ~/.gbrain/autopilot.lock

Once autopilot is unloaded, gbrain serve acquires the DB cleanly and MCP works. Re-enabling autopilot would re-trigger the loop, so I'm leaving it off pending this fix.

Suggested fixes (non-prescriptive)

  1. When git cat-file <anchor> fails, after the full reimport completes, write the new anchor as the current HEAD of the repo, not the previously-cached value. The current code seems to keep the old anchor across reimports (or never write a new one until incremental sync would have).
  2. Make cycle-failure-cap write a persistent sentinel (~/.gbrain/autopilot-disabled or similar) that the autopilot entrypoint checks before doing any work. Refuse to run until cleared manually. This breaks the launchd respawn loop without requiring the user to unload the plist.
  3. Consider whether gbrain autopilot --install should generate a plist with KeepAlive set to a dictionary (KeepAlive = { SuccessfulExit = false }) instead of true, so that a clean cycle-failure-cap exit (which is intentional, not a crash) doesn't get respawned.
  4. Address PGLite MCP server and maintenance commands need a cooperative single-owner mode #677 so that gbrain serve can coexist with autopilot on PGLite.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions