autopilot: cached sync anchor commit invalidated by upstream force-push / hard-reset triggers indefinite full-reimport loop; cycle-failure-cap fails to exit because launchd plist sets KeepAlive=true

## Environment

- gbrain `0.26.0` (bun script install via `bun install -g gbrain`)
- macOS, PGLite engine (`~/.gbrain/config.json` `engine: pglite`)
- Autopilot installed via `gbrain autopilot --install` → launchd plist at `~/Library/LaunchAgents/com.gbrain.autopilot.plist` with `KeepAlive=true` and `RunAtLoad=true`
- Autopilot repo: a git working tree that another tool may force-push or hard-reset (in my case, `~/.claude/skills/gstack` which gets `git reset --hard origin/main` during gstack auto-upgrades)

## Trigger

1. `gbrain autopilot` is running normally against `--repo <X>`. It has cached a sync anchor commit somewhere in its sync state (this case: `17d8df4d`).
2. Something outside gbrain does `git reset --hard origin/main` (or any other history-rewriting operation) inside `<X>`. The cached anchor commit is no longer reachable.
3. Next autopilot cycle: `git cat-file <anchor>` fails. Autopilot logs:
   ```
   fatal: git cat-file: could not get object info
   Sync anchor commit 17d8df4d missing (force push?). Running full reimport.
   ```
4. Full reimport starts, never converges. Cycle counter eventually hits "cycle-failure-cap" and logs `Autopilot stopping (cycle-failure-cap).`
5. Process exits.
6. launchd restarts the process within seconds (`KeepAlive=true`).
7. New process inherits the same stale anchor, repeats from step 3.

The loop is CPU-bound (~99%) because each reimport is a hot loop reading + chunking + extracting 100+ markdown files.

## What actually broke for me

Because PGLite is single-process and autopilot held the database file open continuously, **`gbrain serve` (the MCP entry point) could not acquire the database and blocked at startup**. Claude Code's MCP client timed out the connection, surfacing as `mcp__gbrain__* disconnected` in every session for hours. The autopilot loop was invisible from gbrain doctor (`--fast --json` showed `status: warnings, health_score: 90`) because doctor checks didn't try to acquire the DB.

This is the lock-contention case that #677 anticipates, with autopilot as the antagonist holding the lock.

## Evidence

`~/.gbrain/autopilot.err` (excerpt across multiple respawns):

```
fatal: git cat-file: could not get object info
Sync anchor commit 17d8df4d missing (force push?). Running full reimport.
[gbrain import] Skipping symlink: …
[import.files] start
[import.files] 4/123 (3%) imported=4 skipped=0 errors=0
…
[import.files] 78/123 (63%) imported=78 skipped=0 errors=0
[cycle.lint] start
[cycle.lint] done
[cycle.backlinks] start
…
[cycle.sync] start
fatal: git cat-file: could not get object info
Sync anchor commit 17d8df4d missing (force push?). Running full reimport.
…
```

`~/.gbrain/autopilot.log` (showing cycle-failure-cap exit and immediate respawn):

```
[cycle-inline partial] lint=0 backlinks=0 synced=0 extracted=0 embedded=0 orphans=124
[cycle] score=10 elapsed=1s next=150s
Autopilot stopping (cycle-failure-cap).
Autopilot starting. Repo: /Users/…/.claude/skills/gstack, interval: 300s
[autopilot] running steps inline (engine=pglite)
Running full import of /Users/…/.claude/skills/gstack...
Found 123 markdown files
Stale lock file found (>10 min). Taking over.
```

Process info during the loop:

```
PID    %CPU  ETIME    COMMAND
7995   99.0  15:21    bun /Users/…/.bun/bin/gbrain autopilot --repo /Users/…/.claude/skills/gstack
```

## Expected behavior

- A missing sync anchor commit should trigger **at most one** full reimport, after which a new anchor is established at current HEAD.
- If full reimport cannot converge, autopilot should exit with a non-zero status and a clear log message; under `KeepAlive=true` the OS will restart it, but at the very least the loop should make forward progress.
- Alternatively: `cycle-failure-cap` should write a sentinel file or set state such that autopilot self-disables on next launch until manually re-enabled, instead of being respawned indefinitely by launchd.
- `gbrain doctor` should surface "autopilot loop" or "DB held by long-running process" as a check (related to #677's cooperative-single-owner direction).

## Actual behavior

- Full reimport runs every cycle but never updates the sync anchor (or updates it to a value that doesn't survive a process restart).
- `cycle-failure-cap` exits the process but launchd respawns it. The "stopping" message is misleading — autopilot is not actually stopped, the OS supervises it back up.
- MCP server is silently unavailable for as long as the loop runs.

## Workaround

```bash
launchctl unload ~/Library/LaunchAgents/com.gbrain.autopilot.plist
pkill -9 -f "gbrain autopilot"
rm -f ~/.gbrain/autopilot.lock
```

Once autopilot is unloaded, `gbrain serve` acquires the DB cleanly and MCP works. Re-enabling autopilot would re-trigger the loop, so I'm leaving it off pending this fix.

## Suggested fixes (non-prescriptive)

1. When `git cat-file <anchor>` fails, after the full reimport completes, write the new anchor as the **current HEAD of the repo**, not the previously-cached value. The current code seems to keep the old anchor across reimports (or never write a new one until incremental sync would have).
2. Make `cycle-failure-cap` write a persistent sentinel (`~/.gbrain/autopilot-disabled` or similar) that the autopilot entrypoint checks before doing any work. Refuse to run until cleared manually. This breaks the launchd respawn loop without requiring the user to unload the plist.
3. Consider whether `gbrain autopilot --install` should generate a plist with `KeepAlive` set to a dictionary (`KeepAlive = { SuccessfulExit = false }`) instead of `true`, so that a clean `cycle-failure-cap` exit (which is intentional, not a crash) doesn't get respawned.
4. Address #677 so that `gbrain serve` can coexist with autopilot on PGLite.

## Related

- #677 PGLite MCP server and maintenance commands need a cooperative single-owner mode (the architectural root)
- #1162 autopilot: reconnect loop after 5 consecutive failures (sibling failure mode, different trigger)
- #1078 Autopilot/dream sync fails with FK constraint error (adjacent)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

autopilot: cached sync anchor commit invalidated by upstream force-push / hard-reset triggers indefinite full-reimport loop; cycle-failure-cap fails to exit because launchd plist sets KeepAlive=true #1205

Environment

Trigger

What actually broke for me

Evidence

Expected behavior

Actual behavior

Workaround

Suggested fixes (non-prescriptive)

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

autopilot: cached sync anchor commit invalidated by upstream force-push / hard-reset triggers indefinite full-reimport loop; cycle-failure-cap fails to exit because launchd plist sets KeepAlive=true #1205

Description

Environment

Trigger

What actually broke for me

Evidence

Expected behavior

Actual behavior

Workaround

Suggested fixes (non-prescriptive)

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions