This file defines what the agent checks on a regular cadence to stay healthy. Think of it as a cron job for agent self-maintenance.
- Frequency: [every run | daily | on-demand]
- Trigger: [automatic at session start | manual invocation | cron]
- Memory files are current (MEMORY/ directory)
- No stale progress files from aborted runs
- Git history is clean -- no uncommitted changes from previous sessions
- Required tools are available (node, git, etc.)
- Dependencies are installed and up to date
- Configuration files are valid (no syntax errors)
- No stuck or orphaned workflow runs
- SQLite tracking database is consistent
- All agents can communicate through their expected channels
- Test pass rate from last N runs: [track and report]
- Average stories per run: [track and report]
- Retry rate: [track -- high retry rate indicates prompt or criteria issues]
If memory files are stale or inconsistent:
- Re-read the current git state
- Update MEMORY/ files to reflect reality
- Log the drift in the action log
If a previous run was interrupted:
- Check for orphaned branches
- Clean up partial progress files
- Reset the workflow state in SQLite
- Document what happened
If dependencies are outdated:
- Run dependency audit
- Flag security vulnerabilities
- Create a story for the update (do not auto-update in maintenance)
After each heartbeat, append a status line to the action log:
[HEARTBEAT] 2026-02-25T10:00:00Z | status: healthy | checks: 4/4 | actions: 0
[HEARTBEAT] 2026-02-25T10:00:00Z | status: degraded | checks: 3/4 | actions: 1 | note: cleaned orphaned branch
If any health check fails repeatedly:
- Log the failure pattern
- Notify the human operator
- Pause automated runs until resolved