Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ state/ volatile runtime signals; gitignored
.wake-queue durable queued wakes: epoch<TAB>seq<TAB>kind<TAB>key<TAB>payload
.afk durable away-mode flag; present = sub-supervisor may inject escalations (set by /afk, cleared on user return)
.watch.lock .wake-queue.lock watcher singleton and queue serialization locks
.hash-* .count-* .stale-* .seen-* .last-* .heartbeat-streak watcher internals; never touch
.hash-* .count-* .stale-* .seen-* .babysit-* .escalated-* .last-* .heartbeat-streak watcher internals; never touch
.last-watcher-beat watcher liveness beacon, touched every poll; fm-guard.sh reads it
.subsuper-* .supervise-daemon.* sub-supervisor internals (stale markers, escalation buffer, inject-wedged marker, seen-status dedup, log, lock, pid); never touch
.no-mistakes/ local validation state and evidence; gitignored
Expand Down Expand Up @@ -439,7 +439,7 @@ Use chat for yes/no decisions; use lavish-axi when there are multiple findings o
For PR-based ship tasks, the ready signal depends on mode: `no-mistakes` reports `done: PR <url> checks green` after CI is green, while `direct-PR` reports `done: PR <url>` after opening the PR.
Run `bin/fm-pr-check.sh <id> <PR url>` - it records `pr=` in the task's meta and arms the watcher's merge poll.
Tell the captain: the PR's full URL (always the complete `https://...` link, never a bare `#number` - the captain's terminal makes a full URL clickable), a one-paragraph summary, and, for `no-mistakes`, the risk level it emitted.
(The check contract, for any custom `state/<id>.check.sh` you write yourself: print one line only when firstmate should wake, print nothing otherwise, and finish before `FM_CHECK_TIMEOUT`.)
(The check contract, for any custom `state/<id>.check.sh` you write yourself: prefer printing the current terminal state every run, such as `echo "merged"` while merged. The watcher dedups repeated output with `.seen-check-<name>` and enqueues to the durable queue before advancing that marker, so a lost stdout or crashed watcher cannot swallow a wake. Edge-triggered checks that self-suppress with `.babysit-*.seen` still work, but a swallowed terminal transition then relies on the watcher catch-all backstop. Finish before `FM_CHECK_TIMEOUT`.)

If the captain says "merge it", run `gh-axi pr merge` yourself; that instruction is the explicit approval. If `yolo=on`, merge a green/approved PR yourself and post the required FYI.

Expand Down Expand Up @@ -485,7 +485,7 @@ From there the task is an ordinary ship task through its mode-specific validatio
The watcher is the backbone.
Whenever at least one task is in flight, `bin/fm-watch.sh` must be running as a background task.
It costs zero tokens while running and exits with one reason line when something needs you.
It also writes each detected wake to the durable queue at `state/.wake-queue` before advancing suppression markers such as `.seen-*`, `.stale-*`, `.last-check`, or `.last-heartbeat`.
It also writes each detected wake to the durable queue at `state/.wake-queue` before advancing suppression markers such as `.seen-*`, `.stale-*`, `.seen-check-*`, `.escalated-*`, `.last-check`, or `.last-heartbeat`.
At the start of every wake-handling turn and every recovery turn, run `bin/fm-wake-drain.sh` before peeking panes, reading status files beyond the reason line, or starting new work.
The printed one-shot reason line is still useful, but the drained queue is the lossless backlog.
After handling drained wakes, re-arm `bin/fm-watch.sh` before you end the turn.
Expand Down
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,7 @@ firstmate works from any terminal - outside tmux, crewmates land in a detached `

- **Event-driven supervision** - a zero-token bash watcher (`bin/fm-watch.sh`) sleeps on the fleet and wakes the first mate only when a crewmate reports, stalls, a PR merges, or an internal heartbeat review is due.
Detected wakes are also written to a durable local queue (`state/.wake-queue`) before detector state advances, so a missed one-shot process exit can be recovered by draining the queue.
Custom slow checks should print their current terminal state idempotently; the watcher dedups repeated output and keeps a catch-all backstop for legacy self-suppressing `.babysit-*.seen` checks.
Routine watcher polling, restarts, elapsed waiting time, and unchanged heartbeat reviews stay silent; an idle crew costs you nothing.
A pull-based guard (`bin/fm-guard.sh`) warns through supervision tool output if tasks are in flight and that watcher stops running or queued wakes are waiting to be drained.
A presence-gated sub-supervisor (`bin/fm-supervise-daemon.sh`) extends this for walk-away supervision: the `/afk` skill activates it, after which it self-handles routine wakes in bash and escalates only captain-relevant events as one batched, single-line digest (prefixed with an in-band sentinel marker so firstmate can tell daemon injections apart from real messages).
Expand Down Expand Up @@ -229,20 +230,20 @@ Tracked changes to firstmate itself, including `AGENTS.md`, `README.md`, `CONTRI
When supervising live crewmates, keep long validation or build work in the background so watcher wakes can still be handled.
Human-authored pull requests targeting `main` must be raised through `git push no-mistakes`; see `CONTRIBUTING.md` for the enforced contributor workflow.
Local `.no-mistakes/` state and test evidence stay out of this repo; `.no-mistakes.yaml` keeps evidence in a temp directory instead.
The current watcher reliability work keeps the one-shot process model and adds a durable queue plus singleton lock.
The current watcher reliability work keeps the one-shot process model and adds a durable queue, singleton lock, and lossless check-output dedup.
The presence-gated sub-supervisor (`bin/fm-supervise-daemon.sh`) provides proactive wake routing for walk-away supervision via the `/afk` skill; a blocking-waiter split remains a deferred follow-up phase.

```sh
bash -n bin/*.sh # syntax-check the toolbelt
shellcheck bin/*.sh tests/*.sh # lint the toolbelt and behavior tests; CI enforces this
for test_script in tests/*.test.sh; do "$test_script"; done # behavior tests, matching CI
tests/fm-wake-queue.test.sh # durable wake queue, singleton behavior, sub-supervisor classifier, /afk presence-gating, border-aware composer, max-defer, and fm-send submit tests
tests/fm-wake-queue.test.sh # durable wake queue, lossless check dedup, singleton behavior, sub-supervisor classifier, /afk presence-gating, border-aware composer, max-defer, and fm-send submit tests
tests/fm-composer-ghost.test.sh # dim-ghost stripping, ghost-only composer detection, and escape-free peek tests
tests/fm-afk-inject-e2e.test.sh # private-socket end-to-end test of the afk injection path (partial-input deferral, swallowed-Enter retry)
tests/fm-bootstrap.test.sh # bootstrap dependency and feature-probe tests
tests/fm-update.test.sh # fast-forward-only self-update, reread, nudge, dedup, and skip-safety tests
tests/fm-secondmate.test.sh # persistent secondmate routing, seeding, idle charter, backlog handoff, spawn, recovery, teardown, and FM_HOME tests
tests/fm-teardown.test.sh # fm-teardown.sh safety and reminder checks: local-only fork-remote allow, truly-unpushed refuse, merged-to-main allow, no-mistakes regression, tasks-axi reminder, --force override
tests/fm-teardown.test.sh # fm-teardown.sh safety and reminder checks: local-only fork-remote allow, truly-unpushed refuse, merged-to-main allow, no-mistakes regression, tasks-axi reminder, check-dedup cleanup, --force override
[ "$(readlink CLAUDE.md)" = "AGENTS.md" ]
[ "$(readlink .claude/skills)" = "../.agents/skills" ]
FM_HEARTBEAT=2 FM_POLL=1 bin/fm-watch.sh # watcher smoke test (prints "heartbeat")
Expand Down
7 changes: 4 additions & 3 deletions bin/fm-pr-check.sh
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
#!/usr/bin/env bash
# Record a PR-ready task: appends pr=<url> to state/<id>.meta and arms the
# watcher's merge poll by writing state/<id>.check.sh, which prints one line iff
# the PR is merged (the watcher's check contract: output = wake firstmate,
# silence = keep sleeping).
# watcher's merge poll by writing state/<id>.check.sh. Once the PR is merged,
# the check prints the current terminal state every run; the watcher dedups the
# repeated output and enqueues the first delta before advancing suppression.
# Silence means "no current terminal state; keep sleeping."
# Usage: fm-pr-check.sh <task-id> <pr-url>
set -eu

Expand Down
19 changes: 18 additions & 1 deletion bin/fm-teardown.sh
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,23 @@ meta_value() {
grep "^$key=" "$meta" | cut -d= -f2- || true
}

sanitize_state_name() { printf '%s' "$1" | LC_ALL=C tr -c 'A-Za-z0-9._-' '_'; }

cleanup_task_state() {
local state=$1 id=$2 check_name sidecar_name
check_name=$(sanitize_state_name "$id.check.sh")
sidecar_name=$(sanitize_state_name ".babysit-$id.seen")
rm -f \
"$state/$id.status" \
"$state/$id.turn-ended" \
"$state/$id.check.sh" \
"$state/$id.meta" \
"$state/$id.pi-ext.ts" \
"$state/.seen-check-$check_name" \
"$state/.babysit-$id.seen" \
"$state/.escalated-$sidecar_name"
}

backlog_refresh_reminder() {
local pr done_cmd report_path
if fm_tasks_axi_compatible; then
Expand Down Expand Up @@ -479,7 +496,7 @@ if [ "$KIND" = secondmate ]; then
remove_firstmate_home "$HOME_PATH" "secondmate home" "$ID"
remove_secondmate_registry_entry "$ID"
fi
rm -f "$STATE/$ID.status" "$STATE/$ID.turn-ended" "$STATE/$ID.check.sh" "$STATE/$ID.meta" "$STATE/$ID.pi-ext.ts"
cleanup_task_state "$STATE" "$ID"
if [ "$KIND" != scout ] && [ "$KIND" != secondmate ] && [ "$MODE" != local-only ]; then
"$FM_ROOT/bin/fm-fleet-sync.sh" "$PROJ" || true
fi
Expand Down
61 changes: 59 additions & 2 deletions bin/fm-watch.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@
# signal: <file>... a crewmate wrote a status line or a turn-end hook fired; signals
# landing within FM_SIGNAL_GRACE of each other coalesce into one wake
# stale: <window> a crewmate pane stopped changing and shows no busy signature
# check: <script>: <out> a per-task check script (e.g. merged-PR poll) produced output
# check: <script>: <out> a per-task check produced output (deduped by the watcher;
# enqueued before suppression so the wake is lossless), or the
# catch-all force-escalated a swallowed terminal transition
# heartbeat fleet review due; starts at FM_HEARTBEAT and backs off to FM_HEARTBEAT_MAX
# Run as a background task. Re-arm it after handling each wake; duplicate
# invocations no-op through the watcher singleton lock.
Expand Down Expand Up @@ -101,6 +103,29 @@ recorded_windows() {
done
}

# Collapse a check/sidecar basename into a safe suffix for watcher-side state
# files (.seen-check-*, .escalated-*). LC_ALL=C keeps the mapping byte-stable.
sanitize_name() { printf '%s' "$1" | LC_ALL=C tr -c 'A-Za-z0-9._-' '_'; }

mark_matching_babysit_escalated() {
local c=$1 base id sidecar terminal state ec
base=$(basename "$c")
case "$base" in
*.check.sh) id=${base%.check.sh} ;;
*) return 0 ;;
esac
sidecar="$STATE/.babysit-$id.seen"
[ -e "$sidecar" ] || return 0
terminal=$(cat "$sidecar" 2>/dev/null || true)
state=${terminal%%|*}
case "$state" in
MERGED|CLOSED) ;;
*) return 0 ;;
esac
ec="$STATE/.escalated-$(sanitize_name "$(basename "$sidecar")")"
printf '%s' "$terminal" > "$ec"
}

# Exit reporting a wake. Consecutive heartbeats with no other wake in between
# mean an idle fleet, so the heartbeat interval backs off exponentially
# (base * 2^streak, capped at HEARTBEAT_MAX); any real wake resets the cadence.
Expand Down Expand Up @@ -168,17 +193,49 @@ while :; do
# keeps producing signals - the slow poll (e.g. merge detection) would then
# never run until the fleet went quiet. Checks are due only every
# CHECK_INTERVAL, so most cycles skip this block and fall straight through.
#
# Lossless check wakes: checks used to be the sole wake source whose
# suppression could live inside an opaque script. If an edge-triggered check
# advanced its own .babysit-*.seen marker before stdout became a delivered
# wake, a timeout, crash, or concurrent run could permanently swallow the
# transition. Suppression now lives in the watcher for stateless checks:
# enqueue first, then advance .seen-check-<name>. Old self-suppressing checks
# still work, and the catch-all below force-escalates terminal sidecars.
if [ "$(age_of "$STATE/.last-check")" -ge "$CHECK_INTERVAL" ]; then
for c in "$STATE"/*.check.sh; do
[ -e "$c" ] || continue
out=$(run_check "$c")
if [ -n "$out" ]; then
[ -n "$out" ] || continue
sf="$STATE/.seen-check-$(sanitize_name "$(basename "$c")")"
if [ "$out" != "$(cat "$sf" 2>/dev/null || true)" ]; then
reason="check: $c: $out"
fm_wake_append check "$c" "$reason" || exit 1
# Test-only hook: prove the wake is durable by simulating a crash
# between enqueue and suppress. Never set outside the test suite.
[ -n "${FM_WATCH_BREAK_AFTER_CHECK_ENQUEUE:-}" ] && exit 99
mark_matching_babysit_escalated "$c"
printf '%s' "$out" > "$sf"
touch "$STATE/.last-check"
wake "$reason"
fi
done

for sf in "$STATE"/.babysit-*.seen; do
[ -e "$sf" ] || continue
terminal=$(cat "$sf" 2>/dev/null || true)
state=${terminal%%|*}
case "$state" in
MERGED|CLOSED) ;;
*) continue ;;
esac
ec="$STATE/.escalated-$(sanitize_name "$(basename "$sf")")"
[ "$terminal" = "$(cat "$ec" 2>/dev/null || true)" ] && continue
reason="check: catch-all: $sf: $state"
fm_wake_append check "$sf" "$reason" || exit 1
printf '%s' "$terminal" > "$ec"
touch "$STATE/.last-check"
wake "$reason"
done
touch "$STATE/.last-check"
fi

Expand Down
26 changes: 26 additions & 0 deletions tests/fm-teardown.test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -273,10 +273,36 @@ test_local_only_force_overrides_unpushed() {
pass "local-only worktree with unpushed work is torn down under --force (escape hatch)"
}

test_teardown_removes_check_dedup_state() {
local case_dir rc
case_dir=$(make_case check-dedup-cleanup)
write_meta "$case_dir" no-mistakes ship
wt_commit "$case_dir" "shippable work"
git -C "$case_dir/wt" push -q origin fm/task-x1
git -C "$case_dir/project" fetch -q origin
printf '#!/usr/bin/env bash\nprintf merged\\n\n' > "$case_dir/state/task-x1.check.sh"
printf 'merged\n' > "$case_dir/state/.seen-check-task-x1.check.sh"
printf 'MERGED|https://example.test/pr/1\n' > "$case_dir/state/.babysit-task-x1.seen"
printf 'MERGED|https://example.test/pr/1\n' > "$case_dir/state/.escalated-.babysit-task-x1.seen"

set +e
run_teardown "$case_dir" > "$case_dir/stdout" 2> "$case_dir/stderr"
rc=$?
set -e

expect_code 0 "$rc" "check-cleanup: teardown should succeed"
[ ! -e "$case_dir/state/task-x1.check.sh" ] || fail "check-cleanup: check script remained"
[ ! -e "$case_dir/state/.seen-check-task-x1.check.sh" ] || fail "check-cleanup: seen-check marker remained"
[ ! -e "$case_dir/state/.babysit-task-x1.seen" ] || fail "check-cleanup: babysit sidecar remained"
[ ! -e "$case_dir/state/.escalated-.babysit-task-x1.seen" ] || fail "check-cleanup: escalated marker remained"
pass "teardown removes watcher check dedup state"
}

test_local_only_fork_remote_allows
test_teardown_prompts_tasks_axi_done_when_compatible
test_local_only_truly_unpushed_refuses
test_local_only_merged_to_local_main_allows
test_no_mistakes_origin_remote_allows
test_no_mistakes_truly_unpushed_refuses
test_local_only_force_overrides_unpushed
test_teardown_removes_check_dedup_state
Loading