Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ state/ volatile runtime signals; gitignored
.wake-queue durable queued wakes: epoch<TAB>seq<TAB>kind<TAB>key<TAB>payload
.afk durable away-mode flag; present = sub-supervisor may inject escalations (set by /afk, cleared on user return)
.watch.lock .wake-queue.lock watcher singleton and queue serialization locks
.hash-* .count-* .stale-* .seen-* .last-* .heartbeat-streak watcher internals; never touch
.hash-* .count-* .stale-* .seen-* .babysit-* .escalated-* .last-* .heartbeat-streak watcher internals; never touch
.last-watcher-beat watcher liveness beacon, touched every poll; fm-guard.sh reads it
.subsuper-* .supervise-daemon.* sub-supervisor internals (stale markers, escalation buffer, seen-status dedup, log, lock, pid); never touch
.no-mistakes/ local validation state and evidence; gitignored
Expand Down Expand Up @@ -425,7 +425,7 @@ Use chat for yes/no decisions; use lavish-axi when there are multiple findings o
For PR-based ship tasks, the ready signal depends on mode: `no-mistakes` reports `done: PR <url> checks green` after CI is green, while `direct-PR` reports `done: PR <url>` after opening the PR.
Run `bin/fm-pr-check.sh <id> <PR url>` - it records `pr=` in the task's meta and arms the watcher's merge poll.
Tell the captain: the PR's full URL (always the complete `https://...` link, never a bare `#number` - the captain's terminal makes a full URL clickable), a one-paragraph summary, and, for `no-mistakes`, the risk level it emitted.
(The check contract, for any custom `state/<id>.check.sh` you write yourself: print one line only when firstmate should wake, print nothing otherwise, and finish before `FM_CHECK_TIMEOUT`.)
(The check contract, for any custom `state/<id>.check.sh` you write yourself: **print the current state every run** (idempotent), e.g. `echo "merged"` while merged. The watcher dedups against `.seen-check-<name>` and enqueues to the durable queue *before* advancing that marker, so a lost stdout or a crashed watcher can never swallow a wake - the same lossless guarantee signals enjoy. Edge-triggered checks that self-suppress via their own `.babysit-*.seen` are tolerated (empty stdout = no wake), but a swallowed transition is only recovered by the watcher's catch-all backstop, so prefer the stateless "always print current state" form. Finish before `FM_CHECK_TIMEOUT`.)

If the captain says "merge it", run `gh-axi pr merge` yourself; that instruction is the explicit approval. If `yolo=on`, merge a green/approved PR yourself and post the required FYI.

Expand Down Expand Up @@ -470,7 +470,7 @@ From there the task is an ordinary ship task through its mode-specific validatio
The watcher is the backbone.
Whenever at least one task is in flight, `bin/fm-watch.sh` must be running as a background task.
It costs zero tokens while running and exits with one reason line when something needs you.
It also writes each detected wake to the durable queue at `state/.wake-queue` before advancing suppression markers such as `.seen-*`, `.stale-*`, `.last-check`, or `.last-heartbeat`.
It also writes each detected wake to the durable queue at `state/.wake-queue` before advancing suppression markers such as `.seen-*`, `.stale-*`, `.seen-check-*`, `.escalated-*`, `.last-check`, or `.last-heartbeat`.
At the start of every wake-handling turn and every recovery turn, run `bin/fm-wake-drain.sh` before peeking panes, reading status files beyond the reason line, or starting new work.
The printed one-shot reason line is still useful, but the drained queue is the lossless backlog.
After handling drained wakes, re-arm `bin/fm-watch.sh` before you end the turn.
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ firstmate works from any terminal - outside tmux, crewmates land in a detached `

- **Event-driven supervision** - a zero-token bash watcher (`bin/fm-watch.sh`) sleeps on the fleet and wakes the first mate only when a crewmate reports, stalls, a PR merges, or an internal heartbeat review is due.
Detected wakes are also written to a durable local queue (`state/.wake-queue`) before detector state advances, so a missed one-shot process exit can be recovered by draining the queue.
Custom slow checks should print their current state idempotently; the watcher dedups repeated output and keeps a catch-all backstop for legacy self-suppressing `.babysit-*.seen` checks.
Routine watcher polling, restarts, elapsed waiting time, and unchanged heartbeat reviews stay silent; an idle crew costs you nothing.
A pull-based guard (`bin/fm-guard.sh`) warns through supervision tool output if tasks are in flight and that watcher stops running or queued wakes are waiting to be drained.
A presence-gated sub-supervisor (`bin/fm-supervise-daemon.sh`) extends this for walk-away supervision: the `/afk` skill activates it, after which it self-handles routine wakes in bash and escalates only captain-relevant events as one batched, single-line digest (prefixed with an in-band sentinel marker so firstmate can tell daemon injections apart from real messages).
Expand Down
7 changes: 4 additions & 3 deletions bin/fm-pr-check.sh
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
#!/usr/bin/env bash
# Record a PR-ready task: appends pr=<url> to state/<id>.meta and arms the
# watcher's merge poll by writing state/<id>.check.sh, which prints one line iff
# the PR is merged (the watcher's check contract: output = wake firstmate,
# silence = keep sleeping).
# watcher's merge poll by writing state/<id>.check.sh. Once the PR is merged,
# the check prints the current terminal state every run; the watcher dedups the
# repeated output and enqueues the first delta before advancing suppression.
# Silence means "no current terminal state; keep sleeping."
# Usage: fm-pr-check.sh <task-id> <pr-url>
set -eu

Expand Down
21 changes: 19 additions & 2 deletions bin/fm-teardown.sh
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,23 @@ meta_value() {
grep "^$key=" "$meta" | cut -d= -f2- || true
}

sanitize_state_name() { printf '%s' "$1" | LC_ALL=C tr -c 'A-Za-z0-9._-' '_'; }

cleanup_task_state() {
local state=$1 id=$2 check_name sidecar_name
check_name=$(sanitize_state_name "$id.check.sh")
sidecar_name=$(sanitize_state_name ".babysit-$id.seen")
rm -f \
"$state/$id.status" \
"$state/$id.turn-ended" \
"$state/$id.check.sh" \
"$state/$id.meta" \
"$state/$id.pi-ext.ts" \
"$state/.seen-check-$check_name" \
"$state/.babysit-$id.seen" \
"$state/.escalated-$sidecar_name"
}

registry_home_for_line() {
sed -n 's/^[^(]*(home: \([^;)]*\);.*/\1/p'
}
Expand Down Expand Up @@ -346,7 +363,7 @@ cleanup_firstmate_home_children() {
safe_rm_rf_child_worktree "$child_wt" "$child_proj"
fi
fi
rm -f "$sub_state/$child_id.status" "$sub_state/$child_id.turn-ended" "$sub_state/$child_id.check.sh" "$sub_state/$child_id.meta" "$sub_state/$child_id.pi-ext.ts"
cleanup_task_state "$sub_state" "$child_id"
done
}

Expand Down Expand Up @@ -446,7 +463,7 @@ if [ "$KIND" = secondmate ]; then
remove_firstmate_home "$HOME_PATH" "secondmate home" "$ID"
remove_secondmate_registry_entry "$ID"
fi
rm -f "$STATE/$ID.status" "$STATE/$ID.turn-ended" "$STATE/$ID.check.sh" "$STATE/$ID.meta" "$STATE/$ID.pi-ext.ts"
cleanup_task_state "$STATE" "$ID"
if [ "$KIND" != scout ] && [ "$KIND" != secondmate ] && [ "$MODE" != local-only ]; then
"$FM_ROOT/bin/fm-fleet-sync.sh" "$PROJ" || true
fi
Expand Down
58 changes: 56 additions & 2 deletions bin/fm-watch.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@
# signal: <file>... a crewmate wrote a status line or a turn-end hook fired; signals
# landing within FM_SIGNAL_GRACE of each other coalesce into one wake
# stale: <window> a crewmate pane stopped changing and shows no busy signature
# check: <script>: <out> a per-task check script (e.g. merged-PR poll) produced output
# check: <script>: <out> a per-task check produced output (deduped by the watcher;
# enqueued before suppression so the wake is lossless), or the
# catch-all force-escalated a swallowed terminal transition
# heartbeat fleet review due; starts at FM_HEARTBEAT and backs off to FM_HEARTBEAT_MAX
# Run as a background task. Re-arm it after handling each wake; duplicate
# invocations no-op through the watcher singleton lock.
Expand Down Expand Up @@ -101,6 +103,10 @@ recorded_windows() {
done
}

# Collapse a check/sidecar basename into a safe suffix for watcher-side state
# files (.seen-check-*, .escalated-*). LC_ALL=C so the complement is byte-stable.
sanitize_name() { printf '%s' "$1" | LC_ALL=C tr -c 'A-Za-z0-9._-' '_'; }

# Exit reporting a wake. Consecutive heartbeats with no other wake in between
# mean an idle fleet, so the heartbeat interval backs off exponentially
# (base * 2^streak, capped at HEARTBEAT_MAX); any real wake resets the cadence.
Expand Down Expand Up @@ -168,17 +174,65 @@ while :; do
# keeps producing signals - the slow poll (e.g. merge detection) would then
# never run until the fleet went quiet. Checks are due only every
# CHECK_INTERVAL, so most cycles skip this block and fall straight through.
#
# LOSSLESS CHECK WAKES (the #29 invariant, ported to checks). Checks were the
# sole wake source whose suppression lived inside an opaque script: an
# edge-triggered check advanced its own .babysit-*.seen marker BEFORE the
# print could become a wake, so a lost stdout (timeout / concurrent run /
# crash) permanently swallowed the transition - the root cause of the missed
# PR #3095 merge. Suppression now lives HERE, in the watcher, with
# enqueue-before-suppress - exactly the pattern scan_signals uses:
# * the check always prints its current state (idempotent); the watcher
# dedups against .seen-check-<name> and only wakes on a delta;
# * fm_wake_append (durable queue) happens BEFORE the .seen-check marker
# advances, so a crash between detect and suppress leaves the wake in the
# queue (recovered next turn) and the marker un-advanced (re-detected
# next cycle). A lost check wake is now impossible.
# Backward-compatible with old edge-triggered checks: empty stdout never
# produces a wake, so they keep their quiet behavior. Any transition they
# swallow is caught by the catch-all scan at the end of this block.
if [ "$(age_of "$STATE/.last-check")" -ge "$CHECK_INTERVAL" ]; then
for c in "$STATE"/*.check.sh; do
[ -e "$c" ] || continue
out=$(run_check "$c")
if [ -n "$out" ]; then
[ -n "$out" ] || continue
sf="$STATE/.seen-check-$(sanitize_name "$(basename "$c")")"
if [ "$out" != "$(cat "$sf" 2>/dev/null || true)" ]; then
reason="check: $c: $out"
fm_wake_append check "$c" "$reason" || exit 1
# Test-only hook: prove the wake is durable by simulating a crash
# between enqueue and suppress. Never set outside the test suite.
[ -n "${FM_WATCH_BREAK_AFTER_CHECK_ENQUEUE:-}" ] && exit 99
printf '%s' "$out" > "$sf"
touch "$STATE/.last-check"
wake "$reason"
fi
done

# Catch-all backstop: force-escalate any edge-triggered check whose own
# .babysit-*.seen sidecar shows a terminal state the watcher never
# delivered a wake for. This catches a swallowed transition (sidecar
# advanced inside the script but stdout lost) within one sweep - the
# belt-and-suspenders safety net for checks that have not migrated to the
# lossless "always print current state" contract. Deduped via
# .escalated-<sidecar> so each terminal transition fires at most once.
for sf in "$STATE"/.babysit-*.seen; do
[ -e "$sf" ] || continue
terminal=$(cat "$sf" 2>/dev/null || true)
state=${terminal%%|*}
case "$state" in
MERGED|CLOSED) ;;
*) continue ;;
esac
ec="$STATE/.escalated-$(sanitize_name "$(basename "$sf")")"
[ "$terminal" = "$(cat "$ec" 2>/dev/null || true)" ] && continue
reason="check: catch-all: $sf: $state"
fm_wake_append check "$sf" "$reason" || exit 1
printf '%s' "$terminal" > "$ec"
touch "$STATE/.last-check"
wake "$reason"
done

touch "$STATE/.last-check"
fi

Expand Down
26 changes: 26 additions & 0 deletions tests/fm-teardown.test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -240,9 +240,35 @@ test_local_only_force_overrides_unpushed() {
pass "local-only worktree with unpushed work is torn down under --force (escape hatch)"
}

test_teardown_removes_check_dedup_state() {
local case_dir rc
case_dir=$(make_case check-dedup-cleanup)
write_meta "$case_dir" no-mistakes ship
wt_commit "$case_dir" "shippable work"
git -C "$case_dir/wt" push -q origin fm/task-x1
git -C "$case_dir/project" fetch -q origin
printf '#!/usr/bin/env bash\nprintf merged\\n\n' > "$case_dir/state/task-x1.check.sh"
printf 'merged\n' > "$case_dir/state/.seen-check-task-x1.check.sh"
printf 'MERGED|https://example.test/pr/1\n' > "$case_dir/state/.babysit-task-x1.seen"
printf 'MERGED|https://example.test/pr/1\n' > "$case_dir/state/.escalated-.babysit-task-x1.seen"

set +e
run_teardown "$case_dir" > "$case_dir/stdout" 2> "$case_dir/stderr"
rc=$?
set -e

expect_code 0 "$rc" "check-cleanup: teardown should succeed"
[ ! -e "$case_dir/state/task-x1.check.sh" ] || fail "check-cleanup: check script remained"
[ ! -e "$case_dir/state/.seen-check-task-x1.check.sh" ] || fail "check-cleanup: seen-check marker remained"
[ ! -e "$case_dir/state/.babysit-task-x1.seen" ] || fail "check-cleanup: babysit sidecar remained"
[ ! -e "$case_dir/state/.escalated-.babysit-task-x1.seen" ] || fail "check-cleanup: escalated marker remained"
pass "teardown removes watcher check dedup state"
}

test_local_only_fork_remote_allows
test_local_only_truly_unpushed_refuses
test_local_only_merged_to_local_main_allows
test_no_mistakes_origin_remote_allows
test_no_mistakes_truly_unpushed_refuses
test_local_only_force_overrides_unpushed
test_teardown_removes_check_dedup_state
Loading