Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -355,7 +355,7 @@ Use chat for yes/no decisions; use lavish-axi when there are multiple findings o
For PR-based ship tasks, the ready signal depends on mode: `no-mistakes` reports `done: PR <url> checks green` after CI is green, while `direct-PR` reports `done: PR <url>` after opening the PR.
Run `bin/fm-pr-check.sh <id> <PR url>` - it records `pr=` in the task's meta and arms the watcher's merge poll.
Tell the captain: the PR's full URL (always the complete `https://...` link, never a bare `#number` - the captain's terminal makes a full URL clickable), a one-paragraph summary, and, for `no-mistakes`, the risk level it emitted.
(The check contract, for any custom `state/<id>.check.sh` you write yourself: print one line only when firstmate should wake, print nothing otherwise, and finish before `FM_CHECK_TIMEOUT`.)
(The check contract, for any custom `state/<id>.check.sh` you write yourself: **print the current state every run** (idempotent), e.g. `echo "merged"` while merged. The watcher dedups against `.seen-check-<name>` and enqueues to the durable queue *before* advancing that marker, so a lost stdout or a crashed watcher can never swallow a wake - the same lossless guarantee signals enjoy. Edge-triggered checks that self-suppress via their own `.babysit-*.seen` are tolerated (empty stdout = no wake), but a swallowed transition is only recovered by the watcher's catch-all backstop, so prefer the stateless "always print current state" form. Finish before `FM_CHECK_TIMEOUT`.)

If the captain says "merge it", run `gh-axi pr merge` yourself; that instruction is the explicit approval. If `yolo=on`, merge a green/approved PR yourself and post the required FYI.

Expand Down Expand Up @@ -389,7 +389,7 @@ From there the task is an ordinary ship task through its mode-specific validatio
The watcher is the backbone.
Whenever at least one task is in flight, `bin/fm-watch.sh` must be running as a background task.
It costs zero tokens while running and exits with one reason line when something needs you.
It also writes each detected wake to the durable queue at `state/.wake-queue` before advancing suppression markers such as `.seen-*`, `.stale-*`, `.last-check`, or `.last-heartbeat`.
It also writes each detected wake to the durable queue at `state/.wake-queue` before advancing suppression markers such as `.seen-*`, `.stale-*`, `.seen-check-*`, `.escalated-*`, `.last-check`, or `.last-heartbeat`.
At the start of every wake-handling turn and every recovery turn, run `bin/fm-wake-drain.sh` before peeking panes, reading status files beyond the reason line, or starting new work.
The printed one-shot reason line is still useful, but the drained queue is the lossless backlog.
After handling drained wakes, re-arm `bin/fm-watch.sh` before you end the turn.
Expand Down
58 changes: 56 additions & 2 deletions bin/fm-watch.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@
# signal: <file>... a crewmate wrote a status line or a turn-end hook fired; signals
# landing within FM_SIGNAL_GRACE of each other coalesce into one wake
# stale: <window> a crewmate pane stopped changing and shows no busy signature
# check: <script>: <out> a per-task check script (e.g. merged-PR poll) produced output
# check: <script>: <out> a per-task check produced output (deduped by the watcher;
# enqueued before suppression so the wake is lossless), or the
# catch-all force-escalated a swallowed terminal transition
# heartbeat fleet review due; starts at FM_HEARTBEAT and backs off to FM_HEARTBEAT_MAX
# Run as a background task. Re-arm it after handling each wake; duplicate
# invocations no-op through the watcher singleton lock.
Expand Down Expand Up @@ -72,6 +74,10 @@ hash_pane() {
if command -v md5 >/dev/null 2>&1; then md5 -q; else md5sum | cut -d' ' -f1; fi
}

# Collapse a check/sidecar basename into a safe suffix for watcher-side state
# files (.seen-check-*, .escalated-*). LC_ALL=C so the complement is byte-stable.
sanitize_name() { printf '%s' "$1" | LC_ALL=C tr -c 'A-Za-z0-9._-' '_'; }

# Exit reporting a wake. Consecutive heartbeats with no other wake in between
# mean an idle fleet, so the heartbeat interval backs off exponentially
# (base * 2^streak, capped at HEARTBEAT_MAX); any real wake resets the cadence.
Expand Down Expand Up @@ -139,17 +145,65 @@ while :; do
# keeps producing signals - the slow poll (e.g. merge detection) would then
# never run until the fleet went quiet. Checks are due only every
# CHECK_INTERVAL, so most cycles skip this block and fall straight through.
#
# LOSSLESS CHECK WAKES (the #29 invariant, ported to checks). Checks were the
# sole wake source whose suppression lived inside an opaque script: an
# edge-triggered check advanced its own .babysit-*.seen marker BEFORE the
# print could become a wake, so a lost stdout (timeout / concurrent run /
# crash) permanently swallowed the transition - the root cause of the missed
# PR #3095 merge. Suppression now lives HERE, in the watcher, with
# enqueue-before-suppress - exactly the pattern scan_signals uses:
# * the check always prints its current state (idempotent); the watcher
# dedups against .seen-check-<name> and only wakes on a delta;
# * fm_wake_append (durable queue) happens BEFORE the .seen-check marker
# advances, so a crash between detect and suppress leaves the wake in the
# queue (recovered next turn) and the marker un-advanced (re-detected
# next cycle). A lost check wake is now impossible.
# Backward-compatible with old edge-triggered checks: empty stdout never
# produces a wake, so they keep their quiet behavior. Any transition they
# swallow is caught by the catch-all scan at the end of this block.
if [ "$(age_of "$STATE/.last-check")" -ge "$CHECK_INTERVAL" ]; then
for c in "$STATE"/*.check.sh; do
[ -e "$c" ] || continue
out=$(run_check "$c")
if [ -n "$out" ]; then
[ -n "$out" ] || continue
sf="$STATE/.seen-check-$(sanitize_name "$(basename "$c")")"
if [ "$out" != "$(cat "$sf" 2>/dev/null || true)" ]; then
reason="check: $c: $out"
fm_wake_append check "$c" "$reason" || exit 1
# Test-only hook: prove the wake is durable by simulating a crash
# between enqueue and suppress. Never set outside the test suite.
[ -n "${FM_WATCH_BREAK_AFTER_CHECK_ENQUEUE:-}" ] && exit 99
printf '%s' "$out" > "$sf"
touch "$STATE/.last-check"
wake "$reason"
fi
done

# Catch-all backstop: force-escalate any edge-triggered check whose own
# .babysit-*.seen sidecar shows a terminal state the watcher never
# delivered a wake for. This catches a swallowed transition (sidecar
# advanced inside the script but stdout lost) within one sweep - the
# belt-and-suspenders safety net for checks that have not migrated to the
# lossless "always print current state" contract. Deduped via
# .escalated-<sidecar> so each terminal transition fires at most once.
for sf in "$STATE"/.babysit-*.seen; do
[ -e "$sf" ] || continue
terminal=$(cat "$sf" 2>/dev/null || true)
state=${terminal%%|*}
case "$state" in
MERGED|CLOSED) ;;
*) continue ;;
esac
ec="$STATE/.escalated-$(sanitize_name "$(basename "$sf")")"
[ "$terminal" = "$(cat "$ec" 2>/dev/null || true)" ] && continue
reason="check: catch-all: $sf: $state"
fm_wake_append check "$sf" "$reason" || exit 1
printf '%s' "$terminal" > "$ec"
touch "$STATE/.last-check"
wake "$reason"
done

touch "$STATE/.last-check"
fi

Expand Down
123 changes: 123 additions & 0 deletions tests/fm-wake-queue.test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,126 @@ SH
pass "check output is queued before cadence suppression"
}

# Regression for the missed PR #3095 merge: a check wake must survive a
# crash/race between enqueue and suppress. The watcher enqueues to the durable
# queue BEFORE advancing the .seen-check marker, so even if the marker is never
# advanced (simulated here via FM_WATCH_BREAK_AFTER_CHECK_ENQUEUE), the wake is
# already in the queue and is re-detected next cycle.
test_check_wake_survives_lost_delivery() {
local dir state fakebin out drain_out check_file seen_check
dir=$(make_case check-lossless)
state="$dir/state"
fakebin="$dir/fakebin"
out="$dir/watch.out"
drain_out="$dir/drain.out"
check_file="$state/pr-3095.check.sh"
seen_check="$state/.seen-check-pr-3095.check.sh"
cat > "$check_file" <<'SH'
#!/usr/bin/env bash
printf 'merged: https://example.test/pr/3095\n'
SH
chmod +x "$check_file"
# Cycle 1: enqueue, then simulate a crash before the suppression marker
# advances. The wake is already durable; the marker is still absent.
PATH="$fakebin:$PATH" FM_WATCH_BREAK_AFTER_CHECK_ENQUEUE=1 FM_STATE_OVERRIDE="$state" FM_POLL=1 FM_SIGNAL_GRACE=1 FM_CHECK_INTERVAL=0 FM_HEARTBEAT=999999 "$WATCH" > "$out" 2>/dev/null &
wait_for_exit "$!" 40
local crash_status=$?
[ "$crash_status" -eq 99 ] || fail "crashing watcher did not simulate the post-enqueue crash (exit $crash_status)"
[ ! -e "$seen_check" ] || fail "suppression marker advanced before the wake was durable (enqueue-before-suppress broken)"
# The wake survived the crash in the durable queue.
FM_STATE_OVERRIDE="$state" "$DRAIN" > "$drain_out" || fail "drain after crash failed"
grep "$(printf '\tcheck\t')" "$drain_out" | grep -F "$check_file" | grep -F 'pr/3095' >/dev/null || fail "check wake was lost when delivery failed (the #3095 regression)"
# Cycle 2: re-run normally. The un-advanced marker means the delta is
# re-detected, enqueued again (deduped downstream), and now suppressed.
: > "$out"
PATH="$fakebin:$PATH" FM_STATE_OVERRIDE="$state" FM_POLL=1 FM_SIGNAL_GRACE=1 FM_CHECK_INTERVAL=0 FM_HEARTBEAT=999999 "$WATCH" > "$out" &
wait_for_exit "$!" 40 || fail "watcher did not exit on re-detection"
grep -F "check: $check_file: merged: https://example.test/pr/3095" "$out" >/dev/null || fail "watcher did not re-detect the delta after the crash"
[ -e "$seen_check" ] || fail "suppression marker was not advanced after a clean delivery"
pass "check wake is enqueued before suppression so a lost delivery is recovered"
}

# Dedup: a stateless check that always prints the same state must wake exactly
# once; subsequent identical output is suppressed by the watcher's .seen-check.
test_check_dedup_suppresses_repeats() {
local dir state fakebin out drain_out check_file count
dir=$(make_case check-dedup)
state="$dir/state"
fakebin="$dir/fakebin"
out="$dir/watch.out"
drain_out="$dir/drain.out"
check_file="$state/steady.check.sh"
cat > "$check_file" <<'SH'
#!/usr/bin/env bash
printf 'merged\n'
SH
chmod +x "$check_file"
# Cycle 1: empty -> "merged" is a delta -> wake.
PATH="$fakebin:$PATH" FM_STATE_OVERRIDE="$state" FM_POLL=1 FM_SIGNAL_GRACE=1 FM_CHECK_INTERVAL=0 FM_HEARTBEAT=999999 "$WATCH" > "$out" &
wait_for_exit "$!" 40 || fail "watcher did not exit for first check wake"
grep -Fx "check: $check_file: merged" "$out" >/dev/null || fail "watcher did not wake on first check output"
# Cycle 2: same output -> no delta -> no wake (watcher loops until the
# timeout kills it; the background poll is what lets wait_for_exit return).
: > "$out"
PATH="$fakebin:$PATH" FM_STATE_OVERRIDE="$state" FM_POLL=1 FM_SIGNAL_GRACE=1 FM_CHECK_INTERVAL=0 FM_HEARTBEAT=999999 "$WATCH" > "$out" &
local pid=$!
sleep 2
if ! kill -0 "$pid" 2>/dev/null; then
fail "watcher woke on a duplicate check output (dedup broken)"
fi
kill "$pid" 2>/dev/null || true
wait "$pid" 2>/dev/null || true
# Only the first wake should be in the queue.
FM_STATE_OVERRIDE="$state" "$DRAIN" > "$drain_out" || fail "drain failed"
count=$(grep -c "$(printf '\tcheck\t')" "$drain_out" || true)
[ "$count" -eq 1 ] || fail "expected 1 deduped check wake, got $count"
pass "watcher-side .seen-check dedup suppresses repeated identical check output"
}

# Catch-all backstop: an old edge-triggered check that swallowed its own
# transition (advanced .babysit-*.seen inside the script but lost stdout) is
# force-escalated within one sweep by the watcher's catch-all scanning the
# sidecar. This is the belt-and-suspenders safety net that would have surfaced
# PR #3095 within one scan even without the lossless primary path.
test_catch_all_escalates_swallowed_transition() {
local dir state fakebin out drain_out check_file sidecar escalated
dir=$(make_case check-catchall)
state="$dir/state"
fakebin="$dir/fakebin"
out="$dir/watch.out"
drain_out="$dir/drain.out"
check_file="$state/legacy-pr.check.sh"
sidecar="$state/.babysit-legacy-pr.seen"
escalated="$state/.escalated-.babysit-legacy-pr.seen"
# The check already self-suppressed: its sidecar says MERGED, but the script
# prints nothing (the transition's stdout was lost - exactly the #3095 mode).
printf 'MERGED|comment-a comment-b\n' > "$sidecar"
cat > "$check_file" <<'SH'
#!/usr/bin/env bash
# Edge-triggered babysit that already advanced its own .seen; prints nothing.
exit 0
SH
chmod +x "$check_file"
PATH="$fakebin:$PATH" FM_STATE_OVERRIDE="$state" FM_POLL=1 FM_SIGNAL_GRACE=1 FM_CHECK_INTERVAL=0 FM_HEARTBEAT=999999 "$WATCH" > "$out" &
wait_for_exit "$!" 40 || fail "watcher did not exit for catch-all escalation"
grep -F "catch-all" "$out" >/dev/null || fail "catch-all did not force-escalate the swallowed transition"
grep -F "MERGED" "$out" >/dev/null || fail "catch-all wake did not report the terminal state"
FM_STATE_OVERRIDE="$state" "$DRAIN" > "$drain_out" || fail "drain after catch-all failed"
grep "$(printf '\tcheck\t')" "$drain_out" | grep -F "$sidecar" >/dev/null || fail "catch-all wake was not queued"
[ -e "$escalated" ] || fail "catch-all did not record its dedup marker"
# A second sweep must NOT re-escalate (deduped by .escalated-*).
: > "$out"
PATH="$fakebin:$PATH" FM_STATE_OVERRIDE="$state" FM_POLL=1 FM_SIGNAL_GRACE=1 FM_CHECK_INTERVAL=0 FM_HEARTBEAT=999999 "$WATCH" > "$out" &
local pid=$!
sleep 2
if ! kill -0 "$pid" 2>/dev/null; then
fail "catch-all re-escalated an already-handled terminal state (dedup broken)"
fi
kill "$pid" 2>/dev/null || true
wait "$pid" 2>/dev/null || true
pass "catch-all force-escalates a swallowed terminal transition within one sweep"
}

test_singleton_start() {
local dir state fakebin out1 out2 pid1 pid2 live
dir=$(make_case singleton)
Expand Down Expand Up @@ -329,6 +449,9 @@ test_concurrent_append_and_drain
test_signal_catchup_without_running_watcher
test_stale_enqueue_before_suppressor
test_check_output_is_queued
test_check_wake_survives_lost_delivery
test_check_dedup_suppresses_repeats
test_catch_all_escalates_swallowed_transition
test_singleton_start
test_atomic_double_drain
test_drain_dedupes_obvious_duplicates
Expand Down
Loading