feat(supervision): GitHub events watcher (comments/CI/reviews) with CLI filter knobs#38
feat(supervision): GitHub events watcher (comments/CI/reviews) with CLI filter knobs#38e-jung wants to merge 16 commits into
Conversation
Address review findings from the no-mistakes pipeline: - Emit events per-PR (print then advance that PR's seen) instead of buffering all-at-end: a watcher 30s timeout now surfaces partial progress rather than killing the poll with zero output every cycle. - atomic_write (temp + rename) so a read-only/crashing state dir never leaves a partial seen file; bash builtin printf write()s immediately, keeping the per-PR ordering lossless even under SIGKILL. - Fix open_basenames membership (space-padded) so the last open PR is skipped in detect_left_open instead of always re-checked. - usage() stops before set -u so --help no longer leaks code. - Add review-detection and ci-detection tests.
Address the second review pass: - detect_left_open now honors the merge filter (filter merge off suppresses merge/close events instead of always emitting them). - build_seen carries the CI signature forward across a transiently-empty fetch (new commit whose check-runs have not populated), so a later status change still fires instead of being silently dropped. - Header: note --daemon for large fleets (check-script timeout) and that ci reads the Checks API only. - Tests: fix a vacuous config assertion; add regression tests for the merge-filter suppression and the CI carry-forward window.
…docs Address the third review pass: - Distinguish a configured-empty filters= (all filters off -> watcher muted) from a missing key (defaults to all on); previously 'all off' reset to defaults, so the captain could not actually mute the watcher. - Derive the contributor from `gh auth` when unset instead of hardcoding a username in a shared public-repo script; FM_GH_CONTRIBUTOR and the configured value still take precedence. - count_reviews now excludes the contributor's own reviews (keeps bot and maintainer reviews), matching count_comments. - Document state/.github-watch-config and state/.github-watch-seen/ in the AGENTS.md state layout; add a README toolbelt row + env knobs. - Fix a stale test comment (apply_pending -> atomic_write); add a regression test for the all-filters-off mute.
Address the fourth review pass: - Append ?per_page=100 to the comments, reviews, and check-runs API calls. GitHub list endpoints default to 30 items per page, so counts and the CI signature silently capped at 30 on active PRs; per_page=100 lifts that. - The comment event said 'new maintainer comment(s)' but the count includes bot comments (coderabbit, greptile) by design, so relabel to 'comment(s)'. - Update the fake gh in tests to strip the query string before matching.
Address the fifth review pass: - discover_prs skips discovery when the contributor resolves empty (gh missing/unauthed), so an empty --author is never passed to the search (an empty author qualifier would match open PRs across every repo). - atomic_write now stages its temp in a hidden SEEN_DIR/.tmp subdir (same filesystem, so the rename stays atomic) so a crash-leaked temp never matches detect_left_open's glob and cause a duplicate merge/close event. - Tests pin the contributor via FM_GH_CONTRIBUTOR so they no longer depend on the fake gh implementing `api user`.
Address the sixth review pass: - discover_prs passes --limit 1000 so open PRs beyond the gh search default of 30 are still discovered (the header advertises large-fleet use). - Document that comment/review/check-run counts cap at 100 per type per PR (per_page=100, no pagination) alongside the existing Checks-API caveat.
…count Address the seventh review pass: - detect_left_open now treats only MERGED as terminal. CLOSED PRs are re-probed each cycle, so a close->reopen->merge between polls still emits MERGED instead of being swallowed. Repeat CLOSED events are suppressed by skipping when p_state equals seen_state. - cmd_status excludes the .tmp staging subdir so leaked temps never inflate the 'seen PRs' count; detect_left_open also skips *.tmp.* defensively. - Add a regression test for the close->merge path.
Address the eighth review pass: - build_seen carries the prior state forward when pr_state returns empty (transient gh failure), matching the ci/comments/reviews carry-forward, so a single failed state fetch never erases the tracked OPEN/CLOSED/MERGED. - detect_left_open skips the rewrite entirely when p_state is empty, so it cannot clobber a real state with an empty value and trigger a duplicate.
Address the ninth review pass: - A closed_at timestamp (set when CLOSED is emitted) bounds how long a CLOSED PR is re-probed for a close->reopen->merge. Past FM_GH_CLOSE_REPROBE_SECS (default 7200s) it is treated as settled, so accumulated closed PRs cannot push the fleet past GitHub's rate limit. The default window is generous enough that a prompt close->merge still fires.
…SC2015 CI's shellcheck (v0.10.0) flags SC2015 on `A && B || C` patterns that local 0.11.0 missed; the existing repo scripts avoid the pattern. Rewrite the five guard/assert occurrences in the new files to explicit if-statements.
A --once sweep polled each PR serially: up to 5 gh calls per PR, one PR at a time. At the captain's ~22 open PRs that cost ~47s, over the watcher's 30s check-script kill limit, so a daemon check-script plugin got killed every cycle and delivered nothing. Parallelize the per-PR loop in poll_once with a bounded counting semaphore (FM_GH_CONCURRENCY, default 8; >=1, 0/non-numeric falls back to 8). Each worker is a subshell running process_pr and owns its own per-PR seen file (seen_file is keyed by owner/repo/pr), so concurrent seen writes never collide. The losslessness invariant (print before seen) holds per-worker exactly, and each worker emits its whole event block in a single printf (atomic under PIPE_BUF), so stdout lines never interleave. A final wait settles all workers before detect_left_open scans the seen dir. Measured on the live fleet (~22 PRs): 47.6s -> ~8.6s (5.5x), comfortably under both the 15s target and the 30s kill limit. Adds a parallel-mode regression test asserting events still print before seen advances (via the read-only-seen-dir trick) across 12 concurrently-polled PRs, and that workers never cross-contaminate each other's seen files.
Parallelized the per-PR sweep
Measured on the live fleet (~22 open PRs): serial This now fits the watcher's 30s check-script kill limit with comfortable headroom, so the script is finally usable as a daemon Tests: extended No behavioral changes beyond speed — same filters, same seen-state shape, same CLI. $FM_GH_CONCURRENCY tunes the worker cap (0/non-numeric falls back to 8). |
…ition
The ci filter built a sorted multiset of every check-run conclusion and fired
whenever that multiset changed, so a PR whose checks complete at staggered
times fired once per check landing (observed live: no-mistakes#312 fired
several times as its 7 check-runs trickled in). The captain wants "is the PR
green or not", not per-check noise.
Replace the multiset with a single rolled-up overall state per PR, computed in
one jq pass over the check-runs:
success every non-neutral check passed (success/skipped), none still running
failure at least one non-neutral check failed (failure/timed_out/cancelled/
action_required/stale)
pending at least one non-neutral check is still queued/in_progress
neutral only neutral check-runs present
(empty) no check-runs reported yet
Fire one event only when this state flips vs the seen marker
(`CI: <repo>#<pr> -> green|failure|pending|neutral`); stay silent while it is
unchanged. Failure beats pending (a red check already settles the outcome),
matching GitHub's own combined-status precedence.
Losslessness is preserved exactly: the seen marker is still written AFTER the
event prints, the empty-response carry-forward for a just-pushed SHA whose
checks have not populated yet still holds (so a later transition fires), and
per-PR seen files stay independently owned by each parallel worker. The bounded
concurrent poll is untouched.
The test fake gh now runs the watcher's real --jq roll-up over JSON check-run
fixtures (via jq), so the roll-up logic itself is exercised, not just the
comparison. New tests cover the staggered-checks debounce (7 checks trickling
pending->success fires exactly once, not once per check), the
pending->green / green->green / green->failure transitions, and roll-up
precedence (failure beats pending; neutral checks ignored). Existing CI tests
moved to the rolled-up state model; the parallel losslessness test is unchanged
and still passes.
Measured on the live fleet (23 open PRs): --once = 7.74s, well under the 15s
target and the 30s check-script kill limit.
… use Two refinements that make ghwatch safe to leave wired in as a daemon check-script plugin, plus strict-mode hardening. 1. Silent re-baseline on seen-schema migration. A SEEN_SCHEMA version is now stamped into each seen file. When a prior file's schema does not match the current version, the first poll rewrites it at the current schema with carried-forward values and emits NOTHING -- so deploying a schema change (e.g. the ci debounce) no longer floods once as every PR appears to "transition" off the old format. Applied in both process_pr (open PRs) and detect_left_open (PRs that left the open set). Only subsequent real transitions fire. 2. Exclude FM_GH_IGNORE_CHECKS names from the CI roll-up (default: the known fork-routing signature gap #293, "PR must be raised via no-mistakes"). A PR that fails only that check now rolls up to green when its real checks pass, instead of a false failure. Only the roll-up applies the filter; the raw check list and the other filters are unchanged. Set FM_GH_IGNORE_CHECKS to a custom regex, or empty to disable. The regex is embedded into the jq program (gh api has no --arg binding for --jq); a malformed regex fails open to empty (carried forward), never crashing the poll. Also adopt `set -euo pipefail`; the fail-open design is preserved via the existing `|| true` / `|| return 0` guards (full suite green under strict mode). Tests: silent-baseline-on-migration (no flood on schema change; a real transition still fires afterward); gap-excluded PR rolls up green while a real failure still surfaces. All 17 prior ghwatch tests stay green.
|
Update: two refinements that make 1. Silent re-baseline on seen-schema migrationA Regression test: 2. Exclude
|
…sing them as data ghc() did `command gh "$@" 2>/dev/null || true`: it swallowed stderr and the exit code, but on an API error (e.g. a transient 401 "Bad credentials") gh api writes the error JSON to stdout — bypassing --jq — which the script then parsed as CI data and fired as an event (observed: "CI: manaflow-ai/cmux#6570 -> { \"message\": \"Bad credentials\" ... }"). Detect the error and SKIP that PR for the cycle: never surface an API error as an event. ghc() now captures gh's stdout and returns non-zero (suppressing the body) when gh exits non-zero OR its output is a GitHub error body (a JSON object with top-level "message" + "documentation_url"). process_pr treats any non-zero probe return as "skip this PR this cycle": emit nothing, do not write seen, so the next cycle re-evaluates from the same baseline (lossless). Also guards the discovery fetch (abort the cycle on failure instead of misreading an empty list as "everyone merged") and get_contributor's best-effort user lookup (no set -e trip on a blip). Tests: new test injects a 401 error JSON via the fake gh (verified it fails on the old ghc with the exact bogus CI event, passes with the fix) plus a recovery cycle proving the real transition still fires. shellcheck clean; all 20 green.
Summary
Adds
bin/fm-github-watch.sh, a standalone GitHub events poller that surfaces new comments, CI status changes, reviews, and merge/close transitions on the contributor's open PRs into the same batched-escalation path the watcher already uses. Closes the gap where firstmate is blind to maintainer comments, review submissions, and CI state unless the captain manually points at them.Design
A standalone check script that feeds the existing wake queue:
gh search prs --author=<contributor> --state=open --limit 1000finds every open PR across all repos. The contributor is configurable (contributor <user>,FM_GH_CONTRIBUTORenv, or derived fromgh auth— no hardcoded default in a shared tool).comments(non-contributor, high-water),ci(check-runs signature change, carried forward across a new commit's transient empty window),reviews(non-contributor, high-water),merge(open→merged/closed, with CLOSED treated as non-terminal so close→reopen→merge still fires).state/*.check.shthe existingbin/fm-watch.shalready sweeps everyFM_CHECK_INTERVAL, or as--daemonfor very large fleets (the watcher's 30s check-script timeout is documented).CLI knobs
Losslessness
Per-PR ordering: events are emitted (bash's builtin
printfwrite()s to the capture pipe immediately) before that PR's seen marker advances. A crash or a failing seen write (atomic temp + rename) at worst causes a redundant re-detect next cycle — never a permanent swallow. First sight of a PR baselines silently.Config:
state/.github-watch-config. Seen state:state/.github-watch-seen/<owner>-<repo>-<pr>.Tests
tests/fm-github-watch.test.sh(TAP, 13 cases) with a file-driven fakegh:filter toggling + persistence, silent baseline, comment/review/CI detection, merge via left-open, close→merge non-terminal, CLOSED re-probe window bounding, losslessness under a failing seen write, CI carry-forward across an empty window, merge-filter suppression, all-filters-off mute, config roundtrip.
shellcheck bin/*.sh tests/*.shclean; all existing tests still pass.Notes
no-mistakespipeline (review/test/lint/document all passed); the pipeline's final push step targeted the parent repo due to a fork-routing gap, so the branch is pushed to the fork directly here.per_page=100). Both caveats are documented in the script header.🤖 AI disclosure: Human-reviewed — generated with AI assistance and reviewed/iterated through the no-mistakes pipeline plus multiple human-directed review passes.