Skip to content

feat(supervision): GitHub events watcher (comments/CI/reviews) with CLI filter knobs#38

Open
e-jung wants to merge 16 commits into
kunchenguid:mainfrom
e-jung:fm-ghwatch-k6
Open

feat(supervision): GitHub events watcher (comments/CI/reviews) with CLI filter knobs#38
e-jung wants to merge 16 commits into
kunchenguid:mainfrom
e-jung:fm-ghwatch-k6

Conversation

@e-jung

@e-jung e-jung commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds bin/fm-github-watch.sh, a standalone GitHub events poller that surfaces new comments, CI status changes, reviews, and merge/close transitions on the contributor's open PRs into the same batched-escalation path the watcher already uses. Closes the gap where firstmate is blind to maintainer comments, review submissions, and CI state unless the captain manually points at them.

Design

A standalone check script that feeds the existing wake queue:

  • Discovery: gh search prs --author=<contributor> --state=open --limit 1000 finds every open PR across all repos. The contributor is configurable (contributor <user>, FM_GH_CONTRIBUTOR env, or derived from gh auth — no hardcoded default in a shared tool).
  • Filters (each toggleable): comments (non-contributor, high-water), ci (check-runs signature change, carried forward across a new commit's transient empty window), reviews (non-contributor, high-water), merge (open→merged/closed, with CLOSED treated as non-terminal so close→reopen→merge still fires).
  • Integration: run as a state/*.check.sh the existing bin/fm-watch.sh already sweeps every FM_CHECK_INTERVAL, or as --daemon for very large fleets (the watcher's 30s check-script timeout is documented).

CLI knobs

fm-github-watch.sh [--once|--daemon]
fm-github-watch.sh filter list | <name> on|off
fm-github-watch.sh contributor [<user>]
fm-github-watch.sh status

Losslessness

Per-PR ordering: events are emitted (bash's builtin printf write()s to the capture pipe immediately) before that PR's seen marker advances. A crash or a failing seen write (atomic temp + rename) at worst causes a redundant re-detect next cycle — never a permanent swallow. First sight of a PR baselines silently.

Config: state/.github-watch-config. Seen state: state/.github-watch-seen/<owner>-<repo>-<pr>.

Tests

tests/fm-github-watch.test.sh (TAP, 13 cases) with a file-driven fake gh:
filter toggling + persistence, silent baseline, comment/review/CI detection, merge via left-open, close→merge non-terminal, CLOSED re-probe window bounding, losslessness under a failing seen write, CI carry-forward across an empty window, merge-filter suppression, all-filters-off mute, config roundtrip.

shellcheck bin/*.sh tests/*.sh clean; all existing tests still pass.

Notes

  • This branch went through the no-mistakes pipeline (review/test/lint/document all passed); the pipeline's final push step targeted the parent repo due to a fork-routing gap, so the branch is pushed to the fork directly here.
  • The ci filter reads the Checks API only; comment/review/check-run counts cap at 100 per type per PR (per_page=100). Both caveats are documented in the script header.

🤖 AI disclosure: Human-reviewed — generated with AI assistance and reviewed/iterated through the no-mistakes pipeline plus multiple human-directed review passes.

e-jung added 13 commits June 22, 2026 05:02
Address review findings from the no-mistakes pipeline:
- Emit events per-PR (print then advance that PR's seen) instead of
  buffering all-at-end: a watcher 30s timeout now surfaces partial progress
  rather than killing the poll with zero output every cycle.
- atomic_write (temp + rename) so a read-only/crashing state dir never leaves
  a partial seen file; bash builtin printf write()s immediately, keeping the
  per-PR ordering lossless even under SIGKILL.
- Fix open_basenames membership (space-padded) so the last open PR is skipped
  in detect_left_open instead of always re-checked.
- usage() stops before set -u so --help no longer leaks code.
- Add review-detection and ci-detection tests.
Address the second review pass:
- detect_left_open now honors the merge filter (filter merge off suppresses
  merge/close events instead of always emitting them).
- build_seen carries the CI signature forward across a transiently-empty
  fetch (new commit whose check-runs have not populated), so a later status
  change still fires instead of being silently dropped.
- Header: note --daemon for large fleets (check-script timeout) and that ci
  reads the Checks API only.
- Tests: fix a vacuous config assertion; add regression tests for the
  merge-filter suppression and the CI carry-forward window.
…docs

Address the third review pass:
- Distinguish a configured-empty filters= (all filters off -> watcher muted)
  from a missing key (defaults to all on); previously 'all off' reset to
  defaults, so the captain could not actually mute the watcher.
- Derive the contributor from `gh auth` when unset instead of hardcoding a
  username in a shared public-repo script; FM_GH_CONTRIBUTOR and the
  configured value still take precedence.
- count_reviews now excludes the contributor's own reviews (keeps bot and
  maintainer reviews), matching count_comments.
- Document state/.github-watch-config and state/.github-watch-seen/ in the
  AGENTS.md state layout; add a README toolbelt row + env knobs.
- Fix a stale test comment (apply_pending -> atomic_write); add a regression
  test for the all-filters-off mute.
Address the fourth review pass:
- Append ?per_page=100 to the comments, reviews, and check-runs API calls.
  GitHub list endpoints default to 30 items per page, so counts and the CI
  signature silently capped at 30 on active PRs; per_page=100 lifts that.
- The comment event said 'new maintainer comment(s)' but the count includes
  bot comments (coderabbit, greptile) by design, so relabel to 'comment(s)'.
- Update the fake gh in tests to strip the query string before matching.
Address the fifth review pass:
- discover_prs skips discovery when the contributor resolves empty (gh
  missing/unauthed), so an empty --author is never passed to the search
  (an empty author qualifier would match open PRs across every repo).
- atomic_write now stages its temp in a hidden SEEN_DIR/.tmp subdir (same
  filesystem, so the rename stays atomic) so a crash-leaked temp never
  matches detect_left_open's glob and cause a duplicate merge/close event.
- Tests pin the contributor via FM_GH_CONTRIBUTOR so they no longer depend
  on the fake gh implementing `api user`.
Address the sixth review pass:
- discover_prs passes --limit 1000 so open PRs beyond the gh search default
  of 30 are still discovered (the header advertises large-fleet use).
- Document that comment/review/check-run counts cap at 100 per type per PR
  (per_page=100, no pagination) alongside the existing Checks-API caveat.
…count

Address the seventh review pass:
- detect_left_open now treats only MERGED as terminal. CLOSED PRs are
  re-probed each cycle, so a close->reopen->merge between polls still emits
  MERGED instead of being swallowed. Repeat CLOSED events are suppressed by
  skipping when p_state equals seen_state.
- cmd_status excludes the .tmp staging subdir so leaked temps never inflate
  the 'seen PRs' count; detect_left_open also skips *.tmp.* defensively.
- Add a regression test for the close->merge path.
Address the eighth review pass:
- build_seen carries the prior state forward when pr_state returns empty
  (transient gh failure), matching the ci/comments/reviews carry-forward,
  so a single failed state fetch never erases the tracked OPEN/CLOSED/MERGED.
- detect_left_open skips the rewrite entirely when p_state is empty, so it
  cannot clobber a real state with an empty value and trigger a duplicate.
Address the ninth review pass:
- A closed_at timestamp (set when CLOSED is emitted) bounds how long a
  CLOSED PR is re-probed for a close->reopen->merge. Past FM_GH_CLOSE_REPROBE_SECS
  (default 7200s) it is treated as settled, so accumulated closed PRs cannot
  push the fleet past GitHub's rate limit. The default window is generous
  enough that a prompt close->merge still fires.
…SC2015

CI's shellcheck (v0.10.0) flags SC2015 on `A && B || C` patterns that local
0.11.0 missed; the existing repo scripts avoid the pattern. Rewrite the five
guard/assert occurrences in the new files to explicit if-statements.
A --once sweep polled each PR serially: up to 5 gh calls per PR, one PR at a
time. At the captain's ~22 open PRs that cost ~47s, over the watcher's 30s
check-script kill limit, so a daemon check-script plugin got killed every
cycle and delivered nothing.

Parallelize the per-PR loop in poll_once with a bounded counting semaphore
(FM_GH_CONCURRENCY, default 8; >=1, 0/non-numeric falls back to 8). Each
worker is a subshell running process_pr and owns its own per-PR seen file
(seen_file is keyed by owner/repo/pr), so concurrent seen writes never collide.
The losslessness invariant (print before seen) holds per-worker exactly, and
each worker emits its whole event block in a single printf (atomic under
PIPE_BUF), so stdout lines never interleave. A final wait settles all workers
before detect_left_open scans the seen dir.

Measured on the live fleet (~22 PRs): 47.6s -> ~8.6s (5.5x), comfortably under
both the 15s target and the 30s kill limit.

Adds a parallel-mode regression test asserting events still print before seen
advances (via the read-only-seen-dir trick) across 12 concurrently-polled PRs,
and that workers never cross-contaminate each other's seen files.
@e-jung

e-jung commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Parallelized the per-PR sweep

--once now polls PRs concurrently with a bounded counting semaphore (FM_GH_CONCURRENCY, default 8) instead of serially. Each worker owns its own per-PR seen file, so the losslessness invariant (events print before the seen marker advances) holds per-worker exactly; a final wait settles all workers before detect_left_open scans the seen dir.

Measured on the live fleet (~22 open PRs): serial --once was ~47.6s → parallel is ~8.6s (5.5x faster). Stable across repeated runs (8.2–8.9s).

This now fits the watcher's 30s check-script kill limit with comfortable headroom, so the script is finally usable as a daemon state/*.check.sh plugin (previously it was timeout-killed every cycle and delivered nothing).

Tests: extended tests/fm-github-watch.test.sh with a parallel-mode losslessness case (12 concurrently-polled PRs, read-only-seen-dir trick asserts events still print before seen advances, plus no cross-contamination of seen files). shellcheck clean; all suites green.

No behavioral changes beyond speed — same filters, same seen-state shape, same CLI. $FM_GH_CONCURRENCY tunes the worker cap (0/non-numeric falls back to 8).

e-jung added 2 commits June 24, 2026 03:45
…ition

The ci filter built a sorted multiset of every check-run conclusion and fired
whenever that multiset changed, so a PR whose checks complete at staggered
times fired once per check landing (observed live: no-mistakes#312 fired
several times as its 7 check-runs trickled in). The captain wants "is the PR
green or not", not per-check noise.

Replace the multiset with a single rolled-up overall state per PR, computed in
one jq pass over the check-runs:
  success  every non-neutral check passed (success/skipped), none still running
  failure  at least one non-neutral check failed (failure/timed_out/cancelled/
           action_required/stale)
  pending  at least one non-neutral check is still queued/in_progress
  neutral  only neutral check-runs present
  (empty)  no check-runs reported yet
Fire one event only when this state flips vs the seen marker
(`CI: <repo>#<pr> -> green|failure|pending|neutral`); stay silent while it is
unchanged. Failure beats pending (a red check already settles the outcome),
matching GitHub's own combined-status precedence.

Losslessness is preserved exactly: the seen marker is still written AFTER the
event prints, the empty-response carry-forward for a just-pushed SHA whose
checks have not populated yet still holds (so a later transition fires), and
per-PR seen files stay independently owned by each parallel worker. The bounded
concurrent poll is untouched.

The test fake gh now runs the watcher's real --jq roll-up over JSON check-run
fixtures (via jq), so the roll-up logic itself is exercised, not just the
comparison. New tests cover the staggered-checks debounce (7 checks trickling
pending->success fires exactly once, not once per check), the
pending->green / green->green / green->failure transitions, and roll-up
precedence (failure beats pending; neutral checks ignored). Existing CI tests
moved to the rolled-up state model; the parallel losslessness test is unchanged
and still passes.

Measured on the live fleet (23 open PRs): --once = 7.74s, well under the 15s
target and the 30s check-script kill limit.
… use

Two refinements that make ghwatch safe to leave wired in as a daemon
check-script plugin, plus strict-mode hardening.

1. Silent re-baseline on seen-schema migration. A SEEN_SCHEMA version is now
   stamped into each seen file. When a prior file's schema does not match the
   current version, the first poll rewrites it at the current schema with
   carried-forward values and emits NOTHING -- so deploying a schema change
   (e.g. the ci debounce) no longer floods once as every PR appears to
   "transition" off the old format. Applied in both process_pr (open PRs) and
   detect_left_open (PRs that left the open set). Only subsequent real
   transitions fire.

2. Exclude FM_GH_IGNORE_CHECKS names from the CI roll-up (default: the known
   fork-routing signature gap #293, "PR must be raised via no-mistakes"). A PR
   that fails only that check now rolls up to green when its real checks pass,
   instead of a false failure. Only the roll-up applies the filter; the raw
   check list and the other filters are unchanged. Set FM_GH_IGNORE_CHECKS to a
   custom regex, or empty to disable. The regex is embedded into the jq program
   (gh api has no --arg binding for --jq); a malformed regex fails open to empty
   (carried forward), never crashing the poll.

Also adopt `set -euo pipefail`; the fail-open design is preserved via the
existing `|| true` / `|| return 0` guards (full suite green under strict mode).

Tests: silent-baseline-on-migration (no flood on schema change; a real
transition still fires afterward); gap-excluded PR rolls up green while a real
failure still surfaces. All 17 prior ghwatch tests stay green.
@e-jung

e-jung commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Update: two refinements that make fm-github-watch.sh viable to leave wired in as a daemon check-script plugin (pushed in 9fe2029).

1. Silent re-baseline on seen-schema migration

A SEEN_SCHEMA version is now stamped into each seen-state file. When a prior file's schema doesn't match the current version, the first poll rewrites it at the current schema (carrying forward prior values) and emits nothing — so deploying a schema change (e.g. the CI debounce) no longer floods once as every PR appears to "transition" off the old format. Applied in both process_pr (open PRs) and detect_left_open (PRs that left the open set). Only subsequent real transitions fire.

Regression test: test_silent_baseline_on_schema_migration — an old-format seen file (stale multiset ci=, no schema=) baselines silently on first poll, then a real success -> failure transition still fires.

2. Exclude FM_GH_IGNORE_CHECKS from the CI roll-up

New config (default: PR must be raised via no-mistakes) — a regex of check-run names dropped from the roll-up before it's computed. A PR that fails only that known fork-routing signature gap (#293) now rolls up to green when its real checks pass, instead of a false failure. Only the roll-up applies the filter; the raw check list and the other filters (comments/reviews/merge) are unchanged. Set it to a custom regex, or empty to disable.

The regex is embedded into the jq program (escaped for a JSON string literal) because gh api has no --arg binding for its --jq filter; a malformed regex fails open to empty (carried forward), never crashing the poll.

Regression test: test_ci_ignore_excludes_known_gap_check — 3 real checks pass + the gap check fails → rolls up to success; flipping a real check to failure still surfaces failure.

Other

  • Adopted set -euo pipefail; fail-open design preserved via existing || true / || return 0 guards.
  • shellcheck-clean; all 19 ghwatch tests green (17 prior + 2 new).
  • Parallel poll (<15s) and per-PR losslessness unchanged.

AI disclosure: Human-reviewed. The PR must be raised via no-mistakes CI check fails on the fork-routing gap #293 (known); the 3 real checks should pass.

…sing them as data

ghc() did `command gh "$@" 2>/dev/null || true`: it swallowed stderr and the
exit code, but on an API error (e.g. a transient 401 "Bad credentials") gh api
writes the error JSON to stdout — bypassing --jq — which the script then parsed
as CI data and fired as an event (observed:
"CI: manaflow-ai/cmux#6570 -> { \"message\": \"Bad credentials\" ... }").

Detect the error and SKIP that PR for the cycle: never surface an API error as
an event. ghc() now captures gh's stdout and returns non-zero (suppressing the
body) when gh exits non-zero OR its output is a GitHub error body (a JSON object
with top-level "message" + "documentation_url"). process_pr treats any
non-zero probe return as "skip this PR this cycle": emit nothing, do not write
seen, so the next cycle re-evaluates from the same baseline (lossless).

Also guards the discovery fetch (abort the cycle on failure instead of misreading
an empty list as "everyone merged") and get_contributor's best-effort user
lookup (no set -e trip on a blip).

Tests: new test injects a 401 error JSON via the fake gh (verified it fails on
the old ghc with the exact bogus CI event, passes with the fix) plus a recovery
cycle proving the real transition still fires. shellcheck clean; all 20 green.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant