Skip to content

fix(sdlc): close the dead-lane-after-PR loop — merge-watcher MERGED + headless teardown + dispatch idempotency#3831

Merged
ryanklee merged 1 commit into
mainfrom
epsilon/reform-improve-deadlane-closeout-20260601
Jun 1, 2026
Merged

fix(sdlc): close the dead-lane-after-PR loop — merge-watcher MERGED + headless teardown + dispatch idempotency#3831
ryanklee merged 1 commit into
mainfrom
epsilon/reform-improve-deadlane-closeout-20260601

Conversation

@ryanklee
Copy link
Copy Markdown
Collaborator

@ryanklee ryanklee commented Jun 1, 2026

Problem

Three compounding bugs left dead-lane work stranded and stacked zombie launchers (audit wv7nc21ym, witnessed repeatedly this session):

  1. merge-watcher reconcile skipped MERGED. reconcile_stale_pr_states only acted on CLOSED PRs — a pr_open/merge_queue task whose PR merged outside the run_watcher cursor window never self-healed (live: soundcloud fix(music): SoundCloud adapter 401-resilience — fresh client_id scrape; never empty playlist on auth failure #3740, meeting-in-screwm feat(screwm): outbound Meet camera — live render to /dev/video50 #3751 stranded for days). A pr: null + pr_open note matched no reconciler at all.
  2. headless loop had no merge-aware teardown. hapax-claude-headless ran an unconditional while true that re-injected a generic restart prompt forever, never checking whether its task had closed.
  3. no dispatch idempotency. A second hapax-claude-headless for a live lane clobbered the lane-keyed $ROLE.stdin / $ROLE.pid and stacked zombies (the reboot storm; the lane-supervisor firing during a restart-backoff window).

Fix

  1. reconcile_stale_pr_states: MERGED -> cc-close (cursor-window independent); pr: null -> re-derive the PR from branch: via gh pr list --head, write it back and act on its state, or block with a reason. Adds closed/repaired counters.
  2. task_is_terminal decides, before each respawn, whether the task is closed (claim cleared / note left active/ / terminal status) or its PR merged — the loop then exits cleanly and the cleanup trap reaps the pipe + lock. The lane-supervisor brings the lane back read-only into idle-await on its next tick, not a zombie respawn.
  3. A lane-keyed flock makes the wrapper the single launcher per lane; the lock auto-releases on process exit, so the supervisor stays the sole legitimate respawn authority and a dead launcher needs no stale-pid reaping.

The three compose: #1 closes a merged task -> clears its claim -> #2 tears the loop down -> #3 prevents a duplicate from ever racing it.

Tests

  • tests/test_cc_pr_merge_watcher.py: MERGED auto-close, close-failure non-fatal, dry-run, CLOSED-still-blocks, OPEN-left-alone, pr:null repair (rederive+close / no-PR-blocks / no-branch-blocks) — 8 new.
  • tests/scripts/test_hapax_claude_headless.py: duplicate-launcher refusal (flock), lock-acquire-when-free, teardown on claim-clear / terminal-status / PR-merged + source guards — 7 new.

All green; ruff check/format, bash -n, shellcheck clean; adjacent suites (lane-supervisor, methodology-dispatch, lane-watchdog: 58) pass.

Task: reform-improve-deadlane-closeout-20260601
AuthorityCase: CASE-CROSS-RUNTIME-COMMS-001

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Automatic detection and closure of merged pull requests in task management
    • Launch de-duplication to prevent duplicate lane launchers
  • Bug Fixes

    • Improved stale PR state reconciliation and self-healing for orphaned tasks
    • Merge-aware respawn logic to stop retrying terminal tasks
  • Tests

    • Added test coverage for launcher idempotency and merge-aware teardown behavior

…headless teardown + dispatch idempotency)

Three compounding bugs stranded merged work and stacked zombie launchers:

1. merge-watcher reconcile skipped MERGED. reconcile_stale_pr_states only
   acted on CLOSED PRs, so a pr_open/merge_queue task whose PR merged outside
   the run_watcher cursor window never self-healed (soundcloud #3740,
   meeting-in-screwm #3751 stranded for days). Now MERGED auto-closes
   (cursor-window independent) and pr:null re-derives from branch via
   gh pr list --head, or blocks with a reason.

2. headless loop had no merge-aware teardown. The while-true respawn loop
   re-injected a generic restart prompt forever, never checking whether the
   task closed. Added task_is_terminal (claim cleared / note left active/ /
   terminal status / PR merged) so the loop exits cleanly; the lane-supervisor
   then brings the lane back read-only into idle-await, not a zombie respawn.

3. no dispatch idempotency. A second hapax-claude-headless for a live lane
   clobbered the lane-keyed stdin pipe + pidfile and stacked zombies (the
   reboot storm + the supervisor firing during a restart-backoff window). A
   lane-keyed flock makes the wrapper the single launcher per lane; the lock
   auto-releases on exit, so the supervisor stays the sole legitimate respawn
   authority.

Tests: reconcile MERGED auto-close + pr:null repair (8); wrapper teardown on
claim-clear / terminal-status / PR-merged + duplicate-launcher refusal (7).

Task: reform-improve-deadlane-closeout-20260601
AuthorityCase: CASE-CROSS-RUNTIME-COMMS-001

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR expands task-note state reconciliation and adds merge-aware launcher termination detection. The merge watcher gains new query helpers, state-transition logic, and a reworked reconciliation function that scans all active notes, resolves PR states, and repairs null-PR entries. The headless launcher adds per-lane de-duplication via flock and merge-aware task termination to avoid infinite respawn loops. Both systems receive comprehensive test coverage.

Changes

Task PR Lifecycle Reconciliation and Launcher Awareness

Layer / File(s) Summary
Merge watcher PR state query infrastructure
scripts/cc-pr-merge-watcher.py
Module-level regex patterns, _task_id_from_note, _query_pr_state (via gh pr view), and _list_prs_for_branch (via gh pr list --head) to support PR state lookups and re-derivation from task branches.
Merge watcher note state transitions
scripts/cc-pr-merge-watcher.py
_block_stale_note, _close_merged_note, _apply_pr_state, and _repair_pr_null_note handle note-state changes: blocking open notes, closing merged tasks, and auto-repairing pr: null entries by querying the branch.
Merge watcher reconciliation orchestration
scripts/cc-pr-merge-watcher.py
Reworked reconcile_stale_pr_states function performs cursor-independent, full-scope reconciliation: scans all active notes with open-ish status, resolves live PR state, closes merged tasks, blocks closed ones, repairs pr: null notes, and returns expanded outcome counters (scanned, stale, closed, repaired).
Headless launcher de-duplication
scripts/hapax-claude-headless
Per-lane flock-protected launcher.lock and launcher.pid mechanism prevents duplicate live launchers; refuses second launcher when lock is held; cleans up pidfile on exit.
Headless merge-aware task termination detection
scripts/hapax-claude-headless
find_active_note and task_is_terminal functions check claim binding, note status, and live PR state (via gh pr view) to determine if a task is closed or merged; respawn loop breaks instead of re-injecting restart prompts when task becomes terminal.
Merge watcher reconciliation test coverage
tests/test_cc_pr_merge_watcher.py
Test helpers and suites (TestReconcileMerged, TestReconcilePrNullRepair) validate auto-closing of merged PRs, blocking of CLOSED PRs, leaving OPEN PRs unchanged, handling cc-close failures as non-fatal, dry-run behavior, and pr: null repair via branch re-derivation with closure and note update on merged derived PRs.
Headless launcher and merge-aware test coverage
tests/scripts/test_hapax_claude_headless.py
Test helpers (_stub_bin, _headless_env) and suites validate launcher idempotency via flock-based duplicate refusal, successful acquisition when lane is free, and merge-aware respawn termination on claim cleared, terminal note status, and merged PR detection, each confirming single claude invocation and "stopping respawn loop" output.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • hapax-systems/hapax-council#3693: Both PRs modify the stale-PR reconciliation in scripts/cc-pr-merge-watcher.py by scanning active task notes with PR states and transitioning/repairing task notes accordingly.
  • hapax-systems/hapax-council#3632: The merge watcher's new reconciliation calls close_linked_task for merged PRs, and that retrieved PR modifies close_linked_task's subprocess environment for cc-close invocation.

Poem

🐰 The watchers reconcile with grace,
PR states in every task's space,
Launchers lock to guard the way,
No duplicates shall launch today,
And when a PR's merged, we rest—
No endless respawn, just the best!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description comprehensively covers the problem statement, fixes, and test coverage. However, the required AuthorityCase section is incomplete: the Case and Slice fields are filled in the PR objectives but not properly documented in the PR description itself. Ensure the PR description explicitly includes the AuthorityCase section with Case and Slice values formatted as required by the template (e.g., 'Case: CASE-CROSS-RUNTIME-COMMS-001 / Slice: ').
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main changes: fixing three compounding bugs in the dead-lane workflow by adding merge-aware PR state reconciliation, headless task teardown, and launcher idempotency.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch epsilon/reform-improve-deadlane-closeout-20260601

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

stale_counters = reconcile_stale_pr_states(
vault_root=args.vault_root,
repo_root=args.repo_root,
dry_run=args.dry_run,
)

P2 Badge Nudge dispatch after stale PR closes

When a task is closed by the new cursor-independent reconcile_stale_pr_states path (the missed-window pr_open/merge_queue case this change is meant to fix), trigger_reform_dispatch is not called because the only dispatch nudge happens before this call and only checks run_watcher's counters["closed"]. In runs where no in-cursor PRs closed but this reconciler closes a merged PR, the task is moved to closed without the event-driven dispatcher nudge, so dependent work remains idle until the periodic RTE poll instead of advancing immediately.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +506 to +510
if pr_state == "MERGED":
if _close_merged_note(
note, text, pr_num, repo_root=repo_root, dry_run=dry_run, runner=runner
):
counts["closed"] += 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor the hygiene killswitch before closing reconciled tasks

When HAPAX_CC_HYGIENE_OFF=1 is set, run_watcher skips, but main() still calls reconcile_stale_pr_states; with this new MERGED branch that means a supposedly disabled hygiene run can still invoke cc-close and move active task notes to closed/. The file header describes this killswitch as skipping the watcher entirely, so operationally disabling cc hygiene no longer prevents mutations for any active pr_open/merge_queue task whose PR has already merged.

Useful? React with 👍 / 👎.

Comment on lines +91 to +94
if ! flock -n "$LAUNCHER_LOCK_FD"; then
existing="$(cat "$LAUNCHER_PIDFILE" 2>/dev/null || echo '?')"
echo "hapax-claude-headless: refusing duplicate launcher for lane '$ROLE' (live launcher pid=$existing holds $LAUNCHER_LOCK)" >&2
exit 16
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Count a locked launcher as live for supervisor respawns

When the lane supervisor fires while an incumbent wrapper is in restart backoff, scripts/hapax-lane-supervisor still sees the lane as dead because it only checks the child $lane.pid/tmux state; this new lock then makes the duplicate launcher exit 16. The supervisor backgrounds headless respawns and records the attempt anyway, so repeated backoff-window ticks can consume the burst budget and later suppress a real restart even though no replacement launcher was started.

Useful? React with 👍 / 👎.

@ryanklee ryanklee added this pull request to the merge queue Jun 1, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/scripts/test_hapax_claude_headless.py (1)

12-16: ⚡ Quick win

Add explicit encoding for consistency.

The existing code at line 95 uses encoding="utf-8" explicitly, but this helper omits it. For consistency and to avoid locale-dependent behavior:

 def _stub_bin(bin_dir: Path, name: str, body: str) -> None:
     path = bin_dir / name
-    path.write_text("#!/usr/bin/env bash\n" + textwrap.dedent(body))
+    path.write_text("#!/usr/bin/env bash\n" + textwrap.dedent(body), encoding="utf-8")
     path.chmod(0o755)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/scripts/test_hapax_claude_headless.py` around lines 12 - 16, The helper
function _stub_bin writes script files without an explicit encoding; update the
Path.write_text call inside _stub_bin to include encoding="utf-8" so the file is
written deterministically across locales—locate the _stub_bin(bin_dir: Path,
name: str, body: str) function and change the path.write_text invocation to pass
encoding="utf-8" along with the script content.
scripts/hapax-claude-headless (1)

186-194: 💤 Low value

Consider avoiding ls in command substitution for glob matching.

The ls ... | head -n1 pattern is fragile with filenames containing newlines or special characters. A safer alternative uses bash glob expansion directly:

find_active_note() {
  local task="$1" note=""
  if [[ -f "$CC_TASK_ACTIVE/$task.md" ]]; then
    note="$CC_TASK_ACTIVE/$task.md"
  else
    local matches=("$CC_TASK_ACTIVE/$task-"*.md)
    [[ -f "${matches[0]}" ]] && note="${matches[0]}"
  fi
  printf '%s' "$note"
}

Low risk here since task IDs are controlled, but worth aligning with idiomatic bash.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/hapax-claude-headless` around lines 186 - 194, The find_active_note
function uses ls piped to head for glob matching which breaks on special/newline
characters; replace that command-substitution with safe bash glob expansion: in
the else branch create an array like matches=("$CC_TASK_ACTIVE/$task-"*.md) and
if [[ -f "${matches[0]}" ]] set note="${matches[0]}", leaving the existing
if-check for "$CC_TASK_ACTIVE/$task.md" intact so filenames with spaces/newlines
are handled safely; update references to CC_TASK_ACTIVE, task, and note inside
find_active_note accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/cc-pr-merge-watcher.py`:
- Around line 563-565: The regex used in the re.sub call only matches the
literal "null" and misses other YAML null representations; update the
substitution to match the full set of nullish values (use the existing
_PR_NULL_NULLISH token or expand to a non-capturing group like
(?:~|none|null|''|\"\")) so the replacement targets any nullish value; modify
the re.sub invocation that produces new_text (the current line with
re.sub(r"^pr:\s*null\s*$", f"pr: {pr_number}", text, count=1,
flags=re.MULTILINE)) to use the new pattern (e.g.
re.sub(rf"^pr:\s*(?:{_PR_NULL_NULLISH})\s*$", f"pr: {pr_number}", text, count=1,
flags=re.MULTILINE)) so note.write_text and LOG.info behavior remain unchanged.

---

Nitpick comments:
In `@scripts/hapax-claude-headless`:
- Around line 186-194: The find_active_note function uses ls piped to head for
glob matching which breaks on special/newline characters; replace that
command-substitution with safe bash glob expansion: in the else branch create an
array like matches=("$CC_TASK_ACTIVE/$task-"*.md) and if [[ -f "${matches[0]}"
]] set note="${matches[0]}", leaving the existing if-check for
"$CC_TASK_ACTIVE/$task.md" intact so filenames with spaces/newlines are handled
safely; update references to CC_TASK_ACTIVE, task, and note inside
find_active_note accordingly.

In `@tests/scripts/test_hapax_claude_headless.py`:
- Around line 12-16: The helper function _stub_bin writes script files without
an explicit encoding; update the Path.write_text call inside _stub_bin to
include encoding="utf-8" so the file is written deterministically across
locales—locate the _stub_bin(bin_dir: Path, name: str, body: str) function and
change the path.write_text invocation to pass encoding="utf-8" along with the
script content.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: de0df1e4-9354-47b1-af4f-3e7c0b5f3d68

📥 Commits

Reviewing files that changed from the base of the PR and between 3f6bcbc and 88062b8.

📒 Files selected for processing (4)
  • scripts/cc-pr-merge-watcher.py
  • scripts/hapax-claude-headless
  • tests/scripts/test_hapax_claude_headless.py
  • tests/test_cc_pr_merge_watcher.py

Comment on lines +563 to +565
new_text = re.sub(r"^pr:\s*null\s*$", f"pr: {pr_number}", text, count=1, flags=re.MULTILINE)
note.write_text(new_text, encoding="utf-8")
LOG.info("stale PR drain: %s -> re-derived PR #%s from branch %s", note.stem, pr_number, branch)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Regex only handles pr: null, not other nullish values.

The substitution pattern ^pr:\s*null\s*$ only matches the literal string "null", but _PR_NULL_NULLISH includes other YAML null representations like ~, none, and empty string. If a note has pr: ~, the PR number won't be written back correctly.

Proposed fix
-    new_text = re.sub(r"^pr:\s*null\s*$", f"pr: {pr_number}", text, count=1, flags=re.MULTILINE)
+    # Handle all YAML null representations (null, ~, none, empty)
+    new_text = re.sub(r"^pr:\s*(null|~|none|)\s*$", f"pr: {pr_number}", text, count=1, flags=re.MULTILINE)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
new_text = re.sub(r"^pr:\s*null\s*$", f"pr: {pr_number}", text, count=1, flags=re.MULTILINE)
note.write_text(new_text, encoding="utf-8")
LOG.info("stale PR drain: %s -> re-derived PR #%s from branch %s", note.stem, pr_number, branch)
# Handle all YAML null representations (null, ~, none, empty)
new_text = re.sub(r"^pr:\s*(null|~|none|)\s*$", f"pr: {pr_number}", text, count=1, flags=re.MULTILINE)
note.write_text(new_text, encoding="utf-8")
LOG.info("stale PR drain: %s -> re-derived PR #%s from branch %s", note.stem, pr_number, branch)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/cc-pr-merge-watcher.py` around lines 563 - 565, The regex used in the
re.sub call only matches the literal "null" and misses other YAML null
representations; update the substitution to match the full set of nullish values
(use the existing _PR_NULL_NULLISH token or expand to a non-capturing group like
(?:~|none|null|''|\"\")) so the replacement targets any nullish value; modify
the re.sub invocation that produces new_text (the current line with
re.sub(r"^pr:\s*null\s*$", f"pr: {pr_number}", text, count=1,
flags=re.MULTILINE)) to use the new pattern (e.g.
re.sub(rf"^pr:\s*(?:{_PR_NULL_NULLISH})\s*$", f"pr: {pr_number}", text, count=1,
flags=re.MULTILINE)) so note.write_text and LOG.info behavior remain unchanged.

Merged via the queue into main with commit 5078246 Jun 1, 2026
37 checks passed
@ryanklee ryanklee deleted the epsilon/reform-improve-deadlane-closeout-20260601 branch June 1, 2026 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant