fix(sdlc): reform parity completion — codex-headless + broadcast groups + executor contract + gate-manifest CI + flake-quarantine wiring by ryanklee · Pull Request #3811 · hapax-systems/hapax-council

ryanklee · 2026-06-01T00:56:56Z

Reform parity completion (Phase 6, CASE-CROSS-RUNTIME-COMMS-001)

Closes five wired-but-incomplete parity/queue gaps from the completion audit — all "feature sleeping" or producer-without-consumer.

cc-task: reform-fix-parity-completion-20260531
Parent spec: coordination-reform-master-design-2026-05-30.md (Phase 6 §6)

What changed

hapax-codex-headless — genuine codex exec headless lane (not a tmux pane). JSONL → ~/.cache/hapax/codex-headless/<lane>/output.jsonl, PID file for the staleness watchdog, same identity-env + task-gate + hook wiring as hapax-codex, default-deny like hapax-claude-headless. launch_codex_headless repointed to it; launch_codex_interactive added for the tmux path.
Broadcast groups — expand_recipients gains antigrav / vibe / workers (was ValueError), so every runtime is reachable by a *:group broadcast.
Executor adapter contract — scripts/executor_contract.py (capability registry: modes/profiles/mutates/claims/hooks_wired/headless/read_only). Dispatch consumes supports_route() via a launcher table instead of a hard (platform,mode) if-ladder. New hapax-executor-capabilities probe + hapax-methodology-dispatch --capabilities emit it as JSON; antigrav's IDE hook-gap is now machine-legible.
gate-manifest-check in CI + pre-commit — CI lint step (--skip-claude-settings, self-listed in ci.run_markers); pre-commit hook (--require-claude-settings, the live ~/.claude/settings.json — the most drift-prone wiring CI can't see).
flake-quarantine / bisection write-side — reconcile_flake_quarantines opens/lifts quarantines and writes them on --apply; bisection_plan_for_failed_runs halves failed merge-group batches. Count-freeze names renamed to advisory (DEFAULT_ADVISORY_OPEN_PR_COUNT, ADVISORY_OPEN_PR_COUNT, --advisory-open-pr-count) with deprecated aliases.

Acceptance criteria

hapax-codex-headless exists (codex exec, streamed JSONL + PID) and headless dispatch uses it
expand_recipients supports antigrav/vibe/workers (tested)
Executor adapter contract + capabilities() probe exists; dispatch consumes it
gate-manifest-check runs in CI + pre-commit (live claude-settings via pre-commit)
flake-quarantine/bisection write-side wired into the live autoqueue; count constants renamed to advisory
Ruff + pyrefly clean; new tests for all 5 parts pass

Tests / evidence

tests/shared/test_relay_mq.py, tests/test_merge_queue_lineage.py, tests/test_cc_pr_autoqueue.py, tests/hooks/test_gate_manifest_check.py, tests/test_executor_contract.py — 163 passed in the affected set; full tests/shared sweep 6140 passed (7 pre-existing googleapiclient failures unrelated). ruff check + pyrefly check clean; gate-manifest-check --skip-claude-settings and --require-claude-settings both green.

Deferred (scope)

systemd/units/hapax-cc-pr-autoqueue.service comment fix is outside this task's mutation_scope_refs; the --storm-open-pr-threshold deprecated alias keeps the unit functional.

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Automated flake-quarantine for managing flaky PRs with persisted reconciliation and reporting.
- Capability-aware executor registry and routing, plus a CLI to inspect executor capabilities.
- Headless Codex launch workflow and governed-launch safeguards.
- CI and pre-commit gate-manifest validation to keep hook/capability wiring consistent.
Tests
- Added extensive tests for quarantine behavior, executor capabilities and CLI, relay recipient expansion, headless dispatch, and merge-queue bisection/reconciliation.

…s, executor contract, gate-manifest CI, flake-quarantine wiring Five wired-but-incomplete Phase-6 parity gaps (CASE-CROSS-RUNTIME-COMMS-001), closing "feature sleeping" / producer-without-consumer defects. 1. hapax-codex-headless: genuine `codex exec` headless lane (NOT a tmux pane). Streams JSONL to ~/.cache/hapax/codex-headless/<lane>/output.jsonl, writes a PID file for the staleness watchdog, same identity-env + task-gate + hook wiring as hapax-codex, default-deny like hapax-claude-headless. Repoints launch_codex_headless to it; adds launch_codex_interactive for the tmux path. 2. relay broadcast groups: expand_recipients gains antigrav / vibe / workers so every runtime is reachable by a `*:group` broadcast (was: ValueError). 3. Executor adapter contract: scripts/executor_contract.py defines the common adapter CLI + a capability registry. Dispatch consumes supports_route() via a launcher table instead of a hard (platform,mode) if-ladder. New probe `hapax-executor-capabilities` + `hapax-methodology-dispatch --capabilities` emit it as JSON (antigrav's IDE hook-gap is machine-legible). 4. gate-manifest-check wired into CI (lint step, --skip-claude-settings, self- listed in ci.run_markers) and pre-commit (--require-claude-settings, the live ~/.claude/settings.json — the most drift-prone wiring CI cannot see). 5. flake-quarantine / bisection WRITE side wired into the live autoqueue: reconcile_flake_quarantines opens/lifts quarantines and writes on apply; bisection_plan_for_failed_runs halves failed merge-group batches. Renamed the misleading count-freeze names to advisory (DEFAULT_ADVISORY_OPEN_PR_COUNT, ADVISORY_OPEN_PR_COUNT, --advisory-open-pr-count) with deprecated aliases. Deferred: the systemd cc-pr-autoqueue.service comment is outside this task's mutation_scope_refs; the --storm-open-pr-threshold deprecated alias keeps the unit functional. Task: reform-fix-parity-completion-20260531 Parent spec: coordination-reform-master-design-2026-05-30.md (Phase 6) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai · 2026-06-01T00:57:13Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 348a5fc6-b8d5-4ca5-9edd-f786afc2efac

📥 Commits

Reviewing files that changed from the base of the PR and between b18d3a7 and 1205979.

📒 Files selected for processing (2)

scripts/cc-pr-autoqueue.py
tests/test_cc_pr_autoqueue.py

🚧 Files skipped from review as they are similar to previous changes (2)

tests/test_cc_pr_autoqueue.py
scripts/cc-pr-autoqueue.py

📝 Walkthrough

Walkthrough

This PR integrates five runtime executors (Claude, Codex, Gemini, Vibe, Antigrav) with a shared capability registry, implements flake-quarantine reconciliation for PR admission, adds a Codex headless launcher, and refactors dispatch routing for capability awareness. It also extends relay broadcast groups and updates advisory thresholds for queue throttling.

Changes

Executor Governance & Quarantine Integration

Layer / File(s)	Summary
Gate Manifest CI & Pre-commit Wiring `.github/workflows/ci.yml`, `.pre-commit-config.yaml`, `hooks/gate-manifest.yaml`, `tests/hooks/test_gate_manifest_check.py`	Adds `gate-manifest-check.py` validation steps to CI workflow and pre-commit hooks with `--skip-claude-settings` (CI) and `--require-claude-settings` (local), plus test confirming wiring across all three locations.
Executor Capability Registry & Contract `scripts/executor_contract.py`, `tests/test_executor_contract.py`	Introduces `ExecutorCapabilities` Pydantic model and `EXECUTOR_REGISTRY` with five runtime platforms mapping modes, profiles, governance flags, hook wiring, and headless availability. Provides `capabilities()`, `supports_route()`, and `capabilities_payload()` query functions. Comprehensive tests validate registry structure, derived fields, and launchability predicates.
Executor Capabilities CLI Script `scripts/hapax-executor-capabilities`	New Python CLI outputs executor registry as JSON; supports `--contract` mode for canonical CLI contract, per-platform mode for single executor details, and default mode for full registry. Enables capability-based tooling and dispatch decisions.
Codex Headless Executor Launcher `scripts/hapax-codex-headless`	New Bash script launches Codex in headless non-interactive mode with JSONL event logging. Enforces session/task governance via claim files, prevents credential leakage, manages lifecycle (logging, PID files, concurrency guard), prepends SDLC preamble, and wires hook adapter for session/tool/audit/stop phases and MCP server invocation.
Flake Quarantine Reconciliation Data Model & Logic `shared/merge_queue_lineage.py`, `tests/test_merge_queue_lineage.py`	Adds `QuarantineReconciliation` model and `reconcile_flake_quarantines()` that lifts expired quarantines and opens new ones for PRs exceeding failure thresholds. Introduces `bisection_plan_for_failed_runs()` to group failed merge-group runs and propose per-run bisection splits. Comprehensive tests cover threshold crossing, non-duplication, cooldown lifting, and deterministic ordering.
PR Admission Advisory Threshold Rename `scripts/hapax-pr-admission`	Establishes canonical `ADVISORY_OPEN_PR_COUNT` constant for fleet throttle policy; keeps `AUTO_FREEZE_THRESHOLD` as deprecated alias for backward compatibility.
PR Autoqueue Quarantine & Bisection Integration `scripts/cc-pr-autoqueue.py`, `tests/test_cc_pr_autoqueue.py`	Extends `StormMode` with `recommended_bisections` field computed from recent failed merge-group runs. Integrates `reconcile_flake_quarantines()` into `run_reconciler()` to persist quarantine state, compute newly quarantined/lifted PRs, and emit quarantine status in JSON output. Updates CLI to accept `--advisory-open-pr-count` (with `--storm-open-pr-threshold` as deprecated alias). Test verifies quarantine persistence across reconciliation runs.
Relay MQ Broadcast Group Expansion `shared/relay_mq.py`, `tests/shared/test_relay_mq.py`	Adds lane-name predicate functions to classify peers across all runtimes. Refactors `expand_recipients()` to use new predicates for `claude`, `codex`, `gemini` groups; introduces `:antigrav`, `:vibe`, and `*:workers` broadcast groups. Tests verify correct recipient expansion and exclusion of non-worker coordinators.
Methodology Dispatch Capability-Aware Routing `scripts/hapax-methodology-dispatch`, `tests/scripts/test_hapax_methodology_dispatch.py`	Refactors launch path selection from hard-coded if-ladder to table-driven `launchers` map keyed by `(platform, mode)` and guarded by `supports_route()` capability checks. Adds `--capabilities` flag to emit registry JSON. Replaces `launch_codex_headless` with configurable headless launcher invocation using `--task`/`--force` and Spark-specific model/reasoning flags. On unsupported routes, prints detailed capability guidance (modes, mutates, hooks_wired, headless) and supported governed routes. Updated tests inject the headless launcher env var and assert the new invocation shape.

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

hapax-systems/hapax-council#3789: The main PR wires scripts/gate-manifest-check.py into CI/pre-commit and updates hooks/gate-manifest.yaml CI run_markers, directly extending the same gate-manifest drift checker introduced in the retrieved PR (#3789).
hapax-systems/hapax-council#3460: Both PRs modify scripts/cc-pr-autoqueue.py’s storm-mode logic—extending/using the StormMode dataclass and updating run_reconciler’s storm-related threshold/parameter wiring—so the changes are code-level related.
hapax-systems/hapax-council#3772: The main PR’s updates to scripts/cc-pr-autoqueue.py (using advisory open-PR count and new flake quarantine reconciliation feeding the same throttle/freeze logic) and scripts/hapax-pr-admission (advisory open-PR count renamed/aliased into failure-rate throttling) directly align with the retrieved PR’s failure-rate throttle + quarantine/throttle-policy refactor in those same modules.

"🐰 I hopped through code and built a gate,
Capabilities listed for every slate.
Headless Codex wakes with a governed claim,
Quarantine keeps flaky PRs tame.
Routes now check what each runtime can take."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 12.70% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically summarizes the main changes: five parity completion gaps including codex-headless, broadcast groups, executor contract, gate-manifest CI, and flake-quarantine wiring.
Description check	✅ Passed	The description follows the required template with all critical sections: Summary (comprehensive overview of five changes), AuthorityCase (CASE-SDLC-REFORM-001 / SLICE-NNN), Test plan (detailed test results), and CLAUDE.md hygiene checklist. All acceptance criteria and test evidence provided.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch eta/reform-fix-parity-completion-20260531

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 083a6377bd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T01:03:43Z

+        modes=("headless", "interactive"),
+        profiles=("full", "spark"),


Register a Codex interactive platform path

Advertising Codex interactive support here does not make the route launchable because hapax-methodology-dispatch still calls route_for(args.platform, args.mode, args.profile) against PLATFORM_PATHS before it reaches the new launcher table, and that table has no ("codex", "interactive", "full") entry. As a result, --platform codex --mode interactive --launch is reported as supported by supports_route()/the capabilities CLI but is refused earlier as an unsupported legacy route, so the new launch_codex_interactive path is unreachable.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-01T01:03:43Z

+WORKDIR="${HAPAX_CODEX_HEADLESS_WORKDIR:-$HOME/projects/hapax-council--$SESSION}"
+if [[ ! -d "$WORKDIR" ]]; then
+  WORKDIR="$COUNCIL_DIR"
+fi


Create the Codex headless worktree instead of falling back

When a new cx-* headless lane has no ~/projects/hapax-council--$SESSION yet, this falls back to $COUNCIL_DIR and launches an autonomous mutating codex exec in the primary checkout. Unlike hapax-codex, which creates the codex-native worktree before launch, the first headless dispatch for a fresh lane will share and dirty the main repo rather than its isolated lane worktree.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-01T01:03:43Z

+    status = ""
+    if validation.task is not None:
+        status = strip_scalar(str(validation.task.fields.get("status", "")))
+    args = [str(launcher), "--task", task_id, "--force"]


Stop forcing duplicate Codex headless launches

Every dispatch to the new headless launcher includes --force, but hapax-codex-headless only honors its live-PID guard when FORCE != 1. If a dispatcher targets a lane that already has a running headless Codex process, this bypasses the duplicate-launch check, overwrites the PID file, and starts a second autonomous mutator in the same lane/worktree/log instead of refusing or requiring an explicit operator relaunch.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-01T01:03:43Z

+        profiles=("full", "worker", "flash", "lite"),
+        mutates=False,
+        claims=False,


Expose Gemini worker mutation in capabilities

This capability record lists the worker profile but still reports mutates=False and claims=False, even though hapax-methodology-dispatch has a mutable gemini/headless/worker route and launches it with HAPAX_GEMINI_APPROVAL_MODE=auto_edit. Any cockpit or scheduler using the new machine-readable capabilities will incorrectly treat the governed Gemini worker path as read-only/non-claiming and can route or display it unsafely.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-01T01:03:43Z

+        # operator machine where the live settings exist; CI checks the rest.
+        entry: uv run python scripts/gate-manifest-check.py --require-claude-settings
+        language: system
+        files: ^(hooks/gate-manifest\.yaml|hooks/scripts/(codex|antigrav)-hook-adapter\.sh|scripts/hapax-(codex|antigrav|vibe|gemini)|config/codex/config\.toml|\.github/workflows/ci\.yml)$


Run the live settings pre-commit check unconditionally

Because this hook has a files: filter and no always_run: true, pre-commit only invokes the --require-claude-settings live-settings check when one of these repo paths is staged. The drift-prone file it is meant to protect (~/.claude/settings.json) is outside the repo, so settings drift followed by an ordinary source/docs commit skips the check entirely instead of enforcing the advertised live wiring validation.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (2)

.pre-commit-config.yaml (1)

58-66: ⚡ Quick win

Include self-change paths in the hook matcher.

gate-manifest-check currently won’t self-trigger when .pre-commit-config.yaml or scripts/gate-manifest-check.py changes, which weakens local drift detection.

Proposed update

-        files: ^(hooks/gate-manifest\.yaml|hooks/scripts/(codex|antigrav)-hook-adapter\.sh|scripts/hapax-(codex|antigrav|vibe|gemini)|config/codex/config\.toml|\.github/workflows/ci\.yml)$
+        files: ^(hooks/gate-manifest\.yaml|hooks/scripts/(codex|antigrav)-hook-adapter\.sh|scripts/hapax-(codex|antigrav|vibe|gemini)|scripts/gate-manifest-check\.py|config/codex/config\.toml|\.github/workflows/ci\.yml|\.pre-commit-config\.yaml)$

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.pre-commit-config.yaml around lines 58 - 66, The gate-manifest-check hook
(id: gate-manifest-check) won't run when its own config or the checker script
changes; update the hook's files matcher (the files: pattern in the
gate-manifest-check entry) to also match .pre-commit-config.yaml and
scripts/gate-manifest-check.py so the hook self-triggers on those self-change
paths while keeping pass_filenames: false and the existing matches intact.

tests/hooks/test_gate_manifest_check.py (1)

146-160: ⚡ Quick win

Assert wiring structurally instead of substring matching.

This test can false-pass if the command appears in comments/text but not the actual hook/step entry.

Proposed update

 def test_gate_manifest_check_is_wired_in_ci_and_precommit() -> None:
-    ci_text = (REPO_ROOT / ".github" / "workflows" / "ci.yml").read_text(encoding="utf-8")
-    precommit_text = (REPO_ROOT / ".pre-commit-config.yaml").read_text(encoding="utf-8")
-    # The checker runs in CI (codex/antigrav/vibe/gemini/ci wiring) and in
-    # pre-commit (the live ~/.claude/settings.json, the most drift-prone wiring).
-    assert "scripts/gate-manifest-check.py" in ci_text
-    assert "scripts/gate-manifest-check.py" in precommit_text
-    assert "--require-claude-settings" in precommit_text  # live check in pre-commit
-    assert "--skip-claude-settings" in ci_text  # CI runners have no live settings
+    ci = yaml.safe_load((REPO_ROOT / ".github" / "workflows" / "ci.yml").read_text(encoding="utf-8"))
+    precommit = yaml.safe_load((REPO_ROOT / ".pre-commit-config.yaml").read_text(encoding="utf-8"))
+    # The checker runs in CI (codex/antigrav/vibe/gemini/ci wiring) and in
+    # pre-commit (the live ~/.claude/settings.json, the most drift-prone wiring).
+    lint_steps = ci["jobs"]["lint"]["steps"]
+    assert any(
+        step.get("name") == "Gate manifest wiring check"
+        and "scripts/gate-manifest-check.py" in step.get("run", "")
+        and "--skip-claude-settings" in step.get("run", "")
+        for step in lint_steps
+    )
+    local_hooks = [
+        hook
+        for repo in precommit["repos"]
+        if repo.get("repo") == "local"
+        for hook in repo.get("hooks", [])
+    ]
+    gate_hook = next(hook for hook in local_hooks if hook.get("id") == "gate-manifest-check")
+    assert "scripts/gate-manifest-check.py" in gate_hook.get("entry", "")
+    assert "--require-claude-settings" in gate_hook.get("entry", "")
     # The manifest lists its own CI invocation so the checker self-verifies the
     # CI wiring is present (check_ci run-marker round-trip).
     manifest = _manifest()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/hooks/test_gate_manifest_check.py` around lines 146 - 160, The test
test_gate_manifest_check_is_wired_in_ci_and_precommit should parse the YAML
structures instead of substring-searching: load the CI YAML (the content
currently read into ci_text) with yaml.safe_load and walk to the job steps to
assert there is a step whose "run" or "uses" invokes
"scripts/gate-manifest-check.py" and that its arguments include
"--skip-claude-settings"; likewise load the pre-commit YAML (precommit_text) and
assert one of the defined hooks in the hooks list has an "entry" that invokes
"scripts/gate-manifest-check.py" and includes "--require-claude-settings"; keep
the manifest check using _manifest() but change the run_markers assertion in
test_gate_manifest_check_is_wired_in_ci_and_precommit to compare markers for the
exact substring "gate-manifest-check.py" within each marker element (or exact
equality) rather than relying on raw file substring presence.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/cc-pr-autoqueue.py`:
- Around line 987-997: The read/modify/write of the quarantine ledger is racy:
the sequence using read_quarantine(...), reconcile_flake_quarantines(...), and
write_quarantine(...) must be serialized so two concurrent --apply runs don't
clobber each other; wrap the full read → reconcile → write sequence with an
exclusive file lock on quarantine_path (or implement an optimistic
read-verify-retry loop that re-reads the file before writing and retries if it
changed) so that the code that calls read_quarantine,
reconcile_flake_quarantines, and write_quarantine holds the lock until
write_quarantine completes; ensure the lock is acquired before calling
read_quarantine and released only after write_quarantine returns (use a
cross-platform file lock helper or retry-on-conflict loop).

In `@scripts/hapax-codex-headless`:
- Line 225: The script currently hardcodes
mcp_servers.hapax.env.LOGOS_BASE_URL="http://localhost:8051/api"; replace that
literal with the configured/logical LOGOS endpoint used elsewhere in the script
(the same environment variable set earlier) so non-default environments honor
the contract; specifically, remove the hardcoded URL and reference the existing
configured value (e.g., expand the LOGOS_BASE_URL/LOGOS endpoint variable used
around the earlier configuration block) so mcp_servers.hapax.env.LOGOS_BASE_URL
is assigned from the configured variable instead of the literal.

In `@scripts/hapax-methodology-dispatch`:
- Around line 906-909: The args list currently unconditionally appends "--force"
which bypasses the live-lane PID safety check; change the construction so
"--force" is only appended when an explicit intent exists (e.g., a boolean flag
like force_relaunch or a CLI/env var), e.g., build args = [str(launcher),
"--task", task_id] then if force_relaunch: args.append("--force"); retain the
existing conditional that adds "--no-claim" when status in
{"claimed","in_progress"} and still extend with [lane, prompt]; update any
callers or flag parsing to expose the new force_relaunch control so relaunches
are not forced by default.

In `@shared/merge_queue_lineage.py`:
- Around line 1053-1059: The loop currently does an unguarded int(pr) which will
raise on malformed values; wrap the conversion in a try/except (catching
ValueError and TypeError) around int(pr) inside the block that handles run_id/pr
(the same scope using run_id, pr, bucket and by_run), and on conversion failure
simply continue so malformed pr values are skipped as the docstring promises.
- Around line 1003-1029: The reconciliation currently can lift a quarantine
(lift_quarantine) and then immediately re-open it in the same pass; to prevent
that, exclude PRs that were lifted in this pass from being re-quarantined by
adding a guard in the candidate loop: after computing active_after_lift and
before calling should_quarantine_pr/open_quarantine, check if pr_number is in
the lifted collection (the lifted list populated when calling lift_quarantine)
and skip it (continue) if so; keep the existing active_after_lift and newly
handling unchanged.

---

Nitpick comments:
In @.pre-commit-config.yaml:
- Around line 58-66: The gate-manifest-check hook (id: gate-manifest-check)
won't run when its own config or the checker script changes; update the hook's
files matcher (the files: pattern in the gate-manifest-check entry) to also
match .pre-commit-config.yaml and scripts/gate-manifest-check.py so the hook
self-triggers on those self-change paths while keeping pass_filenames: false and
the existing matches intact.

In `@tests/hooks/test_gate_manifest_check.py`:
- Around line 146-160: The test
test_gate_manifest_check_is_wired_in_ci_and_precommit should parse the YAML
structures instead of substring-searching: load the CI YAML (the content
currently read into ci_text) with yaml.safe_load and walk to the job steps to
assert there is a step whose "run" or "uses" invokes
"scripts/gate-manifest-check.py" and that its arguments include
"--skip-claude-settings"; likewise load the pre-commit YAML (precommit_text) and
assert one of the defined hooks in the hooks list has an "entry" that invokes
"scripts/gate-manifest-check.py" and includes "--require-claude-settings"; keep
the manifest check using _manifest() but change the run_markers assertion in
test_gate_manifest_check_is_wired_in_ci_and_precommit to compare markers for the
exact substring "gate-manifest-check.py" within each marker element (or exact
equality) rather than relying on raw file substring presence.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: c920dfa4-c2b4-4938-ae55-53a0b592dae1

📥 Commits

Reviewing files that changed from the base of the PR and between 40bcb2e and 083a637.

📒 Files selected for processing (16)

.github/workflows/ci.yml
.pre-commit-config.yaml
hooks/gate-manifest.yaml
scripts/cc-pr-autoqueue.py
scripts/executor_contract.py
scripts/hapax-codex-headless
scripts/hapax-executor-capabilities
scripts/hapax-methodology-dispatch
scripts/hapax-pr-admission
shared/merge_queue_lineage.py
shared/relay_mq.py
tests/hooks/test_gate_manifest_check.py
tests/shared/test_relay_mq.py
tests/test_cc_pr_autoqueue.py
tests/test_executor_contract.py
tests/test_merge_queue_lineage.py

coderabbitai · 2026-06-01T01:05:35Z

+    existing_quarantine = read_quarantine(quarantine_path)
+    quarantined_prs = active_quarantined_pr_numbers(existing_quarantine, now=now)
+    quarantine_reconciliation = reconcile_flake_quarantines(
+        existing_quarantine,
+        lineage_records,
+        candidate_prs={pr.number for pr in prs},
+        policy=throttle_policy,
+        now=now,
    )
-    throttle_policy = FleetThrottlePolicy(advisory_open_pr_count=storm_open_pr_threshold)
+    if apply and (quarantine_reconciliation.newly_quarantined or quarantine_reconciliation.lifted):
+        write_quarantine(quarantine_path, quarantine_reconciliation.records)


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Serialize the quarantine read/modify/write cycle.

This new write-side path reads the ledger, reconciles in memory, then overwrites the whole file without any lock. Two overlapping --apply runs can start from the same snapshot and the later write_quarantine() will silently drop the earlier run's newly opened/lifted records.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/cc-pr-autoqueue.py` around lines 987 - 997, The read/modify/write of the quarantine ledger is racy: the sequence using read_quarantine(...), reconcile_flake_quarantines(...), and write_quarantine(...) must be serialized so two concurrent --apply runs don't clobber each other; wrap the full read → reconcile → write sequence with an exclusive file lock on quarantine_path (or implement an optimistic read-verify-retry loop that re-reads the file before writing and retries if it changed) so that the code that calls read_quarantine, reconcile_flake_quarantines, and write_quarantine holds the lock until write_quarantine completes; ensure the lock is acquired before calling read_quarantine and released only after write_quarantine returns (use a cross-platform file lock helper or retry-on-conflict loop).

coderabbitai · 2026-06-01T01:05:35Z

+  -c "hooks.Stop=[{command=\"$HOOK\",timeout=20,statusMessage=\"Writing Hapax session summary\"}]"
+  -c "mcp_servers.hapax.command=\"$HOME/.local/bin/uv\""
+  -c "mcp_servers.hapax.args=[\"--directory\",\"$HOME/projects/hapax-mcp\",\"run\",\"hapax-mcp\"]"
+  -c 'mcp_servers.hapax.env.LOGOS_BASE_URL="http://localhost:8051/api"'


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the configured LOGOS endpoint instead of a hardcoded URL.

Line 225 hardcodes the MCP LOGOS_BASE_URL, which ignores the environment contract set at Line 141 and breaks non-default environments.

Proposed fix

- -c 'mcp_servers.hapax.env.LOGOS_BASE_URL="http://localhost:8051/api"' + -c "mcp_servers.hapax.env.LOGOS_BASE_URL=\"$LOGOS_BASE_URL\""

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

-c 'mcp_servers.hapax.env.LOGOS_BASE_URL="http://localhost:8051/api"'

-c "mcp_servers.hapax.env.LOGOS_BASE_URL=\"$LOGOS_BASE_URL\""

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/hapax-codex-headless` at line 225, The script currently hardcodes mcp_servers.hapax.env.LOGOS_BASE_URL="http://localhost:8051/api"; replace that literal with the configured/logical LOGOS endpoint used elsewhere in the script (the same environment variable set earlier) so non-default environments honor the contract; specifically, remove the hardcoded URL and reference the existing configured value (e.g., expand the LOGOS_BASE_URL/LOGOS endpoint variable used around the earlier configuration block) so mcp_servers.hapax.env.LOGOS_BASE_URL is assigned from the configured variable instead of the literal.

coderabbitai · 2026-06-01T01:05:35Z

+    for record in existing:
+        if record.released_at is None and not quarantine_active(record, now=now):
+            out.append(lift_quarantine(record, now=now))
+            lifted.append(record.pr_number)
+        else:
+            out.append(record)
+
+    active_after_lift = active_quarantined_pr_numbers(out, now=now)
+    newly: list[int] = []
+    window_hours = policy.window_seconds / 3600
+    for pr_number in sorted(set(candidate_prs)):
+        if pr_number in active_after_lift:
+            continue
+        if should_quarantine_pr(pr_number, lineage, policy=policy, now=now):
+            out.append(
+                open_quarantine(
+                    pr_number,
+                    reason=(
+                        f"{counts.get(pr_number, 0)} CI failures within "
+                        f"{window_hours:.0f}h (>= {policy.quarantine_failures}); "
+                        "reversible flake quarantine"
+                    ),
+                    now=now,
+                    cooldown_seconds=policy.quarantine_cooldown_seconds,
+                )
+            )
+            newly.append(pr_number)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't reopen a quarantine in the same reconciliation pass after lifting it.

If quarantine_cooldown_seconds is shorter than window_seconds, this pass can lift an expired record at Lines 1004-1006 and then immediately reopen it from the same historical failures at Lines 1013-1028. That makes the cooldown ineffective for supported non-default policies.

Suggested guard

active_after_lift = active_quarantined_pr_numbers(out, now=now) newly: list[int] = [] + lifted_set = set(lifted) window_hours = policy.window_seconds / 3600 for pr_number in sorted(set(candidate_prs)): + if pr_number in lifted_set: + continue if pr_number in active_after_lift: continue if should_quarantine_pr(pr_number, lineage, policy=policy, now=now):

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for record in existing:

if record.released_at is None and not quarantine_active(record, now=now):

out.append(lift_quarantine(record, now=now))

lifted.append(record.pr_number)

else:

out.append(record)

active_after_lift = active_quarantined_pr_numbers(out, now=now)

newly: list[int] = []

window_hours = policy.window_seconds / 3600

for pr_number in sorted(set(candidate_prs)):

if pr_number in active_after_lift:

continue

if should_quarantine_pr(pr_number, lineage, policy=policy, now=now):

out.append(

open_quarantine(

pr_number,

reason=(

f"{counts.get(pr_number, 0)} CI failures within "

f"{window_hours:.0f}h (>= {policy.quarantine_failures}); "

"reversible flake quarantine"

),

now=now,

cooldown_seconds=policy.quarantine_cooldown_seconds,

)

)

newly.append(pr_number)

for record in existing:

if record.released_at is None and not quarantine_active(record, now=now):

out.append(lift_quarantine(record, now=now))

lifted.append(record.pr_number)

else:

out.append(record)

active_after_lift = active_quarantined_pr_numbers(out, now=now)

newly: list[int] = []

lifted_set = set(lifted)

window_hours = policy.window_seconds / 3600

for pr_number in sorted(set(candidate_prs)):

if pr_number in lifted_set:

continue

if pr_number in active_after_lift:

continue

if should_quarantine_pr(pr_number, lineage, policy=policy, now=now):

out.append(

open_quarantine(

pr_number,

reason=(

f"{counts.get(pr_number, 0)} CI failures within "

f"{window_hours:.0f}h (>= {policy.quarantine_failures}); "

"reversible flake quarantine"

),

now=now,

cooldown_seconds=policy.quarantine_cooldown_seconds,

)

)

newly.append(pr_number)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@shared/merge_queue_lineage.py` around lines 1003 - 1029, The reconciliation currently can lift a quarantine (lift_quarantine) and then immediately re-open it in the same pass; to prevent that, exclude PRs that were lifted in this pass from being re-quarantined by adding a guard in the candidate loop: after computing active_after_lift and before calling should_quarantine_pr/open_quarantine, check if pr_number is in the lifted collection (the lifted list populated when calling lift_quarantine) and skip it (continue) if so; keep the existing active_after_lift and newly handling unchanged.

coderabbitai · 2026-06-01T01:05:35Z

+        run_id = run.get("run_id")
+        pr = run.get("pr")
+        if run_id is None or pr is None:
+            continue
+        bucket = by_run[str(run_id)]
+        if int(pr) not in bucket:
+            bucket.append(int(pr))


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Skip malformed pr values instead of crashing.

The docstring says malformed rows are skipped, but int(pr) here will still raise on values like "abc" and take down the reconciler. Guard the conversion and continue on failure.

Suggested hardening

for run in failed_runs: run_id = run.get("run_id") pr = run.get("pr") if run_id is None or pr is None: continue + try: + pr_number = int(pr) + except (TypeError, ValueError): + continue bucket = by_run[str(run_id)] - if int(pr) not in bucket: - bucket.append(int(pr)) + if pr_number not in bucket: + bucket.append(pr_number)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@shared/merge_queue_lineage.py` around lines 1053 - 1059, The loop currently does an unguarded int(pr) which will raise on malformed values; wrap the conversion in a try/except (catching ValueError and TypeError) around int(pr) inside the block that handles run_id/pr (the same scope using run_id, pr, bucket and by_run), and on conversion failure simply continue so malformed pr values are skipped as the docstring promises.

…eractive path The merge-queue full-suite run caught a regression the PR's lighter test job missed: tests/scripts/test_hapax_methodology_dispatch.py stubbed the codex launcher via the old HAPAX_METHODOLOGY_CODEX_LAUNCHER env var, but the repointed launch_codex_headless now reads HAPAX_METHODOLOGY_CODEX_HEADLESS → the real default-deny hapax-codex-headless ran and exited 77. - Tests: stub HAPAX_METHODOLOGY_CODEX_HEADLESS; assert the new inline CLI (--task <id> --force <lane> <prompt>) instead of the old --bootstrap form. - Drop launch_codex_interactive + the ("codex","interactive") launcher entry: codex-interactive has no PLATFORM_PATHS route (route-validation rejects it before launch), so it was unreachable dead code that also made the capability registry inconsistent. write_codex_bootstrap (only its caller) removed too. - executor_contract: codex modes -> ("headless",) to match the dispatch's launchable routes; update the contract tests. Task: reform-fix-parity-completion-20260531 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…-completion-20260531 # Conflicts: # tests/test_cc_pr_autoqueue.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b18d3a7d9a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T13:02:23Z

+    existing_quarantine = read_quarantine(quarantine_path)
+    quarantined_prs = active_quarantined_pr_numbers(existing_quarantine, now=now)
+    quarantine_reconciliation = reconcile_flake_quarantines(


Apply quarantines to PR admission decisions

Please feed the active flake quarantine into classify_pr/the mutation loop, not just the fleet-rate calculation. With an existing active quarantine for a PR whose checks and task metadata are otherwise ready, these new lines only exclude it from decide_fleet_throttle/recommend_max_entries_to_build; the later classify_pr(...) calls receive no quarantine context and can still return queue or enable_auto_merge, so apply=True will call gh pr merge for the quarantined flaky PR and defeat the quarantine.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-01T13:02:23Z

+ADAPTER_CLI_CONTRACT: tuple[str, ...] = (
+    "--lane",
+    "--task",
+    "--mode",  # headless | interactive | receipt-only
+    "--prompt",
+    "--no-claim",
+    "--force",


Align the advertised adapter flags with launchers

The contract emitted by hapax-executor-capabilities --contract is not accepted by the new Codex headless adapter: hapax-codex-headless only parses --task, --no-claim, and --force before positional <cx-session> <governed-initial-message>, so a client following this machine-readable contract with --lane cx-red --prompt ... is rejected before launch. Either implement these flags in the adapters or stop advertising them as accepted by every launcher.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-01T13:02:23Z

+    return peer.startswith("antigrav")
+
+
+def _is_vibe_lane(peer: str) -> bool:
+    return peer.startswith("vbe") or peer.startswith("vibe")


Exclude relay status files from worker broadcasts

When the relay directory contains status YAMLs such as antigrav-status.yaml (the retire helper checks <role>-status.yaml), these prefix checks include antigrav-status/vibe... as recipients for *:antigrav, *:vibe, and *:workers. Those stems are not live consumers, so group dispatches can leave unconsumable offered messages and make delivery look incomplete; restrict the predicates to actual lane names or filter *-status.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-01T13:02:23Z

+        bucket = by_run[str(run_id)]
+        if int(pr) not in bucket:
+            bucket.append(int(pr))


Build bisection input from full merge-group batches

The new bisection plan expects multiple rows with the same run_id, but the live lineage store is keyed by merge_group_run_id in write_jsonl_records, so normal failed_recent_merge_group_runs contains at most one PR per failed run. In that case every bucket is a single-PR batch and bisect_failed_batch() returns no split, meaning the reported recommended_bisections stays empty precisely for the failed merge-group batches it is meant to isolate.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1205979d9c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T13:10:00Z

    if status in {"claimed", "in_progress"}:
        args.append("--no-claim")


Require the Codex lane claim before launch

When dispatching a task whose note is already claimed or in_progress, this always passes --no-claim to the new headless launcher. That launcher only claims when CLAIM_TASK=1; if the target cx-* lane has no existing ~/.cache/hapax/cc-active-task-$SESSION file (for example after a stale note, reboot, or reassignment where the frontmatter says assigned to this lane but the lease is absent), it still starts codex exec with just --task and no session claim. That bypasses the mandatory cc-task lease for an autonomous mutating headless run; either verify the lane claim exists before adding --no-claim or let the launcher run cc-claim and fail closed.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed Jun 1, 2026

View reviewed changes