Skip to content

feat: add Codex session supervisor runtime#213

Open
ndycode wants to merge 7 commits intodevfrom
git-clean/pr147-supervisor-runtime
Open

feat: add Codex session supervisor runtime#213
ndycode wants to merge 7 commits intodevfrom
git-clean/pr147-supervisor-runtime

Conversation

@ndycode
Copy link
Owner

@ndycode ndycode commented Mar 21, 2026

Summary

  • move the supervisor wrapper and monitor runtime out of feat: add experimental Codex session supervisor #207 into a stacked follow-up PR
  • wire the codex wrapper through the new routing helpers and supervisor entrypoint
  • add focused wrapper and supervisor smoke coverage for the runtime behavior

Validation

  • npm run test:session-supervisor:smoke
  • npm run typecheck
  • npm run build

Notes

note: greptile review for oc-chatgpt-multi-auth. cite files like lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.

Greptile Summary

this pr adds the codex session supervisor runtime — a process-level wrapper that monitors quota pressure on the active account while an interactive codex session is running, and transparently rotates to a healthy account by restarting the child and resuming the session. the approach is sound: a file-based storage lock with heartbeat renewal serialises all account mutations, pre-warm selection probes a successor account before the current one exhausts its quota, and buildCodexChildEnv correctly strips auth tokens (using case-insensitive comparison, addressing the windows env-casing risk from the prior review). abort signal propagation through fetchCodexQuotaSnapshot and the shared-probe isolation fix (skipAccountCooldown: true when an upstream caller's signal aborts) are both in and covered by targeted tests.

key changes:

  • scripts/codex-supervisor.js — 2,500-line supervisor with lock/heartbeat, probe cache, prewarm selection, interactive supervision loop, and windows-safe retry on EPERM/EBUSY for both unlink and heartbeat writes
  • scripts/codex-routing.js — new findPrimaryCodexCommand, hasTopLevelHelpOrVersionFlag, splitCodexCommandArgs helpers consumed by both the supervisor and the wrapper
  • scripts/codex.js — wires runCodexSupervisorIfEnabled ahead of the plain forward path; contains a dead-code guard (!supervisorDidStartupSync && supervisorDidLaunchSession) that is unreachable because syncBeforeLaunch always fires before a session binding can be established
  • lib/quota-probe.ts — adds signal?: AbortSignal to ProbeCodexQuotaOptions with proper listener cleanup and redundant-but-harmless consecutive abort checks
  • test coverage is comprehensive across lock serialisation, prewarm overlap, concurrency, restart limits, and windows filesystem paths

Confidence Score: 4/5

  • safe to merge after resolving the dead-code sync guard in codex.js; no correctness regressions, no token-material leaks, no data-loss paths identified
  • all prior review concerns are addressed in this pass: windows heartbeat write retry is in, case-insensitive env stripping is correct, and the shared-probe abort isolation properly sets skipAccountCooldown rather than coolDownAccount. one new logic issue found — the post-session autoSync branch in codex.js is unreachable dead code. this is not a production safety risk (the sync still runs before every spawn) but it is a logic correctness concern worth a targeted fix before merge.
  • scripts/codex.js — dead post-session sync condition; lib/quota-probe.ts — redundant consecutive abort check (minor)

Important Files Changed

Filename Overview
scripts/codex-supervisor.js large new supervisor runtime (~2,500 loc): lock heartbeat retry, case-insensitive env stripping, and abort-probe isolation are all correctly implemented; the module-level cache singletons are safe in cli use and tests reset them in afterEach
scripts/codex.js wires supervisor entrypoint correctly but the post-session sync guard (!supervisorDidStartupSync && supervisorDidLaunchSession) is logically unreachable — the autoSyncManagerActiveSelectionIfEnabled() inside it is dead code
scripts/codex-routing.js clean addition of findPrimaryCodexCommand, hasTopLevelHelpOrVersionFlag, and splitCodexCommandArgs; flag/value parsing and -- stop-token handling look correct
lib/quota-probe.ts abort signal support added correctly with proper listener cleanup; contains one redundant consecutive throwIfQuotaProbeAborted call before and inside the same try block
test/codex-supervisor.test.ts broad coverage of lock, probe, prewarm, restart, and signal behaviors; some tests pass buildForwardArgs to runInteractiveSupervision which silently ignores it; env var mutations within test bodies rely on afterEach for cleanup

Sequence Diagram

sequenceDiagram
    participant C as codex.js (main)
    participant S as runCodexSupervisorIfEnabled
    participant R as loadSupervisorRuntime
    participant A as ensureLaunchableAccount
    participant M as withLockedManager
    participant P as probeAccountSnapshot
    participant I as runInteractiveSupervision
    participant X as codex child process

    C->>S: rawArgs, buildForwardArgs, forwardToRealCodex
    S->>R: import dist/lib/{config,accounts,quota-probe,storage}
    R-->>S: runtime (or null → fallback to normal forward)
    S->>A: probe all accounts, select healthy one
    A->>M: lock storage, load AccountManager
    M-->>A: manager + lockSignal
    A->>P: fetchCodexQuotaSnapshot (with abort signal)
    P-->>A: snapshot or QuotaProbeUnavailableError
    A-->>S: { ok, account, manager }
    S->>I: initialArgs, runtime, pluginConfig, manager
    loop each launch (initial + restarts)
        I->>C: syncBeforeLaunch()
        I->>X: spawnRealCodex(codexBin, args)
        I->>I: waitForSessionBinding (polls sessions/*.jsonl)
        I->>P: monitor: probeAccountSnapshot(currentAccount)
        alt quota near exhaustion
            I->>I: maybeStartPreparedResumeSelection (prewarm)
            I->>X: requestChildRestart (SIGINT → SIGTERM → SIGKILL)
            I->>A: ensureLaunchableAccount (or commitPreparedSelection)
            I->>I: buildResumeArgs(sessionId, launchArgs)
        end
        X-->>I: exit(code, signal)
    end
    I-->>S: exitCode
    S-->>C: exitCode (or null if runtime unavailable)
Loading

Fix All in Codex

Prompt To Fix All With AI
This is a comment left during a code review.
Path: scripts/codex.js
Line: 554-562

Comment:
**Unreachable post-session sync branch**

the triple condition `!supervisorDidForward && !supervisorDidStartupSync && supervisorDidLaunchSession` is logically impossible to satisfy simultaneously.

in `runInteractiveSupervision`, `syncBeforeLaunch()` (which is `syncBeforeSupervisorLaunch`) is always called before the first `spawnChild()` call, so `supervisorDidStartupSync` is `true` before any session binding can be established. `onLaunch` (which sets `supervisorDidLaunchSession = true`) only fires after `findBinding` resolves a valid binding — which happens after the spawn. so `supervisorDidStartupSync` is always `true` by the time `supervisorDidLaunchSession` becomes `true`.

the `autoSyncManagerActiveSelectionIfEnabled()` inside this block is dead code and never executes. if the intent was to run a post-session sync after interactive supervised sessions complete, this block won't cover it. if the sync-before-each-spawn (via `syncBeforeLaunch`) is the full contract, the dead condition should be removed to avoid confusion.

```suggestion
	if (supervisedExitCode !== null) {
		if (supervisedExitCode === 130) {
			return 130;
		}
		return supervisedExitCode;
	}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: lib/quota-probe.ts
Line: 344-348

Comment:
**Redundant consecutive abort checks**

`throwIfQuotaProbeAborted(options.signal)` is called twice at the same synchronous point — once before the `try` block and immediately again as the first statement inside it. since nothing async runs between them, the second call can never see a state the first didn't already see. the outer call throws synchronously (escaping the loop entirely); the inner call would be caught and re-thrown via the signal check in the `catch` block — same outcome, extra noise.

```suggestion
	for (const model of models) {
		try {
			throwIfQuotaProbeAborted(options.signal);
			const instructions = await getCodexInstructions(model);
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: test/codex-supervisor.test.ts
Line: 1

Comment:
**`buildForwardArgs` silently ignored by `runInteractiveSupervision`**

several tests (e.g. `"preserves caller CLI options when rebuilding resume args after rotation"`) pass a `buildForwardArgs` key to `runInteractiveSupervision`, but that parameter is not in the function's destructured signature — it is silently dropped. the function builds resume args internally via `buildResumeArgs` using `launchArgs` directly.

the test still passes because it checks `spawnedArgs` captured by the `spawnChild` mock, and those args come from the internal `launchArgs` state. but the `buildForwardArgs` key is misleading — it suggests an override hook that doesn't exist on `runInteractiveSupervision`. consider either removing it from the test calls, or adding a note that the pre-wired `initialArgs` is what drives the resume-args rebuild.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (7): Last reviewed commit: "Tighten supervisor test teardown" | Re-trigger Greptile

@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 21, 2026

📝 Walkthrough

Walkthrough

introduces a session supervisor system for managing multi-account quota rotation and resumable codex sessions. supervisor runs before real codex execution, probes account quota states, selects eligible accounts, monitors session binding metadata from jsonl files, and triggers child restarts when quota exhaustion approaches and session idle time permits. integrates with storage locking, abort-aware control flow, and optional prewarming.

Changes

Cohort / File(s) Summary
Argument Parsing Utilities
scripts/codex-routing.js
Added findPrimaryCodexCommand, hasTopLevelHelpOrVersionFlag, and splitCodexCommandArgs functions to parse and normalize command-line arguments, including handling of option flags with values, -- termination, and configuration flag detection.
Session Supervisor Implementation
scripts/codex-supervisor.js
New comprehensive module implementing quota-aware session supervision: dynamically loads runtime config/accounts/quota-probe/storage modules, probes account quota with TTL caching and de-duplication, maintains supervisor lock with heartbeat TTL, polls session binding metadata from jsonl files, spawns child codex process, monitors quota pressure in loop, decides on child restart based on quota exhaustion and session idle time, supports prewarming and selection preparation, handles abort signals, and exposes test-only internal API.
CLI Integration
scripts/codex.js
Integrated supervisor into main codex entry point: replaced unconditional autoSyncManagerActiveSelectionIfEnabled() with conditional runCodexSupervisorIfEnabled(...), wrapped real codex forwarding in forwardToRealCodexWithStartupSync to handle startup sync, added state flags to track supervisor delegation (supervisorDidForward, supervisorDidLaunchSession, supervisorDidStartupSync), and conditionally re-run auto-sync for interactive supervisor commands.
Test Infrastructure & Coverage
test/codex-supervisor.test.ts
Comprehensive test suite (2805 lines) covering supervisor internals via test-only API: session metadata parsing, binding lookup with caching, quota probe caching/de-duplication, concurrent caller sharing, restart control flow with platform-specific SIG escalation, supervisor storage locking with TTL heartbeat and stale detection, critical-section serialization, interactive monitoring loop, cooldown pacing, and account gating bypass for auth/help/version.
Integration Test Updates
test/codex-bin-wrapper.test.ts
Extended wrapper fixture to copy codex-supervisor.js and generate runtime stubs (config.js, accounts.js, quota-probe.js, storage.js, codex-manager.js). Added tests validating supervisor non-interactive forwarding, auto-sync cardinality (exactly once per supervisor command), startup sync skip on abort sentinel 130, post-launch auto-sync for interactive runs, double-sync avoidance, and config flag injection edge cases.
Test Script Addition
package.json
Added npm script test:session-supervisor:smoke running vitest against supervisor and related test files.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI Entry
    participant Supervisor as Supervisor<br/>(codex-supervisor.js)
    participant AccountMgr as Account Manager<br/>(config/accounts)
    participant QuotaProbe as Quota Probe<br/>(quota-probe)
    participant Storage as Storage<br/>(persistent)
    participant Child as Child Codex<br/>Process
    participant SessionLog as Session Log<br/>(JSONL)

    CLI->>Supervisor: runCodexSupervisorIfEnabled(args)
    Note over Supervisor: Parse args, check if interactive
    
    Supervisor->>Storage: Acquire supervisor lock<br/>(with TTL heartbeat)
    Storage-->>Supervisor: Lock acquired
    
    Supervisor->>AccountMgr: Get eligible accounts
    AccountMgr-->>Supervisor: Account list
    
    Supervisor->>QuotaProbe: Probe quota for account<br/>(with TTL cache)
    QuotaProbe-->>Supervisor: Quota snapshot
    
    alt Quota OK
        Supervisor->>Storage: Store selected account<br/>state
    else Quota Exhausted
        Supervisor->>Storage: Mark account cooling down
        Supervisor->>AccountMgr: Rotate to next account
    end
    
    Supervisor->>Child: Spawn child codex process
    
    Note over Supervisor,Child: Interactive monitoring loop
    loop Until child exits
        Supervisor->>SessionLog: Poll session binding<br/>metadata (JSONL)
        SessionLog-->>Supervisor: session_meta line
        
        Supervisor->>QuotaProbe: Probe current account<br/>quota (with de-dup)
        QuotaProbe-->>Supervisor: Quota pressure
        
        alt Quota near exhaustion<br/>AND session idle
            Supervisor->>Child: Send signal / restart
            Child->>Child: Exit
            Supervisor->>Child: Spawn replacement codex
        else Continue
            Supervisor->>Storage: Heartbeat lock TTL
        end
    end
    
    Child-->>CLI: Exit code
    Supervisor-->>CLI: Forwarded exit code
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

substantial new feature spanning 2448 lines in supervisor core logic with dense, multi-concern interactions: quota polling with in-memory ttl caching and in-flight de-duplication, persistent supervisor lock acquisition/heartbeat/stale-detection, abortable async control flow, session binding metadata polling with rollout-path caching, child process restart decisions, and platform-specific signal escalation. integration touch in scripts/codex.js:46 restructures startup sync flow with three state flags. test coverage is comprehensive (2805 lines) but flag missing explicit regression tests for: (1) windows SIGKILL escalation timeout edge cases when child hangs during lock loss (scripts/codex-supervisor.js:~1900), (2) concurrent supervisor instances racing on lock acquisition—heartbeat renewal during long critical sections could lose lease mid-restart, (3) quota probe abortion semantics when caller aborts shared in-flight probe, (4) jsonl session binding corruption or truncated lines causing unbounded scan, and (5) stale lock cleanup on windows transient EPERM/EBUSY retry logic (test/codex-supervisor.test.ts:1200+) only covers synchronous happy path. concurrency risks: probe cache eviction under load, lock heartbeat failure modes during quota-driven restart cascades, and session log reader assumptions about atomic writes.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive PR description is substantial and covers summary, changes, and validation steps, but omits most docs/governance checklist items and risk assessment details required by template. Add risk level, rollback plan, and clarify whether docs updates (README, docs/getting-started.md, etc.) are needed for the supervisor feature.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed Title follows conventional commits format with 'feat' type, lowercase summary, and is well under 72-char limit at 42 chars.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch git-clean/pr147-supervisor-runtime
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch git-clean/pr147-supervisor-runtime

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ndycode ndycode added the needs-human-review Flagged by automated PR quality screening; maintainer review required. label Mar 21, 2026
coderabbitai[bot]
coderabbitai bot previously requested changes Mar 21, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/codex-routing.js`:
- Around line 37-69: The function findPrimaryCodexCommand currently returns the
lowercased token (normalizedArg); change its return shape to preserve the
original token by returning the raw token (use args[index] trimmed but not
lowercased) as command and add a separate normalized (or commandNormalized)
field with the lowercased value for comparisons; apply the same change to the
other occurrence around the split helper (splitCodexCommandArgs / the function
at lines ~109-123) so leadingArgs/trailingArgs keep original casing while
consumers get a normalized field for case-insensitive checks.

In `@scripts/codex-supervisor.js`:
- Around line 1921-1922: The wrapper currently calls onLaunch immediately (see
onLaunch = () => {}) before any resumable session is confirmed; change the
behavior so onLaunch is invoked only after a successful binding.sessionId is
observed (i.e., after findBinding() or waitForBinding() returns a binding with a
sessionId), and ensure spawnChild = spawnRealCodex remains the child spawner;
update the wrapper logic that currently marks “launched” at startup (lines
referencing the wrapper launch flag) to instead wait for the first successful
binding.sessionId capture before firing onLaunch, and add a regression test that
uses the real supervisor path (replacing the current stubs in the bin wrapper
test) to confirm the wrapper won't take post-session auto-sync if the child dies
immediately.
- Around line 153-155: getMaxSessionRestarts currently treats the env var
CODEX_AUTH_CLI_SESSION_MAX_RESTARTS as "max launches" because the first spawn
consumes one unit; change the logic so the env value represents allowed restarts
only: keep getMaxSessionRestarts as the parsed value but update the
spawning/resume logic that uses it (the code path around the initial
spawn/resume handling) to not decrement or consume a restart count on the very
first launch and only decrement when attempting to resume/restart a session;
ensure checks permit an initial launch when getMaxSessionRestarts() === 0 and
only block further resume attempts when the remaining restarts count is
exhausted.

In `@test/codex-supervisor.test.ts`:
- Around line 49-66: The test defines a duplicate directory-removal backoff
helper removeDirectoryWithRetry with its own retry codes/timing; instead,
extract and reuse the centralized windows cleanup retry helper used by the other
suite: move the shared logic into a test utility (e.g., export a function like
removeDirectoryWithRetry or reuse the existing helper function signature),
update this test to import and call that shared helper instead of its local
removeDirectoryWithRetry, and ensure the retryableCodes and backoff timing are
taken from the single shared implementation so both suites use identical
semantics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 807682f1-86cc-4607-936c-c74c54c45cb9

📥 Commits

Reviewing files that changed from the base of the PR and between dd077ff and f14e7db.

📒 Files selected for processing (6)
  • package.json
  • scripts/codex-routing.js
  • scripts/codex-supervisor.js
  • scripts/codex.js
  • test/codex-bin-wrapper.test.ts
  • test/codex-supervisor.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (1)
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/codex-bin-wrapper.test.ts
  • test/codex-supervisor.test.ts

@ndycode ndycode changed the base branch from git-clean/pr147-supervisor-plumbing to dev March 21, 2026 19:57
@ndycode ndycode removed the needs-human-review Flagged by automated PR quality screening; maintainer review required. label Mar 22, 2026
@ndycode ndycode dismissed coderabbitai[bot]’s stale review March 22, 2026 15:50

All review threads are resolved and later commits addressed this stale automated change request.

@ndycode ndycode deleted the branch dev March 24, 2026 18:37
@ndycode ndycode closed this Mar 24, 2026
@ndycode ndycode deleted the git-clean/pr147-supervisor-runtime branch March 24, 2026 18:37
@ndycode ndycode restored the git-clean/pr147-supervisor-runtime branch March 24, 2026 18:40
@ndycode ndycode reopened this Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant