Skip to content

[codex] Reconcile stale running jobs when pid exits#225

Draft
LawrenceChiu95 wants to merge 1 commit intoopenai:mainfrom
LawrenceChiu95:codex/reconcile-stale-running-jobs
Draft

[codex] Reconcile stale running jobs when pid exits#225
LawrenceChiu95 wants to merge 1 commit intoopenai:mainfrom
LawrenceChiu95:codex/reconcile-stale-running-jobs

Conversation

@LawrenceChiu95
Copy link
Copy Markdown

Summary

  • auto-reconcile stale running jobs when the tracked pid no longer exists
  • persist the reconciliation to both state index (state.json) and per-job file (jobs/<id>.json)
  • append a log line so users can see why a job switched to failed

Root Cause

/codex:status trusted persisted state for running jobs without validating whether the worker process was still alive. If a worker died unexpectedly, the job could remain running indefinitely and appear stuck.

Changes

  • plugins/codex/scripts/lib/state.mjs
    • add pid liveness check (process.kill(pid, 0))
    • add reconciliation path in listJobs() for stale running jobs with dead pids
    • mark stale jobs as failed with errorMessage, completedAt, and pid: null
    • mirror updates into per-job json and append a log annotation
  • tests/state.test.mjs
    • add regression test: listJobs auto-reconciles stale running jobs when pid is no longer alive

Validation

  • node --test tests/state.test.mjs
  • node --test tests/runtime.test.mjs --test-name-pattern "status shows phases|status preserves adversarial review kind labels|status --wait times out cleanly|cancel without a job id ignores active jobs from other Claude sessions|cancel with a job id can still target an active job from another Claude session|stop hook logs running tasks to stderr without blocking when the review gate is disabled"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant