fix: handle zombie jobs when process crashes but status remains "running" by D2758695161 · Pull Request #218 · openai/codex-plugin-cc

D2758695161 · 2026-04-14T01:19:28Z

Summary

When a Codex task process crashes or is killed externally, its job record remains in status "running" permanently. All subsequent task calls in the same Claude session fail with:

Task {job-id} is still running. Use /codex:status before continuing it.

This effectively blocks all Codex usage in that session.

Fix

Before blocking on an active task, we now verify the process is actually alive using process.kill(pid, 0). If the process is dead but the job status is still "running", we mark the job as "failed" via upsertJob() and clean up the zombie state.

Root Cause

resolveLatestTrackedTaskThread() was checking only the status field in the job record, not whether the actual process (stored in job.pid) was alive.

Testing

Start a Codex task
Kill the task process externally (e.g., kill -9 )
Verify the job status is updated to "failed"
Verify subsequent task calls are NOT blocked

Fixes: #202
Fixes: #216

Implements openai#145

…ing" Before this fix, if a Codex task process crashed or was killed externally, the job record would remain in status "running" permanently, blocking all subsequent task calls. Now we check if the PID is actually alive using process.kill(pid, 0). If the process is dead but status is still "running", we mark the job as "failed" via upsertJob() and clean up the zombie state. Fixes: openai#202

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4075eb3d8f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-14T01:23:17Z

plugins/codex/scripts/codex-companion.mjs

+        } else {
+          throw new Error(`Task ${activeTask.id} is still running. Use /codex:status before continuing it.`);
+        }
+        return null;


Continue thread lookup after cleaning zombie task

After the zombie cleanup path marks activeTask as failed, the function immediately returns null, which skips the normal fallback logic (trackedTask lookup and findLatestTaskThread). In a task --resume-last flow where a stale running job is cleaned up but an earlier resumable thread exists, this now raises "No previous Codex task thread was found" instead of resuming, so the recovery path still breaks for resumed tasks.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-14T01:23:17Z

plugins/codex/scripts/codex-companion.mjs

    model: request.model,
    effort: request.effort,
-    sandbox: request.write ? "workspace-write" : "read-only",
+    sandbox: request.fullAccess ? "danger-full-access" : (request.write ? "workspace-write" : "read-only"),


Wire --full-access through task request pipeline

The sandbox selection now checks request.fullAccess, but handleTask never reads options["full-access"] and never passes a fullAccess field into executeTaskRun/buildTaskRequest, so this condition is always false. As a result, the newly accepted --full-access flag is silently ignored and users cannot actually request danger-full-access mode.

Useful? React with 👍 / 👎.

Fixes openai#216. When a Codex task process crashes/is killed externally, the job record stays in state.json with status:"running". This change: - Adds isProcessAlive(pid) using process.kill(pid, 0) - Adds sweepZombieJobs(cwd) that marks dead PIDs as failed - All job status reads now clean up zombie entries automatically

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 49301547f9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-14T09:07:41Z

plugins/codex/scripts/codex-companion.mjs

+        process.kill(activeTask.pid, 0);
+      } catch (err) {
+        // Process is dead but job status wasn't updated — mark it as failed and clean up
+        if (err.code === "ESRCH" || err.code === "EPERM") {


Treat EPERM as active when probing task PID

The zombie cleanup branch currently treats EPERM from process.kill(pid, 0) as if the process were dead, but EPERM means the PID exists and is just not signalable by this user. In that case this path will incorrectly mark an active task as failed and allow another task/resume flow to start concurrently. This is especially risky in mixed-privilege environments (e.g., task started under a different uid).

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-14T09:07:41Z

plugins/codex/scripts/lib/tracked-jobs.mjs

+        job.status = "failed";
+        job.endedAt = now;
+        job.error = "Process died unexpectedly — marked failed by zombie sweep";


Persist standard failure fields in zombie sweep

When a zombie job is marked failed here, the code writes endedAt/error but does not set the standard failure fields used elsewhere (phase, pid, completedAt, errorMessage). As a result, downstream status/result rendering can show stale runtime metadata and omit the failure reason/timestamp for swept jobs. The zombie path should emit the same failure shape as other failure updates.

Useful? React with 👍 / 👎.

D2758695161 added 2 commits April 5, 2026 05:02

feat: add --full-access flag for danger-full-access sandbox

16b613c

Implements openai#145

D2758695161 requested a review from a team April 14, 2026 01:19

This was referenced Apr 14, 2026

Fix #202: zombie job blocks subsequent task calls when process crashes #216

Open

Zombie running job blocks all subsequent task calls when process crashes #202

Open

chatgpt-codex-connector bot reviewed Apr 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle zombie jobs when process crashes but status remains "running"#218

fix: handle zombie jobs when process crashes but status remains "running"#218
D2758695161 wants to merge 3 commits intoopenai:mainfrom
D2758695161:feat/zombie-job-fix

D2758695161 commented Apr 14, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

D2758695161 commented Apr 14, 2026

Summary

Fix

Root Cause

Testing

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant