Skip to content

🤖 fix: restore parallel sibling tool execution (revert #2906)#3576

Open
ethanndickson wants to merge 7 commits into
mainfrom
parallel-tools-k5nv
Open

🤖 fix: restore parallel sibling tool execution (revert #2906)#3576
ethanndickson wants to merge 7 commits into
mainfrom
parallel-tools-k5nv

Conversation

@ethanndickson

@ethanndickson ethanndickson commented Jun 16, 2026

Copy link
Copy Markdown
Member

Summary

Restore intended parallel execution of sibling tool calls within a single stream by reverting #2906, and close the concurrency gaps that revert re-exposes so each shared resource owns its own safety: file_edit_insert's create-on-missing path, background bash process-ID allocation, and config.json/providers.jsonc/secrets.json writes (now serialized by a single per-file lock shared across the config tool, the Config class, and ProviderService). When the model plans multiple tool calls in one assistant turn (most visibly parallel task sub-agent spawns), all but the first were forced to run sequentially. This change removes the per-stream mutex that caused the regression and makes each mutating path safe on its own.

Background

#2906 ("serialize sibling tool execution per stream") wrapped every tool's execute() in a single shared per-stream AsyncMutex, held across the entire execute body. Provider-level parallelToolCalls: true stayed on, so the LLM still planned parallel calls, but each wrapped handler blocked on the shared lock — serializing siblings. For task, a foreground call holds the lock across its long waitForAgentReport, so a second task cannot even reach taskService.create() until the first finishes.

That PR had no linked issue and no concrete repro; it was speculative hardening against races on shared mutable state. Its own implementation plan notes that the genuine race (bash background-process output) was already fixed at the runtime boundary by backgroundProcessManager's per-process outputLock. The right shape is parallel-by-default at the framework level, with each shared resource owning its own concurrency safety.

The audit that justified the revert surfaced the places where that ownership was incomplete — the blanket mutex had been masking each:

  • file_edit_insert's create-on-missing fast path did fileExistswriteFileString outside the per-file MutexMap lock that the normal edit path (executeFileEditOperation) holds. Two concurrent sibling inserts targeting the same not-yet-existing file could both observe "missing", both create, and the last write would silently drop the other. Insert semantics are accumulative, so both bodies should survive.
  • Background bash spawns allocate a process ID from the display_name, but BackgroundProcessManager.generateUniqueProcessId only scanned the registered process map, and registration happens only after spawnProcess + writeMeta awaits. Two concurrent spawns sharing a display name could both pass the collision check, pick the same ID and output directory, and the second processes.set would orphan the first — leaving bash_output/termination pointed at the wrong process.
  • config.json had two independent serializers that didn't coordinate: Config.editConfigQueue (instance-scoped, serializing editConfig against itself) and mux_config_write's own per-path lock (serializing tool calls against themselves). So a mux_config_write(config) could still interleave with Config.editConfig — which TaskService.create() uses to persist workspace entries — and clobber it. The shared resource is the on-disk file, so the lock belonged at the file path, owned by neither layer.

(The other unlocked writers, agent_skill_write and mux_agents_write, only read old content to render a diff; their bytes come entirely from tool args, so concurrent execution is indistinguishable from sequential last-writer-wins and needs no lock. file_read is read-only.)

Implementation

Revert #2906 in full:

  • Delete src/node/services/tools/withSequentialExecution.ts and its test.
  • Restore tools: finalTools in StreamManager.buildStreamRequestConfig (drop the wrapper import and call).
  • Remove the now-stale StreamManager - sequential tool execution test that asserted the wrapper was applied.

Close the create-on-missing race at the resource level:

  • Add withFileEditLock(runtime, resolvedPath, fn) in file_edit_operation.ts, exposing the same per-file MutexMap (keyed by Runtime) that executeFileEditOperation already uses. Intentionally non-re-entrant (documented).
  • Wrap file_edit_insert's existence-check + create-on-missing logic in it. If the file now exists, it releases and falls through to the normal guarded edit path; otherwise it creates under the lock.

Close the background process-ID race without re-serializing spawns:

  • Reserve the chosen ID synchronously in generateUniqueProcessId (scan both the registered processes map and a new reservedIds set, then reserve before returning), so two concurrent allocations can never select the same ID across the spawn/migration awaits.
  • Release the reservation once the process is registered or the spawn/migration fails (spawn via try/finally, registerMigratedProcess inline, bash migrate-failure via a new releaseReservedId). Spawns stay fully parallel — no global mutex.

Unify config-file writes under one per-file lock:

  • Add withConfigFileLock(absolutePath, fn) in src/node/utils/concurrency/configFileLocks.ts, a module-level MutexMap<string> keyed by the config file's absolute path. The on-disk file is the shared resource, so the lock lives at the path — not on the Config instance or in the tools layer.
  • Config.editConfig now acquires it (replacing the bespoke instance-scoped editConfigQueue, with identical FIFO semantics), and the config tool's withConfigDocumentLock delegates to it. Both layers already derive byte-identical absolute paths, so a mux_config_write and a Config.editConfig to the same file now serialize on one lock.

Extend the per-file lock to every providers.jsonc and secrets.json writer (Codex P2):

  • The lock only helps if the whole read-modify-write runs inside it, so making Config.saveProvidersConfig/saveSecretsConfig acquire the lock would be false safety — the read must be inside too. Instead add Config.withProvidersConfigLock(fn) and route all five ProviderService providers.jsonc mutations (addCustomOpenAICompatibleProvider, removeCustomProvider, setModels, setConfigValue, setConfig) through it. addCustom/setModels become async; their callers were updated.
  • Wrap Config.updateGlobalSecrets/updateProjectSecrets in withConfigFileLock(secretsFile) (the only secrets.json writers).
  • removeCustomProvider keeps its follow-up editConfig repair outside the providers lock. The remaining nesting — setConfig/setConfigValue calling syncGatewayLifecycleeditConfig while holding the providers lock — is one-directional (nothing acquires the providers lock while holding the config lock, since editConfig's transform is pure), so there is no deadlock cycle. The invariant is documented in configFileLocks.
  • The configFileLocks doc now lists the real acquirers per file and calls out the still-uncovered paths: direct Config.saveConfig callers outside editConfig (some project/workspace service flows) and CLI bootstrap writers, which are single-writer/startup paths rather than agent-parallel tool calls and are left as follow-up.

Each path has a regression test that fails when its guard is removed.

Recompute routePriority inside the config lock (Codex P2):

  • syncGatewayLifecycle read routePriority and computed the inserted/removed nextPriority from that snapshot outside editConfig's per-file lock, then wrote the precomputed value inside the lock. A concurrent routing edit (e.g. another mux_config_write(config)) that landed between the snapshot and the locked callback was silently clobbered.
  • The read + merge now happen inside the editConfig callback, based on the locked config value. A cheap outer snapshot is kept only as a fast-path to skip the write (and its change notification) in the steady state — frequent credential/token refreshes on an already-routed gateway — since editConfig always saves and notifies with no short-circuit.
  • A deterministic regression test injects a concurrent routePriority edit in the exact TOCTOU window (via a one-shot editConfig spy) and asserts both the concurrent route and the gateway insert survive; it fails when the merge reads the stale snapshot.

Risks

This returns to pre-#2906 scheduling, where sibling execute() handlers may overlap. The concurrency-sensitive resources are now individually guarded (see Background), so the realistic regression surface is narrow. The remaining interaction the deleted lock nominally covered — a shared-checkout task call (local runtime, or worktree/SSH isolation:"none") overlapping a mutating sibling during create()'s brief parent-checkout read — was unguarded before #2906, has no observed failure, and was only weakly covered by the mutex anyway (background sub-agents do their real work in separate streams outside the lock). If a concrete race is ever demonstrated there, the correct fix is a narrow lock around create()'s checkout-read critical section, not a blanket stream-level mutex.


Generated with mux • Model: anthropic:claude-opus-4-8 • Thinking: xhigh • Cost: $29.64

@ethanndickson

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 79919ddffb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/tools/withSequentialExecution.ts Outdated
@ethanndickson

Copy link
Copy Markdown
Member Author

Addressed in 5db9e90. The exemption is now per-call and runtime-aware instead of blanket.

A new pure helper taskCallSharesParentWorkspace(mode, isolation) decides, for each task call, whether it runs in the parent checkout (shared working tree) or forks into an isolated workspace:

  • local (or unknown runtime): forking is a no-op, so every task shares the parent checkout → serialized.
  • worktree/ssh: shares only when the model passes isolation: "none" → serialized; default/omitted/"fork" forks → exempt.
  • docker/devcontainer: always forks → exempt.

withSequentialExecution now acquires the lock for a task call only when taskCallSharesParentWorkspace is true; all other tools remain fully serialized. The runtime mode is threaded from the AIService send path into the stream-request builder and is undefined-safe (treated as shared → serialize) so any caller that doesn't supply it stays conservative.

Behavioral tests cover the matrix: worktree default/fork task calls overlap; worktree isolation:"none", local, and undefined-runtime task calls serialize; non-task tools always serialize.

@ethanndickson

Copy link
Copy Markdown
Member Author

@codex review

Please take another look. The blanket task exemption is now per-call and runtime-aware (only provably-forked task calls skip the serialization mutex; shared-checkout task calls and all other tools stay serialized).

@ethanndickson

Copy link
Copy Markdown
Member Author

@codex review

Pushed a formatting-only fix (Prettier) on top of the runtime-aware change. No behavioral difference.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c4ade72cc8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/tools/withSequentialExecution.ts Outdated
@ethanndickson

Copy link
Copy Markdown
Member Author

Addressed in 1505b69. You're right that a forked task still touches the parent checkout during create() (agent frontmatter + git branch --show-current in WorktreeManager.forkWorkspace), so it must not overlap a mutating sibling.

Replaced the blunt per-stream mutex with a per-stream read/write lock (AsyncRwLock), modeling the actual shared resource — the parent checkout:

  • Writers (exclusive): all non-task tools, plus shared-checkout task calls (local runtime, or worktree/ssh isolation:"none").
  • Readers (shared): forked task calls (worktree/ssh default/fork, docker/devcontainer).

So within a stream: forked task ‖ forked task overlap (both readers), but forked task ‖ bash / file-write are mutually exclusive (read‖write), in either start order — no more concurrent access to the parent checkout. The classifier (needsSerializationtaskCallSharesParentWorkspace) is unchanged; it now just selects read vs write mode.

The RW lock is FIFO-fair (a queued writer blocks later readers from jumping ahead, so writers can't starve). Tests cover reader overlap, both-direction read/write exclusion, writer serialization, queued-writer fairness, and double-dispose safety, plus the task/bash interleaving matrix in withSequentialExecution.test.ts.

One intentional conservatism (documented inline): a foreground forked task holds the read lock across its whole execute chain incl. waitForAgentReport, so it blocks later writers for its duration — same scope as the original mutex, strictly better for task‖task. Upgrade path is to narrow the lock to create()'s checkout-read window if it ever matters.

@ethanndickson

Copy link
Copy Markdown
Member Author

@codex review

Replaced the per-stream mutex with a read/write lock so forked task calls overlap each other (readers) but stay mutually exclusive with mutating tools (writers). Please take another look.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1505b695aa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/tools/withSequentialExecution.ts Outdated
#2906 wrapped every tool's execute() in a per-stream AsyncMutex to guard
against sibling tool-call races. It had no linked issue or repro, and its
own plan notes the real race (bash background output) was already fixed at
the runtime boundary via backgroundProcessManager's outputLock. The blanket
mutex serialized all foreground sibling tool calls — most visibly parallel
'task' calls, where the lock is held across waitForAgentReport — defeating
provider-level parallel tool planning.

The genuinely-shared resources are already guarded at the resource level
(bash outputLock, TaskService.create()'s own mutex, config FIFO queue;
file_read is read-only), so remove the redundant stream-level lock instead
of making it concurrency-aware.
@ethanndickson ethanndickson changed the title 🤖 fix: exempt task from stream tool mutex 🤖 fix: restore parallel sibling tool execution (revert #2906) Jun 17, 2026
@ethanndickson

Copy link
Copy Markdown
Member Author

Re: Codex P2 on withSequentialExecution.ts (thread PRRT_kwDOPxxmWM6KF5Iu, "Do not exempt arbitrary tools named task") — this is now moot. Rather than special-casing the task tool name (or any per-call/runtime-aware lock), this PR reverts #2906 entirely: withSequentialExecution.ts is deleted and StreamManager passes the tool map straight through. There is no longer any name-based check or stream-level lock that an MCP/extra tool named task could exploit. Resolving the thread.

@ethanndickson

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7c44e031bf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/streamManager.ts
Removing the per-stream mutex (prior commit) restores parallel sibling
tool execution but exposes a pre-existing race the mutex was masking:
file_edit_insert's create-on-missing fast path does fileExists -> write
OUTSIDE the per-file MutexMap that guards the normal edit pipeline. Two
sibling inserts to the same not-yet-existing path both observe 'missing',
both create, and the last write silently drops the other insert while both
report success.

Route the existence-check + create through the same per-file lock (new
withFileEditLock helper). The second insert then sees the file exists and
merges through the guarded edit path instead of clobbering the first.

Audited the other two writers that mutate outside the lock
(agent_skill_write, mux_agents_write): both are full-content overwrites
whose disk read is diff-only, so concurrent same-path writes resolve
identically to sequential last-writer-wins (no silent merge-loss). Left
them unchanged.

Regression test asserts two concurrent creates of the same missing file
yield exactly one success with the winning content intact; verified it
fails (2 successes) when the lock is bypassed.
@ethanndickson

Copy link
Copy Markdown
Member Author

@codex review

Addressed the create-on-missing race (P2) in 546db28: file_edit_insert now runs its existence-check + create branch under the per-file lock (withFileEditLock) instead of writing outside it, with a regression test that fails when the lock is removed. Also updated the PR body to cover both the revert and this fix.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 546db280a3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/streamManager.ts
…lock

Concurrent sibling mux_config_write calls to the same config file could each
read the same baseline, apply disjoint mutations, and have the later atomic
write silently drop the earlier one (a lost update). The reverted per-stream
mutex used to mask this; now that parallel sibling tools are restored,
mux_config_write owns its own concurrency safety.

Add withConfigDocumentLock (a per-config-document MutexMap in configReadWrite.ts,
keyed by resolved path) and wrap the read -> applyMutations -> write cycle in it,
mirroring withFileEditLock for file edits. Regression test fires two concurrent
disjoint mutations and asserts both survive.
@ethanndickson

Copy link
Copy Markdown
Member Author

@codex review

Addressed the mux_config_write read-modify-write race (P2) in a5f6c39: the read → applyMutations → write cycle now runs under a per-config-document lock (withConfigDocumentLock), mirroring the file_edit fix, with a regression test that fails when the lock is removed. PR body updated to cover all three changes.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a5f6c39811

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/streamManager.ts
Concurrent sibling bash calls (now that parallel tool execution is restored)
could both enter BackgroundProcessManager.spawn, both pass the collision check
in generateUniqueProcessId (which only scanned the processes map), and both
pick the same processId/output directory because registration happens only
after spawnProcess + writeMeta await. The second processes.set then overwrote
the first, orphaning a process and pointing bash_output/termination at the
wrong target.

Reserve the chosen ID synchronously in generateUniqueProcessId (scan both the
registered processes and a new reservedIds set, then add to it before
returning), so two concurrent allocations can never select the same ID across
the awaits. Release the reservation once the process is registered or the
spawn/migration fails (spawn via try/finally, registerMigratedProcess inline,
and the bash migrate-failure path via releaseReservedId). Keeps spawns fully
parallel rather than re-serializing them with a global mutex.

Unit test asserts concurrent allocations stay unique; it fails when the
reservation check is removed.
@ethanndickson

Copy link
Copy Markdown
Member Author

@codex review

Addressed the background process-ID allocation race (P2) in 591c134: generateUniqueProcessId now reserves the chosen ID synchronously (scanning a reservedIds set in addition to the registered map) so concurrent spawns can't collide across the spawn/migration awaits, with the reservation released on registration/failure. Keeps spawns parallel — no global mutex. PR body updated to cover all four changes.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 591c134400

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/tools/mux_config_write.ts
mux_config_write's lock only coordinated other mux_config_write calls, so a
parallel mux_config_write(config) could still interleave with Config.editConfig
(used by TaskService.create() to persist workspace entries to config.json) and
clobber it. config.json had two independent serializers — Config.editConfigQueue
(instance-scoped) and the tool's per-path MutexMap — that didn't coordinate.

The on-disk file is the shared resource, so the lock belongs at the file path,
not the Config instance or the tools layer. Add withConfigFileLock (a module-level
MutexMap<string> keyed by absolute config-file path) and have both writers acquire
it: Config.editConfig now uses it (replacing the bespoke editConfigQueue, with
identical FIFO semantics) and withConfigDocumentLock delegates to it. Both layers
already derive byte-identical absolute paths, so they serialize on one lock.

Test asserts the tool side and a withConfigFileLock holder serialize on the same
file (deterministic ordering); it fails if they use separate locks. Existing
editConfig serialization tests still pass, confirming the queue replacement.
@ethanndickson

Copy link
Copy Markdown
Member Author

@codex review

Addressed the cross-layer config race (P2) in edd89a3: config.json/providers.jsonc writes now serialize on a single per-file lock (withConfigFileLock, keyed by absolute path) shared by both Config.editConfig (replacing the instance-scoped editConfigQueue) and the config tool's withConfigDocumentLock. So a mux_config_write can no longer interleave with Config.editConfig on the same file. Existing editConfig serialization tests still pass; new test fails if the two layers use separate locks. PR body updated.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: edd89a3aa4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/utils/concurrency/configFileLocks.ts Outdated
…ared per-file lock

Codex P2: withConfigFileLock's guarantee over-claimed coverage for providers.jsonc
and secrets.json. Config.saveProvidersConfig (via ProviderService) and the secrets
update methods wrote those files without acquiring the lock, so an agent
mux_config_write(providers) racing a UI provider edit (or two concurrent secret
writes) could read the same baseline and clobber each other.

Make the guarantee true at the read-modify-write callsite level:
- Add Config.withProvidersConfigLock; wrap all five ProviderService providers.jsonc
  mutations (addCustomOpenAICompatibleProvider, removeCustomProvider, setModels,
  setConfigValue, setConfig) so their load->mutate->save runs under the shared lock.
  addCustom/setModels become async; callers updated.
- Wrap Config.updateGlobalSecrets / updateProjectSecrets in withConfigFileLock(secretsFile).
- removeCustomProvider keeps its editConfig repair OUTSIDE the providers lock; the
  remaining nesting (syncGatewayLifecycle/editConfig under the providers lock) is
  one-directional and deadlock-free, documented in configFileLocks.
- Rewrite the configFileLocks doc to list real acquirers per file, the lock-ordering
  invariant, and the still-uncovered direct saveConfig/CLI bootstrap paths.

Regression tests (with neuter proof) for both the cross-layer providers lock and
concurrent secret writes.
@ethanndickson

Copy link
Copy Markdown
Member Author

@codex review

Addressed the providers/secrets locking P2: ProviderService providers.jsonc mutations and Config.updateGlobalSecrets/updateProjectSecrets now run their read-modify-write under the shared withConfigFileLock, with regression tests and a documented one-directional lock-ordering invariant.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 61ed07a05e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/providerService.ts
…ifecycle

syncGatewayLifecycle read routePriority and computed the inserted/removed
nextPriority from that snapshot OUTSIDE editConfig's per-file lock, then wrote
the precomputed value inside the lock. A concurrent routing edit (e.g. another
mux_config_write(config)) that landed between the snapshot and the locked
callback was silently clobbered (Codex P2).

Move the routePriority read + merge into the editConfig callback so it is based
on the locked config value. A cheap outer snapshot is kept only as a fast-path
to skip the write (and its change notification) in the steady state — frequent
credential/token refreshes on an already-routed gateway — while the authoritative
merge happens under the lock.

Adds a deterministic regression test that injects a concurrent routePriority
edit in the exact TOCTOU window and asserts both the concurrent route and the
gateway insert survive.
@ethanndickson

Copy link
Copy Markdown
Member Author

@codex review

Addressed the routePriority TOCTOU (P2) in 037f3ae: syncGatewayLifecycle now performs the routePriority read + insert/remove merge inside editConfig's locked callback (based on the locked config), so a concurrent mux_config_write(config) / routing edit can no longer be clobbered by a nextPriority computed from a stale snapshot. A cheap outer snapshot is kept only as a fast-path to skip the redundant write/notification in the steady state. Added a deterministic regression test that injects a concurrent edit in the exact TOCTOU window and asserts both routes survive.

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Another round soon, please!

Reviewed commit: 037f3ae320

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ethanndickson

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 037f3ae320

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +57 to +60
return await withConfigDocumentLock<MuxConfigWriteToolResult>(
muxHome,
args.file,
async () => {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor aborts before queued config writes mutate

When two mux_config_write calls target the same config file, this new lock queues the later call inside execute, but the tool explicitly ignores the provided abort signal. If the user interrupts the stream while that later write is waiting for the lock, it will still enter this callback and apply its mutation after cancellation; before removing withSequentialExecution, an aborted queued sibling was rejected before any side effect. Please re-check or pass through the abort signal before running the locked read/modify/write so interrupted turns cannot apply stale config changes.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant