🤖 feat: add first-class nested workflows#3565
Conversation
|
@codex review Please review the nested workflow implementation and the deep-review follow-up fixes. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9bbeb36be0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
|
@codex review Please review the latest push as well. I rebased onto the latest main, fixed the sealed-history compaction epoch regression that was failing Unit CI, and reran targeted tests plus static-check locally. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 581265eeab
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Implement first-class nested workflow starts via the built-in action.workflows.start primitive, including deterministic child run IDs, parent/child metadata, run-store idempotency, nested workflow UI rows, tool discovery filtering, docs, and regression coverage.
---
_Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `xhigh` • Cost: `1754016{MUX_COSTS_USD:-unknown}`_
<!-- mux-attribution: model=openai:gpt-5.5 thinking=xhigh costs=84.62 -->
Address deep-review findings for nested workflow runs: re-check current project trust, finalize failed child steps/events, avoid interrupt/create orphans, tolerate child lease contention, recover partial child run directories, expose runner-coordinated workflows.start for runtime-backed workspaces, repair missing terminal events, and document nested discovery semantics.
Validation:
- bun test src/node/services/workflows/WorkflowService.test.ts src/node/services/workflows/WorkflowRunStore.test.ts src/node/services/workflows/WorkflowActionRegistry.test.ts src/browser/features/Tools/WorkflowRunToolCall.test.tsx src/common/utils/workflowRunMessages.test.ts --timeout 20000
- make typecheck
- MUX_ESLINT_CONCURRENCY=2 make static-check
---
_Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `xhigh` • Cost: `2628699{MUX_COSTS_USD:-unknown}`_
<!-- mux-attribution: model=openai:gpt-5.5 thinking=xhigh costs=84.62 -->
Preserve explicit null args for nested workflow starts and keep completed child workflow reports out of post-compaction top-level report indexes.
Validation:
- bun test src/node/services/workflows/WorkflowService.test.ts src/node/services/attachmentService.completedReports.test.ts --timeout 20000
- MUX_ESLINT_CONCURRENCY=2 make static-check
---
_Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `xhigh` • Cost: `2874038{MUX_COSTS_USD:-unknown}`_
<!-- mux-attribution: model=openai:gpt-5.5 thinking=xhigh costs=84.62 -->
Address Codex follow-up feedback by snapshotting child runs with the parent run default action cwd and listing runner-coordinated workflows.start actions in runtime-backed workspaces.
Validation:
- bun test src/node/services/workflows/WorkflowService.test.ts src/node/services/workflows/WorkflowActionRegistry.test.ts --timeout 20000
- MUX_ESLINT_CONCURRENCY=2 make static-check
---
_Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `xhigh` • Cost: `3783076{MUX_COSTS_USD:-unknown}`_
<!-- mux-attribution: model=openai:gpt-5.5 thinking=xhigh costs=84.62 -->
|
@codex review Please take another look after the latest rebase onto main. The previous follow-up fixes are still present, and local targeted validation plus static-check pass on the rebased branch. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 195272ec8a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Register nested child workflow runners with the active abort-controller map once their lease is acquired so direct interrupts by child run ID abort the child runtime in addition to marking status.
Validation:
- bun test src/node/services/workflows/WorkflowService.test.ts --timeout 20000
- MUX_ESLINT_CONCURRENCY=2 make static-check
---
_Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `xhigh` • Cost: `3989322{MUX_COSTS_USD:-unknown}`_
<!-- mux-attribution: model=openai:gpt-5.5 thinking=xhigh costs=84.62 -->
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cc0a91ade4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Include already-interrupted nested children in parent interrupt cascades so workflow-owned child task adapters are still interrupted after the child run status was marked interrupted first.
Validation:
- bun test src/node/services/workflows/WorkflowService.test.ts --timeout 20000
- MUX_ESLINT_CONCURRENCY=2 make static-check
---
_Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `xhigh` • Cost: `4091238{MUX_COSTS_USD:-unknown}`_
<!-- mux-attribution: model=openai:gpt-5.5 thinking=xhigh costs=84.62 -->
|
Codex Review: Didn't find any major issues. Bravo. Reviewed commit: ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
#3565 reintroduced a hardcoded startsWith("wfr_") in WorkflowRunner's nested-step drift detection. taskId.ts already exports the canonical WORKFLOW_RUN_TASK_ID_PREFIX and isWorkflowRunTaskId() predicate as the single source of truth for that prefix. Use the helper instead of the duplicated literal; behavior is identical (both are falsy for a missing taskId inside the find() predicate).
#3565 reintroduced a hardcoded startsWith("wfr_") in WorkflowRunner's nested-step drift detection. taskId.ts already exports the canonical WORKFLOW_RUN_TASK_ID_PREFIX and isWorkflowRunTaskId() predicate as the single source of truth for that prefix. Use the helper instead of the duplicated literal; behavior is identical (both are falsy for a missing taskId inside the find() predicate).
#3565 reintroduced a hardcoded startsWith("wfr_") in WorkflowRunner's nested-step drift detection. taskId.ts already exports the canonical WORKFLOW_RUN_TASK_ID_PREFIX and isWorkflowRunTaskId() predicate as the single source of truth for that prefix. Use the helper instead of the duplicated literal; behavior is identical (both are falsy for a missing taskId inside the find() predicate).
#3565 reintroduced a hardcoded startsWith("wfr_") in WorkflowRunner's nested-step drift detection. taskId.ts already exports the canonical WORKFLOW_RUN_TASK_ID_PREFIX and isWorkflowRunTaskId() predicate as the single source of truth for that prefix. Use the helper instead of the duplicated literal; behavior is identical (both are falsy for a missing taskId inside the find() predicate).
#3565 reintroduced a hardcoded startsWith("wfr_") in WorkflowRunner's nested-step drift detection. taskId.ts already exports the canonical WORKFLOW_RUN_TASK_ID_PREFIX and isWorkflowRunTaskId() predicate as the single source of truth for that prefix. Use the helper instead of the duplicated literal; behavior is identical (both are falsy for a missing taskId inside the find() predicate).
#3565 reintroduced a hardcoded startsWith("wfr_") in WorkflowRunner's nested-step drift detection. taskId.ts already exports the canonical WORKFLOW_RUN_TASK_ID_PREFIX and isWorkflowRunTaskId() predicate as the single source of truth for that prefix. Use the helper instead of the duplicated literal; behavior is identical (both are falsy for a missing taskId inside the find() predicate).
#3565 reintroduced a hardcoded startsWith("wfr_") in WorkflowRunner's nested-step drift detection. taskId.ts already exports the canonical WORKFLOW_RUN_TASK_ID_PREFIX and isWorkflowRunTaskId() predicate as the single source of truth for that prefix. Use the helper instead of the duplicated literal; behavior is identical (both are falsy for a missing taskId inside the find() predicate).
Summary
Adds first-class nested workflow execution through the built-in
action.workflows.startprimitive. Parent workflow steps now deterministically map to child workflow runs, replay/resume attaches to the same child run, child failure/interruption/backgrounding is reflected on the parent, and nested runs stay out of top-level task discovery while remaining explicitly awaitable by run ID.Background
Workflow authors needed durable conductor recursion without flattening child workflow internals into the parent journal. This implements nested workflows as normal child workflow runs linked by parent metadata, preserving each child run's own step namespace and event journal.
Implementation
workflows.startto runner-coordinated logic while preserving project/global overrides and allowing the runner-safe built-in in runtime-backed workspaces.WorkflowService, including current trust re-checks, child lease contention handling, background continuation, child failure finalization, cascade interruption, and replay repair for missing terminal workflow events.deep-review-workflowand addressed its verified findings before PR creation.Validation
bun test src/node/services/workflows/WorkflowService.test.ts src/node/services/workflows/WorkflowRunStore.test.ts src/node/services/workflows/WorkflowActionRegistry.test.ts src/browser/features/Tools/WorkflowRunToolCall.test.tsx src/common/utils/workflowRunMessages.test.ts --timeout 20000make typecheckMUX_ESLINT_CONCURRENCY=2 make static-checkdogfood-output/nested-workflows/locally, including desktop/mobile screenshots and workflow-card videos.Dogfooding evidence
Desktop nested workflow card:
Mobile-width nested workflow card:
Risks
This touches the durable workflow runner, run-store persistence, task discovery, and workflow cards, so regression risk is medium-high. The main mitigations are deterministic child IDs, idempotent creation, trust gates, cascade interruption coverage, replay repair, targeted workflow tests, static checks, and manual dogfooding.
Pains
Deep review surfaced several lifecycle edge cases after the initial implementation: stale child leases, trust revocation, child failure finalization, interrupt/create races, and partial child-run directories. Those are addressed in the final patch set.
📋 Implementation Plan
Implementation Plan: First-Class Nested Workflows via
action.workflows.startGoal
Add first-class nested workflow execution so a workflow can start another workflow with transparent recursion semantics:
The parent call should behave like any other durable workflow step: replay/resume must reuse the same child run, avoid duplicates, preserve child workflow step IDs locally, and allow nested/nested-nested workflows without child workflow definitions knowing they are nested.
Recommended approach and LoC estimate
Recommended approach: first-class runner primitive surfaced through the existing
action.*syntax.action.workflows.start, but the built-inworkflows.startaction should route to runner-owned logic when the resolved action is the built-in one.Alternatives considered
workspaceHostActions.tshost action — lower initial LoC (~600–900) but leaks workflow-runner concerns into host action context and cannot naturally throw/handle foreground backgrounding. Not recommended for transparent recursion.workflow_run— already possible indirectly if the child agent has tools, but it is not deterministic conductor recursion and should not be treated as the feature.Evidence from repo investigation
WorkflowRunnerregisters workflow globals insrc/node/services/workflows/WorkflowRunner.ts(__workflowAgent,__workflowAction,__workflowApplyPatch,__workflowParallelAgents) and compilesaction.*calls through__muxCreateWorkflowActionProxy.WorkflowService(src/node/services/workflows/WorkflowService.ts) owns named run creation, foreground/background run choreography, leases, abort-to-interrupt wiring, and crash recovery resume behavior.WorkflowRunStore(src/node/services/workflows/WorkflowRunStore.ts) persists run records plus append-onlyevents.jsonl/steps.jsonl;WorkflowStepRecord.taskIdalready stores a durable handle for child work.workflowReplayKey.tshashes stable step IDs with canonical JSON input; nested workflow starts should use the same replay identity discipline.WorkflowActionRegistry, described bybuiltInWorkflowActions.ts, and executed byWorkflowActionRunner; project/global actions take precedence over built-ins.src/browser/features/Tools/WorkflowRunToolCall.tsx, with chat-card projection helpers insrc/common/utils/workflowRunMessages.tsand discovery insrc/browser/features/ChatInput/index.tsx.workflow_run/workflow_resumetools andtask_awaitalready understand workflow run IDs; omitted waits intentionally exclude workflow-owned child agents, so nested child workflow runs should similarly avoid cluttering top-level waits while remaining explicitly addressable by ID.Non-goals for V1
parallelWorkflowscan be designed later if needed.Phase 1 — Add durable parent/child run linkage
Files/symbols
src/common/orpc/schemas/workflow.tssrc/common/types/workflow.tssrc/node/services/workflows/WorkflowRunStore.tssrc/node/services/workflows/WorkflowRunStore.test.tsWorkflowService.ts/workflowReplayKey.tsfor child run IDs.Changes
Extend
WorkflowRunRecordSchemawith optional parent linkage:This keeps existing run records compatible and gives UI/tool code a reliable way to distinguish top-level runs from child runs.
Add a deterministic child run ID helper:
Use a hash, not raw step IDs, to satisfy
WorkflowRunIdSchemalength/character constraints and avoid leaking arbitrary workflow-authored strings.Add
CreateWorkflowRunInput.parentWorkflow?and persist it inWorkflowRunStore.createRun.Add an idempotent creation path such as
WorkflowRunStore.createRunIfAbsent(input)orWorkflowService.ensureChildWorkflowRun(...)that:createRun()write path;parentWorkflowlinkage match;Add defensive assertions:
WorkflowRunIdSchema;MAX_NESTED_WORKFLOW_DEPTH = 8.Quality gate after Phase 1
createRunIfAbsentdoes not overwrite an existing child run and rejects identity drift.bun test src/common/orpc/schemas/workflow.test.ts src/node/services/workflows/WorkflowRunStore.test.tsPhase 2 — Implement the nested workflow runtime primitive
Files/symbols
src/node/services/workflows/WorkflowRunner.tssrc/node/services/workflows/WorkflowService.tssrc/node/services/workflows/builtInWorkflowActions.tssrc/node/services/workflows/WorkflowRunner.test.tssrc/node/services/workflows/WorkflowService.test.tssrc/node/services/workflows/builtInWorkflowDefinitions.test.tsor action registry tests as needed.API shape
Support the normal action-call forms while requiring stable parent step IDs:
V1 should reject
wait: false/runInBackground: trueif provided, with a clear message that fire-and-forget nested workflows are deferred. This preserves deterministic parent result composition.Runtime design
Add a built-in action descriptor for
workflows.startsoworkflow_action_list/ registry discovery can expose metadata. Keep project/global action precedence intact:WorkflowActionRegistry.resolveAction("workflows.start")returns project/global scope, execute it as a normal action;workflows.start, route to the nested workflow primitive instead ofWorkflowActionRunner.execute.Add parser/validator for the nested start spec:
Normalize absent
argsto{}for replay identity. Assert JSON-only/canonical input using the existing replay-key canonicalization rules.In
WorkflowRunner.runActionStep, after resolving the action and before generic action execution, route the built-inworkflows.starttorunNestedWorkflowStep(...).runNestedWorkflowStep(...)should mirroragent/applyPatchreplay discipline:compute
inputHash = hashWorkflowStepInput(spec.id, { primitive: "workflows.start", name, args });if a completed step with same hash exists, emit/repair a terminal child workflow row if needed and return cached child result;
if a same
stepIdhas any prior child workflow attempt with a different input hash, fail safely instead of creating another child run under the same durable boundary;derive the deterministic child
runIdbefore any child mutation;record parent step
startedwithtaskId: childRunIdbefore child creation, so replay can safely create the child if a crash happens before creation or attach to it if a crash happens after creation;append a new parent
WorkflowRunEventvariant for child workflow lifecycle, rather than overloading task events; these lifecycle events should be append-if-missing / replay-repairable so replay does not duplicate child status rows:wait for child terminal result;
on child completed, record parent step completed with
taskId: childRunIdandresultequal to the child workflow result plus structured metadata{ runId, status, name };on child failed, record parent step failed with child error and fail the parent workflow;
on parent abort/interruption, stop waiting and allow cascade interruption to handle the child.
Add a required
WorkflowChildRunAdapter-style interface toWorkflowRunneroptions instead of importing or constructingWorkflowServiceinsideWorkflowRunner:WorkflowService.createRunnerpasses an implementation bound to the same workspace/run store/definition store.Add
WorkflowServicechild-run helpers:ensureChildWorkflowRun(...): read the child definition, create child run with deterministic ID andparentWorkflowmetadata if absent.runChildWorkflowToTerminal(...): run/resume/wait the child according to current status.pending: run it.running/backgrounded: wait/poll the existing run; resume crash-orphaned runs only through existing lease-safe mechanisms.interrupted: when the parent is explicitly resumed, resume the child.completed: return final result.failed: return failed status; parent step fails by default.Preserve backgrounding semantics:
backgrounded.backgroundOnMessageQueued: falseuntil the child terminal status.backgroundedstatus as success.Add cascade interruption:
WorkflowService.interruptRunto discover started child workflow runs from the parent run’s steps (taskIdthat parses asWorkflowRunIdSchemaand/orrun.parentWorkflow.runId === parent.id) and recursively interrupt non-terminal children.WorkflowTaskServiceAdapter.interruptRun.Quality gate after Phase 2
WorkflowRunner.test.tscases:WorkflowService.test.tscases:bun test src/node/services/workflows/WorkflowRunner.test.ts src/node/services/workflows/WorkflowService.test.tsPhase 3 — Tool/API/UI surfacing
Files/symbols
src/common/orpc/schemas/workflow.tssrc/node/services/tools/task_list.tssrc/node/services/tools/task_await.tssrc/browser/features/Tools/WorkflowRunToolCall.tsxsrc/browser/features/Tools/WorkflowRunToolCall.test.tsxsrc/browser/features/Tools/WorkflowRunToolCall.stories.tsxsrc/common/utils/workflowRunMessages.tssrc/common/utils/workflowRunMessages.test.tssrc/browser/features/ChatInput/index.tsxforeground workflow discovery path.Changes
Add a shared helper such as
isNestedWorkflowRun(run)/isWorkflowRunId(value)to avoid duplicatingwfr_checks.Exclude child workflow runs from top-level discovery by default:
task_listshould list top-level workflow runs; nested child runs remain explicitly addressable by ID.task_awaitwith omitted IDs should not wait on child runs separately because the parent owns them; explicittask_await({ task_ids: [childRunId] })should still work.getWorkflowRunCardProjection/ChatInputforeground discovery should not emit separate top-level cards for child runs.Parent workflow run card display:
WorkflowRunToolCall.tsx, classify nested workflow rows from the newworkflowevent.taskevents;WorkflowStepRecord.taskId = childRunIdis only the durable step link.Add tests for:
parentWorkflowruns;Quality gate after Phase 3
bun test src/browser/features/Tools/WorkflowRunToolCall.test.tsx src/common/utils/workflowRunMessages.test.tsPhase 4 — Docs and workflow authoring guidance
Files/symbols
src/node/builtinSkills/workflow-authoring.mdChanges
action.workflows.startunder workflow authoring as a special built-in workflow action/primitive.Quality gate after Phase 4
make typecheckmake lintPhase 5 — End-to-end validation and dogfooding
Automated validation
Run, in order, fixing failures before claiming completion:
bun test src/common/orpc/schemas/workflow.test.ts \ src/node/services/workflows/WorkflowRunStore.test.ts \ src/node/services/workflows/WorkflowRunner.test.ts \ src/node/services/workflows/WorkflowService.test.ts \ src/browser/features/Tools/WorkflowRunToolCall.test.tsx \ src/common/utils/workflowRunMessages.test.ts make typecheck make lintIf the touched surface is broad, finish with:
Manual dogfooding setup
Create scratch workflows under
.mux/workflows/.scratch/in the dogfood workspace only (do not commit scratch workflows unless intentionally turning them into fixtures):child-simple.jsparent-simple.jsRun via the
workflow_runtool or slash workflow command:Verify:
parentWorkflowmetadata;Resilience dogfooding
agent({ id: "slow-child", ... })so the child can background.UI dogfooding evidence
Use the repo
dogfood, globalagent-browser, and projectdev-server-sandboxskill guidance:Start an isolated Mux dev server so dogfooding does not conflict with the developer’s normal root:
Use
DEV_SERVER_SANDBOX_ARGS="--clean-projects"if the implementation needs a clean project list, andKEEP_SANDBOX=1only when preserving the tempMUX_ROOTis useful for debugging.Before browser automation, load the installed CLI’s current browser workflow docs and use the direct binary, never
npx:Initialize a dogfood evidence directory and report:
Use the dogfood report template from the skill and append findings immediately as they are found. Every interactive issue needs a repro video plus step-by-step screenshots; static visible issues need at least one annotated screenshot.
Open the sandboxed Mux UI with a named browser session, then orient with an annotated screenshot and interactive snapshot:
Execute the nested workflow smoke and resilience scenarios from the UI. Capture:
annotated screenshot of the parent workflow card with child row running;
annotated screenshot of the parent workflow card with child row completed;
screenshot at ~375px mobile width;
agent-browserconsole/errors output after the workflow card updates;paced repro video for interrupt → resume → complete:
agent-browser --session nested-workflows record start dogfood-output/nested-workflows/videos/interrupt-resume-complete.webm # perform the interrupt/resume flow at human pace, with screenshots between steps agent-browser --session nested-workflows record stopClose the session only after evidence is captured:
Attach screenshots/video to the final implementation report or PR so reviewers can verify the path taken.
Acceptance criteria
action.workflows.startwith a stableidand child workflow{ name, args }.parallelAgents,applyPatch, workflow scheduling, slash workflows, and CLI workflow starts continue to pass existing tests.Main risks and mitigations
backgroundOnMessageQueued: falseMAX_NESTED_WORKFLOW_DEPTHparentWorkflowruns from default discovery while preserving explicit lookup by IDworkflows.startconflictworkflows.start, preserving override semanticsAdvisor review status
Approved after two advisor review passes. Final advisor notes incorporated: concrete
workflowevent model, atomic no-overwrite child creation, snapshotted child definition wins on replay, requiredWorkflowChildRunAdapterseam, strict same-workspace/wait-to-terminal V1, and no separate wake-up service.Generated with
mux• Model:openai:gpt-5.5• Thinking:xhigh• Cost:2665599{MUX_COSTS_USD:-unknown}