Summary
When the merge steward creates a fix task (test failure or merge conflict), createFixTask inherits assignedAgent from the parent task's orchestrator metadata as the new fix task's assignee. This is correct in the common case where assignedAgent is the worker who completed the parent task. It is broken when the steward dispatch paths have overwritten assignedAgent with the steward's own ID — the fix task lands on the steward, and stewards have no mechanism to act on OPEN-status fix tasks, so it sits permanently unactioned.
Root cause chain
- Steward dispatch overwrites
assignedAgent. Two paths in packages/smithy/src/services/dispatch-daemon.ts write the steward's ID into the parent task's orchestrator metadata:
- Line 3359 (merge steward spawn,
mergeStatus: 'testing')
- Line 3579 (recovery steward spawn)
createFixTask reads back the polluted value. packages/smithy/src/services/merge-steward-service.ts:931 does:
assignee: orchestratorMeta?.assignedAgent,
Plus copies it into the new fix task's own metadata at line 938.
- Stewards do not pick up OPEN tasks. Stewards act on REVIEW-status tasks with
mergeStatus: pending|testing. An OPEN-status fix task assigned to a steward has no consumer.
- Result: the fix task is created, notified to the steward (lines 952-979), and never progresses. The director does not auto-dispatch a worker because the task already has an assignee.
Reproduction
- Run the daemon with
orphanRecoveryEnabled: true (or any path that triggers a merge steward or recovery steward dispatch on a task).
- Confirm the parent task's
metadata.orchestrator.assignedAgent is now the steward's ID (e.g. via sf task show <id> --format json or by inspecting the SQLite cache).
- Force a fix-task creation: simulate a test failure or merge conflict in the steward's worktree, or call the daemon path that invokes
MergeStewardService.createFixTask with that parent task's metadata.
- Inspect the new fix task:
assignee and metadata.orchestrator.assignedAgent both equal the steward's entity ID.
- The fix task remains in OPEN status indefinitely. The dispatch daemon does not auto-route it to a worker because it already has an assignee.
Two fix options
Option A — Minimal patch in createFixTask
Filter out steward agents when inheriting:
// merge-steward-service.ts:931
const inheritedAssignee = orchestratorMeta?.assignedAgent;
const inheritedAgent = inheritedAssignee
? await this.agentRegistry.getAgent(inheritedAssignee)
: undefined;
const assignee =
inheritedAgent && inheritedAgent.agentRole !== 'steward'
? inheritedAssignee
: undefined;
When assignee is undefined, the dispatch daemon will pick up the OPEN fix task and route it to a worker via the normal dispatch flow.
Pros: small change, low risk, immediately unblocks. Cons: patches the symptom; the underlying invariant violation remains.
Option B — Stop polluting assignedAgent with steward IDs
assignedAgent should always mean "the agent responsible for forward progress" — which by the system's own contract is always a worker, never a steward. Steward dispatch should record steward identity in a separate metadata field (e.g. recoveringStewardId, mergeStewardId, or in the existing sessionHistory), leaving assignedAgent as the canonical worker reference.
This means:
dispatch-daemon.ts:3359 and :3579 stop writing assignedAgent: stewardId
- A new dedicated field captures steward-of-record for the active session
- All readers of
assignedAgent now reliably mean "the worker"
Pros: removes a class of bugs; aligns the data model with the documented role contract. Cons: requires audit of all assignedAgent readers; bigger surface area.
Recommendation
Ship Option A as the v1 fix to unblock users immediately, file/track Option B as a follow-up architectural cleanup. Option A is a 5-10 line change and fully addresses the user-visible symptom.
Why this matters
The merge steward's fix-task path is the system's only automatic recovery from test failures and merge conflicts during the review/merge workflow. When this path mis-assigns, work silently stalls. There is no log warning, no UI banner, no escalation. The operator only notices when they manually inspect why a task has not moved.
Related
Environment
- stoneforge: master @ 0a7052a
- Reproduced in a real orchestration session
Summary
When the merge steward creates a fix task (test failure or merge conflict),
createFixTaskinheritsassignedAgentfrom the parent task's orchestrator metadata as the new fix task's assignee. This is correct in the common case whereassignedAgentis the worker who completed the parent task. It is broken when the steward dispatch paths have overwrittenassignedAgentwith the steward's own ID — the fix task lands on the steward, and stewards have no mechanism to act on OPEN-status fix tasks, so it sits permanently unactioned.Root cause chain
assignedAgent. Two paths inpackages/smithy/src/services/dispatch-daemon.tswrite the steward's ID into the parent task's orchestrator metadata:mergeStatus: 'testing')createFixTaskreads back the polluted value.packages/smithy/src/services/merge-steward-service.ts:931does:mergeStatus: pending|testing. An OPEN-status fix task assigned to a steward has no consumer.Reproduction
orphanRecoveryEnabled: true(or any path that triggers a merge steward or recovery steward dispatch on a task).metadata.orchestrator.assignedAgentis now the steward's ID (e.g. viasf task show <id> --format jsonor by inspecting the SQLite cache).MergeStewardService.createFixTaskwith that parent task's metadata.assigneeandmetadata.orchestrator.assignedAgentboth equal the steward's entity ID.Two fix options
Option A — Minimal patch in createFixTask
Filter out steward agents when inheriting:
When
assigneeis undefined, the dispatch daemon will pick up the OPEN fix task and route it to a worker via the normal dispatch flow.Pros: small change, low risk, immediately unblocks. Cons: patches the symptom; the underlying invariant violation remains.
Option B — Stop polluting
assignedAgentwith steward IDsassignedAgentshould always mean "the agent responsible for forward progress" — which by the system's own contract is always a worker, never a steward. Steward dispatch should record steward identity in a separate metadata field (e.g.recoveringStewardId,mergeStewardId, or in the existingsessionHistory), leavingassignedAgentas the canonical worker reference.This means:
dispatch-daemon.ts:3359and:3579stop writingassignedAgent: stewardIdassignedAgentnow reliably mean "the worker"Pros: removes a class of bugs; aligns the data model with the documented role contract. Cons: requires audit of all
assignedAgentreaders; bigger surface area.Recommendation
Ship Option A as the v1 fix to unblock users immediately, file/track Option B as a follow-up architectural cleanup. Option A is a 5-10 line change and fully addresses the user-visible symptom.
Why this matters
The merge steward's fix-task path is the system's only automatic recovery from test failures and merge conflicts during the review/merge workflow. When this path mis-assigns, work silently stalls. There is no log warning, no UI banner, no escalation. The operator only notices when they manually inspect why a task has not moved.
Related
merge-steward-service.ts), different defect; also points at steward-routing fragilityEnvironment