Skip to content

Callback for workflow update support#9614

Open
Quinn-With-Two-Ns wants to merge 8 commits into
temporalio:mainfrom
Quinn-With-Two-Ns:nexus-workflow-update
Open

Callback for workflow update support#9614
Quinn-With-Two-Ns wants to merge 8 commits into
temporalio:mainfrom
Quinn-With-Two-Ns:nexus-workflow-update

Conversation

@Quinn-With-Two-Ns
Copy link
Copy Markdown
Contributor

@Quinn-With-Two-Ns Quinn-With-Two-Ns commented Mar 21, 2026

What changed?

Added support for Nexus workflow update completion callbacks via CHASM. This allows a Nexus caller to be notified when a workflow update completes by attaching completion callbacks to the update request.

Why?

Nexus operations that target workflow updates need a way to receive completion notifications. Without this, a Nexus caller that sends an update has no async mechanism to learn when the update finishes. Completion callbacks enable the same async notification pattern that already exists for workflow-level Nexus operations.

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

Potential risks

Touches speculative workflow updates, they are always hard to reason about. Tried to compensate with lots of test coverage.

Note: Needs this API PR https://github.com/temporalio/api/pull/742/changes


Note

High Risk
Touches workflow update state machine and mutable state/history event paths to persist, fire, and describe per-update callbacks, including rejection/continue-as-new/retry handling; mistakes could drop callbacks or affect update lifecycle behavior.

Overview
Adds CHASM-backed completion callbacks for workflow updates so Nexus callers can register callbacks on UpdateWorkflowExecution and receive async completion (or rejection) notifications.

This introduces a new WorkflowUpdate CHASM component with persisted UpdateState (including validator rejection failure), per-update callback storage, and Nexus completion lookup via a new GetNexusUpdateCompletion backend path. Callback registration is extended to support update-scoped limits (MaxCallbacksPerUpdateID) and gated by EnableWorkflowUpdateCallbacks, with DescribeWorkflow now reporting both workflow- and update-triggered callbacks.

Update handling is expanded to persist late-attached callbacks via WorkflowExecutionOptionsUpdated events (including per-update options), buffer/flush callbacks while updates are in-flight, fire update callbacks on update completion, and ensure update callbacks are triggered on workflow close/continue-as-new/retry while leaving workflow-level callbacks inheritable.

Reviewed by Cursor Bugbot for commit 656086b. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread common/dynamicconfig/constants.go Outdated
Comment thread chasm/tree.go
Copy link
Copy Markdown
Member

@bergundy bergundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need just one more round here. For when updates are already completed, let's make sure to generate the new link type we discussed server-side.

Comment thread chasm/lib/workflow/library.go Outdated
func (l *Library) Components() []*chasm.RegistrableComponent {
return []*chasm.RegistrableComponent{
chasm.NewRegistrableComponent[*Workflow](chasm.WorkflowComponentName),
chasm.NewRegistrableComponent[*WorkflowUpdate](chasm.WorkflowUpdateComponentName),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that workflow update is tightly coupled to workflows, it makes total sense to put them in the same library.

*workflowpb.UpdateState

// MSPointer is a special in-memory field for accessing the underlying mutable state.
chasm.MSPointer
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was only supposed to be embedded in the top level Workflow component but I can see why you'd want to access it here. No strong opinion because either way this would be a workaround. I wonder though if you need to embed this or if it'd be better to make it a named field.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was embed in the workflow component so I made it embed here

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's not embedded then it would also need to be an exported field otherwise CHASM tree deserialization will not work. Probably to keep similar convention embedding is ok here

Comment thread chasm/lib/workflow/workflow_update.go Outdated
Comment thread chasm/workflow.go Outdated
)
MaxCallbacksPerUpdateID = NewNamespaceIntSetting(
"system.maxCallbacksPerUpdateID",
32,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think limiting all of the workflow callbacks, regardless of what component they're attached to makes more sense than a per component limit due to the fact that the entire tree needs to be loaded into memory when mutable state is accessed today.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also limited all workflow callbacks as well. I added this limit as well to keep one update from using up all the callbacks limit on a workflow.

Comment thread tests/nexus_workflow_update_test.go Outdated
Comment thread tests/nexus_workflow_update_test.go Outdated
Comment thread tests/nexus_workflow_update_test.go Outdated
Comment thread tests/nexus_workflow_update_test.go Outdated
Comment thread service/history/workflow/update/update.go
Copy link
Copy Markdown
Contributor

@stephanos stephanos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only made it half-way through so far; but figured I can send my first review comments now.

Comment thread .gitignore Outdated
Comment thread service/history/workflow/update/export_test.go
Comment thread tests/update_workflow_sdk_test.go
Comment thread tests/update_workflow_sdk_test.go Outdated
links []*commonpb.Link,
identity string,
priority *commonpb.Priority,
workflowUpdateOptions map[string]*historypb.WorkflowExecutionOptionsUpdatedEventAttributes_WorkflowUpdateOptionsUpdate,
Copy link
Copy Markdown
Contributor

@stephanos stephanos Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's not wrong, but ... WorkflowUpdateOptionsUpdate 😬

(non-blocking; just noticing)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree

Comment thread service/history/interfaces/mutable_state.go Outdated
Comment thread service/history/interfaces/mutable_state.go Outdated
@long-nt-tran long-nt-tran force-pushed the nexus-workflow-update branch 7 times, most recently from a453230 to 09ac27a Compare April 27, 2026 14:58
// - The event will be written atomically with acceptance
// If the Update struct is lost (registry cleared), the abort mechanism fires
// registryClearedErr on the caller's future, prompting an immediate retry.
if u.state == stateAdmitted || u.state == stateSent {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added handling for stateAdmitted, should be same as stateSent but returns false, nil since IIUC caller still needs to create the speculative WFT at this stage

@long-nt-tran long-nt-tran force-pushed the nexus-workflow-update branch from 09ac27a to 9de5339 Compare April 27, 2026 16:05
@long-nt-tran
Copy link
Copy Markdown
Contributor

long-nt-tran commented Apr 27, 2026

Made some updates to bring this to latest main, I squashed the base PR to first commit and group each type of change into a subsequent commit for ease of review.

Only logical changes are on on the top commit -- handling stateAdmitted and flushing callbacks to CHASM store before rejecting, and added some more unit tests to test nexus cases + backlinking.

cc @bergundy @Quinn-With-Two-Ns @stephanos


EDIT, leaving comment up for posterity: ignore this, latest state reverts these changes

@long-nt-tran long-nt-tran marked this pull request as ready for review April 27, 2026 16:34
@long-nt-tran long-nt-tran requested review from a team as code owners April 27, 2026 16:34
Comment thread go.mod Outdated
Comment thread service/history/workflow/mutable_state_impl.go
Comment thread service/history/workflow/update/update.go
@long-nt-tran long-nt-tran force-pushed the nexus-workflow-update branch from 9de5339 to 4b0915d Compare April 27, 2026 17:52
Comment thread service/history/api/updateworkflow/api.go
@long-nt-tran long-nt-tran force-pushed the nexus-workflow-update branch 2 times, most recently from 8551a4f to 3ae1202 Compare April 27, 2026 20:22
Comment thread chasm/lib/workflow/workflow.go Outdated
@long-nt-tran long-nt-tran force-pushed the nexus-workflow-update branch from 3ae1202 to 2ce7339 Compare April 28, 2026 02:13
@long-nt-tran long-nt-tran requested a review from a team as a code owner May 11, 2026 17:33
Comment thread chasm/lib/callback/component.go Outdated
Comment thread service/history/workflow/update/update.go
@long-nt-tran long-nt-tran force-pushed the nexus-workflow-update branch from 72b65be to 4b7757c Compare May 11, 2026 19:45
Comment thread service/history/api/updateworkflow/api.go
Comment thread service/history/api/updateworkflow/api.go
Comment thread chasm/lib/workflow/workflow.go
Comment thread service/history/api/updateworkflow/api.go
Comment thread service/history/workflow/update/update.go
Comment thread service/history/workflow/update/validation.go
@long-nt-tran long-nt-tran requested a review from stephanos May 12, 2026 14:10
awln-temporal pushed a commit that referenced this pull request May 12, 2026
## What changed?

Added a `createExternalNexusServer(...)` which sets up an external Nexus
endpoint with user-provided handler and listens on a provided address.
This is used in nexus_workflow_test.go and will be used more in
#9614

Opportunistically did a couple more drive-by refactors/consistency
fixes, specifically:
* Force user to provide `ctx` into the endpoint creation functions
instead of making a new `ctx`
* Use `env.Context()` instead of `testcore.NewContext()` in all suites
that I touched here

## Why?

Pulling changes out of #9614
into targeted PRs to reduce load on reviewers.

## How did you test it?
- [ ] built
- [ ] run locally and tested manually
- [x] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)
Comment thread service/frontend/workflow_handler.go Outdated
Comment thread service/history/api/updateworkflow/api.go
Comment thread service/history/api/updateworkflow/api.go
Comment thread tests/nexus_workflow_update_test.go Outdated
Comment thread tests/update_workflow_sdk_test.go Outdated
Comment thread chasm/lib/workflow/workflow_update.go Outdated
Comment thread chasm/lib/workflow/workflow_update.go Outdated
Comment thread chasm/lib/workflow/workflow.go Outdated
Comment thread chasm/lib/workflow/workflow.go
Comment thread service/history/workflow/update/update.go
Comment thread service/history/workflow/mutable_state_impl.go
@long-nt-tran long-nt-tran force-pushed the nexus-workflow-update branch from 579442b to 8541095 Compare May 16, 2026 04:32
Comment thread service/history/workflow/mutable_state_impl.go
Comment thread service/history/workflow/update/update.go Outdated
@long-nt-tran long-nt-tran force-pushed the nexus-workflow-update branch from 8541095 to d65dcaa Compare May 16, 2026 13:59
} else {
outcome = cevent.GetWorkflowExecutionUpdateCompletedEventAttributes().GetOutcome()
closeTime = cevent.GetEventTime().AsTime()
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transient errors incorrectly produce permanent failure outcome for callbacks

High Severity

GetNexusUpdateCompletion treats all errors from getUpdateOutcomeEvent the same — including transient I/O errors from the events cache. When the workflow is complete, the fallback path returns AcceptedUpdateCompletedWorkflowFailure as the operation result instead of propagating the transient error. This delivers a permanently incorrect failure to the Nexus caller, even though the update may have succeeded. The fallback logic needs to distinguish "update not found/not completed" errors from transient errors before assuming the update outcome is missing.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit d65dcaa. Configure here.

@long-nt-tran long-nt-tran force-pushed the nexus-workflow-update branch 3 times, most recently from 93da77d to a187f71 Compare May 17, 2026 01:57
Quinn-With-Two-Ns and others added 8 commits May 17, 2026 13:12
Squashed these commits, left for posterity:
- Add Nexus Workflow Update
- Update from rebase
- Fix sent state
- Cleanup
- Fix lint
- Fix more CI
- fix
- Review clean up
- Try suggestions from the review skill
- Fix some tests
- Add TODO for rejected event
- Remove .omc from gitignore
- Respond to PR comments
- Add NS Capability for this feature
- Respond to PR comments
- Update API
@long-nt-tran long-nt-tran force-pushed the nexus-workflow-update branch from a187f71 to 656086b Compare May 17, 2026 17:21
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default mode and found 1 potential issue.

There are 4 total unresolved issues (including 3 from previous reviews).

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 656086b. Configure here.

Comment thread service/history/workflow/update/update.go
}

return callback.ScheduleStandbyCallbacks(ctx, wf.Callbacks)
return wf.ProcessCloseCallbacks(ctx)
Copy link
Copy Markdown
Contributor

@long-nt-tran long-nt-tran May 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @bergundy LMK if this is right, I think we need to fire update callbacks here as well. I think if updates fail to complete before workflow finishes, we should probably propagate back this error vs. waiting for the completion callbacks to timeout:

acceptedUpdateCompletedWorkflowFailure = &failurepb.Failure{
Message: "Workflow Update failed because the Workflow completed before the Update completed.",
Source: "Server",
FailureInfo: &failurepb.Failure_ApplicationFailureInfo{ApplicationFailureInfo: &failurepb.ApplicationFailureInfo{
Type: "AcceptedUpdateCompletedWorkflow",
NonRetryable: true,
}},
}

I tightened up the assertion in test with assertAcceptedUpdateCompletedWorkflowError(...) to assert that we actually do propagate it back.


Without tightening up assertions, caller workflow would just timeout since the update completion callbacks never fired.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to fire all of the standby update callbacks as soon as the run they are attached to completes. This is slight different than what we do with workflow close callbacks, that can be reattached to a following run if the workflow retries or continues as new. I didn't re-review the PR so I trust that that's covered by functional tests and we are good.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replying for posterity from offline discussion: this change is good, we always wanna schedule the update-level callbacks when we schedule workflow-level callbacks

Copy link
Copy Markdown
Member

@yycptt yycptt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamping the chasm NodeBackend change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants