Callback for workflow update support by Quinn-With-Two-Ns · Pull Request #9614 · temporalio/temporal

Quinn-With-Two-Ns · 2026-03-21T21:50:13Z

What changed?

Added support for Nexus workflow update completion callbacks via CHASM. This allows a Nexus caller to be notified when a workflow update completes by attaching completion callbacks to the update request.

Why?

Nexus operations that target workflow updates need a way to receive completion notifications. Without this, a Nexus caller that sends an update has no async mechanism to learn when the update finishes. Completion callbacks enable the same async notification pattern that already exists for workflow-level Nexus operations.

How did you test it?

Potential risks

Touches speculative workflow updates, they are always hard to reason about. Tried to compensate with lots of test coverage.

Note: Needs this API PR https://github.com/temporalio/api/pull/742/changes

Note

High Risk
Touches workflow update state machine and mutable state/history event paths to persist, fire, and describe per-update callbacks, including rejection/continue-as-new/retry handling; mistakes could drop callbacks or affect update lifecycle behavior.

Overview
Adds CHASM-backed completion callbacks for workflow updates so Nexus callers can register callbacks on UpdateWorkflowExecution and receive async completion (or rejection) notifications.

This introduces a new WorkflowUpdate CHASM component with persisted UpdateState (including validator rejection failure), per-update callback storage, and Nexus completion lookup via a new GetNexusUpdateCompletion backend path. Callback registration is extended to support update-scoped limits (MaxCallbacksPerUpdateID) and gated by EnableWorkflowUpdateCallbacks, with DescribeWorkflow now reporting both workflow- and update-triggered callbacks.

Update handling is expanded to persist late-attached callbacks via WorkflowExecutionOptionsUpdated events (including per-update options), buffer/flush callbacks while updates are in-flight, fire update callbacks on update completion, and ensure update callbacks are triggered on workflow close/continue-as-new/retry while leaving workflow-level callbacks inheritable.

^{Reviewed by Cursor Bugbot for commit 656086b. Bugbot is set up for automated code reviews on this repo. Configure here.}

bergundy

I think we need just one more round here. For when updates are already completed, let's make sure to generate the new link type we discussed server-side.

bergundy · 2026-03-23T23:17:48Z

 func (l *Library) Components() []*chasm.RegistrableComponent {
 	return []*chasm.RegistrableComponent{
 		chasm.NewRegistrableComponent[*Workflow](chasm.WorkflowComponentName),
+		chasm.NewRegistrableComponent[*WorkflowUpdate](chasm.WorkflowUpdateComponentName),


Given that workflow update is tightly coupled to workflows, it makes total sense to put them in the same library.

bergundy · 2026-03-23T23:49:43Z

+	*workflowpb.UpdateState
+
+	// MSPointer is a special in-memory field for accessing the underlying mutable state.
+	chasm.MSPointer


This was only supposed to be embedded in the top level Workflow component but I can see why you'd want to access it here. No strong opinion because either way this would be a workaround. I wonder though if you need to embed this or if it'd be better to make it a named field.

It was embed in the workflow component so I made it embed here

if it's not embedded then it would also need to be an exported field otherwise CHASM tree deserialization will not work. Probably to keep similar convention embedding is ok here

bergundy · 2026-03-23T23:54:23Z

 	)
+	MaxCallbacksPerUpdateID = NewNamespaceIntSetting(
+		"system.maxCallbacksPerUpdateID",
+		32,


I think limiting all of the workflow callbacks, regardless of what component they're attached to makes more sense than a per component limit due to the fact that the entire tree needs to be loaded into memory when mutable state is accessed today.

I also limited all workflow callbacks as well. I added this limit as well to keep one update from using up all the callbacks limit on a workflow.

stephanos

Only made it half-way through so far; but figured I can send my first review comments now.

stephanos · 2026-03-25T00:47:37Z

 	links []*commonpb.Link,
 	identity string,
 	priority *commonpb.Priority,
+	workflowUpdateOptions map[string]*historypb.WorkflowExecutionOptionsUpdatedEventAttributes_WorkflowUpdateOptionsUpdate,


I know it's not wrong, but ... WorkflowUpdateOptionsUpdate 😬

(non-blocking; just noticing)

Yeah I agree

long-nt-tran · 2026-04-27T15:45:10Z

+	// - The event will be written atomically with acceptance
+	// If the Update struct is lost (registry cleared), the abort mechanism fires
+	// registryClearedErr on the caller's future, prompting an immediate retry.
+	if u.state == stateAdmitted || u.state == stateSent {


added handling for stateAdmitted, should be same as stateSent but returns false, nil since IIUC caller still needs to create the speculative WFT at this stage

long-nt-tran · 2026-04-27T16:34:40Z

~~Made some updates to bring this to latest main, I squashed the base PR to first commit and group each type of change into a subsequent commit for ease of review.~~

~~Only logical changes are on on the top commit -- handling stateAdmitted and flushing callbacks to CHASM store before rejecting, and added some more unit tests to test nexus cases + backlinking.~~

~~cc @bergundy @Quinn-With-Two-Ns @stephanos~~

EDIT, leaving comment up for posterity: ignore this, latest state reverts these changes

## What changed? Added a `createExternalNexusServer(...)` which sets up an external Nexus endpoint with user-provided handler and listens on a provided address. This is used in nexus_workflow_test.go and will be used more in #9614 Opportunistically did a couple more drive-by refactors/consistency fixes, specifically: * Force user to provide `ctx` into the endpoint creation functions instead of making a new `ctx` * Use `env.Context()` instead of `testcore.NewContext()` in all suites that I touched here ## Why? Pulling changes out of #9614 into targeted PRs to reduce load on reviewers. ## How did you test it? - [ ] built - [ ] run locally and tested manually - [x] covered by existing tests - [ ] added new unit test(s) - [ ] added new functional test(s)

cursor · 2026-05-16T14:05:56Z

+	} else {
+		outcome = cevent.GetWorkflowExecutionUpdateCompletedEventAttributes().GetOutcome()
+		closeTime = cevent.GetEventTime().AsTime()
+	}


Transient errors incorrectly produce permanent failure outcome for callbacks

High Severity

GetNexusUpdateCompletion treats all errors from getUpdateOutcomeEvent the same — including transient I/O errors from the events cache. When the workflow is complete, the fallback path returns AcceptedUpdateCompletedWorkflowFailure as the operation result instead of propagating the transient error. This delivers a permanently incorrect failure to the Nexus caller, even though the update may have succeeded. The fallback logic needs to distinguish "update not found/not completed" errors from transient errors before assuming the update outcome is missing.

^{Reviewed by Cursor Bugbot for commit d65dcaa. Configure here.}

Squashed these commits, left for posterity: - Add Nexus Workflow Update - Update from rebase - Fix sent state - Cleanup - Fix lint - Fix more CI - fix - Review clean up - Try suggestions from the review skill - Fix some tests - Add TODO for rejected event - Remove .omc from gitignore - Respond to PR comments - Add NS Capability for this feature - Respond to PR comments - Update API

cursor

Cursor Bugbot has reviewed your changes using default mode and found 1 potential issue.

There are 4 total unresolved issues (including 3 from previous reviews).

^{Reviewed by Cursor Bugbot for commit 656086b. Configure here.}

long-nt-tran · 2026-05-17T17:32:58Z

 	}

-	return callback.ScheduleStandbyCallbacks(ctx, wf.Callbacks)
+	return wf.ProcessCloseCallbacks(ctx)


cc @bergundy LMK if this is right, I think we need to fire update callbacks here as well. I think if updates fail to complete before workflow finishes, we should probably propagate back this error vs. waiting for the completion callbacks to timeout:

temporal/service/history/workflow/update/errors_failures.go

Lines 27 to 34 in 593fdba

acceptedUpdateCompletedWorkflowFailure = &failurepb.Failure{

Message: "Workflow Update failed because the Workflow completed before the Update completed.",

Source: "Server",

FailureInfo: &failurepb.Failure_ApplicationFailureInfo{ApplicationFailureInfo: &failurepb.ApplicationFailureInfo{

Type: "AcceptedUpdateCompletedWorkflow",

NonRetryable: true,

}},

}

I tightened up the assertion in test with assertAcceptedUpdateCompletedWorkflowError(...) to assert that we actually do propagate it back.

Without tightening up assertions, caller workflow would just timeout since the update completion callbacks never fired.

We need to fire all of the standby update callbacks as soon as the run they are attached to completes. This is slight different than what we do with workflow close callbacks, that can be reattached to a following run if the workflow retries or continues as new. I didn't re-review the PR so I trust that that's covered by functional tests and we are good.

replying for posterity from offline discussion: this change is good, we always wanna schedule the update-level callbacks when we schedule workflow-level callbacks

yycptt

Stamping the chasm NodeBackend change

Quinn-With-Two-Ns commented Mar 21, 2026

View reviewed changes

Comment thread common/dynamicconfig/constants.go Outdated

Quinn-With-Two-Ns commented Mar 22, 2026

View reviewed changes

Comment thread chasm/tree.go

Quinn-With-Two-Ns mentioned this pull request Mar 24, 2026

Add Callbacks and Links to Workflow Update temporalio/api#742

Open

bergundy reviewed Mar 25, 2026

View reviewed changes

stephanos reviewed Mar 25, 2026

View reviewed changes

Quinn-With-Two-Ns requested review from bergundy and stephanos March 26, 2026 22:22

bergundy approved these changes Mar 30, 2026

View reviewed changes

long-nt-tran force-pushed the nexus-workflow-update branch 7 times, most recently from a453230 to 09ac27a Compare April 27, 2026 14:58

long-nt-tran reviewed Apr 27, 2026

View reviewed changes

long-nt-tran force-pushed the nexus-workflow-update branch from 09ac27a to 9de5339 Compare April 27, 2026 16:05

long-nt-tran marked this pull request as ready for review April 27, 2026 16:34

long-nt-tran requested review from a team as code owners April 27, 2026 16:34

cursor Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread go.mod Outdated

Comment thread service/history/workflow/mutable_state_impl.go

Comment thread service/history/workflow/update/update.go

long-nt-tran force-pushed the nexus-workflow-update branch from 9de5339 to 4b0915d Compare April 27, 2026 17:52

cursor Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread service/history/api/updateworkflow/api.go

long-nt-tran force-pushed the nexus-workflow-update branch 2 times, most recently from 8551a4f to 3ae1202 Compare April 27, 2026 20:22

cursor Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread chasm/lib/workflow/workflow.go Outdated

long-nt-tran force-pushed the nexus-workflow-update branch from 3ae1202 to 2ce7339 Compare April 28, 2026 02:13

long-nt-tran requested a review from a team as a code owner May 11, 2026 17:33

long-nt-tran reviewed May 11, 2026

View reviewed changes

Comment thread chasm/lib/callback/component.go Outdated

cursor Bot reviewed May 11, 2026

View reviewed changes

Comment thread service/history/workflow/update/update.go

long-nt-tran force-pushed the nexus-workflow-update branch from 72b65be to 4b7757c Compare May 11, 2026 19:45

cursor Bot reviewed May 11, 2026

View reviewed changes

Comment thread service/history/api/updateworkflow/api.go

cursor Bot reviewed May 11, 2026

View reviewed changes

Comment thread service/history/api/updateworkflow/api.go

cursor Bot reviewed May 11, 2026

View reviewed changes

Comment thread chasm/lib/workflow/workflow.go

Comment thread service/history/api/updateworkflow/api.go

Comment thread service/history/workflow/update/update.go

Comment thread service/history/workflow/update/validation.go

long-nt-tran requested a review from stephanos May 12, 2026 14:10

stephanos reviewed May 12, 2026

View reviewed changes

long-nt-tran reviewed May 15, 2026

View reviewed changes

Comment thread service/history/workflow/mutable_state_impl.go

long-nt-tran force-pushed the nexus-workflow-update branch from 579442b to 8541095 Compare May 16, 2026 04:32

cursor Bot reviewed May 16, 2026

View reviewed changes

Comment thread service/history/workflow/mutable_state_impl.go

Comment thread service/history/workflow/update/update.go Outdated

long-nt-tran force-pushed the nexus-workflow-update branch from 8541095 to d65dcaa Compare May 16, 2026 13:59

cursor Bot reviewed May 16, 2026

View reviewed changes

long-nt-tran force-pushed the nexus-workflow-update branch 3 times, most recently from 93da77d to a187f71 Compare May 17, 2026 01:57

Quinn-With-Two-Ns and others added 8 commits May 17, 2026 13:12

Update function interfaces with timeSkippingConfig

8ebe03a

Update Nexus update tests to parallelsuite

5ac6aae

Update protos

34a6282

No-op code changes

9614ee4

Fire all callbacks when wf closes

ddcc577

Fix linter

59aa5b7

Make test assertions tighter

656086b

long-nt-tran force-pushed the nexus-workflow-update branch from a187f71 to 656086b Compare May 17, 2026 17:21

cursor Bot reviewed May 17, 2026

View reviewed changes

Comment thread service/history/workflow/update/update.go

long-nt-tran reviewed May 17, 2026

View reviewed changes

yycptt approved these changes May 18, 2026

View reviewed changes

	acceptedUpdateCompletedWorkflowFailure = &failurepb.Failure{
	Message: "Workflow Update failed because the Workflow completed before the Update completed.",
	Source: "Server",
	FailureInfo: &failurepb.Failure_ApplicationFailureInfo{ApplicationFailureInfo: &failurepb.ApplicationFailureInfo{
	Type: "AcceptedUpdateCompletedWorkflow",
	NonRetryable: true,
	}},
	}

Conversation

Quinn-With-Two-Ns commented Mar 21, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed?

Why?

How did you test it?

Potential risks

Uh oh!

Uh oh!

Uh oh!

bergundy left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stephanos left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stephanos Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

long-nt-tran commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot May 16, 2026

Choose a reason for hiding this comment

Quinn-With-Two-Ns commented Mar 21, 2026 •

edited by cursor Bot

Loading

stephanos Mar 25, 2026 •

edited

Loading

long-nt-tran commented Apr 27, 2026 •

edited

Loading

long-nt-tran May 17, 2026 •

edited

Loading