Summary
Make executor state transitions (accept_drift, set_flow_status) operate on copies or via the registry/store API instead of mutating Flow objects that are shared with the registry and potentially other executor instances.
Why this matters
Flow instances retrieved from a FlowRegistry are shared references. Mutating them in place means one executor's drift acceptance or status change silently alters the state seen by every other holder of that object — including long-running servers such as chainweaver.mcp.FlowServer, which enumerates flows from the executor's registry. This is a correctness hazard that gets worse as deployment topologies grow (multiple executors over one store, hot-reload scenarios).
Current evidence
chainweaver/executor.py (~lines 832–837): accept_drift() writes to flow.tool_schema_hashes and flow.status on the registry-held object.
chainweaver/executor.py (~line 847): set_flow_status() mutates flow.status in place.
FlowRegistry is store-backed (storage.py), so persisted state and in-memory mutated state can also diverge depending on whether the store is re-saved after mutation.
Proposed implementation
- Decide the ownership rule and document it: flow state transitions go through the registry, which owns persistence; executors request transitions rather than performing field writes.
- Add registry-level operations, e.g.
FlowRegistry.update_flow_state(name, *, version, status=None, tool_schema_hashes=None) that performs model_copy(update=...), re-registers/persists the new object, and returns it.
- Rewrite
accept_drift() and set_flow_status() to use the new registry operation; never mutate a Flow retrieved from the registry.
- Evaluate making
Flow/DAGFlow Pydantic models frozen (model_config = ConfigDict(frozen=True)) or at least documenting immutability expectations, so accidental in-place mutation fails fast. If freezing is too disruptive now, add it as a follow-up with a deprecation window.
- Audit for other in-place mutations of registry-held objects (e.g., anything writing to
flow.governance or tool fields post-registration).
Acceptance criteria
- No code path mutates a
Flow object obtained from a registry; verified by a test that registers a flow, captures the original object, runs accept_drift/set_flow_status, and asserts the original instance is unchanged while the registry returns the updated state.
- File-backed stores reflect state transitions after
accept_drift/set_flow_status (persistence test with FileStore).
- Two executors sharing one registry observe consistent, intentional state (regression test).
Test plan
- New tests in
tests/test_drift_detection.py and tests/test_flow_status.py covering the shared-reference and persistence scenarios.
- Full validation commands pass.
Migration notes
Behavior change: callers who relied on their local Flow reference being updated in place after accept_drift/set_flow_status must re-fetch from the registry. Document in CHANGELOG.md. If Flow is frozen, constructing-then-mutating patterns in user code will break — gate that step behind a deprecation notice.
Risks and tradeoffs
- Freezing
Flow may ripple into tests/examples that tweak fields post-construction; the copy-on-write registry operation alone removes the main hazard if freezing is deferred.
- Slight overhead from
model_copy on state transitions — negligible (transitions are rare).
Suggested labels
reliability, architecture
Confidence: High — direct code evidence; severity driven by server/multi-executor deployments.
Summary
Make executor state transitions (
accept_drift,set_flow_status) operate on copies or via the registry/store API instead of mutatingFlowobjects that are shared with the registry and potentially other executor instances.Why this matters
Flowinstances retrieved from aFlowRegistryare shared references. Mutating them in place means one executor's drift acceptance or status change silently alters the state seen by every other holder of that object — including long-running servers such aschainweaver.mcp.FlowServer, which enumerates flows from the executor's registry. This is a correctness hazard that gets worse as deployment topologies grow (multiple executors over one store, hot-reload scenarios).Current evidence
chainweaver/executor.py(~lines 832–837):accept_drift()writes toflow.tool_schema_hashesandflow.statuson the registry-held object.chainweaver/executor.py(~line 847):set_flow_status()mutatesflow.statusin place.FlowRegistryis store-backed (storage.py), so persisted state and in-memory mutated state can also diverge depending on whether the store is re-saved after mutation.Proposed implementation
FlowRegistry.update_flow_state(name, *, version, status=None, tool_schema_hashes=None)that performsmodel_copy(update=...), re-registers/persists the new object, and returns it.accept_drift()andset_flow_status()to use the new registry operation; never mutate aFlowretrieved from the registry.Flow/DAGFlowPydantic models frozen (model_config = ConfigDict(frozen=True)) or at least documenting immutability expectations, so accidental in-place mutation fails fast. If freezing is too disruptive now, add it as a follow-up with a deprecation window.flow.governanceor tool fields post-registration).Acceptance criteria
Flowobject obtained from a registry; verified by a test that registers a flow, captures the original object, runsaccept_drift/set_flow_status, and asserts the original instance is unchanged while the registry returns the updated state.accept_drift/set_flow_status(persistence test withFileStore).Test plan
tests/test_drift_detection.pyandtests/test_flow_status.pycovering the shared-reference and persistence scenarios.Migration notes
Behavior change: callers who relied on their local
Flowreference being updated in place afteraccept_drift/set_flow_statusmust re-fetch from the registry. Document in CHANGELOG.md. IfFlowis frozen, constructing-then-mutating patterns in user code will break — gate that step behind a deprecation notice.Risks and tradeoffs
Flowmay ripple into tests/examples that tweak fields post-construction; the copy-on-write registry operation alone removes the main hazard if freezing is deferred.model_copyon state transitions — negligible (transitions are rare).Suggested labels
reliability, architecture
Confidence: High — direct code evidence; severity driven by server/multi-executor deployments.