Skip to content

Stop mutating registry-shared Flow objects during executor state transitions #335

@dgenio

Description

@dgenio

Summary

Make executor state transitions (accept_drift, set_flow_status) operate on copies or via the registry/store API instead of mutating Flow objects that are shared with the registry and potentially other executor instances.

Why this matters

Flow instances retrieved from a FlowRegistry are shared references. Mutating them in place means one executor's drift acceptance or status change silently alters the state seen by every other holder of that object — including long-running servers such as chainweaver.mcp.FlowServer, which enumerates flows from the executor's registry. This is a correctness hazard that gets worse as deployment topologies grow (multiple executors over one store, hot-reload scenarios).

Current evidence

  • chainweaver/executor.py (~lines 832–837): accept_drift() writes to flow.tool_schema_hashes and flow.status on the registry-held object.
  • chainweaver/executor.py (~line 847): set_flow_status() mutates flow.status in place.
  • FlowRegistry is store-backed (storage.py), so persisted state and in-memory mutated state can also diverge depending on whether the store is re-saved after mutation.

Proposed implementation

  1. Decide the ownership rule and document it: flow state transitions go through the registry, which owns persistence; executors request transitions rather than performing field writes.
  2. Add registry-level operations, e.g. FlowRegistry.update_flow_state(name, *, version, status=None, tool_schema_hashes=None) that performs model_copy(update=...), re-registers/persists the new object, and returns it.
  3. Rewrite accept_drift() and set_flow_status() to use the new registry operation; never mutate a Flow retrieved from the registry.
  4. Evaluate making Flow/DAGFlow Pydantic models frozen (model_config = ConfigDict(frozen=True)) or at least documenting immutability expectations, so accidental in-place mutation fails fast. If freezing is too disruptive now, add it as a follow-up with a deprecation window.
  5. Audit for other in-place mutations of registry-held objects (e.g., anything writing to flow.governance or tool fields post-registration).

Acceptance criteria

  • No code path mutates a Flow object obtained from a registry; verified by a test that registers a flow, captures the original object, runs accept_drift/set_flow_status, and asserts the original instance is unchanged while the registry returns the updated state.
  • File-backed stores reflect state transitions after accept_drift/set_flow_status (persistence test with FileStore).
  • Two executors sharing one registry observe consistent, intentional state (regression test).

Test plan

  • New tests in tests/test_drift_detection.py and tests/test_flow_status.py covering the shared-reference and persistence scenarios.
  • Full validation commands pass.

Migration notes

Behavior change: callers who relied on their local Flow reference being updated in place after accept_drift/set_flow_status must re-fetch from the registry. Document in CHANGELOG.md. If Flow is frozen, constructing-then-mutating patterns in user code will break — gate that step behind a deprecation notice.

Risks and tradeoffs

  • Freezing Flow may ripple into tests/examples that tweak fields post-construction; the copy-on-write registry operation alone removes the main hazard if freezing is deferred.
  • Slight overhead from model_copy on state transitions — negligible (transitions are rare).

Suggested labels

reliability, architecture


Confidence: High — direct code evidence; severity driven by server/multi-executor deployments.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions