Skip to content

Remove the passthrough serializer; serialize all dispatch through cloudpickle #225

Description

@conradbzura

Description

Remove PassthroughSerializer and its supporting machinery (_PassthroughKey, _passthrough_store, _passthrough_pool) from wool.runtime.serializer. Self-dispatch — a worker dispatching a routine that resolves to its own process — currently negotiates PassthroughSerializer, which transfers the dispatch payload by object reference instead of serializing it.

After this change, every dispatch — cross-process and self — serializes through CloudpickleSerializer (wool.__serializer__). The use_passthrough branching in WorkerConnection (dispatch/_execute/_handshake), the passthrough negotiation in DispatchSession.__aenter__, and the PassthroughSerializer-specific branches in Task.to_protobuf/from_protobuf are removed. The dispatch path is left with a single serialization path.

Motivation

PassthroughSerializer is a self-dispatch optimization that hands the routine the same argument objects the caller passed, rather than copies. This leaks reference semantics into a routine's contract.

A @wool.routine is location-transparent: a cross-process dispatch genuinely copies its arguments through pickle, so a routine reasonably assumes it receives a copy and that mutating an argument in place is local. Under self-dispatch the routine instead shares the caller's objects, so an in-place mutation leaks back to the caller — or does not — depending purely on where the load balancer routed the call. The behavior is observable, non-deterministic, and contradicts @wool.routine's own guidance to avoid shared mutable state across a dispatch.

The most acute symptom is wool.Token: it carries a native contextvars.Token valid only in the contextvars.Context it was minted in. Lent by reference through a same-process dispatch, that native handle is stale on the receiving task and wool.ContextVar.reset raises ValueError. This is the defect behind the xfail'd multi-hop token-reset test on PR #224.

The optimization does not earn its cost. Benchmarked on representative payloads, cloudpickle (C-accelerated pickle underneath) round-trips ordinary data faster than copy.deepcopy (2–7×), and serializing a typical self-dispatch payload through cloudpickle costs single-digit microseconds. Passthrough's zero-copy saves only those microseconds, only for self-dispatch — at the price of a correctness leak, a real bug, and a permanent special-case branch threaded through the connection, session, and task layers. Removing it is a net simplification.

Expected outcome

  • PassthroughSerializer, _PassthroughKey, _passthrough_store, and _passthrough_pool are removed from wool.runtime.serializer.
  • All dispatch — cross-process and self-dispatch — serializes through CloudpickleSerializer; a routine always receives a copy of its arguments, regardless of where it runs.
  • The use_passthrough branches in WorkerConnection and DispatchSession, and the PassthroughSerializer handling in Task.to_protobuf/from_protobuf, are removed; one serialization path remains.
  • Self-dispatch behaves observationally identically to cross-process dispatch — including that an argument which is not cloudpicklable can no longer be carried (cross-process dispatch never could).
  • The multi-hop wool.Token reset scenario is correct once this lands; PR Reframe Wool's context model around stdlib contextvars — Closes #223, #229, #232 #224 removes its TestMultiHopTokenReset xfail when it takes this change in.
  • The test suite stays green; tests cover the unified self-dispatch path.

Metadata

Metadata

Assignees

Labels

refactorCode restructuring without behavior change

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions