You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Have the workflow discovery's pickle reduce target an inert stub on the worker side instead of faithfully reconstructing a functional EcsDiscovery/_EcsSubscriber.
wool serializes the dispatched task's WorkerProxy (task.to_protobuf → proxy_dumps → cloudpickle), and WorkerProxy.__wool_reduce__ drags the caller's discovery across the boundary into the worker. cfdb does single-level dispatch only — there is exactly one @wool.routine (src/cfdb/workflows/executor.py:182), and its body runs the preprocessing pipeline (processors shelling out to samtools/bgzip/tabix); it constructs no WorkerPool, holds no WorkerProxy, and calls no other routine. WorkerPool/discovery are constructed only dispatch-side (the API in src/cfdb/api/main.py, the LAN pool host in src/cfdb/workflows/worker_lan.py). So the discovery wool ships to the worker is never used worker-side.
Today's pickle fixes (#54 for EcsDiscovery, #60 for _EcsSubscriber) faithfully rebuild that unused object on unpickle: EcsDiscovery.__setstate__ calls build_ecs_client(...) to mint a fresh boto3 ECS client and fresh asyncio.Locks; _EcsSubscriber.__setstate__ mints a fresh queue. This is dead baggage — and worse, it forces the worker to be able to construct a boto3 ECS client (region resolution, credential chain) purely to reconstruct an object it never touches: a latent worker-side failure surface for zero functional benefit.
Proposed end state: the discovery reduces (via __reduce__) to an inert stub that satisfies wool's structural Discovery protocol but whose activation methods (__aenter__/__aexit__, subscribe, subscriber, publisher, poll_once) raise NotImplementedError("discovery is inert worker-side; nested dispatch is unsupported"). The stub needs no boto3 client, no region, no credentials, and no transient-field stripping. This is the natural terminus of the __reduce__ path and the cfdb-side counterpart of the durable factory-callable-discovery direction.
Motivation
Remove a latent worker-side failure surface. The worker currently must be able to build a boto3 ECS client (region/credentials) just to reconstruct dead baggage; any future change to client construction can break unpickling on the worker for no functional reason.
Make the "never used worker-side" assumption explicit and fail-fast. Today it's an unenforced invariant; a raising stub converts it into a loud failure if nested dispatch is ever added without revisiting this.
Expected Outcome
The object wool drags to the worker deserializes into an inert stub; the worker constructs no boto3 ECS client and needs no AWS region/credentials to unpickle it.
Activating the stub worker-side (entering it / subscribe / poll_once / publisher) raises NotImplementedError naming nested dispatch as the unsupported case.
A regression test asserts the reduced object is the inert stub (not a rebuilt discovery) and that its activation methods raise.
Verified precondition: wool does not enter/activate the proxy's discovery during normal single-level dispatch (only on nested dispatch). Confirm via the live GET /index/{dcc}/{local_id} E2E completing with the stub in place; if wool does touch it during normal dispatch, narrow which methods the stub must implement as functional no-ops rather than raisers.
Description
Have the workflow discovery's pickle reduce target an inert stub on the worker side instead of faithfully reconstructing a functional
EcsDiscovery/_EcsSubscriber.wool serializes the dispatched task's
WorkerProxy(task.to_protobuf→proxy_dumps→ cloudpickle), andWorkerProxy.__wool_reduce__drags the caller'sdiscoveryacross the boundary into the worker. cfdb does single-level dispatch only — there is exactly one@wool.routine(src/cfdb/workflows/executor.py:182), and its body runs the preprocessing pipeline (processors shelling out to samtools/bgzip/tabix); it constructs noWorkerPool, holds noWorkerProxy, and calls no other routine.WorkerPool/discovery are constructed only dispatch-side (the API insrc/cfdb/api/main.py, the LAN pool host insrc/cfdb/workflows/worker_lan.py). So the discovery wool ships to the worker is never used worker-side.Today's pickle fixes (#54 for
EcsDiscovery, #60 for_EcsSubscriber) faithfully rebuild that unused object on unpickle:EcsDiscovery.__setstate__callsbuild_ecs_client(...)to mint a fresh boto3 ECS client and freshasyncio.Locks;_EcsSubscriber.__setstate__mints a fresh queue. This is dead baggage — and worse, it forces the worker to be able to construct a boto3 ECS client (region resolution, credential chain) purely to reconstruct an object it never touches: a latent worker-side failure surface for zero functional benefit.Proposed end state: the discovery reduces (via
__reduce__) to an inert stub that satisfies wool's structuralDiscoveryprotocol but whose activation methods (__aenter__/__aexit__,subscribe,subscriber,publisher,poll_once) raiseNotImplementedError("discovery is inert worker-side; nested dispatch is unsupported"). The stub needs no boto3 client, no region, no credentials, and no transient-field stripping. This is the natural terminus of the__reduce__path and the cfdb-side counterpart of the durable factory-callable-discovery direction.Motivation
__getstate__/__setstate__strip-and-rebuild logic; a third such object — or a new transient field — re-opens the same drift hazard. An inert stub eliminates the rebuild path entirely and supersedes both interim fixes.Expected Outcome
subscribe/poll_once/publisher) raisesNotImplementedErrornaming nested dispatch as the unsupported case.__getstate__/__setstate__hand-stripping onEcsDiscovery(Fix ECS workflow dispatch: unpicklable EcsDiscovery in proxy, cold-start TimeoutError, and exhausted subscriber #54) and_EcsSubscriber(Make _EcsSubscriber cloudpickle-safe so ECS workflow dispatch survives the wool reduce boundary #60) is removed in favor of the reduce-to-stub path (or kept only where still load-bearing).GET /index/{dcc}/{local_id}E2E completing with the stub in place; if wool does touch it during normal dispatch, narrow which methods the stub must implement as functional no-ops rather than raisers.