Skip to content

Make _EcsSubscriber cloudpickle-safe so ECS workflow dispatch survives the wool reduce boundary — Closes #60#62

Merged
conradbzura merged 1 commit into
masterfrom
60-ecssubscriber-cloudpickle-safe
Jun 22, 2026
Merged

Make _EcsSubscriber cloudpickle-safe so ECS workflow dispatch survives the wool reduce boundary — Closes #60#62
conradbzura merged 1 commit into
masterfrom
60-ecssubscriber-cloudpickle-safe

Conversation

@conradbzura

Copy link
Copy Markdown
Collaborator

Summary

ECS-profile workflow dispatch failed before any preprocessing ran: a GET /data|/index for an uncached processable file returned 202, then the job terminated failed with cannot pickle '_asyncio.Future' object. wool's WorkerProxy.__wool_reduce__ serializes the dispatched task's discovery, which in the no-spawn WorkerPool branch is cfdb's _EcsSubscriber (the EcsDiscovery rides along as its _owner). While the API's WorkerPool consumes the subscriber, its asyncio.Queue holds a pending-getter _asyncio.Future that cloudpickle cannot serialize. _EcsSubscriber defined no pickle protocol, so #54's EcsDiscovery fix did not reach it.

Make _EcsSubscriber honor wool's documented picklability contract by adding __getstate__/__setstate__ mirroring EcsDiscovery one level deeper: strip the live queue and the single-use flag on pickle, recreate a fresh inert queue on unpickle. The worker never consumes the restored subscriber, so an empty queue is correct. This is the targeted unblock; a follow-up (#61) replaces the per-object hand-stripping with an inert worker-side discovery stub.

Closes #60

Proposed changes

Add the pickle protocol to _EcsSubscriber (src/cfdb/workflows/discovery.py)

  • __getstate__ copies __dict__ and drops _queue (its internal getter deque holds the unpicklable pending _asyncio.Future mid-consumption, and the queue is bound to the API event loop) plus the single-use _exhausted flag. _owner (already pickle-safe via EcsDiscovery.__getstate__) and _filter carry across.
  • __setstate__ restores the dict, then mints a fresh asyncio.Queue() and resets _exhausted = False, yielding a well-formed, inert subscriber.

Add a regression test for the consumed-subscriber path (tests/test_workflows/test_discovery.py)

  • Add test_subscriber_roundtrip_should_succeed_when_queue_has_pending_getter to TestEcsDiscoveryPickleAgainstMoto. It drives the subscriber into the live being-consumed shape — a getter parked on the queue plus the flipped flag — over a real moto-backed boto3 client, the state the existing fresh-subscriber round-trip never reproduced (it passed while production failed). It asserts cloudpickle.dumps succeeds and the restored subscriber is inert: empty queue, reset flag, retained filter, rebuilt owner client.

Test cases

# Test Suite Given When Then Coverage Target
1 TestEcsDiscoveryPickleAgainstMoto An _EcsSubscriber over a real moto-backed client with a getter parked on its queue and the single-use flag set The subscriber is cloudpickle dumped and loaded back It serializes successfully and restores an inert handle with an empty queue, a reset flag, a retained filter, and a rebuilt owner client _EcsSubscriber.__getstate__ / __setstate__

@conradbzura conradbzura self-assigned this Jun 22, 2026
wool's WorkerProxy.__wool_reduce__ serializes the dispatched task's
discovery, which in the no-spawn WorkerPool branch is cfdb's
_EcsSubscriber (the EcsDiscovery rides along as its owner). While the
API's WorkerPool is consuming the subscriber, its asyncio.Queue holds a
pending-getter Future that cloudpickle cannot serialize, so every ECS
workflow dispatch failed with "cannot pickle '_asyncio.Future' object".

Add __getstate__/__setstate__ to _EcsSubscriber, mirroring the
EcsDiscovery fix one level deeper: strip the live queue and the
single-use exhausted flag on pickle and recreate a fresh, inert queue on
unpickle. The worker never consumes the restored subscriber, so an empty
queue is correct.

The regression test drives the subscriber into the live being-consumed
state (a getter parked on the queue plus the flipped flag) over a real
moto-backed client; the earlier fresh-subscriber test did not reproduce
that state and so passed while production failed.
@conradbzura conradbzura force-pushed the 60-ecssubscriber-cloudpickle-safe branch from 18714f2 to 7c869a3 Compare June 22, 2026 03:08
@conradbzura conradbzura marked this pull request as ready for review June 22, 2026 03:10
@conradbzura conradbzura merged commit 8fde211 into master Jun 22, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make _EcsSubscriber cloudpickle-safe so ECS workflow dispatch survives the wool reduce boundary

1 participant