Skip to content

Draft technical whitepaper for Wool — Closes #152#153

Open
conradbzura wants to merge 1 commit into
wool-labs:mainfrom
conradbzura:152-whitepaper
Open

Draft technical whitepaper for Wool — Closes #152#153
conradbzura wants to merge 1 commit into
wool-labs:mainfrom
conradbzura:152-whitepaper

Conversation

@conradbzura

@conradbzura conradbzura commented Apr 4, 2026

Copy link
Copy Markdown
Contributor

Summary

Add a technical whitepaper – WHITEPAPER.md – at the repo root covering Wool's architecture, protocol design, and runtime semantics. The document provides a thorough walkthrough of the system model, gRPC wire protocol, streaming semantics, pluggable discovery, load balancing, admission control, security, and design trade-offs among other things.

Closes #152

@conradbzura conradbzura self-assigned this Apr 4, 2026
@conradbzura conradbzura marked this pull request as ready for review April 6, 2026 01:26
Comment thread WHITEPAPER.md

**Conrad Bzura, April 2025**

This document is a technical reference for the Wool distributed Python runtime. It is intended as a handbook for Wool users with low-level system questions, engineers evaluating Wool as a backend for distributed Python applications, and contributors to the Wool core. It is not an introduction to Wool; for that, refer to the [Getting Started tutorials](01-hello-world-ml.md) and the [Wool GitHub repository](https://github.com/wool-labs/wool).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the last sentence, refer users to the project's README.

Comment thread WHITEPAPER.md

## Abstract

This paper presents Wool, a distributed Python runtime that extends Python's native async execution model to operate transparently across process boundaries. Wool is designed as a composable execution primitive rather than an application framework; it imposes no workflow model, no orchestration engine, and no state management layer. Instead, it provides a minimal set of structural protocols—for task dispatch, worker discovery, and load balancing—that integrate with Python's existing async semantics, including coroutines, async generators, exception propagation, and context variables. The design prioritizes efficiency, composibility, and simplicity, with the goal that distribution is expressed entirely through source-level annotations rather than external configuration, broker infrastructure, or deployment manifests. This paper describes the architecture, protocols, and implementation of Wool, with particular attention to the tradeoffs that follow from its positioning as a focused execution layer.

@conradbzura conradbzura Apr 6, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few items:

  • Rephrase "it imposes no workflow model, no orchestration engine, and no state management layer" and "it does not impose any workflow model, orchestration engine, or state management layer".
  • Drop "—for task dispatch, worker discovery, and load balancing—".
  • Swap order of "exception propagation" and "context variables".

Comment thread WHITEPAPER.md

### 1.1 Motivation

The Python ecosystem offers several mature frameworks for distributed execution, including Celery, Ray, Prefect, Dask, and Dramatiq. These systems address legitimate and well-understood problems, and their widespread adoption is evidence of their value. They share, however, a common architectural pattern: distributed execution is coupled with orchestration, state management, retry logic, and task scheduling. A developer seeking to execute a function on a remote process must generally adopt a programming model—task registries, DAGs, serialization constraints, message broker configuration—that imposes structure well beyond the execution concern itself.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop "—task registries, DAGs, serialization constraints, message broker configuration—".

Comment thread WHITEPAPER.md

### 1.2 Design Goals

The core principles that drive Wool's architecture are API simplicity and composability — where composability means that every extension point (discovery, load balancing, worker implementation, serialization) is a structural protocol that can be replaced independently without coupling to Wool's internal class hierarchy — while the core system goals are low dispatch overhead and horizontal scalability through decentralized coordination. We are willing to sacrifice certain desirable properties—such as at-least-once delivery, distributed state management, and centralized scheduling—in order to preserve these core goals. The rationale is that such properties are application-level concerns whose ideal implementation depends on the specific workload, and embedding them in the execution layer would constrain the kinds of workflows that Wool can support.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrase "every extension point (discovery, load balancing, worker implementation, serialization) is a structural protocol" as "core extension points are structural protocols".

Comment thread WHITEPAPER.md

Wool seeks to enable the transparent distribution of Python async functions and async generators across pools of worker processes. It provides worker lifecycle management, pluggable service discovery, client-side load balancing, and bidirectional streaming over gRPC. It does not provide a distributed object store, a centralized scheduler, a workflow engine, retry policies, task persistence, or an actor model. These are explicitly out of scope and are expected to be composed externally by users who need them.

## 2. Related Systems

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each of the paragraphs in this section reiterate that Wool's scope is purposely narrower – e.g., "Wool provides none of these facilities...", "Wool is considerably narrower in scope...", "Wool does not impose an actor model in its core...", "Wool does not manage workflows..." – and it's a bit repetitive. Instead, we should state what the related systems offer, then in a closing paragraph describe how Wool differs and why. This paragraph is where those points can be made, and any overlapping or closely related points can be consolidated. This is also where we can state that some of these features may be formalized in future work, but that they will always be optional add-ons rather than core runtime concerns (thus eliminating the "We expect to formalize support for these patterns in future work, but they would remain optional add-ons rather than core runtime concerns" sentence on line 94).

Also, the Python built-in paragraph should come first after the introductory sentence/paragraph.

Comment thread WHITEPAPER.md

## 3. Architecture Overview

### 3.1 Application Concepts

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This subsection introduces "concepts", but they read more like concrete components of the system. I think we need to reframe this subsection to be truly more conceptual, e.g., describe the concept of a worker pool (it's a declarative grouping of workers that emerges from the discovery protocol and exists only from the caller's perspective - "pools" in this context are not centrally managed and are in fact completely decoupled from worker management, although not exclusively so). The other core concepts are the Wool routine, worker discovery, and load balancing. We should touch on the peer-to-peer nature of Wool workers as well. Again, describe the nature of these components in a more abstract manner rather than their literal implementations here. This section should illustrate for the reader a mental model of how Wool works in an abstract sense.

Comment thread WHITEPAPER.md

Wool's runtime is composed of six components: the routine decorator, the worker pool, the worker proxy, a discovery service, a load balancer, and one or more workers. The first three are fixed parts of the runtime, whereas the discovery service, load balancer, and worker are each defined by a structural protocol (`DiscoveryLike`, `LoadBalancerLike`, and `WorkerLike` respectively) that can be replaced independently to suit the requirements of a given deployment.

The entry point for distribution is the `@wool.routine` decorator, which is applied to async functions and async generators. When a decorated function is called within an active `WorkerPool` context, the decorator constructs a `Task` object (a serializable envelope containing the callable, its arguments, and a reference to the current `WorkerProxy`) and passes it to the proxy for dispatch rather than executing it locally. Outside a pool context, the decorated function executes as an ordinary coroutine or async generator. In other words, the same function is distributable or local depending solely on whether a pool is active at the call site. The dispatch decision is governed internally by a context variable, `_do_dispatch`, which is toggled during worker-side execution to prevent infinite recursion: when a task runs on a worker, direct calls to the immediate callable execute locally, but any nested `@wool.routine` calls within its body are still dispatched outward to the pool (see `src/wool/runtime/routine/wrapper.py`).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't name _do_dispatch explicitly here, the description is sufficient.

Comment thread WHITEPAPER.md

The entry point for distribution is the `@wool.routine` decorator, which is applied to async functions and async generators. When a decorated function is called within an active `WorkerPool` context, the decorator constructs a `Task` object (a serializable envelope containing the callable, its arguments, and a reference to the current `WorkerProxy`) and passes it to the proxy for dispatch rather than executing it locally. Outside a pool context, the decorated function executes as an ordinary coroutine or async generator. In other words, the same function is distributable or local depending solely on whether a pool is active at the call site. The dispatch decision is governed internally by a context variable, `_do_dispatch`, which is toggled during worker-side execution to prevent infinite recursion: when a task runs on a worker, direct calls to the immediate callable execute locally, but any nested `@wool.routine` calls within its body are still dispatched outward to the pool (see `src/wool/runtime/routine/wrapper.py`).

The `WorkerPool` async context manager is responsible for orchestrating the lifecycle of a pool. On entry, it resolves its operational mode from the `spawn` and `discovery` parameters, optionally spawns ephemeral worker subprocesses, publishes their metadata to the discovery service, and creates a `WorkerProxy` configured with the specified discovery subscriber and load balancer. On exit, it reverses the process: ephemeral workers are unpublished and stopped, and the discovery service is closed. We believe this design keeps the pool's responsibilities narrow — it manages lifecycle and wiring, but does not participate in task dispatch, which is handled entirely by the proxy (see `src/wool/runtime/worker/pool.py`).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should mention the lease parameter here and that pools are unbounded by default (they'll consider to any number of workers that match the discovery namespace and predicate for dispatch).

Comment thread WHITEPAPER.md

The `WorkerPool` async context manager is responsible for orchestrating the lifecycle of a pool. On entry, it resolves its operational mode from the `spawn` and `discovery` parameters, optionally spawns ephemeral worker subprocesses, publishes their metadata to the discovery service, and creates a `WorkerProxy` configured with the specified discovery subscriber and load balancer. On exit, it reverses the process: ephemeral workers are unpublished and stopped, and the discovery service is closed. We believe this design keeps the pool's responsibilities narrow — it manages lifecycle and wiring, but does not participate in task dispatch, which is handled entirely by the proxy (see `src/wool/runtime/worker/pool.py`).

The `WorkerProxy` is the component that connects application code to the set of available workers. It subscribes to a discovery event stream and runs a sentinel task, which is a background async loop that consumes `worker-added`, `worker-updated`, and `worker-dropped` events and uses them to maintain a `LoadBalancerContext` — a mutable mapping of live workers to their gRPC connections. When a routine is dispatched, the proxy delegates to the load balancer, which selects a worker from this context. The proxy is lazily initialized by default, meaning it defers discovery subscription and sentinel startup until the first `dispatch()` call, which avoids unnecessary overhead for nested routines that may never execute nested calls. The proxy is also picklable: its `__reduce__` method serializes the discovery configuration, load balancer type, lease limit, and proxy identifier, and this serialization is the mechanism by which nested Wool routine calls on remote workers are able to locate and reconnect to their originating pool (see `src/wool/runtime/worker/proxy.py`).

@conradbzura conradbzura Apr 6, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should mention that worker connections are established lazily on dispatch and only live for the life of the request + a brief TTL.

Also, "a mutable mapping of live workers" will no longer be true once issue #155 is completed.

Comment thread WHITEPAPER.md

The `WorkerProxy` is the component that connects application code to the set of available workers. It subscribes to a discovery event stream and runs a sentinel task, which is a background async loop that consumes `worker-added`, `worker-updated`, and `worker-dropped` events and uses them to maintain a `LoadBalancerContext` — a mutable mapping of live workers to their gRPC connections. When a routine is dispatched, the proxy delegates to the load balancer, which selects a worker from this context. The proxy is lazily initialized by default, meaning it defers discovery subscription and sentinel startup until the first `dispatch()` call, which avoids unnecessary overhead for nested routines that may never execute nested calls. The proxy is also picklable: its `__reduce__` method serializes the discovery configuration, load balancer type, lease limit, and proxy identifier, and this serialization is the mechanism by which nested Wool routine calls on remote workers are able to locate and reconnect to their originating pool (see `src/wool/runtime/worker/proxy.py`).

Discovery is the mechanism by which workers advertise their availability and proxies learn about them. The `DiscoveryLike` structural protocol composes a publisher (a single `publish` method for broadcasting lifecycle events) and a subscriber (an `__aiter__` interface yielding `DiscoveryEvent` objects), with an optional `subscribe(filter=)` method for predicate-based filtered views. Wool ships two implementations: `LocalDiscovery`, which uses shared memory and filesystem notifications for single-machine pools, and `LanDiscovery`, which uses mDNS via Zeroconf for network-wide discovery. Both are scoped by a namespace string, so multiple independent pools can coexist on the same machine or network. Because `DiscoveryLike` is a structural protocol, a deployment that already uses a service registry such as Consul or etcd can implement a compatible backend without inheriting from any Wool base class — conforming to the method signatures is sufficient (see `src/wool/runtime/discovery/`).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should mention that discovery publishers are not strictly necessary for custom implementations of externally managed worker discovery so long as there is an alternative means of discovering worker metadata (e.g., etcd).

Comment thread WHITEPAPER.md

Discovery is the mechanism by which workers advertise their availability and proxies learn about them. The `DiscoveryLike` structural protocol composes a publisher (a single `publish` method for broadcasting lifecycle events) and a subscriber (an `__aiter__` interface yielding `DiscoveryEvent` objects), with an optional `subscribe(filter=)` method for predicate-based filtered views. Wool ships two implementations: `LocalDiscovery`, which uses shared memory and filesystem notifications for single-machine pools, and `LanDiscovery`, which uses mDNS via Zeroconf for network-wide discovery. Both are scoped by a namespace string, so multiple independent pools can coexist on the same machine or network. Because `DiscoveryLike` is a structural protocol, a deployment that already uses a service registry such as Consul or etcd can implement a compatible backend without inheriting from any Wool base class — conforming to the method signatures is sufficient (see `src/wool/runtime/discovery/`).

The load balancer determines which worker receives a given task. The `LoadBalancerLike` structural protocol requires a single async method, `dispatch(task, *, context, timeout)`, which returns an `AsyncGenerator` that streams results back to the caller. The `context` parameter is a `LoadBalancerContext` managed by the proxy's sentinel task, and because this context is passed in rather than owned by the balancer, a single balancer instance can service multiple independent pools without conflating their state. Wool ships a `RoundRobinLoadBalancer` as the default. For deployments where round-robin is insufficient — for example, when tasks have highly variable cost — a custom balancer that tracks in-flight counts or manages worker scaling can be substituted by conforming to the same single-method protocol (see `src/wool/runtime/loadbalancer/`).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preferred load balancer structural protocol will be changing with issue #155 – update this paragraph to reflect the new contract.

Comment thread WHITEPAPER.md

The load balancer determines which worker receives a given task. The `LoadBalancerLike` structural protocol requires a single async method, `dispatch(task, *, context, timeout)`, which returns an `AsyncGenerator` that streams results back to the caller. The `context` parameter is a `LoadBalancerContext` managed by the proxy's sentinel task, and because this context is passed in rather than owned by the balancer, a single balancer instance can service multiple independent pools without conflating their state. Wool ships a `RoundRobinLoadBalancer` as the default. For deployments where round-robin is insufficient — for example, when tasks have highly variable cost — a custom balancer that tracks in-flight counts or manages worker scaling can be substituted by conforming to the same single-method protocol (see `src/wool/runtime/loadbalancer/`).

Workers are the processes that actually execute dispatched tasks. The `WorkerLike` structural protocol requires `start`, `stop`, `metadata`, `address`, `uid`, `tags`, and `extra` members, and Wool's default implementation, `LocalWorker`, spawns a subprocess containing a gRPC server, a `WorkerService` that deserializes and executes tasks, and a dedicated worker event loop on a background thread that isolates CPU-bound task execution from gRPC I/O. Each worker advertises its capabilities and transport configuration through a `WorkerMetadata` dataclass (including its address, capability tags, and gRPC channel options), which is published to the discovery service and used by the proxy to configure gRPC connections automatically. In other words, workers are self-describing: a client that discovers a worker can connect to it with the correct message size limits, keepalive intervals, and compression settings without any out-of-band configuration (see `src/wool/runtime/worker/local.py`, `src/wool/runtime/worker/process.py`, `src/wool/runtime/worker/service.py`).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should mention that WorkerLike really defines a worker process management interface rather than a custom worker hook directly – users are strongly encouraged to reuse WorkerProcess within their custom worker management wrappers should they choose to build one.

Comment thread WHITEPAPER.md

*Figure 2. Static layer dependencies. Each arrow represents an import dependency between architectural layers. The Worker Runtime is the most connected layer, depending on all others. Discovery and Routine & Task are peers that share the Shared Utilities layer but do not depend on each other. The Wire Protocol has no upward dependencies.*

### 3.3 ResourcePool

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This subsection is very literal. We should instead describe resource pool as a concept along with its motivation (ultimately, to be efficient with system resources like file descriptors, web sockets, HTTP2 connections, etc.). We can touch on how this is achieved with the ResourcePool type (which is set to be renamed to ReferenceCountedCache in issue #151), but the main focus should be resource caching as a means of achieving the "efficiency" design goal outlined in the first section of this document.

Comment thread WHITEPAPER.md
}
```

The choice of a single bidirectional stream for `dispatch` — rather than separate unary and server-streaming RPCs — is deliberate, since a Wool routine may be either an ordinary coroutine that returns one value or an async generator that yields many values, and the caller cannot always know which case applies at invocation time. A unified streaming RPC accommodates both patterns without negotiation. The `stop` RPC is kept unary because graceful shutdown is a one-shot command; it accepts a `StopRequest` carrying a float timeout and returns a `Void` sentinel once the worker has drained or cancelled in-flight work. (Code reference: `proto/wire.proto`.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The claim "and the caller cannot always know which case applies at invocation time" is downright false. The server must stream because the protocol involves an initial handshake that gives the server an opportunity to reject a task due to version or security incompatibility, resource constraints (back-pressure), etc. Server stream would be sufficient for the simple coroutine case, but it's simpler to reason about and maintain a single endpoint, and the async generator case requires bidirectional streaming.

Comment thread WHITEPAPER.md

### 4.2 Message Schema

The `Task` message constitutes the initial payload of every dispatch and carries the full execution context. Its first four fields — `version`, `id`, `caller`, and `tag` — form the lightweight "envelope" that can be parsed independently via the `TaskEnvelope` message. `TaskEnvelope` is a separate protobuf message definition that declares only these four fields using the same field numbers as `Task`. Because Protocol Buffers ignores unknown fields during deserialization, parsing a full `Task` byte stream as a `TaskEnvelope` silently skips the heavy cloudpickle payload fields (callable, args, kwargs) and returns only the metadata. This is the mechanism that makes the `VersionInterceptor` (Section 4.5) efficient: it can extract the client's protocol version from the raw bytes of every incoming dispatch without deserializing any of the executable payload. The remaining fields encode the executable material: `callable`, `args`, and `kwargs` hold the pickled function and its arguments; `proxy` and `proxy_id` identify the worker proxy through which the task was dispatched; `timeout` governs per-task deadlines; and the optional `context` carries a `RuntimeContext` snapshot so that caller-side settings such as `dispatch_timeout` are propagated to the worker. An optional `serializer` field, when present, holds a pickled `Serializer` instance that the receiving side restores via an LRU-cached `_unpickle_serializer` to decode the payload fields.

@conradbzura conradbzura Apr 6, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few items:

  • Rephrase "skips the heavy" and "skips the potentially heavy". Another reason for the envelope is that the rest of the payload may be of a different shape than the worker expects and fail to deserialize – the envelope makes it possible to return a clear version incompatibility error to the caller rather than an opaque deserialization error.
  • Instead of pickled, we should refer to the callable, arg, and kwarg payload fields as serialized since other serializers may be used (for example, the PassthroughSerializer for self-dispatch).

Comment thread WHITEPAPER.md

The `Task` message constitutes the initial payload of every dispatch and carries the full execution context. Its first four fields — `version`, `id`, `caller`, and `tag` — form the lightweight "envelope" that can be parsed independently via the `TaskEnvelope` message. `TaskEnvelope` is a separate protobuf message definition that declares only these four fields using the same field numbers as `Task`. Because Protocol Buffers ignores unknown fields during deserialization, parsing a full `Task` byte stream as a `TaskEnvelope` silently skips the heavy cloudpickle payload fields (callable, args, kwargs) and returns only the metadata. This is the mechanism that makes the `VersionInterceptor` (Section 4.5) efficient: it can extract the client's protocol version from the raw bytes of every incoming dispatch without deserializing any of the executable payload. The remaining fields encode the executable material: `callable`, `args`, and `kwargs` hold the pickled function and its arguments; `proxy` and `proxy_id` identify the worker proxy through which the task was dispatched; `timeout` governs per-task deadlines; and the optional `context` carries a `RuntimeContext` snapshot so that caller-side settings such as `dispatch_timeout` are propagated to the worker. An optional `serializer` field, when present, holds a pickled `Serializer` instance that the receiving side restores via an LRU-cached `_unpickle_serializer` to decode the payload fields.

The `Request` message wraps a `oneof payload` whose four alternatives correspond to the stages of the async iteration protocol: `task` initiates execution, `next` advances a remote generator, `send` pushes a value into the generator, and `throw` injects an exception. The `Response` mirror offers `ack` (task accepted), `nack` (task rejected, with a human-readable `reason`), `result` (a yielded or returned value), and `exception` (a remotely raised error). Both `result` and `exception` carry a `Message` whose single `bytes dump` field contains a cloudpickle-serialized Python object.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The send and throw payloads are also serialized. We should actually verify that these as well as the return values and exceptions are serialized using the serializer specified on the task and not just cloudpickle indiscriminately.

Comment thread WHITEPAPER.md
| Worker proxies | `WorkerProxy` | Proxy instance | `proxy.enter()` | `proxy.exit()` | 60s | `process.py` (via `__proxy_pool__` `ContextVar`) |
| Worker event loops | `(EventLoop, Thread)` | `"worker-loop"` | `new_event_loop()` + `Thread(run_forever)` | `call_soon_threadsafe(stop)` + `join` + `close` | None | `service.py` |

## 4. Wire Protocol

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add a subsection that covers how transport errors and both client- and server-side task cancellations are manifested and managed by the system.

Comment thread WHITEPAPER.md

### 5.1 Task Lifecycle

A task in Wool traverses several well-defined stages from invocation to result delivery: construction, serialization, transport, acknowledgment, deserialization, execution, and result return.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line could be improved. It covers ser/des on the front end, but then ends simply with "result return" foregoing the ser/des details on the back end. I think the introduction can be more conceptual, as the following subsections dive into the specifics. E.g., "When a Wool routine is invoked on a client, the routine is packaged as a task, dispatched to a worker via client-side load balancer, and finally either accepted, unpacked and executed by the worker, or rejected outright. Depending on the routine type, results are packaged and either returned or streamed back to the originating client. If an exception is raised by the Wool routine on the worker, it gets packaged and transmitted back to the client where it is re-raised with its full stack trace intact."

Comment thread WHITEPAPER.md

A task in Wool traverses several well-defined stages from invocation to result delivery: construction, serialization, transport, acknowledgment, deserialization, execution, and result return.

**Construction.** When a caller invokes a `@wool.routine`-decorated function, the wrapper first consults the `_do_dispatch` context variable to determine whether remote dispatch is active. If it is, the wrapper retrieves the current `WorkerProxy` from the process-wide context variable `wool.__proxy__`, constructs a `Task` dataclass, and passes it to `proxy.dispatch()`. The `Task` captures the callable, its positional and keyword arguments, a unique UUID, a human-readable tag (module, qualified name, and source line number), and a reference to the proxy. During `__post_init__`, the `Task` also records a `caller` identifier if a parent task is active (enabling nested-task tracking) and snapshots the current `RuntimeContext`, which at present carries the dispatch timeout setting. (Code references: `src/wool/runtime/routine/wrapper.py`, `src/wool/runtime/routine/task.py`.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call this subsection "Invocation". Rephrase "When a caller invokes..." as "When a caller awaits...". Don't mention _do_dispatch explicitly, just describe the behavior. In addition to the mentioned fields, Task also captures Wool context variables explicitly set by the caller, and there may be additional fields added, so future proof this claim a bit by ending with something like "among other information".

Comment thread WHITEPAPER.md

**Construction.** When a caller invokes a `@wool.routine`-decorated function, the wrapper first consults the `_do_dispatch` context variable to determine whether remote dispatch is active. If it is, the wrapper retrieves the current `WorkerProxy` from the process-wide context variable `wool.__proxy__`, constructs a `Task` dataclass, and passes it to `proxy.dispatch()`. The `Task` captures the callable, its positional and keyword arguments, a unique UUID, a human-readable tag (module, qualified name, and source line number), and a reference to the proxy. During `__post_init__`, the `Task` also records a `caller` identifier if a parent task is active (enabling nested-task tracking) and snapshots the current `RuntimeContext`, which at present carries the dispatch timeout setting. (Code references: `src/wool/runtime/routine/wrapper.py`, `src/wool/runtime/routine/task.py`.)

**Serialization and transport.** The proxy delegates to the load balancer, which selects a target worker from its pool and invokes `WorkerConnection.dispatch()`. The connection acquires a pooled gRPC channel keyed by the tuple `(target, credentials, channel_options)` and obtains a concurrency semaphore slot sized by `max_concurrent_streams`. The `Task` is serialized to protobuf via `to_protobuf()`: the callable, args, and kwargs are each individually pickled with `cloudpickle`, while the proxy reference is always cloudpickled regardless of serializer. The serialized protobuf is wrapped in a `Request` message and written to a bidirectional gRPC stream. (Code reference: `src/wool/runtime/worker/connection.py`.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The load balancer will no longer be responsible for dispatching tasks via WorkerConnection once #156 is implemented. It will be solely responsible for delegating a worker back to the proxy, which will hence be responsible for the actual dispatch.

Comment thread WHITEPAPER.md

**Serialization and transport.** The proxy delegates to the load balancer, which selects a target worker from its pool and invokes `WorkerConnection.dispatch()`. The connection acquires a pooled gRPC channel keyed by the tuple `(target, credentials, channel_options)` and obtains a concurrency semaphore slot sized by `max_concurrent_streams`. The `Task` is serialized to protobuf via `to_protobuf()`: the callable, args, and kwargs are each individually pickled with `cloudpickle`, while the proxy reference is always cloudpickled regardless of serializer. The serialized protobuf is wrapped in a `Request` message and written to a bidirectional gRPC stream. (Code reference: `src/wool/runtime/worker/connection.py`.)

**Acknowledgment.** Upon receiving the first `Request`, the worker-side `WorkerService.dispatch()` deserializes the `Task` from protobuf, optionally evaluates a backpressure hook, and responds with an `Ack` to signal acceptance. If the worker is shutting down, it aborts with `UNAVAILABLE`; if backpressure rejects the task, it aborts with `RESOURCE_EXHAUSTED`. On the client side, the connection validates the `Ack` and, if it receives a `Nack` or an unexpected payload, raises an `RpcError` or `UnexpectedResponse`. Transient gRPC errors (`UNAVAILABLE`, `DEADLINE_EXCEEDED`, `RESOURCE_EXHAUSTED`) are wrapped as `TransientRpcError` so the load balancer can retry on a different worker. (Code references: `src/wool/runtime/worker/service.py`, `src/wool/runtime/worker/connection.py`.)

@conradbzura conradbzura Apr 8, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrase "optionally evaluates a backpressure hook, and..." to "determines whether to apply back pressure, and, if not, ...".

Also, with #156, "load balancer can retry on a different worker" will no longer be accurate – the proxy will be retrying on another worker delegated by the load balancer.

Comment thread WHITEPAPER.md

**Acknowledgment.** Upon receiving the first `Request`, the worker-side `WorkerService.dispatch()` deserializes the `Task` from protobuf, optionally evaluates a backpressure hook, and responds with an `Ack` to signal acceptance. If the worker is shutting down, it aborts with `UNAVAILABLE`; if backpressure rejects the task, it aborts with `RESOURCE_EXHAUSTED`. On the client side, the connection validates the `Ack` and, if it receives a `Nack` or an unexpected payload, raises an `RpcError` or `UnexpectedResponse`. Transient gRPC errors (`UNAVAILABLE`, `DEADLINE_EXCEEDED`, `RESOURCE_EXHAUSTED`) are wrapped as `TransientRpcError` so the load balancer can retry on a different worker. (Code references: `src/wool/runtime/worker/service.py`, `src/wool/runtime/worker/connection.py`.)

**Execution and result return.** For coroutine tasks, the worker schedules execution on the worker event loop (see Section 5.3), awaits the result, serializes it with `cloudpickle`, and sends it as a single `Message` response. For async generator tasks, execution follows a streaming protocol described in Section 5.2. In both cases, the `Task` enters its own context manager during execution: it sets itself as the `_current_task` context variable (so nested tasks can record their caller), restores the captured `RuntimeContext`, and sets `do_dispatch(False)` to prevent re-dispatch of the immediate callable while still allowing nested `@wool.routine` calls within its body to dispatch normally. (Code reference: `src/wool/runtime/routine/task.py`.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't mention _current_task explicitly, just describe the behavior, e.g., "it sets itself as the current task via context variable".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Draft technical whitepaper for Wool

1 participant