Skip to content

Refactor load balancer protocol to delegate worker selection instead of dispatching tasks — Closes #155#156

Open
conradbzura wants to merge 4 commits into
mainfrom
155-delegate-loadbalancer-protocol
Open

Refactor load balancer protocol to delegate worker selection instead of dispatching tasks — Closes #155#156
conradbzura wants to merge 4 commits into
mainfrom
155-delegate-loadbalancer-protocol

Conversation

@conradbzura

@conradbzura conradbzura commented Apr 6, 2026

Copy link
Copy Markdown
Contributor

Summary

Split the load balancer protocol so that balancers own only worker selection and routing policy, while WorkerProxy owns the dispatch-retry-evict loop. Introduce DelegatingLoadBalancerLike — a protocol whose single delegate method yields (WorkerMetadata, WorkerConnection) candidates and receives dispatch outcomes from the proxy via asend (success) and athrow (failure). The proxy calls WorkerConnection.dispatch directly, classifies errors as transient vs non-transient, evicts workers on non-transient failures, and enforces a contract that asend is terminal — yielding after a success signal raises RuntimeError.

The previous LoadBalancerLike protocol is renamed to DispatchingLoadBalancerLike and deprecated with both a @typing_extensions.deprecated annotation and a runtime DeprecationWarning at WorkerProxy.start(). LoadBalancerLike becomes a transitional union alias that narrows to DelegatingLoadBalancerLike in the next major release. RoundRobinLoadBalancer migrates to the new protocol and reduces to a pure cycling generator with no dispatch, error-classification, or eviction code.

A new read-only LoadBalancerContextView protocol enforces at the type level that delegating balancers cannot mutate pool membership. The proxy passes LoadBalancerContext (which satisfies both the view and the full mutable protocol) but the balancer's type annotation only sees the read-only surface.

Cancellation semantics are preserved from the pre-refactor behavior: CancelledError propagates through the proxy without eviction or retry, and a stream-ownership handoff via a sentinel variable ensures the gRPC call is released if the proxy unwinds between connection.dispatch returning and the stream being handed to the caller.

Closes #155

Proposed changes

New protocols and read-only context (loadbalancer/base.py)

Add three new public types:

  • LoadBalancerContextView — runtime-checkable protocol exposing only the workers property. Passed to DelegatingLoadBalancerLike.delegate so balancers can observe pool membership without mutating it.
  • DelegatingLoadBalancerLike — runtime-checkable protocol with a single delegate(*, context) async generator method. The generator supports three proxy-driven signals: anext() (next candidate), athrow(exc) (dispatch failed), and asend(metadata) (dispatch succeeded — generator must terminate).
  • DispatchingLoadBalancerLike — the former LoadBalancerLike, renamed and annotated with @deprecated. Retains the old dispatch(task, *, context, timeout) signature for backwards compatibility.

LoadBalancerLike becomes TypeAlias = DispatchingLoadBalancerLike | DelegatingLoadBalancerLike. LoadBalancerContextLike now inherits from LoadBalancerContextView, preserving backwards compatibility for the deprecated path.

RoundRobinLoadBalancer migration (loadbalancer/roundrobin.py)

Replace dispatch with delegate. The implementation reduces to:

  1. Read the index for the context, wrap at the end
  2. Yield (metadata, connection)
  3. On athrow (failure): advance to next worker
  4. On asend with non-None value (success): return
  5. On full cycle (checkpoint UID match): return → proxy raises NoWorkersAvailable

The _lock now scopes only the index read/advance, not the yield — eliminating the pre-refactor risk of holding the lock across a slow connection.dispatch.

Proxy-owned dispatch loop (worker/proxy.py)

WorkerProxy.dispatch branches on the balancer's protocol at dispatch time:

  • DelegatingLoadBalancerLike: route through _delegate_dispatch, which owns the full retry-evict loop. The loop uses except Exception (not BaseException) for non-transient errors so CancelledError, KeyboardInterrupt, and SystemExit propagate without eviction or retry. An inner try/finally with a stream = None sentinel ensures the gRPC stream is aclosed if the success path unwinds before handoff (cancellation during asend, or RuntimeError from the contract check).
  • DispatchingLoadBalancerLike: fall through to the legacy lb.dispatch(task, context=..., timeout=...) call. A DeprecationWarning is emitted once at start().

isinstance(lb, DelegatingLoadBalancerLike) is checked first so classes implementing both protocols during migration use the new path.

Documentation updates

Update all affected READMEs and docstrings for the new protocol:

  • Rewrite loadbalancer/README.md around the delegate protocol. Document the three-signal contract, the asend-is-terminal rule, and the deprecation path. Replace the LeastLoadedBalancer example with a delegate-based sketch demonstrating in-flight tracking via asend/athrow.
  • Update WorkerProxy and WorkerPool class docstring examples from stale dispatch-override patterns to delegate-based examples.
  • Update worker/README.md error classification table header from "Load balancer behavior" to "Dispatch behavior" reflecting the proxy's ownership of retry/eviction.
  • Fix discovery/README.md reference to the deprecated size parameter (now spawn).

Public exports (wool/__init__.py)

Export DelegatingLoadBalancerLike, DispatchingLoadBalancerLike, and LoadBalancerContextView. LoadBalancerLike continues to be exported as the union alias.

Test cases

# Test Suite Given When Then Coverage Target
1 TestLoadBalancerContextView A class with only a workers property Checked against LoadBalancerContextView Satisfies the protocol View conformance
2 TestLoadBalancerContextView A class missing the workers property Checked against LoadBalancerContextView Does not satisfy the protocol View negative conformance
3 TestLoadBalancerContextView A concrete LoadBalancerContext Checked against LoadBalancerContextView Satisfies the protocol View-to-concrete compatibility
4 TestDispatchingLoadBalancerLike A class with dispatch Checked against DispatchingLoadBalancerLike Satisfies the protocol Deprecated protocol conformance
5 TestDispatchingLoadBalancerLike A class without dispatch Checked against DispatchingLoadBalancerLike Does not satisfy the protocol Deprecated protocol negative
6 TestDelegatingLoadBalancerLike A class with delegate as async generator Checked against DelegatingLoadBalancerLike Satisfies the protocol New protocol conformance
7 TestDelegatingLoadBalancerLike A class without delegate Checked against DelegatingLoadBalancerLike Does not satisfy the protocol New protocol negative
8 TestLoadBalancerLike A delegating implementation Checked against union alias Satisfies the union Union delegating member
9 TestLoadBalancerLike A dispatching implementation Checked against union alias Satisfies the union Union dispatching member
10 TestLoadBalancerLike A class with neither protocol Checked against union alias Does not satisfy the union Union negative
11 TestLoadBalancerLike A class implementing both protocols Checked against both individually Satisfies both Migration edge case
12 TestRoundRobinLoadBalancer A RoundRobinLoadBalancer instance Checked against DelegatingLoadBalancerLike Satisfies the protocol Protocol conformance
13 TestRoundRobinLoadBalancer A balancer driven through one delegate cycle Pickled and unpickled, then driven on the same context Restored instance starts from position zero Pickle roundtrip via public API
14 TestRoundRobinLoadBalancer An empty context delegate() driven with anext StopAsyncIteration on first call Empty-context exhaustion
15 TestRoundRobinLoadBalancer A single-worker context delegate() driven with anext Yields the single worker First-yield behavior
16 TestRoundRobinLoadBalancer A multi-worker context after one successful dispatch delegate() driven again Next candidate is workers[1] Post-asend index advance
17 TestRoundRobinLoadBalancer A multi-worker context, first candidate yielded Proxy reports failure via athrow Next candidate is the subsequent worker Post-athrow advance
18 TestRoundRobinLoadBalancer N workers all failing athrow called repeatedly Exactly N candidates yielded before StopAsyncIteration Full-cycle checkpoint exhaustion
19 TestRoundRobinLoadBalancer A multi-worker context where the first candidate is evicted athrow called after context mutation Next candidate drawn from the mutated context Context-mutation reactivity
20 TestRoundRobinLoadBalancer A multi-worker context, first candidate yielded asend(metadata) called StopAsyncIteration raised asend-terminal contract
21 TestRoundRobinLoadBalancer 4 workers 4 concurrent delegate() drivers each completing with asend Each lands on a distinct worker Concurrent fairness
22 TestRoundRobinLoadBalancer A multi-worker context after proxy-driven eviction athrow with non-transient error Next candidate yielded from remaining workers Non-transient parity
23 TestRoundRobinLoadBalancer N workers (2-8) and F failures (0 to N-1) followed by success Generator driven through the sequence Yielded candidates are workers[0..F] in order Hypothesis round-robin sequence
24 TestWorkerProxyDelegateDispatch Two workers, first raises TransientRpcError, second succeeds dispatch() called Successful stream returned, neither worker evicted Transient skip without eviction
25 TestWorkerProxyDelegateDispatch Two workers, first raises generic Exception, second succeeds dispatch() called First worker evicted before balancer sees athrow, second succeeds Non-transient evict-before-athrow
26 TestWorkerProxyDelegateDispatch A delegate that yields zero candidates dispatch() called NoWorkersAvailable raised via initial-anext path Empty-generator exhaustion
27 TestWorkerProxyDelegateDispatch One worker raising non-transient error, no further candidates dispatch() called NoWorkersAvailable raised, worker evicted Non-transient athrow exhaustion
28 TestWorkerProxyDelegateDispatch One worker raising TransientRpcError, no further candidates dispatch() called NoWorkersAvailable raised Transient exhaustion
29 TestWorkerProxyDelegateDispatch A recording balancer with one worker dispatch() succeeds Balancer observed asend(metadata) equal to the winning worker Success notification via asend
30 TestWorkerProxyDelegateDispatch A malformed balancer that yields after asend dispatch() called RuntimeError raised, orphaned stream aclosed Contract enforcement + stream handoff
31 TestWorkerProxyDelegateDispatch One worker whose connection.dispatch hangs Outer task cancelled mid-dispatch CancelledError propagates, worker not evicted, no retry Cancellation during dispatch
32 TestWorkerProxyDelegateDispatch A balancer hanging during asend bookkeeping Outer task cancelled mid-asend Spy stream's aclose called, CancelledError propagates Cancellation during asend
33 TestWorkerProxyDelegateDispatch A legacy DispatchingLoadBalancerLike only proxy.start() called DeprecationWarning emitted referencing DelegatingLoadBalancerLike Legacy deprecation warning
34 TestWorkerProxyDelegateDispatch A started proxy with a legacy DispatchingLoadBalancerLike dispatch() called Legacy dispatch method invoked and stream returned Legacy path end-to-end
35 test_public The wool.__all__ list Compared to expected list DelegatingLoadBalancerLike, DispatchingLoadBalancerLike, LoadBalancerContextView present Public API surface

@conradbzura conradbzura force-pushed the 155-delegate-loadbalancer-protocol branch from 7b4937b to 1144ba2 Compare April 6, 2026 01:03
@conradbzura conradbzura marked this pull request as ready for review April 6, 2026 01:05
@conradbzura conradbzura force-pushed the 155-delegate-loadbalancer-protocol branch from 1144ba2 to 0439c22 Compare April 6, 2026 01:21
@conradbzura conradbzura self-assigned this Apr 6, 2026
Split the LoadBalancerLike protocol into two: the new
DelegatingLoadBalancerLike, which only selects worker candidates, and
the deprecated DispatchingLoadBalancerLike, which retains the old
responsibility of owning task dispatch. LoadBalancerLike becomes a
transitional union alias that narrows to the delegating protocol in
the next major release.

A delegating load balancer implements delegate(*, context), an async
generator that yields (WorkerMetadata, WorkerConnection) candidates
and receives dispatch outcomes from the proxy:

  - anext() requests the next candidate
  - athrow(exc) reports a failure; the proxy evicts the worker from
    the context before throwing non-transient errors so the balancer
    observes the capacity change
  - asend(metadata) reports a success; the generator MUST terminate
    after this signal — yielding another candidate is a protocol
    violation and surfaces as a RuntimeError from WorkerProxy

The context passed to delegate is typed as the new read-only
LoadBalancerContextView protocol, enforcing at the type level that
load balancers cannot mutate pool membership. Eviction lives in the
proxy exclusively.

WorkerProxy.dispatch now branches on the balancer's protocol and, for
delegating balancers, owns the full dispatch-retry-evict loop via
_delegate_dispatch. The loop catches Exception (not BaseException)
for non-transient errors so CancelledError, KeyboardInterrupt, and
SystemExit propagate without eviction or retry — cancellation is
caller intent, not a worker health signal. A stream-ownership
handoff via a sentinel variable ensures the gRPC call is released if
the proxy unwinds between connection.dispatch returning and the
stream being handed back to the caller. A DeprecationWarning is
emitted once at start() when a DispatchingLoadBalancerLike is passed.

RoundRobinLoadBalancer reduces to a pure cycling generator: no
dispatch, no error classification, no eviction. The checkpoint-by-UID
termination logic and pickle-safe __reduce__ are preserved.

The load balancer README is rewritten around the delegate protocol,
documents the asend-is-terminal contract, and calls out the
deprecation.
Protocol conformance tests in test_base.py now cover all three
protocols — LoadBalancerContextView, DispatchingLoadBalancerLike,
DelegatingLoadBalancerLike — plus the LoadBalancerLike union alias,
including an edge-case test for classes that implement both
protocols during migration.

test_roundrobin.py is rewritten around the delegate API. The 12
tests cover empty context exhaustion, first-yield behavior, index
advancement on asend and athrow, full-cycle exhaustion via
checkpoint, context mutation reactivity (proxy-driven eviction
between yields), asend as terminal signal, concurrent delegate
drivers for fairness, Hypothesis-driven round-robin sequence
verification, and non-transient error handling. The obsolete
dispatch_side_effect_factory fixture is removed from
loadbalancer/conftest.py.

test_proxy.py adds TestWorkerProxyDelegateDispatch to cover the
proxy-owned retry-evict loop. Tests exercise transient errors
(skip without eviction), non-transient errors (evict before
notifying the balancer), candidate exhaustion in both the initial
anext and the non-transient athrow paths, empty delegate behavior,
success notification via asend, the asend-is-terminal contract
(verified via a malformed balancer that raises RuntimeError and
closes the orphaned stream), cancellation during connection.dispatch
(no eviction, no retry, CancelledError propagates) and cancellation
during asend (orphaned stream is aclosed before CancelledError
propagates). Two tests cover the legacy path: the deprecation
warning is emitted at start(), and dispatch through a legacy
DispatchingLoadBalancerLike still works.

Existing dispatch tests in test_proxy.py (spy_loadbalancer_with_workers,
FailingLoadBalancer, WaitingLoadBalancer, StubLoadBalancer) are
updated from the legacy dispatch method to the delegate async
generator. Dead worker_*_callback helpers on the spy balancer —
which the production proxy never called — are removed. The
mock_worker_connection fixture's dispatch stub now accepts the
timeout keyword argument the proxy passes.

test_public.py is updated to expect the three new public exports:
DelegatingLoadBalancerLike, DispatchingLoadBalancerLike,
LoadBalancerContextView.
The test guide requires tests to exercise only public APIs. Three
violations introduced in the prior commit are corrected:

The pickle roundtrip test for RoundRobinLoadBalancer was probing
_index and _lock directly. It now drives delegate() via the public
API before and after pickle, asserting that the restored instance
starts cycling from position zero on the same context.

TestWorkerProxyDelegateDispatch._start_proxy_with_workers was seeding
workers via proxy._loadbalancer_context.add_worker, bypassing the
public discovery flow. Replaced with _make_proxy_with_workers, which
constructs the proxy with a ReducibleAsyncIterator discovery stream,
patches WorkerConnection so the sentinel creates the intended mock
connections, and waits via the public proxy.workers property.

The legacy dispatch test was similarly using _loadbalancer_context
directly; it now uses the same helper. The orphaned empty
loadbalancer conftest.py is deleted.
The loadbalancer README example used except BaseException around the
delegate yield, which would swallow GeneratorExit from aclose() and
CancelledError from task cancellation. Corrected to except Exception
to match the RoundRobinLoadBalancer implementation.

The worker README error classification table header said "Load
balancer behavior" but post-refactor this is the proxy's
responsibility. Updated to "Dispatch behavior" with clearer action
descriptions (skip vs evict).

The discovery README referenced the deprecated size parameter in its
description of pool modes. Updated to spawn.

The dispatch method accepts a :py:class:`Task` and returns an async
iterator that yields task results from the worker.
class DispatchingLoadBalancerLike(Protocol):

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't rename this as it's not backwards compatible.

) -> AsyncGenerator[
tuple[WorkerMetadata, WorkerConnection],
WorkerMetadata | None,
]: ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a docstring to delegate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor load balancer protocol to delegate worker selection instead of dispatching tasks

1 participant