You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A worker advertised over LanDiscovery with only IPv6 addresses in its DNS-SD record never surfaces as a discovery event — no worker-added, no error, no log. Since #237, advertise_host accepts IPv6 literals, so the dead end is reachable through supported, validated configuration: the publisher registers the record successfully and every subscriber silently ignores it.
Steps to reproduce:
Publish a worker through LanDiscovery("ns", advertise_host="fd00::1").publisher.
Iterate LanDiscovery("ns").subscriber.
No event ever arrives, and a pool dispatching on that namespace waits forever on the quorum gate.
The IPv6 case is one instance of a broader gap. The zeroconf listener handlers (_handle_add_service / _handle_update_service) drop every unusable announcement silently, across three distinct swallow sites: an unresolvable service-info fetch (None), a malformed record (_deserialize_metadata raising ValueError), and the catch-all except Exception: pass that hides the IPv6 IndexError and any defect in the handler itself. Making the IPv6 record deserialize cleanly removes one cause, but the silent-drop machinery around it stays opaque. Both halves touch the same except blocks, so this issue covers the IPv6 fix together with making the drops observable.
Expected behavior
Address resolution:
Prefer IPv4 when the record carries both address families (current behavior for V4-bearing records is unchanged).
Fall back to the first IPv6 address when the record is IPv6-only, formatted bracketed — [fd00::1]:<port> — so the address survives host/port splitting (the bracket-aware _split_host_port introduced in Make LanDiscovery-published worker pools reachable across hosts #237 already parses this form).
Raise ValueError from _deserialize_metadata when a record has no usable address — the zeroconf listeners already catch ValueError and skip deliberately — instead of leaking IndexError into the broad except Exception: pass handler.
Observability of dropped records — the drops are a rate phenomenon (the operator signal is "malformed records/sec is climbing," indicating a version-skewed or buggy peer on the namespace), so the production-facing answer is a metric, not a per-event log and not a warning. Treat the three sites by character:
Unresolvable fetch (service_info is None): expected churn, i.e., a service caught mid-registration, a browse racing an unregister, or TTL expiry. logger.debug at most; not an anomaly.
Malformed record (ValueError, including the address-less case above): a peer published something this version cannot parse. logger.warning with structured fields (the service name, the reason), not a bare formatted message.
Catch-all: narrow except Exception: pass and use logger.exception so an unexpected failure carries a traceback instead of vanishing. This is the site that currently hides both the IPv6 IndexError and any bug in the handler itself.
Emit through stdlib logging with structured extra={} fields, not f-string-interpolated messages, so a future OpenTelemetry log bridge exports real attributes instead of re-parsing strings; and name the counter these sites would feed — wool.discovery.records_dropped, dimensioned by {reason, backend} — as the eventual metric, without taking an OTel dependency now. Do not route these data-plane drops through warnings.warn: warnings are stderr-bound, deduplicated by source location (so they under-report a repeating bad record), and lossy under OTel log capture. Reserve the warning categories for deterministic, developer-facing configuration signals, e.g., LoopbackAdvertisementWarning. The diagnostic vocabulary should be discovery-agnostic — LocalDiscovery's subscriber has the analogous malformed-record drop — so the logging keys and the eventual counter apply to both backends.
Root cause
_deserialize_metadata (wool/src/wool/runtime/discovery/lan.py) resolves the advertised address with a V4-only lookup and indexes the first entry:
For an IPv6-only record the list is empty and the IndexError propagates into _handle_add_service / _handle_update_service, whose except Exception: pass swallows it — the worker is invisible to subscribers with no trace.
Implementation notes:
Verify gRPC dial-target compatibility for the bracketed-IPv6 WorkerMetadata.address form (grpc.aio.insecure_channel("[fd00::1]:50051")) before relying on the fallback for dispatch.
Unit tests: a V6-only record surfaces with a bracketed address; a V4+V6 record prefers V4; an address-less record is skipped without breaking the event stream — via the real-zeroconf ServiceInfo registration pattern in test___aiter___ignores_malformed_zeroconf_service. Assert the differentiated diagnostics with caplog: a malformed record logs at WARNING with the structured reason, an unresolvable fetch stays at DEBUG, and the catch-all path logs an exception with traceback.
Description
A worker advertised over
LanDiscoverywith only IPv6 addresses in its DNS-SD record never surfaces as a discovery event — noworker-added, no error, no log. Since #237,advertise_hostaccepts IPv6 literals, so the dead end is reachable through supported, validated configuration: the publisher registers the record successfully and every subscriber silently ignores it.Steps to reproduce:
LanDiscovery("ns", advertise_host="fd00::1").publisher.LanDiscovery("ns").subscriber.The IPv6 case is one instance of a broader gap. The zeroconf listener handlers (
_handle_add_service/_handle_update_service) drop every unusable announcement silently, across three distinct swallow sites: an unresolvable service-info fetch (None), a malformed record (_deserialize_metadataraisingValueError), and the catch-allexcept Exception: passthat hides the IPv6IndexErrorand any defect in the handler itself. Making the IPv6 record deserialize cleanly removes one cause, but the silent-drop machinery around it stays opaque. Both halves touch the sameexceptblocks, so this issue covers the IPv6 fix together with making the drops observable.Expected behavior
Address resolution:
[fd00::1]:<port>— so the address survives host/port splitting (the bracket-aware_split_host_portintroduced in Make LanDiscovery-published worker pools reachable across hosts #237 already parses this form).ValueErrorfrom_deserialize_metadatawhen a record has no usable address — the zeroconf listeners already catchValueErrorand skip deliberately — instead of leakingIndexErrorinto the broadexcept Exception: passhandler.Observability of dropped records — the drops are a rate phenomenon (the operator signal is "malformed records/sec is climbing," indicating a version-skewed or buggy peer on the namespace), so the production-facing answer is a metric, not a per-event log and not a warning. Treat the three sites by character:
service_info is None): expected churn, i.e., a service caught mid-registration, a browse racing an unregister, or TTL expiry.logger.debugat most; not an anomaly.ValueError, including the address-less case above): a peer published something this version cannot parse.logger.warningwith structured fields (the service name, the reason), not a bare formatted message.except Exception: passand uselogger.exceptionso an unexpected failure carries a traceback instead of vanishing. This is the site that currently hides both the IPv6IndexErrorand any bug in the handler itself.Emit through stdlib
loggingwith structuredextra={}fields, not f-string-interpolated messages, so a future OpenTelemetry log bridge exports real attributes instead of re-parsing strings; and name the counter these sites would feed —wool.discovery.records_dropped, dimensioned by{reason, backend}— as the eventual metric, without taking an OTel dependency now. Do not route these data-plane drops throughwarnings.warn: warnings are stderr-bound, deduplicated by source location (so they under-report a repeating bad record), and lossy under OTel log capture. Reserve the warning categories for deterministic, developer-facing configuration signals, e.g.,LoopbackAdvertisementWarning. The diagnostic vocabulary should be discovery-agnostic —LocalDiscovery's subscriber has the analogous malformed-record drop — so the logging keys and the eventual counter apply to both backends.Root cause
_deserialize_metadata(wool/src/wool/runtime/discovery/lan.py) resolves the advertised address with a V4-only lookup and indexes the first entry:For an IPv6-only record the list is empty and the
IndexErrorpropagates into_handle_add_service/_handle_update_service, whoseexcept Exception: passswallows it — the worker is invisible to subscribers with no trace.Implementation notes:
WorkerMetadata.addressform (grpc.aio.insecure_channel("[fd00::1]:50051")) before relying on the fallback for dispatch.ServiceInforegistration pattern intest___aiter___ignores_malformed_zeroconf_service. Assert the differentiated diagnostics withcaplog: a malformed record logs at WARNING with the structured reason, an unresolvable fetch stays at DEBUG, and the catch-all path logs an exception with traceback.