l4proxy: run active health checks on dynamic upstreams#430
Open
tannevaled wants to merge 4 commits into
Open
Conversation
Add an UpstreamSource mechanism so the backend set can be discovered at
runtime instead of being listed statically, with two DNS sources:
- layer4.proxy.upstreams.srv: resolves SRV records (service/proto/name).
- layer4.proxy.upstreams.a: resolves A/AAAA records for a name, using a
configured port (fits clusters where all members share a port).
Caddyfile: dynamic <source> { ... }. Results are cached per name and
refreshed (refresh / grace_period / dial_network). When dynamic upstreams
are configured the static list may be empty. Discovered peers are drawn
from the shared peer pool, so passive health and connection counts persist
across refreshes.
UpstreamSource.GetUpstreams takes the connection's *caddy.Replacer rather
than the connection itself, keeping discovery decoupled from a live
connection.
Mirrors caddyhttp/reverseproxy's dynamic srv/a sources. Note: active health
checks still run only on statically-configured upstreams (same limitation
as the HTTP reverse_proxy).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Document dynamic_upstreams (dynamic srv / dynamic a) in docs/handlers/proxy.md. - Add a caddyfile_adapt integration test for `dynamic srv`. - Add tests covering the grace-period path, cache bounding, and newDynamicUpstream's invalid-address error. The per-record "skip invalid target" branch is defensive and unreachable for well-formed DNS (SRV/A always yield a numeric port), so it is intentionally left uncovered. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Active health checks previously ran only on statically-configured upstreams, so a set discovered dynamically (DNS SRV/A) could not be health-gated. Extend the checker to also poll the currently-discovered upstreams each interval. This makes a discovered cluster usable with health-based routing with no external coordinator: e.g. discover members via DNS, then route only to the node whose HTTP /primary check passes. Discovery for health checks uses a bare replacer (connection-independent), so dynamic sources used this way must not rely on connection-scoped placeholders. Note: this intentionally goes beyond caddyhttp/reverseproxy, which does not actively health-check dynamic upstreams. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Update docs/handlers/proxy.md to state that active health checks now run on dynamically-discovered upstreams too (re-resolved each interval). - Add a test covering the path where the dynamic upstream source errors during a health-check pass (logged and swallowed). (Rebased on the dynamic-srv-upstreams branch.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
8065f8f to
5f55324
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Extends the active health checker to also run on dynamically-discovered upstreams (DNS SRV/A from #429), not just statically-configured ones. Each interval it polls the current dynamic set and health-checks every discovered peer.
Why
Today active health checks run only on static upstreams (same as the HTTP
reverse_proxy). That means a cluster discovered via DNS can't be health-gated. With this change, a discovered cluster becomes usable with health-based routing and no external coordinator — for example:dynamic srv/dynamic a, then/primary, which returns 200 only on the leader)so the proxy follows the primary without etcd/Consul/a sidecar controller.
Notes
caddy.Replacer(there is no connection), so dynamic sources used with active health checks must not rely on connection-scoped placeholders. Discovery is otherwise shared with the request path (same cache, same pooled peers), so marks are visible to selection.caddyhttp/reverseproxy, which does not actively health-check dynamic upstreams. Happy to discuss whether you'd prefer it gated behind an option.Tests
dynamichealth_test.go: a dynamic source returns a dead address; after a health-check pass the discovered peer is marked unhealthy.go test ./modules/l4proxy/ -racepasses;gofmt/go vet/golangci-lintclean.