l4proxy: add dynamic upstreams via DNS (SRV and A/AAAA)#429
Conversation
Add an UpstreamSource mechanism so the backend set can be discovered at
runtime instead of being listed statically, with two DNS sources:
- layer4.proxy.upstreams.srv: resolves SRV records (service/proto/name).
- layer4.proxy.upstreams.a: resolves A/AAAA records for a name, using a
configured port (fits clusters where all members share a port).
Caddyfile: dynamic <source> { ... }. Results are cached per name and
refreshed (refresh / grace_period / dial_network). When dynamic upstreams
are configured the static list may be empty. Discovered peers are drawn
from the shared peer pool, so passive health and connection counts persist
across refreshes.
UpstreamSource.GetUpstreams takes the connection's *caddy.Replacer rather
than the connection itself, keeping discovery decoupled from a live
connection.
Mirrors caddyhttp/reverseproxy's dynamic srv/a sources. Note: active health
checks still run only on statically-configured upstreams (same limitation
as the HTTP reverse_proxy).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
00c0d01 to
7c4e237
Compare
|
For both this and some of your other PRs, there seems to be a lot of duplication/copying of existing code from the HTTP proxy. I think there should be some consideration on how we can deduplicate to reduce the maintenance burden. |
|
Thanks, that's a fair point and worth getting right. You're correct that the duplication is concentrated in two places: the dynamic DNS upstreams here (#429), which mirror If you're open to it, I think the clean fix is to factor the transport-neutral parts into reusable helpers in core, then make these caddy-l4 PRs thin adapters:
I'm happy to open a companion PR against For context on the rest: the other PRs in the series (#425 close-on-unhealthy, #427 rise/fall, #428 weighted LB, #430 active checks on dynamic upstreams) are layer4-native rather than copies, and the observability/timeout additions I'm about to push are too — but if you spot specific spots there you'd like factored out, point me at them and I'll fold them into the same de-duplication pass. |
|
To make the de-duplication concrete rather than hypothetical, I opened a draft RFC on core: caddyserver/caddy#7790. It extracts the SRV resolution + caching into a transport-neutral Happy to adjust naming/placement or scope based on your preference. |
|
Also — thanks for taking the time to look at these, and apologies for the burst of PRs arriving all at once; I realize that's a lot to land on a maintainer's plate. There's genuinely no urgency on any of them. If it's easier for you, I'm happy to consolidate them, sequence them in whatever order suits your priorities, or close any that aren't a good fit — just say the word and I'll adjust. #7790 is intended to be the de-duplication step you asked about, so that's probably the most useful place to start; the rest can wait until the direction there is settled. Thanks again for the project and the feedback. |
- Document dynamic_upstreams (dynamic srv / dynamic a) in docs/handlers/proxy.md. - Add a caddyfile_adapt integration test for `dynamic srv`. - Add tests covering the grace-period path, cache bounding, and newDynamicUpstream's invalid-address error. The per-record "skip invalid target" branch is defensive and unreachable for well-formed DNS (SRV/A always yield a numeric port), so it is intentionally left uncovered. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Would love to see this get merged. The ability to add own dynamic upstream via plugin would be awesome. I maintain caddy-nomad-sd which can be used to service discover upstreams in a nomad cluster. If the right set of APIs are exposed from caddy-l4, it could possibly be extended to support L4 as well. Happy to help test or contribute. |
What
Adds dynamic upstreams to the layer4 proxy so the backend set can be discovered at runtime instead of being restated in config, with two DNS sources:
layer4.proxy.upstreams.srv— resolves SRV records (service/proto/name).layer4.proxy.upstreams.a— resolves A/AAAA records for a name, using a configuredport(fits clusters where every member shares a port, e.g. a Postgres cluster on 5432 behind one name).Caddyfile:
dynamic <source> { … }. Results are cached per name and refreshed (refresh/grace_period/dial_network). When dynamic upstreams are configured the staticupstreamlist may be empty. Discovered peers come from the shared peer pool, so passive health checks and connection counts persist across refreshes.UpstreamSource.GetUpstreamstakes the connection's*caddy.Replacerrather than the connection itself, keeping discovery decoupled from a live connection (and pollable by other callers).Why
So the L4 config doesn't have to hard-code endpoints DNS already publishes — the common service-discovery case (Consul DNS, Kubernetes headless services, etc.).
Scope / limitations
reverse_proxy's dynamic upstreams. Passive health + connection counting apply to discovered upstreams.caddyhttp/reverseproxy'sdynamic srv/adesign for consistency.Tests
upstreams_test.go: SRV and A discovery (record → upstream), caching (one lookup for repeated calls), lookup-error handling, SRVexpandedAddr, and Caddyfile parsing for both sources (happy + missing/unknown source + bad option). DNS is stubbed via injectable lookups, so no network is needed.go test ./modules/l4proxy/passes;gofmt/go vet/golangci-lintclean.