Skip to content

l4proxy: run active health checks on dynamic upstreams#430

Open
tannevaled wants to merge 4 commits into
mholt:masterfrom
tannevaled:feat/active-health-dynamic
Open

l4proxy: run active health checks on dynamic upstreams#430
tannevaled wants to merge 4 commits into
mholt:masterfrom
tannevaled:feat/active-health-dynamic

Conversation

@tannevaled

Copy link
Copy Markdown

What

Extends the active health checker to also run on dynamically-discovered upstreams (DNS SRV/A from #429), not just statically-configured ones. Each interval it polls the current dynamic set and health-checks every discovered peer.

Why

Today active health checks run only on static upstreams (same as the HTTP reverse_proxy). That means a cluster discovered via DNS can't be health-gated. With this change, a discovered cluster becomes usable with health-based routing and no external coordinator — for example:

  • discover the cluster members via dynamic srv/dynamic a, then
  • gate routing on an HTTP health check (e.g. Patroni's /primary, which returns 200 only on the leader)

so the proxy follows the primary without etcd/Consul/a sidecar controller.

Notes

  • Discovery for health-checking uses a bare caddy.Replacer (there is no connection), so dynamic sources used with active health checks must not rely on connection-scoped placeholders. Discovery is otherwise shared with the request path (same cache, same pooled peers), so marks are visible to selection.
  • This intentionally goes beyond caddyhttp/reverseproxy, which does not actively health-check dynamic upstreams. Happy to discuss whether you'd prefer it gated behind an option.
  • Stacked on l4proxy: add dynamic upstreams via DNS (SRV and A/AAAA) #429 (dynamic upstreams); the diff includes it until that merges, after which I'll rebase.

Tests

dynamichealth_test.go: a dynamic source returns a dead address; after a health-check pass the discovered peer is marked unhealthy. go test ./modules/l4proxy/ -race passes; gofmt / go vet / golangci-lint clean.

Add an UpstreamSource mechanism so the backend set can be discovered at
runtime instead of being listed statically, with two DNS sources:

  - layer4.proxy.upstreams.srv: resolves SRV records (service/proto/name).
  - layer4.proxy.upstreams.a:   resolves A/AAAA records for a name, using a
    configured port (fits clusters where all members share a port).

Caddyfile: dynamic <source> { ... }. Results are cached per name and
refreshed (refresh / grace_period / dial_network). When dynamic upstreams
are configured the static list may be empty. Discovered peers are drawn
from the shared peer pool, so passive health and connection counts persist
across refreshes.

UpstreamSource.GetUpstreams takes the connection's *caddy.Replacer rather
than the connection itself, keeping discovery decoupled from a live
connection.

Mirrors caddyhttp/reverseproxy's dynamic srv/a sources. Note: active health
checks still run only on statically-configured upstreams (same limitation
as the HTTP reverse_proxy).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
tannevaled and others added 3 commits June 3, 2026 14:47
- Document dynamic_upstreams (dynamic srv / dynamic a) in docs/handlers/proxy.md.
- Add a caddyfile_adapt integration test for `dynamic srv`.
- Add tests covering the grace-period path, cache bounding, and
  newDynamicUpstream's invalid-address error. The per-record "skip invalid
  target" branch is defensive and unreachable for well-formed DNS (SRV/A
  always yield a numeric port), so it is intentionally left uncovered.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Active health checks previously ran only on statically-configured
upstreams, so a set discovered dynamically (DNS SRV/A) could not be
health-gated. Extend the checker to also poll the currently-discovered
upstreams each interval.

This makes a discovered cluster usable with health-based routing with no
external coordinator: e.g. discover members via DNS, then route only to
the node whose HTTP /primary check passes. Discovery for health checks
uses a bare replacer (connection-independent), so dynamic sources used
this way must not rely on connection-scoped placeholders.

Note: this intentionally goes beyond caddyhttp/reverseproxy, which does
not actively health-check dynamic upstreams.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Update docs/handlers/proxy.md to state that active health checks now run
  on dynamically-discovered upstreams too (re-resolved each interval).
- Add a test covering the path where the dynamic upstream source errors
  during a health-check pass (logged and swallowed).
(Rebased on the dynamic-srv-upstreams branch.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@tannevaled tannevaled force-pushed the feat/active-health-dynamic branch from 8065f8f to 5f55324 Compare June 3, 2026 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant