Skip to content

l4proxy: add rise/fall thresholds to active health checks#427

Open
tannevaled wants to merge 2 commits into
mholt:masterfrom
tannevaled:feat/health-rise-fall
Open

l4proxy: add rise/fall thresholds to active health checks#427
tannevaled wants to merge 2 commits into
mholt:masterfrom
tannevaled:feat/health-rise-fall

Conversation

@tannevaled

Copy link
Copy Markdown

What

Adds HAProxy-style rise/fall thresholds to the active health checker:

  • fall (Caddyfile health_fall <int>) — number of consecutive failed checks before a peer is marked unhealthy.
  • rise (Caddyfile health_rise <int>) — number of consecutive successful checks before an unhealthy peer is marked healthy again.

Both default to 1, which is exactly the current behavior (flip on the first result), so this is fully backward compatible.

Why

Today the checker calls setHealthy on every individual check, so a single transient failure (or success) immediately flips a peer's state. That makes health flap on a brief blip — undesirable, e.g. for database failover where a momentary timeout shouldn't trigger a switchover. fall 3 / rise 2 smooths that out.

Implementation

Per-peer consecutive-success / consecutive-failure streaks tracked under a mutex (recordActiveCheck), capped at the threshold so the counters stay bounded. A streak reaching fall marks unhealthy; reaching rise marks healthy.

Tests

risefall_test.go: streak logic (incl. reset on opposite result and the default-to-1 behavior), an end-to-end fall threshold via doActiveHealthCheck, and Caddyfile parsing (happy + duplicate/invalid errors). go test ./modules/l4proxy/ passes; gofmt / go vet / golangci-lint clean.

The active health checker flipped a peer's health on every single check
result, which makes it flap on a transient blip. Add rise/fall thresholds
(HAProxy-style): a peer is marked unhealthy only after `fall` consecutive
failed checks and healthy again only after `rise` consecutive successful
checks.

Caddyfile: health_fall <int>, health_rise <int>. Both default to 1, which
preserves the existing flip-on-first-result behavior. Streaks are tracked
per peer under a mutex and capped at the threshold.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Document the rise/fall active-health-check thresholds in
docs/handlers/proxy.md and add a caddyfile_adapt integration test. The
recordActiveCheck streak logic is already unit-tested at 100%.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant