Skip to content

Add per-issuer circuit breaker and bounded retry policy #50

@michaelw

Description

@michaelw

Problem

Every protected request can synchronously call the token issuer. During issuer brownouts, naive live calls can amplify load and turn a partial issuer incident into broad gateway request failures. The plugin currently has no retry budget, circuit breaker, or fail-fast issuer health state.

Acceptance Conditions

  • Add a per-token-endpoint-host circuit breaker for transient issuer failures.
  • Breaker opens after configurable consecutive failures or failure ratio over a short window.
  • While open, exchange requests fail fast with the existing sanitized server_error fail-closed response.
  • Breaker transitions to half-open after a configurable cool-down and permits a bounded probe.
  • Successful probe closes the breaker; failed probe reopens it.
  • Optional retries, if implemented, are disabled or conservative by default:
    • at most one retry
    • only for transport errors, timeouts before deadline budget is exhausted, 429, or 503
    • jittered backoff
    • total retry time stays within the incoming ext-authz request context deadline
    • no retry for OAuth client/request errors such as 400, 401, 403, invalid_grant, or invalid_target
  • Add observability for breaker state, opens, half-open probes, fail-fast decisions, retry attempts, and exhausted retry budget.
  • Tests cover open/half-open/closed transitions, no retry on deterministic OAuth errors, deadline-respecting retry behavior, and no secret/token leakage.

Implementation Suggestions

  • Implement breaker behavior as a wrapper around the existing tokenExchanger interface rather than embedding it into form/request construction.
  • Key breaker state by normalized token endpoint host, or by full endpoint only if needed to avoid mixing materially different issuers.
  • Make all thresholds configurable with safe defaults.
  • Keep fail-closed semantics and downstream error sanitization unchanged.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/configRuntime configuration, typed values, env wiringarea/observabilityTracing, metrics, admin/debug surfacesenhancementNew feature or requestpriority/highRelease-shaping or prerequisite work

    Projects

    Status

    Ready

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions