Skip to content

feat: add configurable liveness and readiness probes to OtelCollector#6680

Open
kyledharrington wants to merge 1 commit into
Dynatrace:mainfrom
kyledharrington:feat/otel-collector-probes
Open

feat: add configurable liveness and readiness probes to OtelCollector#6680
kyledharrington wants to merge 1 commit into
Dynatrace:mainfrom
kyledharrington:feat/otel-collector-probes

Conversation

@kyledharrington
Copy link
Copy Markdown

Description

The DynaKube CR's spec.templates.otelCollector exposes fields for replicas, resources, tolerations, topology spread constraints, etc., but does not allow users to configure liveness or readiness probes for the OpenTelemetry Collector StatefulSet. As a result the collector container has no health checks — Kubernetes can't detect a hung or unhealthy collector, won't restart it, and won't gate traffic readiness on its health.

This PR adds optional livenessProbe and readinessProbe fields (both *corev1.Probe) to OpenTelemetryCollectorSpec, mirroring Kubernetes' own PodSpec convention so users can configure any probe type (HTTP/TCP/exec) with full control. When nil, no probe is applied — existing deployments are unaffected.

Summary of changes

  • Added LivenessProbe and ReadinessProbe to OpenTelemetryCollectorSpec in pkg/api/v1beta5/dynakube/opentelemetry.go and pkg/api/latest/dynakube/opentelemetry.go (v1beta6).
  • Wired the fields through the v1beta5↔latest conversion functions.
  • Threaded the probes into the container builder at pkg/controllers/dynakube/otelc/statefulset/container.go.
  • Regenerated zz_generated.deepcopy.go and the CRD manifest YAMLs via make manifests.
  • Added a unit test (TestProbes) covering both the default (nil) case and the custom-probe case.
  • Extended the existing v1beta5 conversion round-trip tests to cover the new fields.

v1beta4 is intentionally left unchanged; conversion drops the new fields when stepping down.

No GitHub issue or ticket is linked; this addresses a gap encountered when deploying the operator against an environment that requires health-gated rollouts.

How can this be tested?

A cluster with an existing DynaKube that has the OtelCollector active (via spec.extensions.prometheus or spec.telemetryIngest) is sufficient.

  1. Build and deploy this branch's operator image to the cluster (e.g. make deploy).
  2. Update the DynaKube CR with probes under spec.templates.otelCollector, for example:
    ```yaml
    spec:
    templates:
    otelCollector:
    livenessProbe:
    httpGet:
    path: /
    port: 13133
    initialDelaySeconds: 10
    periodSeconds: 30
    readinessProbe:
    httpGet:
    path: /
    port: 13133
    initialDelaySeconds: 5
    periodSeconds: 10
    ```
  3. Verify the probes are applied to the collector container:
    ```sh
    STS=$(kubectl get sts -n dynatrace -o name | grep -i otel | head -1)
    kubectl get $STS -n dynatrace -o jsonpath='{.spec.template.spec.containers[0].livenessProbe}{"\n"}{.spec.template.spec.containers[0].readinessProbe}'
    ```
    Both probes should print as JSON matching the CR. When the fields are unset on the CR, the container should have no probes (preserving previous behavior).

Note: the snippet above uses the otel-collector's default health_check extension port (13133) for illustration. The probe handler is fully user-controlled, so an exec or tcpSocket probe works equally well if you don't run the collector's HTTP health extension.

Expose LivenessProbe and ReadinessProbe fields on
DynaKube.spec.templates.otelCollector so users can configure pod
health checks for the OpenTelemetry collector StatefulSet. Both
fields are optional *corev1.Probe pointers; when nil, no probe is
applied (preserving existing behavior).

The fields are added to the v1beta5 and v1beta6 (latest) API
versions; v1beta4 is left unchanged and conversion drops the new
fields when downgrading.
@dynatrace-cla-bot
Copy link
Copy Markdown

dynatrace-cla-bot commented May 15, 2026

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants