Skip to content

[BUG] startupProbe failureThreshold missing from templates (Backend and Workflows/Workers) #277

@ndpete

Description

@ndpete

[BUG] startupProbe failureThreshold missing from templates (Backend and Workflows/Workers)

Description

The Retool Helm Chart (v6.10.3) contains two issues regarding startupProbe implementation that prevent users from configuring a sufficient safety window for long-running startups:

  1. Omitted failureThreshold (Backend): The deployment_backend.yaml template does not include the failureThreshold key. Consequently, Kubernetes defaults to a value of 3, ignoring any user-defined threshold in values.yaml.
  2. Missing startupProbe (Other Components): As referenced in Issue Request - Make the startupProbe behavior consistent with liveness and readiness #251, the startupProbe block is missing entirely from the templates for Workflows, Workers, and Jobs.

Reproduction Steps

1. Verification of missing failureThreshold in Backend

Define a custom startupProbe with a high failureThreshold in values.yaml:

startupProbe:
  enabled: true
  failureThreshold: 60

Generate the template and inspect the output:

helm template retool retool/retool --version 6.10.3 | grep -A 10 "startupProbe:"

Observed Output:
The rendered YAML contains initialDelaySeconds, timeoutSeconds, etc., but is missing the failureThreshold key.

Template Evidence:
In retool/templates/deployment_backend.yaml, the startupProbe block (approx. line 312) lacks a mapping for .Values.startupProbe.failureThreshold.

2. Verification of missing probes in Workflows

Inspect the Workflows deployment template:

grep "startupProbe" retool/templates/deployment_workflows.yaml
# Result: (empty)

Impact

User-defined startup failure thresholds are ignored. Pods are terminated after 3 failures (Kubernetes default), which is often insufficient for Retool environments performing schema migrations. Users are currently forced to increase the livenessProbe.failureThreshold as a workaround, which reduces the responsiveness of health monitoring during normal operation.


Proposed Fix

1. Update Backend Deployment (templates/deployment_backend.yaml)

Include the missing key in the startupProbe definition:

        startupProbe:
          ...
          periodSeconds: {{ .Values.startupProbe.periodSeconds }}
          failureThreshold: {{ .Values.startupProbe.failureThreshold }}

2. Standardize Other Deployments

Add the complete startupProbe block to the following templates to ensure consistency:

  • templates/deployment_workflows.yaml
  • templates/deployment_workers.yaml
  • templates/deployment_jobs.yaml
  • templates/deployment_code_executor.yaml

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions