Skip to content

PipelineRun can get stuck when generated pac-gitauth secret is not created after state annotation patch failure #2751

@chmouel

Description

@chmouel

What happened?

A Pipelines-as-Code PipelineRun can get stuck before the repository clone step starts if PAC creates the PipelineRun successfully, but the follow-up patch that sets the PAC state annotation fails.

The clone task pod then references a generated git auth secret that PAC never creates:

Warning  FailedMount  pod/<pipelinerun>-clone-repository-pod
MountVolume.SetUp failed for volume "<workspace-volume>" : secret "pac-gitauth-<suffix>" not found

The affected PipelineRun has the generated secret annotation and marks the secret as not yet created:

metadata:
  annotations:
    pipelinesascode.tekton.dev/git-auth-secret: pac-gitauth-<suffix>
    pipelinesascode.tekton.dev/secret-created: "false"

But it does not have the PAC state annotation:

metadata:
  annotations:
    pipelinesascode.tekton.dev/state: <missing>

It may still have the state label:

metadata:
  labels:
    pipelinesascode.tekton.dev/state: started

In this state, the PAC controller does not appear to enqueue/reconcile the PipelineRun into the generated secret creation path. The pod keeps waiting for the missing pac-gitauth-* secret.

Expected behavior

If PAC creates a PipelineRun that references an auto-generated git auth secret and sets:

pipelinesascode.tekton.dev/secret-created: "false"

then PAC should eventually reconcile that PipelineRun and create the referenced pac-gitauth-* secret, even if a later status/state/reporting patch fails transiently.

The clone pod should not be left permanently blocked on a generated secret that only PAC can create.

Why this seems to happen

The failure appears to require this sequence:

  1. PAC creates the PipelineRun.
  2. The created object includes pipelinesascode.tekton.dev/secret-created: "false" and pipelinesascode.tekton.dev/git-auth-secret: pac-gitauth-<suffix>.
  3. The later PAC patch that sets state/reporting annotations fails transiently.
  4. The PipelineRun is left without pipelinesascode.tekton.dev/state in annotations.
  5. The controller enqueue guard ignores the PipelineRun because it checks the state annotation for normal PipelineRun objects.
  6. The generated git auth secret is never created.
  7. The clone task pod fails to mount the missing pac-gitauth-* secret.

In the observed case, the post-create patch failed while Tekton admission webhooks were temporarily unhealthy:

failed calling webhook "validation.webhook.pipeline.tekton.dev": context deadline exceeded
failed calling webhook "webhook.pipeline.tekton.dev": context deadline exceeded
no endpoints available for service "tekton-pipelines-webhook"

There were also transient Kubernetes update conflicts around the same time:

Operation cannot be fulfilled on pipelineruns.tekton.dev "<pipelinerun>": the object has been modified; please apply your changes to the latest version and try again

Relevant code path

From current code inspection, the relevant flow appears to be:

  • Initial labels/annotations are prepared in pkg/kubeinteraction/labels.go.
  • The PipelineRun is created before later startup/reporting annotations are patched.
  • The later patch is performed in pkg/pipelineascode/pipelineascode.go.
  • The controller enqueue guard is in pkg/reconciler/controller.go, checkStateAndEnqueue.
  • Generated git auth secret creation happens later in the reconciler path, guarded by pipelinesascode.tekton.dev/secret-created == "false".

The issue is that the initial object can contain secret-created=false and the generated secret name, but not the state annotation needed for enqueueing. If the later patch fails, no future reconcile creates the secret.

Possible fixes

One possible mitigation is to include the initial state annotation when creating the PipelineRun, not only in the later patch:

diff --git a/pkg/kubeinteraction/labels.go b/pkg/kubeinteraction/labels.go
@@
       keys.Repository:    repo.GetName(),
       keys.GitProvider:   providerConfig.Name,
+      keys.State:         StateStarted,
       keys.SecretCreated: "false",

Another possible mitigation would be for the controller enqueue guard to also consider the state label as a fallback when the annotation is missing, or to enqueue PipelineRun objects that have secret-created=false and a generated git auth secret annotation.

The safest behavior may be to ensure any PipelineRun that references a PAC-managed generated git auth secret can always reach the secret creation reconcile path, independent of best-effort reporting/state patch success.

Impact

This leaves user PipelineRuns stuck with a clone pod in FailedMount/pending state. Retrying the webhook event or creating a new PipelineRun can work, but the broken run itself does not self-heal because the generated secret is never created.

Additional notes

This is separate from user-provided secrets referenced by Pipeline tasks. In this case, the missing secret is the PAC-generated pac-gitauth-* secret referenced through PAC annotations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions