What happened?
A Pipelines-as-Code PipelineRun can get stuck before the repository clone step starts if PAC creates the PipelineRun successfully, but the follow-up patch that sets the PAC state annotation fails.
The clone task pod then references a generated git auth secret that PAC never creates:
Warning FailedMount pod/<pipelinerun>-clone-repository-pod
MountVolume.SetUp failed for volume "<workspace-volume>" : secret "pac-gitauth-<suffix>" not found
The affected PipelineRun has the generated secret annotation and marks the secret as not yet created:
metadata:
annotations:
pipelinesascode.tekton.dev/git-auth-secret: pac-gitauth-<suffix>
pipelinesascode.tekton.dev/secret-created: "false"
But it does not have the PAC state annotation:
metadata:
annotations:
pipelinesascode.tekton.dev/state: <missing>
It may still have the state label:
metadata:
labels:
pipelinesascode.tekton.dev/state: started
In this state, the PAC controller does not appear to enqueue/reconcile the PipelineRun into the generated secret creation path. The pod keeps waiting for the missing pac-gitauth-* secret.
Expected behavior
If PAC creates a PipelineRun that references an auto-generated git auth secret and sets:
pipelinesascode.tekton.dev/secret-created: "false"
then PAC should eventually reconcile that PipelineRun and create the referenced pac-gitauth-* secret, even if a later status/state/reporting patch fails transiently.
The clone pod should not be left permanently blocked on a generated secret that only PAC can create.
Why this seems to happen
The failure appears to require this sequence:
- PAC creates the
PipelineRun.
- The created object includes
pipelinesascode.tekton.dev/secret-created: "false" and pipelinesascode.tekton.dev/git-auth-secret: pac-gitauth-<suffix>.
- The later PAC patch that sets state/reporting annotations fails transiently.
- The
PipelineRun is left without pipelinesascode.tekton.dev/state in annotations.
- The controller enqueue guard ignores the
PipelineRun because it checks the state annotation for normal PipelineRun objects.
- The generated git auth secret is never created.
- The clone task pod fails to mount the missing
pac-gitauth-* secret.
In the observed case, the post-create patch failed while Tekton admission webhooks were temporarily unhealthy:
failed calling webhook "validation.webhook.pipeline.tekton.dev": context deadline exceeded
failed calling webhook "webhook.pipeline.tekton.dev": context deadline exceeded
no endpoints available for service "tekton-pipelines-webhook"
There were also transient Kubernetes update conflicts around the same time:
Operation cannot be fulfilled on pipelineruns.tekton.dev "<pipelinerun>": the object has been modified; please apply your changes to the latest version and try again
Relevant code path
From current code inspection, the relevant flow appears to be:
- Initial labels/annotations are prepared in
pkg/kubeinteraction/labels.go.
- The
PipelineRun is created before later startup/reporting annotations are patched.
- The later patch is performed in
pkg/pipelineascode/pipelineascode.go.
- The controller enqueue guard is in
pkg/reconciler/controller.go, checkStateAndEnqueue.
- Generated git auth secret creation happens later in the reconciler path, guarded by
pipelinesascode.tekton.dev/secret-created == "false".
The issue is that the initial object can contain secret-created=false and the generated secret name, but not the state annotation needed for enqueueing. If the later patch fails, no future reconcile creates the secret.
Possible fixes
One possible mitigation is to include the initial state annotation when creating the PipelineRun, not only in the later patch:
diff --git a/pkg/kubeinteraction/labels.go b/pkg/kubeinteraction/labels.go
@@
keys.Repository: repo.GetName(),
keys.GitProvider: providerConfig.Name,
+ keys.State: StateStarted,
keys.SecretCreated: "false",
Another possible mitigation would be for the controller enqueue guard to also consider the state label as a fallback when the annotation is missing, or to enqueue PipelineRun objects that have secret-created=false and a generated git auth secret annotation.
The safest behavior may be to ensure any PipelineRun that references a PAC-managed generated git auth secret can always reach the secret creation reconcile path, independent of best-effort reporting/state patch success.
Impact
This leaves user PipelineRuns stuck with a clone pod in FailedMount/pending state. Retrying the webhook event or creating a new PipelineRun can work, but the broken run itself does not self-heal because the generated secret is never created.
Additional notes
This is separate from user-provided secrets referenced by Pipeline tasks. In this case, the missing secret is the PAC-generated pac-gitauth-* secret referenced through PAC annotations.
What happened?
A Pipelines-as-Code
PipelineRuncan get stuck before the repository clone step starts if PAC creates thePipelineRunsuccessfully, but the follow-up patch that sets the PAC state annotation fails.The clone task pod then references a generated git auth secret that PAC never creates:
The affected
PipelineRunhas the generated secret annotation and marks the secret as not yet created:But it does not have the PAC state annotation:
It may still have the state label:
In this state, the PAC controller does not appear to enqueue/reconcile the
PipelineRuninto the generated secret creation path. The pod keeps waiting for the missingpac-gitauth-*secret.Expected behavior
If PAC creates a
PipelineRunthat references an auto-generated git auth secret and sets:then PAC should eventually reconcile that
PipelineRunand create the referencedpac-gitauth-*secret, even if a later status/state/reporting patch fails transiently.The clone pod should not be left permanently blocked on a generated secret that only PAC can create.
Why this seems to happen
The failure appears to require this sequence:
PipelineRun.pipelinesascode.tekton.dev/secret-created: "false"andpipelinesascode.tekton.dev/git-auth-secret: pac-gitauth-<suffix>.PipelineRunis left withoutpipelinesascode.tekton.dev/statein annotations.PipelineRunbecause it checks the state annotation for normalPipelineRunobjects.pac-gitauth-*secret.In the observed case, the post-create patch failed while Tekton admission webhooks were temporarily unhealthy:
There were also transient Kubernetes update conflicts around the same time:
Relevant code path
From current code inspection, the relevant flow appears to be:
pkg/kubeinteraction/labels.go.PipelineRunis created before later startup/reporting annotations are patched.pkg/pipelineascode/pipelineascode.go.pkg/reconciler/controller.go,checkStateAndEnqueue.pipelinesascode.tekton.dev/secret-created == "false".The issue is that the initial object can contain
secret-created=falseand the generated secret name, but not the state annotation needed for enqueueing. If the later patch fails, no future reconcile creates the secret.Possible fixes
One possible mitigation is to include the initial state annotation when creating the
PipelineRun, not only in the later patch:Another possible mitigation would be for the controller enqueue guard to also consider the state label as a fallback when the annotation is missing, or to enqueue
PipelineRunobjects that havesecret-created=falseand a generated git auth secret annotation.The safest behavior may be to ensure any
PipelineRunthat references a PAC-managed generated git auth secret can always reach the secret creation reconcile path, independent of best-effort reporting/state patch success.Impact
This leaves user
PipelineRuns stuck with a clone pod inFailedMount/pending state. Retrying the webhook event or creating a newPipelineRuncan work, but the broken run itself does not self-heal because the generated secret is never created.Additional notes
This is separate from user-provided secrets referenced by Pipeline tasks. In this case, the missing secret is the PAC-generated
pac-gitauth-*secret referenced through PAC annotations.