fix: short-circuit cert-rotation reconciler under USE_OLM_TLS#30
Merged
Conversation
PR #25 (using olm cert) patched two of the three knative.dev/pkg webhook components that touch the cert secret: 1. vendor/knative.dev/pkg/webhook/webhook.go TLS server's GetCertificate — read tls.crt / tls.key when USE_OLM_TLS 2. vendor/knative.dev/pkg/webhook/resourcesemantics/conversion/reconciler.go CRD caBundle sync — early-return when USE_OLM_TLS (OLM syncs caBundle) It missed the third one: vendor/knative.dev/pkg/webhook/certificates/certificates.go, the WebhookCertificates cert-rotation reconciler. In OLM mode that reconciler: 1. Reads the OLM-provisioned secret (type kubernetes.io/tls, keys tls.crt / tls.key / olmCAKey). 2. Sees its expected server-key.pem / server-cert.pem / ca-cert.pem missing, logs `Certificate secret "operator-webhook-service-cert" is missing key "server-key.pem"` at INFO. 3. Self-signs a new cert via certresources.MakeSecret (which only emits the three server-*.pem keys). 4. Replaces secret.Data wholesale — would clobber the OLM-injected tls.crt / tls.key. 5. Sends the Update; apiserver rejects because the kubernetes.io/tls secret type requires tls.crt / tls.key — logs `Reconcile error ... Secret ... is invalid: [data[tls.crt]: Required value, data[tls.key]: Required value]` at ERROR, then exponential-backoff retries indefinitely. Functionally harmless (the apiserver rejection is what stops the reconciler from corrupting OLM's cert), but the log spam is loud and hides real issues. The webhook server itself reads via the patched GetCertificate from #25 and serves correctly. Mirror PR #25's pattern — early-return at the top of reconcileCertificate when USE_OLM_TLS is set so the reconciler doesn't even try to rotate certs OLM is already managing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mirrors hack/patches/006-conversion-webhook.patch and 007-conversion-reconciler.patch from PR #25 so the cert-rotation reconciler's USE_OLM_TLS guard also survives `go mod vendor` runs. The reason PR #25's vendor edits disappeared after the vuln-0512 sweep was that nobody ran `make update-deps` (= `git apply hack/patches/*.patch`) after the vendor regeneration. Adding 008 alongside its sibling patches keeps the recovery mechanical: `go mod vendor && make update-deps` reproduces the working tree. Without this file, a future vendor regeneration would silently revert this PR's fix the same way PR #28 reverted PR #25's. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ed285a3 inadvertently staged a revert of vendor/.../certificates.go (a stale `git checkout 34c2b0f -- vendor/...` from a patch round-trip test had left the index pointing at pre-fix content). HEAD's working tree still had the correct guarded version, but the commit captured the reverted index state. This commit puts the vendor edit back. End state on the branch: c62378b fix: short-circuit cert-rotation reconciler under USE_OLM_TLS (vendor edit) ed285a3 hack/patches: add 008 for cert-rotation reconciler (patch file only, vendor edit reverted) HEAD fix: re-include vendor edit dropped by 008 patch commit (vendor edit restored) Net diff PR #30 = +5 vendor lines + new hack/patches/008 file. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #25 (using olm cert) and #29 (restore #25 after vuln-0512). Same pattern (early-return on
USE_OLM_TLS), applied to the thirdknative.dev/pkg/webhook/component that wasn't covered, with a companionhack/patches/file so the nextgo mod vendordoesn't wipe it.What
vendor/knative.dev/pkg/webhook/certificates/certificates.go— adds at the top ofreconcileCertificate:Mirrors PR fix: using olm cert #25's pattern in
vendor/knative.dev/pkg/webhook/resourcesemantics/conversion/reconciler.go.hack/patches/008-certificates-reconciler.patch— companion patch file, sibling of the existing006-conversion-webhook.patch/007-conversion-reconciler.patchfrom PR fix: using olm cert #25. Reapplied bymake update-deps(=git apply hack/patches/*.patch) after anygo mod vendorrun, so this fix can't get silently dropped the way PR fix: using olm cert #25's vendor edits were dropped by PR fix: vuln 0512 #28's vuln-0512 sweep.Verified by round-trip: revert vendor file to
34c2b0fe,git apply hack/patches/008-certificates-reconciler.patch→ guard restored.Why
The
WebhookCertificatesreconciler is the third component inknative.dev/pkg/webhook/that touches the webhook cert secret. PR #25 patched the other two (TLS-read inwebhook.go, CRD caBundle sync inconversion/reconciler.go) but left the cert-rotation reconciler alone.In OLM mode the reconciler:
kubernetes.io/tls, keystls.crt/tls.key/olmCAKey).Certificate secret "operator-webhook-service-cert" is missing key "server-key.pem"at INFO (line 78) — its expected key names don't match what OLM populates.certresources.MakeSecret, which only emitsserver-cert.pem/server-key.pem/ca-cert.pem.secret.Datawholesale — would clobber OLM'stls.crt/tls.key.Reconcile error ... Secret ... is invalid: [data[tls.crt]: Required value, data[tls.key]: Required value].Observed on the test cluster running #29's webhook image
v1.10.1-34c2b0feagainst the OLM-installed bundleknative-operator-bundle:v3.20.9-hotfix.129.6.gddf86032-fix-vuln-0512:Functionally harmless — apiserver's type validation is what prevents the reconciler from corrupting OLM's cert — but the log spam runs continuously (every few seconds to every minute) and hides real issues. The webhook itself reads certs via #25's patched
GetCertificateand serves correctly.Verify after merge
After rebuilding + rolling the bundle, the two log signatures above should both disappear, and
operator-webhook.WebhookCertificatesshould fall silent in OLM-installed clusters. Standalone (non-OLM) installs are unaffected —USE_OLM_TLSis unset, so the reconciler runs as before.Note on commit history
This PR's branch ended up with three commits because of an intermediate test workflow that staged an unintended revert. Net branch diff is +28 lines across two files (the 5-line vendor edit + the 23-line patch file). Recommend squash on merge.
🤖 Generated with Claude Code