Skip to content

Comments

[WiP]SREP-3024: Add E2E tests for ACME DNS-01 challenge verification#424

Open
TheUndeadKing wants to merge 3 commits intoopenshift:masterfrom
TheUndeadKing:SREP-3024
Open

[WiP]SREP-3024: Add E2E tests for ACME DNS-01 challenge verification#424
TheUndeadKing wants to merge 3 commits intoopenshift:masterfrom
TheUndeadKing:SREP-3024

Conversation

@TheUndeadKing
Copy link
Contributor

Summary

Adds comprehensive E2E tests for ACME DNS-01 challenge verification with real AWS Route53 integration.

Changes Made

New E2E Tests

  • TXT Record Creation Test: Verifies that _acme-challenge TXT records are created in Route53 for DNS-01 challenge
  • Certificate Creation Test: Validates primary-cert-bundle-secret creation with proper certificate data
  • TXT Record Cleanup Test: Ensures TXT records are deleted from Route53 after certificate issuance
  • Certificate Renewal Test: Simulates and validates certificate renewal process

Utility Functions Added

  • CreateRoute53Client(): Creates AWS Route53 client for DNS operations
  • GetHostedZoneIDFromDNSZone(): Extracts hosted zone ID from Hive DNSZone resource
  • ListAcmeChallengeTXTRecords(): Lists ACME challenge TXT records from Route53
  • FindDNSZoneForClusterDeployment(): Finds DNSZone resource for a ClusterDeployment
  • VerifyDNSZoneReady(): Verifies DNSZone is ready with valid zone ID
  • ParseCertificateFromSecret(): Parses X.509 certificate from Kubernetes secret
  • VerifyCertificateExpiry(): Validates certificate expiration dates
  • GetCertificateRemainingDays(): Calculates remaining days until certificate expiry

Code Cleanup

  • Removed duplicate import of k8s.io/apimachinery/pkg/api/errors
  • Removed duplicate constant namespace_certman_operator (consolidated to operatorNS)
  • Removed unused function SanitizeInput()
  • Fixed incorrect usage of errors.IsNotFoundapierrors.IsNotFound

Test Coverage

Tests verify the complete certificate lifecycle:

  1. ClusterDeployment creation → CertificateRequest creation
  2. TXT records created in Route53 for DNS-01 challenge
  3. Certificate issued and stored in secret
  4. TXT records cleaned up after issuance
  5. Certificate renewal when secret is deleted

Technical Details

  • Tests use hybrid approach: Mock ACME client + Real AWS Route53
  • All tests are sequential using ginkgo.Ordered to maintain proper lifecycle flow
  • Tests only modify files in test/e2e/ directory
  • All error cases properly handled with gomega assertions

Testing

  • E2E tests run against real OpenShift cluster
  • AWS Route53 integration verified with actual DNS operations
  • Certificate validation includes expiry checks and DNS name verification

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 19, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 19, 2026

@TheUndeadKing: This pull request references SREP-3024 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Summary

Adds comprehensive E2E tests for ACME DNS-01 challenge verification with real AWS Route53 integration.

Changes Made

New E2E Tests

  • TXT Record Creation Test: Verifies that _acme-challenge TXT records are created in Route53 for DNS-01 challenge
  • Certificate Creation Test: Validates primary-cert-bundle-secret creation with proper certificate data
  • TXT Record Cleanup Test: Ensures TXT records are deleted from Route53 after certificate issuance
  • Certificate Renewal Test: Simulates and validates certificate renewal process

Utility Functions Added

  • CreateRoute53Client(): Creates AWS Route53 client for DNS operations
  • GetHostedZoneIDFromDNSZone(): Extracts hosted zone ID from Hive DNSZone resource
  • ListAcmeChallengeTXTRecords(): Lists ACME challenge TXT records from Route53
  • FindDNSZoneForClusterDeployment(): Finds DNSZone resource for a ClusterDeployment
  • VerifyDNSZoneReady(): Verifies DNSZone is ready with valid zone ID
  • ParseCertificateFromSecret(): Parses X.509 certificate from Kubernetes secret
  • VerifyCertificateExpiry(): Validates certificate expiration dates
  • GetCertificateRemainingDays(): Calculates remaining days until certificate expiry

Code Cleanup

  • Removed duplicate import of k8s.io/apimachinery/pkg/api/errors
  • Removed duplicate constant namespace_certman_operator (consolidated to operatorNS)
  • Removed unused function SanitizeInput()
  • Fixed incorrect usage of errors.IsNotFoundapierrors.IsNotFound

Test Coverage

Tests verify the complete certificate lifecycle:

  1. ClusterDeployment creation → CertificateRequest creation
  2. TXT records created in Route53 for DNS-01 challenge
  3. Certificate issued and stored in secret
  4. TXT records cleaned up after issuance
  5. Certificate renewal when secret is deleted

Technical Details

  • Tests use hybrid approach: Mock ACME client + Real AWS Route53
  • All tests are sequential using ginkgo.Ordered to maintain proper lifecycle flow
  • Tests only modify files in test/e2e/ directory
  • All error cases properly handled with gomega assertions

Testing

  • E2E tests run against real OpenShift cluster
  • AWS Route53 integration verified with actual DNS operations
  • Certificate validation includes expiry checks and DNS name verification

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 19, 2026
@coderabbitai
Copy link

coderabbitai bot commented Feb 19, 2026

Important

Review skipped

Auto reviews are limited based on label configuration.

🚫 Excluded labels (none allowed) (1)
  • do-not-merge/work-in-progress

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

Adds Route53 DNS-01 TXT record verification and certificate lifecycle checks to end-to-end tests, plus Route53 and TLS utilities (Route53 client, hosted zone discovery, TXT record listing, DNSZone readiness) and centralized certificate parsing/expiry helpers. Updates namespace usage and error handling in tests.

Changes

Cohort / File(s) Summary
E2E tests (DNS-01 & cert lifecycle)
test/e2e/certman_operator_tests.go
Adds Route53 DNS-01 TXT record verification (hosted zone lookup, polling, TXT record attribute checks), duplicate DNS-01 test blocks, TXT cleanup checks, certificate renewal simulation (delete/recreate secret), replaces inline PEM/x509 parsing with utils.ParseCertificateFromSecret, adds expiry/remaining-days checks, enforces DNSName assertions, and switches namespace refs to operatorNS plus apierrors.IsNotFound usage.
Test utilities (Route53 / TLS)
test/e2e/utils/utils.go
Adds Route53 and DNS helpers and certificate utilities: CreateRoute53Client, GetHostedZoneIDFromDNSZone, ListAcmeChallengeTXTRecords, FindDNSZoneForClusterDeployment, VerifyDNSZoneReady, ParseCertificateFromSecret, VerifyCertificateExpiry, GetCertificateRemainingDays. Removes SanitizeInput.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
test/e2e/utils/utils.go (1)

1688-1709: ListAcmeChallengeTXTRecords does not handle Route53 pagination.

ListResourceRecordSets returns at most 300 records per call. If the hosted zone contains more records, ACME challenge TXT records could be missed. For a controlled E2E test zone this is unlikely to matter, but if the zone grows or is shared, results will silently be incomplete.

Consider using ListResourceRecordSetsPages (or checking IsTruncated / NextRecordName) to paginate, or at minimum starting the listing at the _acme-challenge prefix via StartRecordName + StartRecordType to reduce the result set.

♻️ Suggested fix using pagination or a targeted start
 func ListAcmeChallengeTXTRecords(client *route53.Route53, hostedZoneID string) ([]*route53.ResourceRecordSet, error) {
-	input := &route53.ListResourceRecordSetsInput{
-		HostedZoneId: aws.String(hostedZoneID),
-	}
-
-	result, err := client.ListResourceRecordSets(input)
-	if err != nil {
-		return nil, fmt.Errorf("failed to list Route53 resource record sets: %w", err)
-	}
-
-	// Filter for _acme-challenge TXT records
 	var challengeRecords []*route53.ResourceRecordSet
-	for _, record := range result.ResourceRecordSets {
-		if record.Type != nil && *record.Type == "TXT" &&
-			record.Name != nil && strings.Contains(*record.Name, "_acme-challenge") {
-			challengeRecords = append(challengeRecords, record)
+	input := &route53.ListResourceRecordSetsInput{
+		HostedZoneId:    aws.String(hostedZoneID),
+		StartRecordName: aws.String("_acme-challenge"),
+		StartRecordType: aws.String("TXT"),
+	}
+
+	err := client.ListResourceRecordSetsPages(input,
+		func(page *route53.ListResourceRecordSetsOutput, lastPage bool) bool {
+			for _, record := range page.ResourceRecordSets {
+				if record.Type != nil && *record.Type == "TXT" &&
+					record.Name != nil && strings.Contains(*record.Name, "_acme-challenge") {
+					challengeRecords = append(challengeRecords, record)
+				}
+			}
+			return !lastPage
+		})
+	if err != nil {
+		return nil, fmt.Errorf("failed to list Route53 resource record sets: %w", err)
 	}
-	}
 
 	return challengeRecords, nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/utils/utils.go` around lines 1688 - 1709,
ListAcmeChallengeTXTRecords currently calls route53.ListResourceRecordSets once
and can miss records if the hosted zone has >300 entries; update the function to
handle pagination by using route53.ListResourceRecordSetsPages (or loop using
IsTruncated/NextRecordName/NextRecordType) and accumulate ResourceRecordSets
across pages, while still filtering for TXT records whose Name contains
"_acme-challenge"; alternatively, to reduce scanned records, set StartRecordName
to "_acme-challenge" and StartRecordType to "TXT" when constructing the
ListResourceRecordSetsInput and iterate pages until IsTruncated is false,
returning the combined filtered slice.
test/e2e/certman_operator_tests.go (1)

654-771: This single It block conflates TXT-record cleanup verification with certificate renewal simulation — two independent concerns.

If the TXT cleanup assertion passes but renewal fails, the test name ("should delete TXT records … after certificate issuance") is misleading, and the renewal logic (lines 729-770) largely duplicates the existing "delete secret primary-cert-bundle-secret if exists" test (lines 852-878). Splitting into two It blocks would improve failure diagnosis and remove duplication.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/certman_operator_tests.go` around lines 654 - 771, The single It
block "should delete TXT records from Route53 after certificate issuance" mixes
two concerns — TXT-record cleanup and certificate renewal simulation — causing
misleading failures and duplicated logic; split the renewal simulation into a
separate ginkgo.It (e.g., "should renew certificate when secret is deleted") and
move the code that calls utils.FindCertificateRequestForClusterDeployment,
utils.GetCertificateSecretNameFromCR, clientset.CoreV1().Secrets().Delete/Get,
utils.ParseCertificateFromSecret and utils.VerifyCertificateExpiry into that new
It; keep the original It focused only on DNS TXT cleanup
(ListAcmeChallengeTXTRecords/Route53 checks) and remove the duplicated renewal
block from the original test so each It targets one behavior and the existing
"delete secret primary-cert-bundle-secret if exists" test isn’t duplicated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/certman_operator_tests.go`:
- Around line 514-547: The test currently dereferences record.TTL (and also
*record.ResourceRecords[0].Value in the log) which can panic if TTL or
ResourceRecords/Value is nil; update the validation loop over challengeRecords
to first assert TTL is not nil (e.g.
gomega.Expect(record.TTL).NotTo(gomega.BeNil(), "Record TTL should not be nil")
) before using *record.TTL, and similarly ensure ResourceRecords and
ResourceRecords[0].Value are non-nil before dereferencing for logging; adjust
the ginkgo.GinkgoLogr.Info calls to check pointers and use the safe,
already-asserted values (record.TTL, record.ResourceRecords) rather than
directly dereferencing without guards.

---

Nitpick comments:
In `@test/e2e/certman_operator_tests.go`:
- Around line 654-771: The single It block "should delete TXT records from
Route53 after certificate issuance" mixes two concerns — TXT-record cleanup and
certificate renewal simulation — causing misleading failures and duplicated
logic; split the renewal simulation into a separate ginkgo.It (e.g., "should
renew certificate when secret is deleted") and move the code that calls
utils.FindCertificateRequestForClusterDeployment,
utils.GetCertificateSecretNameFromCR, clientset.CoreV1().Secrets().Delete/Get,
utils.ParseCertificateFromSecret and utils.VerifyCertificateExpiry into that new
It; keep the original It focused only on DNS TXT cleanup
(ListAcmeChallengeTXTRecords/Route53 checks) and remove the duplicated renewal
block from the original test so each It targets one behavior and the existing
"delete secret primary-cert-bundle-secret if exists" test isn’t duplicated.

In `@test/e2e/utils/utils.go`:
- Around line 1688-1709: ListAcmeChallengeTXTRecords currently calls
route53.ListResourceRecordSets once and can miss records if the hosted zone has
>300 entries; update the function to handle pagination by using
route53.ListResourceRecordSetsPages (or loop using
IsTruncated/NextRecordName/NextRecordType) and accumulate ResourceRecordSets
across pages, while still filtering for TXT records whose Name contains
"_acme-challenge"; alternatively, to reduce scanned records, set StartRecordName
to "_acme-challenge" and StartRecordType to "TXT" when constructing the
ListResourceRecordSetsInput and iterate pages until IsTruncated is false,
returning the combined filtered slice.

Comment on lines 514 to 547
// Verify we have records for our domains
for _, record := range records {
ginkgo.GinkgoLogr.Info("Found TXT record",
"name", *record.Name,
"type", *record.Type,
"ttl", *record.TTL)
}

return true
}, 10*time.Minute, 15*time.Second).Should(gomega.BeTrue(),
"TXT records should be created in Route53 for DNS-01 challenge")

// Validate record format
for _, record := range challengeRecords {
// Verify it's a TXT record
gomega.Expect(*record.Type).To(gomega.Equal("TXT"), "Record should be TXT type")

// Verify name starts with _acme-challenge
gomega.Expect(*record.Name).To(gomega.ContainSubstring("_acme-challenge"),
"Record name should contain _acme-challenge")

// Verify TTL is reasonable (usually 60 seconds for ACME challenges)
gomega.Expect(*record.TTL).To(gomega.BeNumerically("<=", 300),
"TTL should be short for challenge records")

// Verify record has a value
gomega.Expect(len(record.ResourceRecords)).To(gomega.BeNumerically(">", 0),
"TXT record should have a value")

ginkgo.GinkgoLogr.Info("✅ TXT record validated",
"name", *record.Name,
"ttl", *record.TTL,
"value", *record.ResourceRecords[0].Value)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential nil-pointer dereference on record.TTL will produce a panic instead of a clear test failure.

TTL is a *int64 and will be nil for alias-type record sets. Although the filter selects only TXT records (which should always carry a TTL), a defensive nil-guard keeps the failure message useful if an unexpected record slips through.

🛡️ Suggested nil-safe logging and assertion
 			for _, record := range records {
+				ttl := int64(0)
+				if record.TTL != nil {
+					ttl = *record.TTL
+				}
 				ginkgo.GinkgoLogr.Info("Found TXT record",
 					"name", *record.Name,
 					"type", *record.Type,
-					"ttl", *record.TTL)
+					"ttl", ttl)
 			}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Verify we have records for our domains
for _, record := range records {
ginkgo.GinkgoLogr.Info("Found TXT record",
"name", *record.Name,
"type", *record.Type,
"ttl", *record.TTL)
}
return true
}, 10*time.Minute, 15*time.Second).Should(gomega.BeTrue(),
"TXT records should be created in Route53 for DNS-01 challenge")
// Validate record format
for _, record := range challengeRecords {
// Verify it's a TXT record
gomega.Expect(*record.Type).To(gomega.Equal("TXT"), "Record should be TXT type")
// Verify name starts with _acme-challenge
gomega.Expect(*record.Name).To(gomega.ContainSubstring("_acme-challenge"),
"Record name should contain _acme-challenge")
// Verify TTL is reasonable (usually 60 seconds for ACME challenges)
gomega.Expect(*record.TTL).To(gomega.BeNumerically("<=", 300),
"TTL should be short for challenge records")
// Verify record has a value
gomega.Expect(len(record.ResourceRecords)).To(gomega.BeNumerically(">", 0),
"TXT record should have a value")
ginkgo.GinkgoLogr.Info("✅ TXT record validated",
"name", *record.Name,
"ttl", *record.TTL,
"value", *record.ResourceRecords[0].Value)
}
// Verify we have records for our domains
for _, record := range records {
ttl := int64(0)
if record.TTL != nil {
ttl = *record.TTL
}
ginkgo.GinkgoLogr.Info("Found TXT record",
"name", *record.Name,
"type", *record.Type,
"ttl", ttl)
}
return true
}, 10*time.Minute, 15*time.Second).Should(gomega.BeTrue(),
"TXT records should be created in Route53 for DNS-01 challenge")
// Validate record format
for _, record := range challengeRecords {
// Verify it's a TXT record
gomega.Expect(*record.Type).To(gomega.Equal("TXT"), "Record should be TXT type")
// Verify name starts with _acme-challenge
gomega.Expect(*record.Name).To(gomega.ContainSubstring("_acme-challenge"),
"Record name should contain _acme-challenge")
// Verify TTL is reasonable (usually 60 seconds for ACME challenges)
gomega.Expect(*record.TTL).To(gomega.BeNumerically("<=", 300),
"TTL should be short for challenge records")
// Verify record has a value
gomega.Expect(len(record.ResourceRecords)).To(gomega.BeNumerically(">", 0),
"TXT record should have a value")
ginkgo.GinkgoLogr.Info("✅ TXT record validated",
"name", *record.Name,
"ttl", *record.TTL,
"value", *record.ResourceRecords[0].Value)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/certman_operator_tests.go` around lines 514 - 547, The test
currently dereferences record.TTL (and also *record.ResourceRecords[0].Value in
the log) which can panic if TTL or ResourceRecords/Value is nil; update the
validation loop over challengeRecords to first assert TTL is not nil (e.g.
gomega.Expect(record.TTL).NotTo(gomega.BeNil(), "Record TTL should not be nil")
) before using *record.TTL, and similarly ensure ResourceRecords and
ResourceRecords[0].Value are non-nil before dereferencing for logging; adjust
the ginkgo.GinkgoLogr.Info calls to check pointers and use the safe,
already-asserted values (record.TTL, record.ResourceRecords) rather than
directly dereferencing without guards.

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 19, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/certman_operator_tests.go`:
- Around line 550-554: The initial call to FindDNSZoneForClusterDeployment is
not retried and can fail if Hive hasn't created the DNSZone yet; wrap that
lookup in a gomega.Eventually closure that repeatedly calls
FindDNSZoneForClusterDeployment (using the same dynamicClient,
certConfig.TestNamespace and clusterDeploymentName) and asserts no error and
non-nil DNSZone before proceeding; apply the exact same change to the
TXT-cleanup test where the identical pattern appears (the second
FindDNSZoneForClusterDeployment usage).

In `@test/e2e/utils/utils.go`:
- Around line 1688-1709: ListAcmeChallengeTXTRecords currently calls
client.ListResourceRecordSets once and drops records when Route53 pages results;
fix by iterating pages until result.IsTruncated is false and accumulate all
ResourceRecordSets before filtering. Implement pagination by either using
client.ListResourceRecordSetsPages (if available) or by repeatedly calling
client.ListResourceRecordSets with input.StartRecordName and
input.StartRecordType set from result.NextRecordName and result.NextRecordType,
appending each page's ResourceRecordSets to a slice, then apply the existing TXT
+ "_acme-challenge" filter to the combined slice before returning.

---

Duplicate comments:
In `@test/e2e/certman_operator_tests.go`:
- Around line 621-654: The loop validating challengeRecords dereferences
optional fields without nil checks and can panic; before asserting or logging
use explicit nil guards: for each record in challengeRecords first
gomega.Expect(record.TTL).NotTo(gomega.BeNil(), ...) then perform
gomega.Expect(*record.TTL).To(gomega.BeNumerically("<=", 300), ...); similarly
ensure gomega.Expect(len(record.ResourceRecords)).To(gomega.BeNumerically(">",
0), ...) then gomega.Expect(record.ResourceRecords[0]).NotTo(gomega.BeNil(),
...) and gomega.Expect(record.ResourceRecords[0].Value).NotTo(gomega.BeNil(),
...) before dereferencing *record.ResourceRecords[0].Value for assertions or
logging; update the ginkgo.GinkgoLogr.Info calls to only dereference after those
nil checks (or log safely by checking nil).

Comment on lines 550 to 554
ginkgo.It("should create TXT records in Route53 for DNS-01 challenge", func(ctx context.Context) {
// Find the DNSZone for our cluster
dnsZone, err := utils.FindDNSZoneForClusterDeployment(ctx, dynamicClient,
certConfig.TestNamespace, clusterDeploymentName)
gomega.Expect(err).ShouldNot(gomega.HaveOccurred(), "DNSZone should exist for ClusterDeployment")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Initial FindDNSZoneForClusterDeployment call is not retried — test fails immediately if the DNSZone hasn't been created yet.

Hive creates the DNSZone asynchronously after ClusterDeployment creation. The bare gomega.Expect at line 554 asserts the DNSZone exists right now; if Hive hasn't reconciled yet, the test hard-fails before reaching the Eventually readiness poll. The identical pattern is repeated at lines 762–765 in the TXT-cleanup test.

🛡️ Proposed fix — wrap initial lookup in Eventually
-	dnsZone, err := utils.FindDNSZoneForClusterDeployment(ctx, dynamicClient,
-		certConfig.TestNamespace, clusterDeploymentName)
-	gomega.Expect(err).ShouldNot(gomega.HaveOccurred(), "DNSZone should exist for ClusterDeployment")
-
-	// Verify DNSZone is ready before querying Route53
-	gomega.Eventually(func() bool {
-		// Re-fetch DNSZone to get latest status
-		freshDNSZone, err := utils.FindDNSZoneForClusterDeployment(ctx, dynamicClient,
-			certConfig.TestNamespace, clusterDeploymentName)
-		if err != nil {
-			return false
-		}
-		ready, err := utils.VerifyDNSZoneReady(freshDNSZone)
-		if err != nil {
-			ginkgo.GinkgoLogr.Error(err, "Error checking DNSZone readiness")
-			return false
-		}
-		if !ready {
-			ginkgo.GinkgoLogr.Info("DNSZone not ready yet, waiting...")
-		} else {
-			// Update dnsZone with fresh data when ready
-			dnsZone = freshDNSZone
-		}
-		return ready
-	}, 5*time.Minute, 10*time.Second).Should(gomega.BeTrue(), "DNSZone should be ready")
+	// Wait for DNSZone to exist AND be ready (creation is asynchronous)
+	var dnsZone *unstructured.Unstructured
+	gomega.Eventually(func() bool {
+		freshDNSZone, err := utils.FindDNSZoneForClusterDeployment(ctx, dynamicClient,
+			certConfig.TestNamespace, clusterDeploymentName)
+		if err != nil {
+			ginkgo.GinkgoLogr.Info("DNSZone not found yet, waiting...", "error", err)
+			return false
+		}
+		ready, err := utils.VerifyDNSZoneReady(freshDNSZone)
+		if err != nil {
+			ginkgo.GinkgoLogr.Error(err, "Error checking DNSZone readiness")
+			return false
+		}
+		if !ready {
+			ginkgo.GinkgoLogr.Info("DNSZone not ready yet, waiting...")
+			return false
+		}
+		dnsZone = freshDNSZone
+		return true
+	}, 5*time.Minute, 10*time.Second).Should(gomega.BeTrue(), "DNSZone should exist and be ready")

Apply the same fix to lines 762–782.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/certman_operator_tests.go` around lines 550 - 554, The initial call
to FindDNSZoneForClusterDeployment is not retried and can fail if Hive hasn't
created the DNSZone yet; wrap that lookup in a gomega.Eventually closure that
repeatedly calls FindDNSZoneForClusterDeployment (using the same dynamicClient,
certConfig.TestNamespace and clusterDeploymentName) and asserts no error and
non-nil DNSZone before proceeding; apply the exact same change to the
TXT-cleanup test where the identical pattern appears (the second
FindDNSZoneForClusterDeployment usage).

Comment on lines +1688 to +1709
// ListAcmeChallengeTXTRecords queries Route53 for _acme-challenge TXT records in a hosted zone
func ListAcmeChallengeTXTRecords(client *route53.Route53, hostedZoneID string) ([]*route53.ResourceRecordSet, error) {
input := &route53.ListResourceRecordSetsInput{
HostedZoneId: aws.String(hostedZoneID),
}

result, err := client.ListResourceRecordSets(input)
if err != nil {
return nil, fmt.Errorf("failed to list Route53 resource record sets: %w", err)
}

// Filter for _acme-challenge TXT records
var challengeRecords []*route53.ResourceRecordSet
for _, record := range result.ResourceRecordSets {
if record.Type != nil && *record.Type == "TXT" &&
record.Name != nil && strings.Contains(*record.Name, "_acme-challenge") {
challengeRecords = append(challengeRecords, record)
}
}

return challengeRecords, nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

ListAcmeChallengeTXTRecords silently drops records when the hosted zone is paginated.

ListResourceRecordSets returns at most 300 records per call and sets result.IsTruncated = true when more exist. The current implementation makes exactly one call, so ACME-challenge TXT records beyond the first page are silently ignored. The Eventually poll loop in the test would then incorrectly conclude that no challenge records exist.

🛡️ Proposed pagination fix
 func ListAcmeChallengeTXTRecords(client *route53.Route53, hostedZoneID string) ([]*route53.ResourceRecordSet, error) {
-	input := &route53.ListResourceRecordSetsInput{
-		HostedZoneId: aws.String(hostedZoneID),
-	}
-
-	result, err := client.ListResourceRecordSets(input)
-	if err != nil {
-		return nil, fmt.Errorf("failed to list Route53 resource record sets: %w", err)
-	}
-
-	// Filter for _acme-challenge TXT records
 	var challengeRecords []*route53.ResourceRecordSet
-	for _, record := range result.ResourceRecordSets {
-		if record.Type != nil && *record.Type == "TXT" &&
-			record.Name != nil && strings.Contains(*record.Name, "_acme-challenge") {
-			challengeRecords = append(challengeRecords, record)
+	input := &route53.ListResourceRecordSetsInput{
+		HostedZoneId: aws.String(hostedZoneID),
+	}
+	for {
+		result, err := client.ListResourceRecordSets(input)
+		if err != nil {
+			return nil, fmt.Errorf("failed to list Route53 resource record sets: %w", err)
+		}
+		for _, record := range result.ResourceRecordSets {
+			if record.Type != nil && *record.Type == "TXT" &&
+				record.Name != nil && strings.Contains(*record.Name, "_acme-challenge") {
+				challengeRecords = append(challengeRecords, record)
+			}
+		}
+		if result.IsTruncated == nil || !*result.IsTruncated {
+			break
 		}
+		input.StartRecordName = result.NextRecordName
+		input.StartRecordType = result.NextRecordType
+		input.StartRecordIdentifier = result.NextRecordIdentifier
 	}

 	return challengeRecords, nil
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// ListAcmeChallengeTXTRecords queries Route53 for _acme-challenge TXT records in a hosted zone
func ListAcmeChallengeTXTRecords(client *route53.Route53, hostedZoneID string) ([]*route53.ResourceRecordSet, error) {
input := &route53.ListResourceRecordSetsInput{
HostedZoneId: aws.String(hostedZoneID),
}
result, err := client.ListResourceRecordSets(input)
if err != nil {
return nil, fmt.Errorf("failed to list Route53 resource record sets: %w", err)
}
// Filter for _acme-challenge TXT records
var challengeRecords []*route53.ResourceRecordSet
for _, record := range result.ResourceRecordSets {
if record.Type != nil && *record.Type == "TXT" &&
record.Name != nil && strings.Contains(*record.Name, "_acme-challenge") {
challengeRecords = append(challengeRecords, record)
}
}
return challengeRecords, nil
}
// ListAcmeChallengeTXTRecords queries Route53 for _acme-challenge TXT records in a hosted zone
func ListAcmeChallengeTXTRecords(client *route53.Route53, hostedZoneID string) ([]*route53.ResourceRecordSet, error) {
var challengeRecords []*route53.ResourceRecordSet
input := &route53.ListResourceRecordSetsInput{
HostedZoneId: aws.String(hostedZoneID),
}
for {
result, err := client.ListResourceRecordSets(input)
if err != nil {
return nil, fmt.Errorf("failed to list Route53 resource record sets: %w", err)
}
for _, record := range result.ResourceRecordSets {
if record.Type != nil && *record.Type == "TXT" &&
record.Name != nil && strings.Contains(*record.Name, "_acme-challenge") {
challengeRecords = append(challengeRecords, record)
}
}
if result.IsTruncated == nil || !*result.IsTruncated {
break
}
input.StartRecordName = result.NextRecordName
input.StartRecordType = result.NextRecordType
input.StartRecordIdentifier = result.NextRecordIdentifier
}
return challengeRecords, nil
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/utils/utils.go` around lines 1688 - 1709,
ListAcmeChallengeTXTRecords currently calls client.ListResourceRecordSets once
and drops records when Route53 pages results; fix by iterating pages until
result.IsTruncated is false and accumulate all ResourceRecordSets before
filtering. Implement pagination by either using
client.ListResourceRecordSetsPages (if available) or by repeatedly calling
client.ListResourceRecordSets with input.StartRecordName and
input.StartRecordType set from result.NextRecordName and result.NextRecordType,
appending each page's ResourceRecordSets to a slice, then apply the existing TXT
+ "_acme-challenge" filter to the combined slice before returning.

@openshift-ci openshift-ci bot requested review from Tafhim and ravitri February 19, 2026 09:23
@codecov
Copy link

codecov bot commented Feb 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 57.15%. Comparing base (4b2fc9d) to head (cb49991).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #424   +/-   ##
=======================================
  Coverage   57.15%   57.15%           
=======================================
  Files          29       29           
  Lines        2138     2138           
=======================================
  Hits         1222     1222           
  Misses        802      802           
  Partials      114      114           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 19, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: TheUndeadKing
Once this PR has been reviewed and has the lgtm label, please assign rbhilare for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@TheUndeadKing TheUndeadKing changed the title SREP-3024: Add E2E tests for ACME DNS-01 challenge verification [WiP]SREP-3024: Add E2E tests for ACME DNS-01 challenge verification Feb 19, 2026
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 19, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 19, 2026

@TheUndeadKing: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/validate cb49991 link true /test validate

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants