Konflux integration#145
Open
AdamSaleh wants to merge 34 commits into
Open
Conversation
Collaborator
Author
|
/retest |
2 similar comments
Collaborator
Author
|
/retest |
Collaborator
Author
|
/retest |
2e84399 to
acd01fe
Compare
Collaborator
Author
|
/retest |
Collaborator
Author
|
/retest |
4 similar comments
Collaborator
Author
|
/retest |
Collaborator
Author
|
/retest |
Collaborator
Author
|
/retest |
Collaborator
Author
|
/retest |
Collaborator
Author
|
/retest |
Collaborator
Author
|
/retest |
4 similar comments
Collaborator
Author
|
/retest |
Collaborator
Author
|
/retest |
Collaborator
Author
|
/retest |
Collaborator
Author
|
/retest |
Collaborator
Author
|
/retest |
1 similar comment
Collaborator
Author
|
/retest |
There are currently four test-suites being run: - gitops-operator's e2e ginkgo test-suite, sharded into 3 scripts - the rollouts e2e tests - gitops operator's ui test verifying login (more tests to come) - the argocd tests in a separate pipeline There is simple parametrized pipeline, where you can choose: - the openshift version - size of cluster nodes - the channel to be used in the catalog - the test-script to run Secont separate pipeline installs standalone argocd and runs the e2e tests All the tests are run from precompiled docker image, the pipeline will check at the start and build them if hte images were changed. The test and utility scripts always get copied. The logs get uploaded to quay. At the end of the pipeline, it will send a message to gitops-test-notification channel on slack The code is mostly authored by prompting claude and tested against the v1.20 branch of the catalog repo. Assisted-by: Claude <usersafety@anthropic.com> Signed-off-by: Adam Saleh <adam@asaleh.net>
The parse-metadata task reads event type and PR number from pod labels (pac.test.appstudio.openshift.io/*), but these labels may not be propagated to integration test PipelineRun pods. This adds a fallback that reads the PipelineRun labels directly via the Kubernetes API, plus diagnostic logging to help debug label propagation issues. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When build-test-image is skipped (BUILD_TEST_IMAGE=false), Tekton cascade-skips any task that references its results. Adding default values to the results means resolve-test-image receives "" instead of being skipped, allowing it to fall through to the TEST_IMAGE_URL fallback. This was causing all downstream tasks (install-operator, test-operator) to be skipped after provision-cluster. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… ref Revert result defaults (unsupported by this Tekton version) and instead remove the $(tasks.build-test-image.results.IMAGE_URL) reference from resolve-test-image params. Pass "" so it always falls through to TEST_IMAGE_URL. BUILD_TEST_IMAGE is not actively used; wiring can be restored when needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the inline resolve-test-image pass-through with a new overlay-test-scripts task that builds a thin scripts layer on top of the pre-built base image. This ensures new/changed scripts (like run-sanity-tests.sh) are always available without full image rebuilds. The task clones the catalog repo, hashes scripts/ and config/ dirs, and skips the build on cache hit (skopeo inspect). On miss it builds a single-layer overlay with buildah and pushes to quay. Both operator and argocd e2e pipelines now use this task. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The grep pattern for extracting PR labels assumed "name":"value" (no space after colon), but GitHub returns "name": "value" with a space, causing label detection to always fail. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Separate the image resolution and scripts overlay into distinct tasks: - resolve-test-image: inline task that picks the base image (build output via K8s API when BUILD_TEST_IMAGE=true, or TEST_IMAGE_URL) - overlay-test-scripts: builds scripts layer on top of resolved base resolve-test-image now has runAfter: [build-test-image] so it waits for the full build when active, then passes the build output to the overlay task. When build is skipped (common case), resolve runs immediately with the pre-built TEST_IMAGE_URL. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The guestbook app sync failed because the ArgoCD application controller lacked permissions in the target namespace. Label the namespace with argocd.argoproj.io/managed-by so the operator automatically creates the required RoleBindings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The guestbook deployment can take a while to pull its image on EaaS clusters. The sanity test validates that ArgoCD can sync an app, not that the container starts quickly. Accept Synced as the primary success condition — Progressing health is noted but not a failure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The guestbook app pulls gcr.io/google-samples/gb-frontend:v5 which is slow on EaaS clusters, causing the health check to time out at Progressing. Replace with a ConfigMap-only app from the catalog repo itself — no image pull, instant Synced+Healthy, still validates the full ArgoCD sync path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Point the ArgoCD smoke test at a ConfigMap in the catalog repo itself (.tekton/test-image/config/smoke-app/) instead of the guestbook app. Uses CATALOG_URL and CATALOG_REVISION env vars so ArgoCD syncs from the same branch the pipeline is running from — the smoke-app path exists on that branch. No image pull needed, instant Synced+Healthy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add IntegrationTestScenario definitions for all test types: - rc-operator-check: parallel, parallel-fips, sequential-s1, sequential-s2, rollouts, parallel-upgrade, sequential-s1-upgrade (7 scenarios) - rc-argocd-check: argocd-e2e, argocd-e2e-fips (2 scenarios) - rc-ui-check: ui-e2e (1 scenario) All scenarios are optional and gated on PR labels. Only operator and sanity test groups include upgrade testing variants. Also adds catalogUrl/catalogRevision params to test-operator task for smoke test app source resolution. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…switch - Add fallback to pre-compiled argocd binary from test image when all extraction methods fail (IDMS mirror + arch mismatch on EaaS clusters) - Add wait_for_argocd_reconciliation() to ensure ArgoCD workloads are updated with new images before tests run after an operator upgrade - Switch test suite repo from rh-gitops-release-qa to redhat-developer Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename Dockerfile.base to Dockerfile.base-v1.21 with Go 1.26.2 installed from go.dev tarball (UBI9 dnf only provides 1.25.9) - Add BASE_DOCKERFILE param across pipelines, build task, and build script so different operator versions can use different Go toolchains - Restructure check-gate to produce a build-image result driven by both the BUILD_TEST_IMAGE param and the build-test-image PR label - Fix false FAIL reporting: derive status from test-results.json (actual test pass/fail) instead of $(tasks.status) pipeline aggregate Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When tests fail to compile (0 total, 0 failed, 0 errors), the status was incorrectly overridden to Succeeded. Now only override when tests actually ran (total > 0) or explicitly failed (failed > 0 or errors > 0). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Konflux integration service does not propagate PAC labels
(pac.test.appstudio.openshift.io/event-type) to integration test
PipelineRuns. This left EVENT_TYPE empty, causing check-gate to skip
label checks and never trigger build-test-image.
Add a GitHub API fallback that queries /commits/{sha}/pulls to find
the associated PR when PAC labels are unavailable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove label-gated build-test-image triggering (PAC labels are not available in Konflux integration tests). The build task now runs unconditionally when proceed=true, relying on content-hash caching to skip unchanged layers. Pin all floating versions in Dockerfile.base-v1.21: - UBI9 base: latest → 9.8 - OC client: stable-4.14 (floating) → 4.14.67 Remove resolve-test-image task — overlay-test-scripts now references build-test-image results directly. Simplify check-gate to only produce the proceed result. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous commit over-simplified check-gate by removing the GitHub API fallback that finds PRs by commit SHA. This broke GATE_LABEL gating — without PR detection, the label check never runs. Restore the commit-to-PR lookup and label check, keeping only the build-image detection removed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The overlay-test-scripts cache key only hashed scripts/config files, ignoring the upstream base image. When the base changed (Go 1.25.9 → 1.26.2), the overlay served a stale cached image built on the old base, causing test compilation failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The generate_readme function referenced undefined IMAGE_TAG, causing an unbound variable crash (set -u). Use the same tag as upload_logs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements a DAST (Dynamic Application Security Testing) integration test that installs the GitOps operator on an ephemeral cluster and runs RapidAST/ZAP against the deployed ArgoCD REST API. New files: - tasks/test-dast.yaml: provisions cluster info, runs rapidast.py, parses ZAP findings to JUnit XML, uploads task artifact to Quay - pipelines/catalog-gitops-operator-dast.yaml: same structure as the operator e2e pipeline (parse-metadata → check-gate → build/overlay → provision → install → test-dast → pipeline-wrapup) - scenarios/gitops-dast.yaml: IntegrationTestScenario gated on run-dast label - scripts/parse-dast-results.py: converts ZAP JSON to JUnit XML, applying configurable thresholds and false-positive suppression rules - config/dast-false-positives.json: alert thresholds and suppression rules; baked into the overlay image at /usr/local/config/ The GCP secret (gitops-dast-gcp-key) must contain gcp-key.json for uploading raw findings to the gitops-results GCS bucket. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ection $(results.X.path) in a comment was interpreted by Tekton's admission webhook as an unresolved variable reference. Write the result directly to /tekton/results/LOG_ARTIFACT_TAG, consistent with other tasks in the repo. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…endency The git resolver was failing with 403 when resolving the eaas-get-ephemeral-cluster-credentials StepAction from konflux-ci/build-definitions at pipeline admission time. Inline the equivalent logic directly in the get-kubeconfig step using testImageUrl (which already has oc). The hub kubeconfig is injected via secretKeyRef on the eaasSpaceSecretRef secret, and the cluster kubeconfig is written to /credentials/<cluster>-kubeconfig — the same path the subsequent get-cluster-info step already discovers via find. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The first DAST run stalled for 3+ hours with no output — the ZAP active scan had no time cap and ran until the task was killed externally. - activeScan.maxScanDurationInMins: 30 — ZAP stops active scan at 30m - miscOptions.maxRuleDurationInMins: 5 — any single rule capped at 5m - pipeline task timeout: 1h30m — hard ceiling on the whole task Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Collaborator
Author
|
/retest |
The GCP secret in rh-openshift-gitops-tenant is named 'gcp' with key 'key.json', not 'gitops-dast-gcp-key'/'gcp-key.json'. The name mismatch caused the test-dast pod to hang indefinitely in PodInitializing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Collaborator
Author
|
/retest |
1 similar comment
Collaborator
Author
|
/retest |
7fcc555 to
15d6e10
Compare
kubeadmin passwords can contain \$ characters (e.g. \$2 from bcrypt-style strings) which survive literally into the env file when written via an unquoted heredoc, then fail with 'unbound variable' when sourced under set -u. Use printf '%q' to produce properly shell-escaped output. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This should be ready for review.
The code itself was produced mostly by claude but I reviewed and tested it extensively.
Intent of the pipeline:
Outstanding questions: