Skip to content

Konflux integration#145

Open
AdamSaleh wants to merge 34 commits into
mainfrom
konflux-integration
Open

Konflux integration#145
AdamSaleh wants to merge 34 commits into
mainfrom
konflux-integration

Conversation

@AdamSaleh

Copy link
Copy Markdown
Collaborator

This should be ready for review.

The code itself was produced mostly by claude but I reviewed and tested it extensively.

Intent of the pipeline:

The pipeline uses a specific test image, that is split into base, that should be rebuilt infrequently and a layer on top of it that contains all of the scripts used in the pipeline
the scripts are configured by env-vars, making them easy to use from pipelines or when testing locally
Pipeline provisions it's own konflux cluster - it is based on arm64 hypershift, version 4.14
It installs appropriate catalog and installs the latest operator.
runs the parallel test-suite from https://github.com/rh-gitops-release-qa/gitops-operator that houses the for of gitops-operator, to facilitate for fast test updates
after it finishes it pushes installation logs, test logs and assorted debug information as quay artefact
it sends a message to gitops-test-notification channel

Outstanding questions:

what openshift version should we test against?
should there be difference for running against master?
any part of the pipeline that looks wonky?

@AdamSaleh

Copy link
Copy Markdown
Collaborator Author

/retest

2 similar comments
@AdamSaleh

Copy link
Copy Markdown
Collaborator Author

/retest

@AdamSaleh

Copy link
Copy Markdown
Collaborator Author

/retest

@AdamSaleh AdamSaleh force-pushed the konflux-integration branch from 2e84399 to acd01fe Compare June 5, 2026 12:56
@AdamSaleh

Copy link
Copy Markdown
Collaborator Author

/retest

@AdamSaleh

Copy link
Copy Markdown
Collaborator Author

/retest

4 similar comments
@AdamSaleh

Copy link
Copy Markdown
Collaborator Author

/retest

@AdamSaleh

Copy link
Copy Markdown
Collaborator Author

/retest

@AdamSaleh

Copy link
Copy Markdown
Collaborator Author

/retest

@AdamSaleh

Copy link
Copy Markdown
Collaborator Author

/retest

@AdamSaleh

Copy link
Copy Markdown
Collaborator Author

/retest

@AdamSaleh

Copy link
Copy Markdown
Collaborator Author

/retest

4 similar comments
@AdamSaleh

Copy link
Copy Markdown
Collaborator Author

/retest

@AdamSaleh

Copy link
Copy Markdown
Collaborator Author

/retest

@AdamSaleh

Copy link
Copy Markdown
Collaborator Author

/retest

@AdamSaleh

Copy link
Copy Markdown
Collaborator Author

/retest

@AdamSaleh

Copy link
Copy Markdown
Collaborator Author

/retest

1 similar comment
@AdamSaleh

Copy link
Copy Markdown
Collaborator Author

/retest

There are currently four test-suites being run:
- gitops-operator's e2e ginkgo test-suite, sharded into 3 scripts
- the rollouts e2e tests
- gitops operator's ui test verifying login (more tests to come)
- the argocd tests in a separate pipeline

There is simple parametrized pipeline, where you can choose:
- the openshift version
- size of cluster nodes
- the channel to be used in the catalog
- the test-script to run

Secont separate pipeline installs standalone argocd and runs the e2e tests

All the tests are run from precompiled docker image,
the pipeline will check at the start and build them if hte images were
changed. The test and utility scripts always get copied.

The logs get uploaded to quay.
At the end of the pipeline, it will send a message to
gitops-test-notification channel on slack

The code is mostly authored by prompting claude and tested
against the v1.20 branch of the catalog repo.

Assisted-by: Claude <usersafety@anthropic.com>
Signed-off-by: Adam Saleh <adam@asaleh.net>
AdamSaleh and others added 24 commits June 22, 2026 17:35
The parse-metadata task reads event type and PR number from pod labels
(pac.test.appstudio.openshift.io/*), but these labels may not be
propagated to integration test PipelineRun pods. This adds a fallback
that reads the PipelineRun labels directly via the Kubernetes API,
plus diagnostic logging to help debug label propagation issues.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When build-test-image is skipped (BUILD_TEST_IMAGE=false), Tekton
cascade-skips any task that references its results. Adding default
values to the results means resolve-test-image receives "" instead
of being skipped, allowing it to fall through to the TEST_IMAGE_URL
fallback. This was causing all downstream tasks (install-operator,
test-operator) to be skipped after provision-cluster.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… ref

Revert result defaults (unsupported by this Tekton version) and instead
remove the $(tasks.build-test-image.results.IMAGE_URL) reference from
resolve-test-image params. Pass "" so it always falls through to
TEST_IMAGE_URL. BUILD_TEST_IMAGE is not actively used; wiring can be
restored when needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the inline resolve-test-image pass-through with a new
overlay-test-scripts task that builds a thin scripts layer on top of
the pre-built base image. This ensures new/changed scripts (like
run-sanity-tests.sh) are always available without full image rebuilds.

The task clones the catalog repo, hashes scripts/ and config/ dirs,
and skips the build on cache hit (skopeo inspect). On miss it builds
a single-layer overlay with buildah and pushes to quay.

Both operator and argocd e2e pipelines now use this task.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The grep pattern for extracting PR labels assumed "name":"value"
(no space after colon), but GitHub returns "name": "value" with
a space, causing label detection to always fail.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Separate the image resolution and scripts overlay into distinct tasks:
- resolve-test-image: inline task that picks the base image (build
  output via K8s API when BUILD_TEST_IMAGE=true, or TEST_IMAGE_URL)
- overlay-test-scripts: builds scripts layer on top of resolved base

resolve-test-image now has runAfter: [build-test-image] so it waits
for the full build when active, then passes the build output to the
overlay task. When build is skipped (common case), resolve runs
immediately with the pre-built TEST_IMAGE_URL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The guestbook app sync failed because the ArgoCD application
controller lacked permissions in the target namespace. Label the
namespace with argocd.argoproj.io/managed-by so the operator
automatically creates the required RoleBindings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The guestbook deployment can take a while to pull its image on EaaS
clusters. The sanity test validates that ArgoCD can sync an app, not
that the container starts quickly. Accept Synced as the primary
success condition — Progressing health is noted but not a failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The guestbook app pulls gcr.io/google-samples/gb-frontend:v5 which
is slow on EaaS clusters, causing the health check to time out at
Progressing. Replace with a ConfigMap-only app from the catalog repo
itself — no image pull, instant Synced+Healthy, still validates the
full ArgoCD sync path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Point the ArgoCD smoke test at a ConfigMap in the catalog repo itself
(.tekton/test-image/config/smoke-app/) instead of the guestbook app.
Uses CATALOG_URL and CATALOG_REVISION env vars so ArgoCD syncs from
the same branch the pipeline is running from — the smoke-app path
exists on that branch.

No image pull needed, instant Synced+Healthy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add IntegrationTestScenario definitions for all test types:

- rc-operator-check: parallel, parallel-fips, sequential-s1, sequential-s2,
  rollouts, parallel-upgrade, sequential-s1-upgrade (7 scenarios)
- rc-argocd-check: argocd-e2e, argocd-e2e-fips (2 scenarios)
- rc-ui-check: ui-e2e (1 scenario)

All scenarios are optional and gated on PR labels. Only operator
and sanity test groups include upgrade testing variants.

Also adds catalogUrl/catalogRevision params to test-operator task
for smoke test app source resolution.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…switch

- Add fallback to pre-compiled argocd binary from test image when all
  extraction methods fail (IDMS mirror + arch mismatch on EaaS clusters)
- Add wait_for_argocd_reconciliation() to ensure ArgoCD workloads are
  updated with new images before tests run after an operator upgrade
- Switch test suite repo from rh-gitops-release-qa to redhat-developer

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename Dockerfile.base to Dockerfile.base-v1.21 with Go 1.26.2
  installed from go.dev tarball (UBI9 dnf only provides 1.25.9)
- Add BASE_DOCKERFILE param across pipelines, build task, and build
  script so different operator versions can use different Go toolchains
- Restructure check-gate to produce a build-image result driven by
  both the BUILD_TEST_IMAGE param and the build-test-image PR label
- Fix false FAIL reporting: derive status from test-results.json
  (actual test pass/fail) instead of $(tasks.status) pipeline aggregate

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When tests fail to compile (0 total, 0 failed, 0 errors), the status
was incorrectly overridden to Succeeded. Now only override when tests
actually ran (total > 0) or explicitly failed (failed > 0 or errors > 0).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Konflux integration service does not propagate PAC labels
(pac.test.appstudio.openshift.io/event-type) to integration test
PipelineRuns. This left EVENT_TYPE empty, causing check-gate to skip
label checks and never trigger build-test-image.

Add a GitHub API fallback that queries /commits/{sha}/pulls to find
the associated PR when PAC labels are unavailable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove label-gated build-test-image triggering (PAC labels are not
available in Konflux integration tests). The build task now runs
unconditionally when proceed=true, relying on content-hash caching
to skip unchanged layers.

Pin all floating versions in Dockerfile.base-v1.21:
- UBI9 base: latest → 9.8
- OC client: stable-4.14 (floating) → 4.14.67

Remove resolve-test-image task — overlay-test-scripts now references
build-test-image results directly. Simplify check-gate to only
produce the proceed result.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous commit over-simplified check-gate by removing the GitHub
API fallback that finds PRs by commit SHA. This broke GATE_LABEL
gating — without PR detection, the label check never runs.

Restore the commit-to-PR lookup and label check, keeping only the
build-image detection removed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The overlay-test-scripts cache key only hashed scripts/config files,
ignoring the upstream base image. When the base changed (Go 1.25.9 →
1.26.2), the overlay served a stale cached image built on the old base,
causing test compilation failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The generate_readme function referenced undefined IMAGE_TAG, causing
an unbound variable crash (set -u). Use the same tag as upload_logs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements a DAST (Dynamic Application Security Testing) integration test
that installs the GitOps operator on an ephemeral cluster and runs
RapidAST/ZAP against the deployed ArgoCD REST API.

New files:
- tasks/test-dast.yaml: provisions cluster info, runs rapidast.py, parses
  ZAP findings to JUnit XML, uploads task artifact to Quay
- pipelines/catalog-gitops-operator-dast.yaml: same structure as the
  operator e2e pipeline (parse-metadata → check-gate → build/overlay →
  provision → install → test-dast → pipeline-wrapup)
- scenarios/gitops-dast.yaml: IntegrationTestScenario gated on run-dast label
- scripts/parse-dast-results.py: converts ZAP JSON to JUnit XML, applying
  configurable thresholds and false-positive suppression rules
- config/dast-false-positives.json: alert thresholds and suppression rules;
  baked into the overlay image at /usr/local/config/

The GCP secret (gitops-dast-gcp-key) must contain gcp-key.json for
uploading raw findings to the gitops-results GCS bucket.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ection

$(results.X.path) in a comment was interpreted by Tekton's admission
webhook as an unresolved variable reference. Write the result directly to
/tekton/results/LOG_ARTIFACT_TAG, consistent with other tasks in the repo.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…endency

The git resolver was failing with 403 when resolving the
eaas-get-ephemeral-cluster-credentials StepAction from
konflux-ci/build-definitions at pipeline admission time.

Inline the equivalent logic directly in the get-kubeconfig step using
testImageUrl (which already has oc). The hub kubeconfig is injected via
secretKeyRef on the eaasSpaceSecretRef secret, and the cluster kubeconfig
is written to /credentials/<cluster>-kubeconfig — the same path the
subsequent get-cluster-info step already discovers via find.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The first DAST run stalled for 3+ hours with no output — the ZAP active
scan had no time cap and ran until the task was killed externally.

- activeScan.maxScanDurationInMins: 30  — ZAP stops active scan at 30m
- miscOptions.maxRuleDurationInMins: 5  — any single rule capped at 5m
- pipeline task timeout: 1h30m          — hard ceiling on the whole task

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@AdamSaleh

Copy link
Copy Markdown
Collaborator Author

/retest

The GCP secret in rh-openshift-gitops-tenant is named 'gcp' with key
'key.json', not 'gitops-dast-gcp-key'/'gcp-key.json'. The name mismatch
caused the test-dast pod to hang indefinitely in PodInitializing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@AdamSaleh

Copy link
Copy Markdown
Collaborator Author

/retest

1 similar comment
@AdamSaleh

Copy link
Copy Markdown
Collaborator Author

/retest

@AdamSaleh AdamSaleh force-pushed the konflux-integration branch from 7fcc555 to 15d6e10 Compare June 22, 2026 21:12
kubeadmin passwords can contain \$ characters (e.g. \$2 from bcrypt-style
strings) which survive literally into the env file when written via an
unquoted heredoc, then fail with 'unbound variable' when sourced under
set -u. Use printf '%q' to produce properly shell-escaped output.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant