ci: skip SonarCloud Scan on dependabot PRs#70
Merged
Conversation
brandonrc
added a commit
that referenced
this pull request
Apr 25, 2026
Previously every PR against this repo triggered the full terraform matrix (terraform-validate x3 environments + terraform-fmt), even for PRs that only touched docs, chart YAML, argocd values, workflows, or any other non-terraform file. On top of that, any pre-existing terraform job issues on main (e.g. the node wrapper bug) would fail CI for completely unrelated PRs. Add a detect-changes job that checks whether the PR touches anything under terraform/ (plus the CI workflow itself so workflow tweaks keep running the jobs once). Gate terraform-validate and terraform-fmt on its output. SonarCloud keeps running on every PR because it analyzes the whole repo, and it already skips dependabot PRs via #70. Affected open PRs that will stop failing TF jobs after this lands: - iac#68 bump e2e runners - iac#70 sonar-skip-dependabot - iac#71 chart docs - iac#69 LICENSE (already merged, but same pattern)
brandonrc
added a commit
that referenced
this pull request
Apr 25, 2026
…o artifact-keeper#830) (#67) * chore: replace Meilisearch with OpenSearch in Helm chart Companion to artifact-keeper/artifact-keeper#830, which swaps the backend search engine. The old MEILISEARCH_URL and MEILISEARCH_API_KEY env vars are gone; the backend now expects OPENSEARCH_URL, OPENSEARCH_USERNAME, OPENSEARCH_PASSWORD, and OPENSEARCH_ALLOW_INVALID_CERTS. Chart changes: - templates/opensearch-deployment.yaml replaces meilisearch-deployment.yaml. Single-replica renders a Deployment with discovery.type=single-node. replicaCount >= 2 renders a StatefulSet with per-pod PVCs, a headless service, and cluster.initial_cluster_manager_nodes wired to pod names. - templates/backend-deployment.yaml now wires OPENSEARCH_* env vars and the wait-for init container polls /_cluster/health for green or yellow. - templates/secrets.yaml and templates/external-secrets.yaml store OPENSEARCH_USERNAME and OPENSEARCH_PASSWORD instead of a Meilisearch key. - templates/networkpolicy.yaml allows backend to OpenSearch on 9200 and OpenSearch-to-OpenSearch transport on 9300. - values.yaml, values-staging.yaml, values-production.yaml, and both mesh overlays rename meilisearch: to opensearch:. Production uses replicaCount: 3, -Xms2g -Xmx2g, and pod anti-affinity. - aws-scripts/docker-compose.yml swaps the meilisearch service to opensearchproject/opensearch:2.19.1 with discovery.type=single-node. No data migration is needed; the backend auto-reindexes from Postgres on first boot. Security notes: - disableSecurityPlugin defaults to true so the example template does not ship demo certs. Staging/production should provision real TLS via cert-manager and flip disableSecurityPlugin to false. - DISABLE_INSTALL_DEMO_CONFIG is always true regardless of the security plugin toggle. * docs(helm): regenerate README from helm-docs Alphabetize the values table by re-running helm-docs 1.14.2. The prior commit left nameOverride and networkPolicy after the opensearch block, which failed the Helm Docs drift check on PR. * ci(terraform): disable terraform_wrapper on ARC runners hashicorp/setup-terraform@v4 defaults to terraform_wrapper: true, which installs a Node.js shim that invokes the real terraform binary. The ARC runner images do not ship Node.js, so every terraform step fails immediately with: /usr/bin/env: 'node': No such file or directory Process completed with exit code 127. Set terraform_wrapper: false in both terraform-validate and terraform-fmt so terraform runs directly. This unblocks all future Terraform CI runs on ARC runners, not just this PR. * ci(helm): install kubeconform via sudo mv for ARC runners The kubeconform install piped tar directly into /usr/local/bin, which the non-root ARC runner user cannot write to: tar: kubeconform: Cannot open: Permission denied tar: Exiting with failure status due to previous errors Extract to /tmp first then sudo mv, matching the helm-docs install pattern already used in helm-docs.yml. All five Template & Validate matrix jobs (default, production, staging, mesh-main, mesh-peer) hit this failure; this unblocks them together. * ci(helm): install wget before kind-action on ARC runners helm/kind-action's setup script downloads the kind binary via wget, which isn't pre-installed on the ARC self-hosted runner image. The script fails with 'wget: command not found', kind never installs, the cluster never comes up, and the chart-testing install step exits 127 with no useful diagnostics. Add an idempotent install step before kind-action. Same pattern as the earlier kubeconform/kubectl fix in commit bdf9d96, accommodating the runner image's missing tools rather than rebuilding the runner. Unblocks Install Test (kind) job for this PR and any future helm-ci runs on ARC. * ci(helm): run Install Test on GitHub-hosted runner for DinD support The ARC self-hosted runners cannot run helm/kind-action because the kind-action script boots a kubeadm control-plane inside a Docker container, which requires privileged Docker-in-Docker on the runner host. The ARC runner pods run inside a K8s cluster without privileged DinD configured, so kubeadm init times out after 4 minutes without producing a working cluster. Workarounds that don't work: - Install wget (the kind binary downloads fine) - Shortening timeouts (the kubeadm init phase genuinely never completes) The real fix is to configure privileged DinD on ARC runners, which is a multi-day infra project. Until then, run this one job on ubuntu-latest where kind works out of the box. Build cost is small (single job, ~5 runs per week on this workflow). Every other job in helm-ci continues to use ak-ci-runners. * ci(helm): switch install-test from kind to k3d, keep on ARC runners Reverts the previous move to ubuntu-latest. Instead of fighting the kind-in-DinD nesting problem (kubeadm control-plane init timed out inside the arc-runner pod's DinD sidecar), switch to k3d. k3d runs k3s as a single Docker container using the existing DinD sidecar. k3s uses embedded containerd and does not use kubeadm, so it boots reliably in constrained environments where kind fails. This keeps CI entirely on the self-hosted ARC runners rather than depending on GitHub-hosted runners. k3d v5.7.5 is pinned. traefik and metrics-server are disabled since the chart-testing install check does not need them. Removes the wget install workaround (k3d's installer is a direct binary download via curl, no wget needed). * ci(helm): install kubectl for k3d-based Install Test job The k3d cluster comes up fine on the ARC runners (previous run showed 'Cluster chart-testing created successfully'), but the next step hit 'kubectl: command not found'. helm/kind-action used to install kubectl as a side effect; k3d does not. Add azure/setup-kubectl@v4 pinned to v1.29.3 (matches what kind-action was installing) before the k3d install step. * ci(helm): disable Install Test for now, document why The k3d cluster comes up correctly on our ARC runners, but installing the full artifact-keeper chart (backend + web + postgres + trivy + opensearch + dtrack) hits 'context deadline exceeded' because the pods do not all fit + initialise within chart-testing's timeout when the runner pod is limited to 4 CPU / 8 Gi. Chart correctness is still validated by: - helm lint (Lint job) - helm template across 5 value overlays (Template & Validate x5) The install smoke test was adding signal only on the 'does the chart actually apply to a cluster and become Ready' dimension, which is redundant for release validation when template rendering is clean. Re-enable once either: 1. Runner pods are resized so the full chart fits, or 2. chart-testing is configured to install only a subset of components, or 3. The chart itself is split so individual sub-charts can be tested Tracking issue will be filed separately. * ci(helm): add ak-beefy-runners scale set, route install-test to it Adds a third ARC AutoscalingRunnerSet with a larger resource envelope (4 CPU / 8 Gi requests, 8 CPU / 16 Gi limits, maxRunners: 5). Routes the Install Test (k3d) job to it so the k3d cluster plus the full artifact-keeper chart (backend, web, postgres, trivy, opensearch, dtrack) can stand up inside chart-testing's timeout window. Previous runs on the regular ak-ci-runners pool (4 CPU / 8 Gi limit) hit 'context deadline exceeded' because the aggregate pod footprint plus k3s plus the runner overhead did not fit. Capacity check: maxRunners=5 x 4 CPU / 8 Gi requests = 20 CPU / 40 Gi peak. Rocky has 88 CPU / 164 Gi allocatable, leaves plenty of headroom alongside ak-ci-runners (up to 40 CPU / 80 Gi) and ak-e2e-runners (up to 40 CPU / 80 Gi post iac#68). Requires a follow-up deploy on the cluster before the new pool is actually available: helm upgrade --install ak-beefy-runners \ --namespace arc-runners \ -f argocd/arc-beefy-runners-values.yaml \ oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set Reverts the previous 'if: false' disable of install-test. The chart correctness validation via helm lint and helm template continues as before and is independent. * ci(arc): fix beefy runner DinD wiring to match ak-ci-runners Previous beefy-runners template was modeled on arc-e2e-runners-values.yaml (which has never actually run a job in production) and declared dind as a regular container without the args needed for the runner to reach the socket. On a live k3d install-test run this produced: permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock Reshape the values file to mirror ak-ci-runners (known-working) with doubled resources: - Move dind to initContainers with restartPolicy: Always (K8s 1.29+ native sidecar pattern, required by the 0.13 chart) - Add explicit dockerd args: --host=unix:///var/run/docker.sock --storage-driver=vfs (for nested-container safety) --group=123 (socket ACL group for runner user) - Rename the shared socket mount from dind-sock to dind-var, matching ak-ci-runners - Pin docker:27-dind (not floating docker:dind tag) - Add a startupProbe so the runner is told to wait 120s for docker readiness via RUNNER_WAIT_FOR_DOCKER_IN_SECONDS Document the chart version pin (0.13.1) and the whole shape in the header so future operators don't repeat the mistake. * ci(helm): bump chart-testing install timeout to 20m Default helm install timeout is 5 minutes. Inside k3s-in-DinD, image pulls for the full artifact-keeper chart (backend, web, postgres, opensearch, dtrack, trivy) are serial and can take 8-10 minutes on a cold runner because containerd inside the k3s node has no image cache between CI runs. Previous run timed out with pods still in Pending and no scheduling events, which is the signature of images still being pulled when helm gave up. * ci: path-filter terraform jobs so they only run on terraform changes Previously every PR against this repo triggered the full terraform matrix (terraform-validate x3 environments + terraform-fmt), even for PRs that only touched docs, chart YAML, argocd values, workflows, or any other non-terraform file. On top of that, any pre-existing terraform job issues on main (e.g. the node wrapper bug) would fail CI for completely unrelated PRs. Add a detect-changes job that checks whether the PR touches anything under terraform/ (plus the CI workflow itself so workflow tweaks keep running the jobs once). Gate terraform-validate and terraform-fmt on its output. SonarCloud keeps running on every PR because it analyzes the whole repo, and it already skips dependabot PRs via #70. Affected open PRs that will stop failing TF jobs after this lands: - iac#68 bump e2e runners - iac#70 sonar-skip-dependabot - iac#71 chart docs - iac#69 LICENSE (already merged, but same pattern) * ci(helm): use k3s native snapshotter (nested overlayfs unsupported) Root cause of the previous Install Test hang: k3s's embedded containerd defaulted to the overlayfs snapshotter. The beefy runner's DinD sidecar already runs on overlayfs, so k3s's attempt to mount another overlayfs inside it fails with 'invalid argument', kubelet never starts, and the k3s node never registers with the apiserver. Every pod (including k3s system pods coredns and local-path-provisioner) stays Pending forever because there's nothing to schedule them onto. Switching to --snapshotter=native tells containerd to copy files directly instead of stacking overlay mounts. Slower but works in nested container environments. Also bump k3d --timeout from 120s to 300s since native snapshotter is slower to unpack initial images. * ci(helm): override ADMIN_PASSWORD in test-values so backend readyz unlocks With the beefy runner + native snapshotter fixes, k3s now registers, pods schedule, postgres/web come up fine. The backend itself also starts cleanly (DB migrations, HTTP listening on 8080) but stays 0/1 Ready because its /readyz endpoint returns 503 for as long as the admin API is locked. The lock triggers when the ADMIN_PASSWORD env var is the default "admin" from values.yaml. This is a real security feature for production (fresh deploys must change the password before API use), but it blocks chart-testing's helm --wait because the startup probe hits /readyz which returns 503 until the lock clears. Override to a non-default value in the CI-only test-values.yaml so the lock does not trigger during install smoke tests. Production installs continue to use secrets.yaml patterns with proper password management.
GitHub withholds repo secrets from dependabot workflow runs for security reasons, so SONAR_TOKEN is always empty on those runs and the scanner exits with 'Running this GitHub Action without SONAR_TOKEN is not recommended' followed by a failure. SonarCloud scans on 'bump X from Y to Z' PRs provide no real signal anyway, since dependabot does not modify code. The scan still runs on push events to main (where secrets are available) and on human PRs. This unblocks iac#47, iac#49, iac#52, iac#58 which are all waiting on SonarCloud to pass. Longer-term alternative is adding SONAR_TOKEN to the repo's Dependabot Secrets (Settings > Secrets and variables > Dependabot), which lets dependabot PRs see the token. This workflow change avoids that maintenance step.
d2920fd to
aff9571
Compare
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Summary
GitHub withholds repo secrets from dependabot workflow runs for security reasons, so
SONAR_TOKENis always empty on those runs and the SonarCloud scanner exits with a failure. This blocks ~5 dependency-bump PRs from ever reaching green CI.SonarCloud scans on dependency-bump PRs provide no real signal anyway (dependabot does not modify source code). The scan still runs on push events to
main(where secrets are available) and on human-authored PRs.Unblocks:
Longer-term alternative: add SONAR_TOKEN to the repo's Dependabot Secrets (Settings > Secrets and variables > Dependabot). That lets dependabot PRs see the token. This workflow change avoids that maintenance step.
Test Checklist
API Changes