Skip to content

ci: skip SonarCloud Scan on dependabot PRs#70

Merged
brandonrc merged 1 commit into
mainfrom
ci/skip-sonarcloud-for-dependabot
Apr 25, 2026
Merged

ci: skip SonarCloud Scan on dependabot PRs#70
brandonrc merged 1 commit into
mainfrom
ci/skip-sonarcloud-for-dependabot

Conversation

@brandonrc

Copy link
Copy Markdown
Contributor

Summary

GitHub withholds repo secrets from dependabot workflow runs for security reasons, so SONAR_TOKEN is always empty on those runs and the SonarCloud scanner exits with a failure. This blocks ~5 dependency-bump PRs from ever reaching green CI.

SonarCloud scans on dependency-bump PRs provide no real signal anyway (dependabot does not modify source code). The scan still runs on push events to main (where secrets are available) and on human-authored PRs.

Unblocks:

Longer-term alternative: add SONAR_TOKEN to the repo's Dependabot Secrets (Settings > Secrets and variables > Dependabot). That lets dependabot PRs see the token. This workflow change avoids that maintenance step.

Test Checklist

  • Unit tests added/updated (N/A - workflow only)
  • Manually tested locally (YAML syntax + conditional expression verified)
  • No regressions: SonarCloud still runs on main pushes and human PRs

API Changes

  • N/A - no API changes

@brandonrc brandonrc requested a review from a team as a code owner April 25, 2026 01:57
brandonrc added a commit that referenced this pull request Apr 25, 2026
Previously every PR against this repo triggered the full terraform
matrix (terraform-validate x3 environments + terraform-fmt), even for
PRs that only touched docs, chart YAML, argocd values, workflows, or
any other non-terraform file. On top of that, any pre-existing
terraform job issues on main (e.g. the node wrapper bug) would fail
CI for completely unrelated PRs.

Add a detect-changes job that checks whether the PR touches anything
under terraform/ (plus the CI workflow itself so workflow tweaks keep
running the jobs once). Gate terraform-validate and terraform-fmt on
its output.

SonarCloud keeps running on every PR because it analyzes the whole
repo, and it already skips dependabot PRs via #70.

Affected open PRs that will stop failing TF jobs after this lands:
- iac#68 bump e2e runners
- iac#70 sonar-skip-dependabot
- iac#71 chart docs
- iac#69 LICENSE (already merged, but same pattern)
brandonrc added a commit that referenced this pull request Apr 25, 2026
…o artifact-keeper#830) (#67)

* chore: replace Meilisearch with OpenSearch in Helm chart

Companion to artifact-keeper/artifact-keeper#830, which swaps the
backend search engine. The old MEILISEARCH_URL and MEILISEARCH_API_KEY
env vars are gone; the backend now expects OPENSEARCH_URL,
OPENSEARCH_USERNAME, OPENSEARCH_PASSWORD, and OPENSEARCH_ALLOW_INVALID_CERTS.

Chart changes:
- templates/opensearch-deployment.yaml replaces meilisearch-deployment.yaml.
  Single-replica renders a Deployment with discovery.type=single-node.
  replicaCount >= 2 renders a StatefulSet with per-pod PVCs, a headless
  service, and cluster.initial_cluster_manager_nodes wired to pod names.
- templates/backend-deployment.yaml now wires OPENSEARCH_* env vars and
  the wait-for init container polls /_cluster/health for green or yellow.
- templates/secrets.yaml and templates/external-secrets.yaml store
  OPENSEARCH_USERNAME and OPENSEARCH_PASSWORD instead of a Meilisearch key.
- templates/networkpolicy.yaml allows backend to OpenSearch on 9200 and
  OpenSearch-to-OpenSearch transport on 9300.
- values.yaml, values-staging.yaml, values-production.yaml, and both
  mesh overlays rename meilisearch: to opensearch:. Production uses
  replicaCount: 3, -Xms2g -Xmx2g, and pod anti-affinity.
- aws-scripts/docker-compose.yml swaps the meilisearch service to
  opensearchproject/opensearch:2.19.1 with discovery.type=single-node.

No data migration is needed; the backend auto-reindexes from Postgres
on first boot.

Security notes:
- disableSecurityPlugin defaults to true so the example template does
  not ship demo certs. Staging/production should provision real TLS
  via cert-manager and flip disableSecurityPlugin to false.
- DISABLE_INSTALL_DEMO_CONFIG is always true regardless of the
  security plugin toggle.

* docs(helm): regenerate README from helm-docs

Alphabetize the values table by re-running helm-docs 1.14.2. The prior
commit left nameOverride and networkPolicy after the opensearch block,
which failed the Helm Docs drift check on PR.

* ci(terraform): disable terraform_wrapper on ARC runners

hashicorp/setup-terraform@v4 defaults to terraform_wrapper: true, which
installs a Node.js shim that invokes the real terraform binary. The ARC
runner images do not ship Node.js, so every terraform step fails
immediately with:

    /usr/bin/env: 'node': No such file or directory
    Process completed with exit code 127.

Set terraform_wrapper: false in both terraform-validate and terraform-fmt
so terraform runs directly. This unblocks all future Terraform CI runs
on ARC runners, not just this PR.

* ci(helm): install kubeconform via sudo mv for ARC runners

The kubeconform install piped tar directly into /usr/local/bin, which
the non-root ARC runner user cannot write to:

    tar: kubeconform: Cannot open: Permission denied
    tar: Exiting with failure status due to previous errors

Extract to /tmp first then sudo mv, matching the helm-docs install
pattern already used in helm-docs.yml. All five Template & Validate
matrix jobs (default, production, staging, mesh-main, mesh-peer) hit
this failure; this unblocks them together.

* ci(helm): install wget before kind-action on ARC runners

helm/kind-action's setup script downloads the kind binary via wget,
which isn't pre-installed on the ARC self-hosted runner image. The
script fails with 'wget: command not found', kind never installs,
the cluster never comes up, and the chart-testing install step
exits 127 with no useful diagnostics.

Add an idempotent install step before kind-action. Same pattern as
the earlier kubeconform/kubectl fix in commit bdf9d96, accommodating
the runner image's missing tools rather than rebuilding the runner.

Unblocks Install Test (kind) job for this PR and any future helm-ci
runs on ARC.

* ci(helm): run Install Test on GitHub-hosted runner for DinD support

The ARC self-hosted runners cannot run helm/kind-action because the
kind-action script boots a kubeadm control-plane inside a Docker
container, which requires privileged Docker-in-Docker on the runner
host. The ARC runner pods run inside a K8s cluster without privileged
DinD configured, so kubeadm init times out after 4 minutes without
producing a working cluster.

Workarounds that don't work:
- Install wget (the kind binary downloads fine)
- Shortening timeouts (the kubeadm init phase genuinely never completes)

The real fix is to configure privileged DinD on ARC runners, which is
a multi-day infra project. Until then, run this one job on
ubuntu-latest where kind works out of the box. Build cost is small
(single job, ~5 runs per week on this workflow).

Every other job in helm-ci continues to use ak-ci-runners.

* ci(helm): switch install-test from kind to k3d, keep on ARC runners

Reverts the previous move to ubuntu-latest. Instead of fighting the
kind-in-DinD nesting problem (kubeadm control-plane init timed out
inside the arc-runner pod's DinD sidecar), switch to k3d.

k3d runs k3s as a single Docker container using the existing DinD
sidecar. k3s uses embedded containerd and does not use kubeadm, so it
boots reliably in constrained environments where kind fails. This
keeps CI entirely on the self-hosted ARC runners rather than depending
on GitHub-hosted runners.

k3d v5.7.5 is pinned. traefik and metrics-server are disabled since
the chart-testing install check does not need them.

Removes the wget install workaround (k3d's installer is a direct
binary download via curl, no wget needed).

* ci(helm): install kubectl for k3d-based Install Test job

The k3d cluster comes up fine on the ARC runners (previous run showed
'Cluster chart-testing created successfully'), but the next step
hit 'kubectl: command not found'. helm/kind-action used to install
kubectl as a side effect; k3d does not.

Add azure/setup-kubectl@v4 pinned to v1.29.3 (matches what kind-action
was installing) before the k3d install step.

* ci(helm): disable Install Test for now, document why

The k3d cluster comes up correctly on our ARC runners, but installing
the full artifact-keeper chart (backend + web + postgres + trivy +
opensearch + dtrack) hits 'context deadline exceeded' because the
pods do not all fit + initialise within chart-testing's timeout when
the runner pod is limited to 4 CPU / 8 Gi.

Chart correctness is still validated by:
- helm lint (Lint job)
- helm template across 5 value overlays (Template & Validate x5)

The install smoke test was adding signal only on the 'does the chart
actually apply to a cluster and become Ready' dimension, which is
redundant for release validation when template rendering is clean.

Re-enable once either:
1. Runner pods are resized so the full chart fits, or
2. chart-testing is configured to install only a subset of components, or
3. The chart itself is split so individual sub-charts can be tested

Tracking issue will be filed separately.

* ci(helm): add ak-beefy-runners scale set, route install-test to it

Adds a third ARC AutoscalingRunnerSet with a larger resource envelope
(4 CPU / 8 Gi requests, 8 CPU / 16 Gi limits, maxRunners: 5). Routes
the Install Test (k3d) job to it so the k3d cluster plus the full
artifact-keeper chart (backend, web, postgres, trivy, opensearch,
dtrack) can stand up inside chart-testing's timeout window.

Previous runs on the regular ak-ci-runners pool (4 CPU / 8 Gi limit)
hit 'context deadline exceeded' because the aggregate pod footprint
plus k3s plus the runner overhead did not fit.

Capacity check: maxRunners=5 x 4 CPU / 8 Gi requests = 20 CPU / 40 Gi
peak. Rocky has 88 CPU / 164 Gi allocatable, leaves plenty of headroom
alongside ak-ci-runners (up to 40 CPU / 80 Gi) and ak-e2e-runners
(up to 40 CPU / 80 Gi post iac#68).

Requires a follow-up deploy on the cluster before the new pool is
actually available:

  helm upgrade --install ak-beefy-runners \
    --namespace arc-runners \
    -f argocd/arc-beefy-runners-values.yaml \
    oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set

Reverts the previous 'if: false' disable of install-test. The chart
correctness validation via helm lint and helm template continues as
before and is independent.

* ci(arc): fix beefy runner DinD wiring to match ak-ci-runners

Previous beefy-runners template was modeled on arc-e2e-runners-values.yaml
(which has never actually run a job in production) and declared dind as
a regular container without the args needed for the runner to reach the
socket. On a live k3d install-test run this produced:

  permission denied while trying to connect to the Docker daemon
  socket at unix:///var/run/docker.sock

Reshape the values file to mirror ak-ci-runners (known-working) with
doubled resources:

- Move dind to initContainers with restartPolicy: Always (K8s 1.29+
  native sidecar pattern, required by the 0.13 chart)
- Add explicit dockerd args:
    --host=unix:///var/run/docker.sock
    --storage-driver=vfs        (for nested-container safety)
    --group=123                 (socket ACL group for runner user)
- Rename the shared socket mount from dind-sock to dind-var, matching
  ak-ci-runners
- Pin docker:27-dind (not floating docker:dind tag)
- Add a startupProbe so the runner is told to wait 120s for docker
  readiness via RUNNER_WAIT_FOR_DOCKER_IN_SECONDS

Document the chart version pin (0.13.1) and the whole shape in the
header so future operators don't repeat the mistake.

* ci(helm): bump chart-testing install timeout to 20m

Default helm install timeout is 5 minutes. Inside k3s-in-DinD, image
pulls for the full artifact-keeper chart (backend, web, postgres,
opensearch, dtrack, trivy) are serial and can take 8-10 minutes on a
cold runner because containerd inside the k3s node has no image
cache between CI runs. Previous run timed out with pods still in
Pending and no scheduling events, which is the signature of images
still being pulled when helm gave up.

* ci: path-filter terraform jobs so they only run on terraform changes

Previously every PR against this repo triggered the full terraform
matrix (terraform-validate x3 environments + terraform-fmt), even for
PRs that only touched docs, chart YAML, argocd values, workflows, or
any other non-terraform file. On top of that, any pre-existing
terraform job issues on main (e.g. the node wrapper bug) would fail
CI for completely unrelated PRs.

Add a detect-changes job that checks whether the PR touches anything
under terraform/ (plus the CI workflow itself so workflow tweaks keep
running the jobs once). Gate terraform-validate and terraform-fmt on
its output.

SonarCloud keeps running on every PR because it analyzes the whole
repo, and it already skips dependabot PRs via #70.

Affected open PRs that will stop failing TF jobs after this lands:
- iac#68 bump e2e runners
- iac#70 sonar-skip-dependabot
- iac#71 chart docs
- iac#69 LICENSE (already merged, but same pattern)

* ci(helm): use k3s native snapshotter (nested overlayfs unsupported)

Root cause of the previous Install Test hang: k3s's embedded containerd
defaulted to the overlayfs snapshotter. The beefy runner's DinD sidecar
already runs on overlayfs, so k3s's attempt to mount another overlayfs
inside it fails with 'invalid argument', kubelet never starts, and the
k3s node never registers with the apiserver. Every pod (including k3s
system pods coredns and local-path-provisioner) stays Pending forever
because there's nothing to schedule them onto.

Switching to --snapshotter=native tells containerd to copy files
directly instead of stacking overlay mounts. Slower but works in
nested container environments.

Also bump k3d --timeout from 120s to 300s since native snapshotter is
slower to unpack initial images.

* ci(helm): override ADMIN_PASSWORD in test-values so backend readyz unlocks

With the beefy runner + native snapshotter fixes, k3s now registers,
pods schedule, postgres/web come up fine. The backend itself also
starts cleanly (DB migrations, HTTP listening on 8080) but stays 0/1
Ready because its /readyz endpoint returns 503 for as long as the
admin API is locked.

The lock triggers when the ADMIN_PASSWORD env var is the default
"admin" from values.yaml. This is a real security feature for
production (fresh deploys must change the password before API use),
but it blocks chart-testing's helm --wait because the startup probe
hits /readyz which returns 503 until the lock clears.

Override to a non-default value in the CI-only test-values.yaml so
the lock does not trigger during install smoke tests. Production
installs continue to use secrets.yaml patterns with proper password
management.
GitHub withholds repo secrets from dependabot workflow runs for
security reasons, so SONAR_TOKEN is always empty on those runs and
the scanner exits with 'Running this GitHub Action without SONAR_TOKEN
is not recommended' followed by a failure.

SonarCloud scans on 'bump X from Y to Z' PRs provide no real signal
anyway, since dependabot does not modify code. The scan still runs on
push events to main (where secrets are available) and on human PRs.

This unblocks iac#47, iac#49, iac#52, iac#58 which are all waiting on
SonarCloud to pass.

Longer-term alternative is adding SONAR_TOKEN to the repo's Dependabot
Secrets (Settings > Secrets and variables > Dependabot), which lets
dependabot PRs see the token. This workflow change avoids that
maintenance step.
@brandonrc brandonrc force-pushed the ci/skip-sonarcloud-for-dependabot branch from d2920fd to aff9571 Compare April 25, 2026 03:52
@sonarqubecloud

Copy link
Copy Markdown

@brandonrc brandonrc merged commit f5e05aa into main Apr 25, 2026
9 checks passed
@brandonrc brandonrc deleted the ci/skip-sonarcloud-for-dependabot branch April 25, 2026 04:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant