Skip to content

fix(arc): raise sccache cache size to 50G on ak-ci-runners#163

Open
brandonrc wants to merge 2 commits into
mainfrom
fix/sccache-cache-size
Open

fix(arc): raise sccache cache size to 50G on ak-ci-runners#163
brandonrc wants to merge 2 commits into
mainfrom
fix/sccache-cache-size

Conversation

@brandonrc

Copy link
Copy Markdown
Contributor

Problem

SCCACHE_CACHE_SIZE was never set on the ak-ci-runners scale set, so sccache used its 10 GiB default. The shared /home/runner-cache/sccache hostPath was verified at exactly 10G on 2026-06-12 — at cap and continuously evicting.

Baseline (measured 2026-06-12, last 33h of CI / 595 runs over 7 days)

  • ~30 cold rust builds/week, ~12 min penalty each (~6 h/week)
  • GHA cache quota also at 10.49 GB / 10 GB with ~daily full turnover — both cache layers evicting simultaneously
  • Total cache-churn cost ≈ 22 runner-hours/week (~8.5% of node capacity)

Change

  • SCCACHE_CACHE_SIZE: 50G (/home has 1.2T free)
  • SCCACHE_ERROR_LOG: /cache/sccache-error.log for observability

Apply

Per the file header: helm upgrade --install ak-ci-runners --namespace arc-runners --version 0.13.1 -f argocd/arc-ci-runners-values.yaml oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set

Note: this upgrade also deploys the iac#83 init-dockerhub-creds initContainer already in git but not yet on the cluster (rev 24 predates it). dockerhub-pull secret and dind-registry-mirror configmap verified present in arc-runners.

Companion PR (buildkit type=gha → GHCR registry cache) follows in artifact-keeper/artifact-keeper.

🤖 Generated with Claude Code

The shared sccache hostPath was capped at sccache's 10 GiB default
(SCCACHE_CACHE_SIZE unset) and verified at exactly 10G on 2026-06-12 —
continuously evicting and degrading hit rates. Baseline measurement:
~30 cold rust builds/week at ~12m penalty each, ~22 runner-hours/week
lost to cache churn across GHA quota + local sccache eviction.

Also adds SCCACHE_ERROR_LOG so compile-cache failures are observable.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@brandonrc brandonrc requested a review from a team as a code owner June 12, 2026 03:32
@github-actions

Copy link
Copy Markdown

Missing linked issue

This PR does not reference a tracking issue in its body. Every PR must link to an issue in this repository so we can trace work back to a planned change.

How to fix

  1. Edit the PR description and add a line like Closes #123, Fixes #123, or Resolves #123 referring to an open issue in artifact-keeper/artifact-keeper-iac.
  2. Save the description. This check will re-run automatically.

Accepted keywords (case-insensitive, any tense): close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved.

Policy reference: see the PR template.

Maintainer bypass: apply the no-issue-required label to this PR to skip the check (use sparingly, e.g. for trivial typo fixes or release-tag chores).

/cache (hostPath root) is root:755; only the subdirs get chmod 777 from
init-cache. sccache could not create /cache/sccache-error.log, the
server died at startup, and every compile job on ak-ci-runners failed
with 'Timed out waiting for server startup' (~04:00Z 2026-06-12).
Coverage stayed green only because it sets RUSTC_WRAPPER=''.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant