fix(arc): move dind docker root to /home with overlay2 (disk-pressure hazard)#167
fix(arc): move dind docker root to /home with overlay2 (disk-pressure hazard)#167brandonrc wants to merge 3 commits into
Conversation
The shared sccache hostPath was capped at sccache's 10 GiB default (SCCACHE_CACHE_SIZE unset) and verified at exactly 10G on 2026-06-12 — continuously evicting and degrading hit rates. Baseline measurement: ~30 cold rust builds/week at ~12m penalty each, ~22 runner-hours/week lost to cache churn across GHA quota + local sccache eviction. Also adds SCCACHE_ERROR_LOG so compile-cache failures are observable. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
/cache (hostPath root) is root:755; only the subdirs get chmod 777 from init-cache. sccache could not create /cache/sccache-error.log, the server died at startup, and every compile job on ak-ci-runners failed with 'Timed out waiting for server startup' (~04:00Z 2026-06-12). Coverage stayed green only because it sets RUSTC_WRAPPER=''.
…disk The dind sidecars ran dockerd --storage-driver=vfs with /var/lib/docker unmounted, so every image layer was a full directory copy landing in the container writable layer under the crio graph root on the 70G root disk (~25G free). With up to 20 concurrent ak-ci-runners doing docker builds this risks root-disk exhaustion and kubelet disk-pressure eviction of prod pods on the rocky node. Fix, applied to all three scale sets (ci, e2e, beefy): - Mount hostPath /home/runner-dind at /var/lib/docker with subPathExpr $(POD_NAME) for per-pod isolation (concurrent dockerds cannot share a docker root). /home is 1.8T xfs with ftype=1. - Switch dockerd to --storage-driver=overlay2. vfs was only needed because /var/lib/docker previously sat on the crio overlay layer and the RHEL 8 4.18 kernel forbids overlay-on-overlay; on a plain xfs directory overlay2 works (verified with a manual overlay mount on /home on the rocky node). - Add e2e/runner-dind-cleanup.yaml: a CronJob (every 15 min) that reaps /home/runner-dind subdirs with no matching live pod in arc-runners and older than 10 min, since ephemeral runner pods churn constantly and preStop hooks are not guaranteed to run. Fail-safe: an API error aborts the run before any deletion. dockerd runs as root in a privileged container, so the root:755 DirectoryOrCreate hostPath is writable (cf. the SCCACHE_ERROR_LOG permissions outage - sccache runs unprivileged, dockerd does not). Based on fix/sccache-cache-size (PR #163), which is live as helm revision 26; basing on main would revert the sccache fix. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Missing linked issueThis PR does not reference a tracking issue in its body. Every PR must link to an issue in this repository so we can trace work back to a planned change. How to fix
Accepted keywords (case-insensitive, any tense): Policy reference: see the PR template. Maintainer bypass: apply the |
Deployed and verified on the rocky node (2026-06-12 ~00:25 EDT)Helm revisions (all chart 0.13.1): ak-ci-runners 27, ak-e2e-runners 11, ak-beefy-runners 6. Rollback point for ci: revision 26. Evidence
Unrelated pre-existing issue observed during verification
🤖 Generated with Claude Code |
Problem
The dind sidecars in all three runner scale sets run
dockerd --storage-driver=vfswith/var/lib/dockerunmounted. Every image layer is a full directory copy (vfs) landing in the container's writable layer under the crio graph root on the 70G root disk (~25G free). WithmaxRunners: 20on ak-ci-runners doing concurrent docker builds, this can exhaust the root disk and trigger kubelet disk-pressure eviction of prod pods on the rocky node.Fix
For ak-ci-runners, ak-e2e-runners, and ak-beefy-runners:
/home/runner-dindat/var/lib/dockerwithsubPathExpr: $(POD_NAME)— per-pod isolation, since concurrent dockerds cannot share a docker root./homeis 1.8T xfs,ftype=1(d_type OK for overlay2).--storage-driver=overlay2. vfs was only required because/var/lib/dockerpreviously sat on the crio overlay layer and the RHEL 8.10 4.18 kernel forbids overlay-on-overlay. On a plain xfs directory overlay2 works — verified with a manual overlay mount + copy-up test on/homeon the node.e2e/runner-dind-cleanup.yaml: CronJob (every 15 min,concurrencyPolicy: Forbid) reaping/home/runner-dindsubdirs that have no matching live pod in arc-runners AND are older than 10 min (grace window). Ephemeral runner pods churn constantly and preStop hooks are not guaranteed to run. Fail-safe:set -euo pipefailaborts before any deletion if the API call fails. Uses ak-runner-rust (already on the node, ships curl+jq) with a dedicated SA limited topods get/list.Permission note (lesson from the SCCACHE_ERROR_LOG outage): the
DirectoryOrCreatehostPath is root:755, but dockerd runs as root in a privileged container, so it's writable.Why not emptyDir + overlay2
emptyDir lives under
/var/lib/kubelet/pods— still the root disk — and asizeLimitbreach evicts the pod mid-job. The hostPath approach moves all heavy build I/O onto the 1.4T-free /home disk.Depends on / includes PR #163
fix/sccache-cache-size(#163), which is live in prod as helm revision 26 of ak-ci-runners. It includes those commits (SCCACHE_CACHE_SIZE=50G, SCCACHE_ERROR_LOG into the 777 subdir). Merge #163 first or merge this PR which carries both. A values file based on main would revert the sccache fix and re-break every compile job.Validation
helm upgrade --dry-run=serverwith chart 0.13.1 renders cleanly (overlay2 args, POD_NAME downward API env, subPathExpr mount, hostPath volume, sccache env intact).docker info, in-poddocker build, layer location on /home,sccache --show-stats) performed after deploy — evidence in PR comments / ops log.🤖 Generated with Claude Code