ci: move sccache off the GHA backend onto a persisted disk cache (fix cargo-test timeouts)#5324
Conversation
…-test timeouts) The cargo-test gate was timing out at 120 min on perry-runtime/perry-codegen PRs; even passing runs took 90-103 min. sccache's GHA backend stored one object per compilation unit and GitHub throttled/evicted the thousands of tiny entries (~0% Rust hit rate: 3 hits / 3209 misses, 613 write errors), so every run recompiled cold. Switch all three sccache jobs to a local disk cache persisted via actions/cache (distinct per-job key, shared restore prefix so they share a warm cache), and bump cargo-test timeout 120->180 while the cache warms.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThree CI jobs ( Changessccache CI local-disk cache migration
v0.5.1179 version bump and changelog
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Problem
The
cargo-testrequired gate has been timing out at its 120-min cap on PRs that touchperry-runtime/perry-codegen(e.g. #5294, killed at package 48 of 76), and even successful runs were taking 90-103 min — not the "~45-50 min" the stale comment claimed.Root cause, from the sccache stats on a timed-out run:
SCCACHE_CACHE_SIZE2Gsccache was using the GitHub Actions cache backend (
SCCACHE_GHA_ENABLED=true), which stores one cache object per compilation unit. GitHub's cache service throttles and LRU-evicts the thousands of tiny entries, so a full build wrote ~3.3k objects (≈35 min) yet the next run got essentially zero Rust hits — every run recompiled the whole dependency graph cold.SCCACHE_CACHE_SIZEis a silent no-op under the GHA backend, so the old2Gnever mattered.Fix
Switch all three sccache jobs (
cargo-test,api-docs-drift,compiler-output-regression— they compile overlapping crate graphs) to a local disk cache persisted as a single tarball viaactions/cache@v4:SCCACHE_GHA_ENABLED=false,SCCACHE_DIR=$GITHUB_WORKSPACE/.sccache,SCCACHE_CACHE_SIZE=12G(now actually honoured by the disk backend).sccache-<os>-perry-<job>-<run_id>with a sharedsccache-<os>-perry-restore-keys prefix — every run (PRs included) saves its own entry while restoring the most recent one from any of the three jobs, so the object cache warms continuously and cross-pollinates instead of starting cold.cargo-testtimeout-minutes120 → 180 as headroom while the cache warms (cold runs are ~90-103 min); can be lowered once warm hit rates are confirmed in CI.A single tarball restores in one step and gives real cross-run hit rates, versus thousands of tiny GHA-API objects that never survived.
Validation
No production code changes. This PR's own
cargo-testrun starts with a cold disk cache (~90-103 min, under the new 180 cap) and will populate the cache; subsequent runs should show high sccache Rust hit rates and drop toward ~50-60 min. Worth watching the sccache "Post" stats over the next few main-branch runs to confirm before lowering the timeout.Summary by CodeRabbit
Release Notes