Skip to content

ci: move sccache off the GHA backend onto a persisted disk cache (fix cargo-test timeouts)#5324

Merged
proggeramlug merged 1 commit into
mainfrom
ci/fix-sccache-disk-cache
Jun 17, 2026
Merged

ci: move sccache off the GHA backend onto a persisted disk cache (fix cargo-test timeouts)#5324
proggeramlug merged 1 commit into
mainfrom
ci/fix-sccache-disk-cache

Conversation

@proggeramlug

@proggeramlug proggeramlug commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Problem

The cargo-test required gate has been timing out at its 120-min cap on PRs that touch perry-runtime/perry-codegen (e.g. #5294, killed at package 48 of 76), and even successful runs were taking 90-103 min — not the "~45-50 min" the stale comment claimed.

Root cause, from the sccache stats on a timed-out run:

metric value
Rust cache hits 3
Rust cache misses 3209 (0.09% hit rate)
cache write errors 613
cache write time ~35 min
SCCACHE_CACHE_SIZE 2G

sccache was using the GitHub Actions cache backend (SCCACHE_GHA_ENABLED=true), which stores one cache object per compilation unit. GitHub's cache service throttles and LRU-evicts the thousands of tiny entries, so a full build wrote ~3.3k objects (≈35 min) yet the next run got essentially zero Rust hits — every run recompiled the whole dependency graph cold. SCCACHE_CACHE_SIZE is a silent no-op under the GHA backend, so the old 2G never mattered.

Fix

Switch all three sccache jobs (cargo-test, api-docs-drift, compiler-output-regression — they compile overlapping crate graphs) to a local disk cache persisted as a single tarball via actions/cache@v4:

  • SCCACHE_GHA_ENABLED=false, SCCACHE_DIR=$GITHUB_WORKSPACE/.sccache, SCCACHE_CACHE_SIZE=12G (now actually honoured by the disk backend).
  • Cache key sccache-<os>-perry-<job>-<run_id> with a shared sccache-<os>-perry- restore-keys prefix — every run (PRs included) saves its own entry while restoring the most recent one from any of the three jobs, so the object cache warms continuously and cross-pollinates instead of starting cold.
  • Bump cargo-test timeout-minutes 120 → 180 as headroom while the cache warms (cold runs are ~90-103 min); can be lowered once warm hit rates are confirmed in CI.

A single tarball restores in one step and gives real cross-run hit rates, versus thousands of tiny GHA-API objects that never survived.

Validation

No production code changes. This PR's own cargo-test run starts with a cold disk cache (~90-103 min, under the new 180 cap) and will populate the cache; subsequent runs should show high sccache Rust hit rates and drop toward ~50-60 min. Worth watching the sccache "Post" stats over the next few main-branch runs to confirm before lowering the timeout.

Note: an S3/R2 sccache backend would be even more robust but needs repo secrets — this disk-cache approach is secret-free and is the standard sccache-on-GHA-without-S3 pattern.

Summary by CodeRabbit

Release Notes

  • Chores
    • Version bumped to 0.5.1179
    • Enhanced continuous integration caching infrastructure for improved build performance and reduced redundant compilation across test jobs
    • Extended build timeout allowance to optimize cache initialization and ensure reliable test execution

…-test timeouts)

The cargo-test gate was timing out at 120 min on perry-runtime/perry-codegen
PRs; even passing runs took 90-103 min. sccache's GHA backend stored one object
per compilation unit and GitHub throttled/evicted the thousands of tiny entries
(~0% Rust hit rate: 3 hits / 3209 misses, 613 write errors), so every run
recompiled cold. Switch all three sccache jobs to a local disk cache persisted
via actions/cache (distinct per-job key, shared restore prefix so they share a
warm cache), and bump cargo-test timeout 120->180 while the cache warms.
@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 5366431a-c4c9-455b-8aec-b5f54782a6e6

📥 Commits

Reviewing files that changed from the base of the PR and between 8704cca and 2c1e073.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • .github/workflows/test.yml
  • CHANGELOG.md
  • CLAUDE.md
  • Cargo.toml

📝 Walkthrough

Walkthrough

Three CI jobs (api-docs-drift, cargo-test, compiler-output-regression) switch sccache from the GitHub Actions cache backend to a persisted local-disk tarball cache via actions/cache@v4, with SCCACHE_GHA_ENABLED=false and SCCACHE_CACHE_SIZE=12G. The cargo-test timeout is raised from 120 to 180 minutes. The workspace version is bumped to 0.5.1179 with matching changelog and doc updates.

Changes

sccache CI local-disk cache migration

Layer / File(s) Summary
sccache disk cache config across all three CI jobs
.github/workflows/test.yml
Replaces GHA-backend sccache with local disk cache in api-docs-drift, cargo-test, and compiler-output-regression: sets SCCACHE_GHA_ENABLED: "false", SCCACHE_DIR, SCCACHE_CACHE_SIZE: 12G, and adds an explicit actions/cache@v4 step with a job/run-scoped key and shared restore-keys prefix. cargo-test timeout raised from 120 to 180 minutes.

v0.5.1179 version bump and changelog

Layer / File(s) Summary
Version bump and changelog entry
Cargo.toml, CLAUDE.md, CHANGELOG.md
workspace.package.version and CLAUDE.md current version incremented to 0.5.1179. CHANGELOG.md gains a new entry documenting the sccache GHA backend throttling root cause, the local-disk cache fix with its keying strategy, the timeout increase, and the absence of production code changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • PerryTS/perry#5221: Modifies the same three CI jobs in .github/workflows/test.yml for sccache configuration and cargo-test timeout-minutes, making it the direct predecessor to this sccache backend change.

Poem

🐇 Hop hop, the cache was a mess,
GHA throttled and left quite a stress.
Now to disk we write,
A tarball so tight,
And CI runs faster — I confess! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description violates repository guidelines by bumping workspace version in Cargo.toml and editing CLAUDE.md and CHANGELOG.md, which maintainers handle at merge time. Revert changes to Cargo.toml (workspace version), CLAUDE.md (current version), and CHANGELOG.md per CONTRIBUTING.md guidelines; maintainers handle these at merge time.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: moving sccache from GitHub Actions cache backend to a local disk cache to fix cargo-test timeouts.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ci/fix-sccache-disk-cache

Comment @coderabbitai help to get the list of available commands and usage tips.

@proggeramlug proggeramlug merged commit 6d5718e into main Jun 17, 2026
15 checks passed
@proggeramlug proggeramlug deleted the ci/fix-sccache-disk-cache branch June 17, 2026 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant