Skip to content

[8.8.0] Support dynamic inputs with the remote repo contents cache#29974

Open
fmeum wants to merge 2 commits into
bazelbuild:release-8.8.0from
fmeum:rrcc-8.7.0-25
Open

[8.8.0] Support dynamic inputs with the remote repo contents cache#29974
fmeum wants to merge 2 commits into
bazelbuild:release-8.8.0from
fmeum:rrcc-8.7.0-25

Conversation

@fmeum

@fmeum fmeum commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

With this change, all reproducible repository rules can now be cached in a disk or remote cache, including those with dependencies recorded dynamically during evaluation.

This is made possible by introducing a new intermediate type of synthetic AC entries. When looking up the predeclared inputs hash for a repo rule with dynamic dependencies, the action result for such an intermediate entry lists one or more sets of inputs (e.g. a particular file in another repo or an environment variable name). These inputs are then requested from Skyframe and their current values are hashed to obtain the key of the next AC entry, which is again either an intermediate entry or a final entry containing the contents of the repository.

8.x adaptation: the upstream commit changes the merged RepositoryFetchFunction worker; 8.x has no such worker, so the cache lookup is wired through RepositoryDelegatorFunction on the split, restart-based architecture instead. lookupCache now takes the SkyFunction.Environment and requests recorded-input values from Skyframe; the caller returns null to restart when env.valuesMissing(). Because DigestWriter is a nested class of RepositoryDelegatorFunction on 8.x rather than the standalone target it is in v9, the marker file is parsed inline in RemoteRepoContentsCacheImpl via RepoRecordedInput.WithValue.parse to avoid a dependency cycle between the remote and repository-rule libraries. Everything else (intermediate-entry DAG traversal, rolling-hash construction, disk-cache write-through for downloads) is ported as-is.

Restart memoization: A second commit adds a Skyframe memoization layer for action results obtained from the remote cache. This ensures that AC requests are not rerun across Skyframe restarts, which are avoided in the 9.x and master branch by making the entire RepositoryFetchFunction worker-based.

Closes #27634.

PiperOrigin-RevId: 889750228
Change-Id: I9c7e4fed9d86432a85a96b3318f6eccc9c0558eb
(cherry picked from commit ffebc5b)

@fmeum fmeum requested a review from a team as a code owner June 24, 2026 12:17
@github-actions github-actions Bot added team-Performance Issues for Performance teams team-Configurability platforms, toolchains, cquery, select(), config transitions team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. team-Rules-Server Issues for serverside rules included with Bazel team-Core Skyframe, bazel query, BEP, options parsing, bazelrc team-Remote-Exec Issues and PRs for the Execution (Remote) team area-Bzlmod Bzlmod-specific PRs, issues, and feature requests awaiting-review PR is awaiting review from an assigned reviewer labels Jun 24, 2026
@fmeum fmeum marked this pull request as draft June 24, 2026 13:30
@fmeum fmeum force-pushed the rrcc-8.7.0-25 branch 3 times, most recently from a8b6025 to 9f32f9b Compare June 25, 2026 08:31
@fmeum fmeum marked this pull request as ready for review June 25, 2026 08:33
@fmeum fmeum requested a review from Wyverald June 25, 2026 08:34
@iancha1992 iancha1992 enabled auto-merge June 25, 2026 20:08
fmeum added 2 commits June 26, 2026 13:10
With this change, all reproducible repository rules can now be cached in a disk or remote cache, including those with dependencies recorded dynamically during evaluation.

This is made possible by introducing a new intermediate type of synthetic AC entries. When looking up the predeclared inputs hash for a repo rule with dynamic dependencies, the action result for such an intermediate entry lists one or more sets of inputs (e.g. a particular file in another repo or an environment variable name). These inputs are then requested from Skyframe and their current values are hashed to obtain the key of the next AC entry, which is again either an intermediate entry or a final entry containing the contents of the repository.

8.x: adapted to the split RepositoryDelegatorFunction/StarlarkRepositoryFunction restart model (there is no merged RepositoryFetchFunction worker on 8.x). The cache lookup is wired through RepositoryDelegatorFunction, which returns null to trigger a Skyframe restart when lookupCache reports missing values via env.valuesMissing(). Since DigestWriter is a nested class of RepositoryDelegatorFunction on 8.x rather than a standalone target, the marker file is parsed inline in RemoteRepoContentsCacheImpl via RepoRecordedInput.WithValue.parse to avoid a dependency cycle between the remote and repository rule libraries.

RELNOTES: The remote repo contents cache now supports all reproducible repo rules.

Closes bazelbuild#27634.

PiperOrigin-RevId: 889750228
Change-Id: I9c7e4fed9d86432a85a96b3318f6eccc9c0558eb
(cherry picked from commit ffebc5b)
On 8.x the remote repo contents cache lookup runs in RepositoryDelegatorFunction's restart-based compute (there is no merged repository fetch worker). Looking up a repository with dynamically recorded inputs walks a DAG of action cache entries, requesting the current values of recorded inputs from Skyframe along the way. Each such request can trigger a Skyframe restart, after which the lookup re-walks the DAG from the root and re-downloads every action cache entry it already fetched, turning an O(depth) lookup into O(depth^2) remote round-trips.

Memoize the action cache entries fetched during a lookup, keyed by the input hash under which they were looked up, in a SkyKeyComputeState that survives restarts. Action cache entries don't change within a command, so a memoized entry can be reused on the next restart instead of being re-downloaded.

The repository's compute-state slot is already occupied by StarlarkRepositoryFunction's worker state (created by wasJustFetched before the lookup runs), and Skyframe stores a single state instance per key. The lookup therefore cannot allocate its own SkyKeyComputeState via env.getState. Instead, the memo is stored on the existing state and handed to lookupCache through a new RepositoryFunction.getRemoteRepoContentsCacheLookupState hook; the default returns a fresh, non-memoizing instance for handlers without a compute state.

Follow-up to the dynamic inputs support (cherry-pick of bazelbuild#27634).
auto-merge was automatically disabled June 26, 2026 11:10

Head branch was pushed to by a user without write access

@fmeum

fmeum commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator Author

@Wyverald I resolved the conflicts. The previous four PRs have been merged as a single squashed commit - no problem, that just made it so that this PR conflicted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-Bzlmod Bzlmod-specific PRs, issues, and feature requests awaiting-review PR is awaiting review from an assigned reviewer team-Configurability platforms, toolchains, cquery, select(), config transitions team-Core Skyframe, bazel query, BEP, options parsing, bazelrc team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. team-Performance Issues for Performance teams team-Remote-Exec Issues and PRs for the Execution (Remote) team team-Rules-Server Issues for serverside rules included with Bazel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants