Skip to content

[8.8.0] Fix remote repo content caching for non-gRPC backends#30029

Draft
fmeum wants to merge 5 commits into
bazelbuild:release-8.8.0from
fmeum:rrcc-8.7.0-28
Draft

[8.8.0] Fix remote repo content caching for non-gRPC backends#30029
fmeum wants to merge 5 commits into
bazelbuild:release-8.8.0from
fmeum:rrcc-8.7.0-28

Conversation

@fmeum

@fmeum fmeum commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Note: Supersedes the standalone bot cherry-pick #29781, integrating the same fix into the stacked RRCC backport lineage. #29781 can be closed in favor of this PR.

This change extracts setup logic for remote repo content caching from the gRPC path so it can be shared with the HTTP/disk path.

Remote repo content caching is currently only initialized for gRPC cache backends. The --experimental_remote_repo_contents_cache flag is a no-op when using a cache backend like GCS.

8.x adaptation: initHttpAndDiskCache on 8.x does not take a diskCachePath parameter, so the new initRepoHelpersAndOverlayFs call is added after the existing 8.x invocation. The extracted helper uses the 8.x RemoteOptions field accessors (remoteInstanceName, remoteAcceptCached, remoteUploadLocalResults, remoteCacheTtl) and the 8.x RepositoryRemoteHelpersFactoryImpl constructor, which has no workspace-name parameter. The regression test assigns the public diskCache field instead of calling a setter.

Closes #29744.

PiperOrigin-RevId: 928545624
Change-Id: Ib811c85d0c2462f91a8a8114640d58edf5b4c0a3
(cherry picked from commit d1e894e)

fmeum added 3 commits June 26, 2026 13:10
With this change, all reproducible repository rules can now be cached in a disk or remote cache, including those with dependencies recorded dynamically during evaluation.

This is made possible by introducing a new intermediate type of synthetic AC entries. When looking up the predeclared inputs hash for a repo rule with dynamic dependencies, the action result for such an intermediate entry lists one or more sets of inputs (e.g. a particular file in another repo or an environment variable name). These inputs are then requested from Skyframe and their current values are hashed to obtain the key of the next AC entry, which is again either an intermediate entry or a final entry containing the contents of the repository.

8.x: adapted to the split RepositoryDelegatorFunction/StarlarkRepositoryFunction restart model (there is no merged RepositoryFetchFunction worker on 8.x). The cache lookup is wired through RepositoryDelegatorFunction, which returns null to trigger a Skyframe restart when lookupCache reports missing values via env.valuesMissing(). Since DigestWriter is a nested class of RepositoryDelegatorFunction on 8.x rather than a standalone target, the marker file is parsed inline in RemoteRepoContentsCacheImpl via RepoRecordedInput.WithValue.parse to avoid a dependency cycle between the remote and repository rule libraries.

RELNOTES: The remote repo contents cache now supports all reproducible repo rules.

Closes bazelbuild#27634.

PiperOrigin-RevId: 889750228
Change-Id: I9c7e4fed9d86432a85a96b3318f6eccc9c0558eb
(cherry picked from commit ffebc5b)
On 8.x the remote repo contents cache lookup runs in RepositoryDelegatorFunction's restart-based compute (there is no merged repository fetch worker). Looking up a repository with dynamically recorded inputs walks a DAG of action cache entries, requesting the current values of recorded inputs from Skyframe along the way. Each such request can trigger a Skyframe restart, after which the lookup re-walks the DAG from the root and re-downloads every action cache entry it already fetched, turning an O(depth) lookup into O(depth^2) remote round-trips.

Memoize the action cache entries fetched during a lookup, keyed by the input hash under which they were looked up, in a SkyKeyComputeState that survives restarts. Action cache entries don't change within a command, so a memoized entry can be reused on the next restart instead of being re-downloaded.

The repository's compute-state slot is already occupied by StarlarkRepositoryFunction's worker state (created by wasJustFetched before the lookup runs), and Skyframe stores a single state instance per key. The lookup therefore cannot allocate its own SkyKeyComputeState via env.getState. Instead, the memo is stored on the existing state and handed to lookupCache through a new RepositoryFunction.getRemoteRepoContentsCacheLookupState hook; the default returns a fresh, non-memoizing instance for handlers without a compute state.

Follow-up to the dynamic inputs support (cherry-pick of bazelbuild#27634).
This info is accessible via `rctx.os.{name,arch}` and can also influence the
result of a repo rule in subtle ways (e.g. behavior of host tools, line
breaks, etc), so it is now mixed into the predeclared inputs hash used as the
key for the local and remote repo contents cache.

8.x adaptation: `DigestWriter` is a nested class of `RepositoryDelegatorFunction`
rather than a standalone target, and the predeclared inputs hash is computed in
`computePredeclaredInputHash` from the serialized `Rule` instead of a
`RepoDefinition`. The host OS name and CPU architecture are therefore mixed into
that fingerprint right before the environment variable inputs.

Closes bazelbuild#29148.

PiperOrigin-RevId: 893352625
Change-Id: I02c0ded7ffe2b5aa9bc2ef489dc5240b5716ebdf
(cherry picked from commit 2ec24b4)
fmeum and others added 2 commits June 26, 2026 15:57
When using the remote repo contents cache, repo materialization now uses the
same code path as local action input prefetching for symlinks. This ensures
that each symlink is only materialized once and that there are no races in case
symlinking is not atomic (on Windows with default settings).

Along the way, remove unreachable code from `UploadManifest`.

8.x adaptation: `AbstractActionInputPrefetcher` on 8.x resolves input paths via
the Bazel 8.x-specific `resolveExecPath(execPath)` helper (which avoids
evaluating the WORKSPACE-named exec root when possible) and exposes the exec
root through the `execRoot()` method rather than an `execRoot` field, so the new
symlink branch is expressed in those terms. Because `execRoot()` is not
available during external repository materialization, the branch detects
external repo paths via a new `isUnderExternalRepoRoot` helper (mirroring
`DirectoryTracker.setWritable`) instead of `!inputPath.startsWith(execRoot())`,
and reads the symlink target from `UnresolvedSymlinkArtifactValue#getSymlinkTarget`
(8.x has no `FileArtifactValue#getUnresolvedSymlinkTarget`). Only the
`FileStateType` import is added; the other imports touched upstream don't apply
because 8.x resolves `VirtualActionInput` to `actions.cache.VirtualActionInput`.

The three symlink-materialization tests added upstream are deferred to the
chains-of-symlinks backport (bazelbuild#29767): on 8.x, repo files are materialized lazily
through `RemoteExternalOverlayFileSystem`, and faithfully reproducing (chains
of) symlinks on disk only works with that later change, not with this commit's
single-link prefetching alone.

Fixes bazelbuild#28575.

Closes bazelbuild#28683.

PiperOrigin-RevId: 893359962
Change-Id: If0eb64e41858a4b9298c2609d83160b0ecbfd451
(cherry picked from commit 90eb56e)
This change extracts setup logic for remote repo content caching from the gRPC
path so it can be shared with the HTTP/disk path.

Remote repo content caching is currently only initialized for gRPC cache
backends. The `--experimental_remote_repo_contents_cache` flag is a no-op when
using a cache backend like GCS.

8.x adaptation: `initHttpAndDiskCache` on 8.x does not take a `diskCachePath`
parameter, so the new `initRepoHelpersAndOverlayFs` call is added after the
existing 8.x invocation. The extracted helper uses the 8.x `RemoteOptions`
field accessors (`remoteInstanceName`, `remoteAcceptCached`,
`remoteUploadLocalResults`, `remoteCacheTtl`) and the 8.x
`RepositoryRemoteHelpersFactoryImpl` constructor, which has no workspace-name
parameter. The regression test assigns the public `diskCache` field instead of
calling a setter.

Closes bazelbuild#29744.

PiperOrigin-RevId: 928545624
Change-Id: Ib811c85d0c2462f91a8a8114640d58edf5b4c0a3
(cherry picked from commit d1e894e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants