Skip to content

[8.8.0] Support dynamic inputs with the remote repo contents cache#29972

Closed
fmeum wants to merge 8 commits into
bazelbuild:release-8.8.0from
fmeum:rrcc-8.7.0-24
Closed

[8.8.0] Support dynamic inputs with the remote repo contents cache#29972
fmeum wants to merge 8 commits into
bazelbuild:release-8.8.0from
fmeum:rrcc-8.7.0-24

Conversation

@fmeum

@fmeum fmeum commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

With this change, all reproducible repository rules can now be cached in a disk or remote cache, including those with dependencies recorded dynamically during evaluation.

This is made possible by introducing a new intermediate type of synthetic AC entries. When looking up the predeclared inputs hash for a repo rule with dynamic dependencies, the action result for such an intermediate entry lists one or more sets of inputs (e.g. a particular file in another repo or an environment variable name). These inputs are then requested from Skyframe and their current values are hashed to obtain the key of the next AC entry, which is again either an intermediate entry or a final entry containing the contents of the repository.

8.x adaptation: the upstream commit changes the merged RepositoryFetchFunction worker; 8.x has no such worker, so the cache lookup is wired through RepositoryDelegatorFunction on the split, restart-based architecture instead. lookupCache now takes the SkyFunction.Environment and requests recorded-input values from Skyframe; the caller returns null to restart when env.valuesMissing(). Because DigestWriter is a nested class of RepositoryDelegatorFunction on 8.x rather than the standalone target it is in v9, the marker file is parsed inline in RemoteRepoContentsCacheImpl via RepoRecordedInput.WithValue.parse to avoid a dependency cycle between the remote and repository-rule libraries. Everything else (intermediate-entry DAG traversal, rolling-hash construction, disk-cache write-through for downloads) is ported as-is.

Depends on the local/remote repo contents cache cycle fix (#29087) and its RepoRecordedInput restructuring, whose commits are included in this branch.

Closes #27634.

PiperOrigin-RevId: 889750228
Change-Id: I9c7e4fed9d86432a85a96b3318f6eccc9c0558eb
(cherry picked from commit ffebc5b)

Wyverald and others added 6 commits June 24, 2026 09:36
* Also upload to the remote cache when the local cache is in use. The fix is simple but subtle: the logic for the two caches in `RepositoryFetchFunction` has to be flipped since the Skyframe restart after adding an entry to the local cache meant that the same code path would not be taken again.
* Fix a crash when using both by ensuring that the local repo contents cache uses the file system backing the output base, not the workspace directory:
```
FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.RuntimeException: Unrecoverable error while evaluating node 'REPOSITORY_DIRECTORY:@@rules_python+' (requested by nodes 'REPO_FILE:@@rules_python+')
	at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:552)
	at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:435)
	at java.base/java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(Unknown Source)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
Caused by: java.lang.IllegalArgumentException: Files are on different filesystems: C:/users/runneradmin/_bazel_runneradmin/ebfu7cpi/external/@rules_python+.marker (on com.google.devtools.build.lib.remote.RemoteExternalOverlayFileSystem@79583b9), C:/Users/runneradmin/.cache/bazel-repo/contents/_trash/26a5feef-bf8c-4326-bf3d-500997c7362e (on com.google.devtools.build.lib.windows.WindowsFileSystem@24180f0f)
	at com.google.devtools.build.lib.vfs.Path.checkSameFileSystem(Path.java:964)
	at com.google.devtools.build.lib.vfs.Path.renameTo(Path.java:630)
	at com.google.devtools.build.lib.vfs.FileSystemUtils.moveFile(FileSystemUtils.java:456)
	at com.google.devtools.build.lib.bazel.repository.cache.LocalRepoContentsCache.moveToCache(LocalRepoContentsCache.java:172)
	at com.google.devtools.build.lib.bazel.repository.RepositoryFetchFunction.compute(RepositoryFetchFunction.java:297)
	at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:471)
```

Closes bazelbuild#28002.

PiperOrigin-RevId: 855211557
Change-Id: I2f3c40a6aef594682fba989853f7ee982f30c294
(cherry picked from commit b143070)
Since this behavior is quite surprising (it definitely was to the author), this change also improves the test coverage for repo contents cache deletion by asserting that non-BUILD files within it actually exist on disk rather than just exist from the point of Skyframe.

Also fix a crash observed while working on the test improvements.

Closes bazelbuild#28222.

PiperOrigin-RevId: 855225639
Change-Id: Ie4a88e93d14a4f4b7bb5217fc924e998a1779ccd
(cherry picked from commit 4839f46)
…nv handling

Ports the essential API changes from 01407ce needed by later feature
commits:
- Add EnvironmentVariableValue record type
- Add RepoEnvironmentFunction with REPO_ENV + client env fallback
- Register REPOSITORY_ENVIRONMENT_VARIABLE in SkyFunctions and SkyframeExecutor
- Update EnvVar.getSkyKey() to use RepoEnvironmentFunction
- Update EnvVar.isOutdated() to use EnvironmentVariableValue

On 8.7.0, RepoEnvironmentFunction checks --repo_env first, then falls
back to the client environment via ClientEnvironmentFunction, since the
consolidated repo env computation from CommandEnvironment is not present.

(cherry picked from commit 01407ce)
Adds the integration test from bazelbuild#29946
(test_unrelated_env_var_does_not_invalidate_repo): changing an unrelated
environment variable via an interleaved `bazel mod` command must not refetch a
repository that doesn't depend on it.

PR bazelbuild#29946's production fix (inject the repo env per-variable, dropping the
whole-map PrecomputedValue.REPO_ENV) is NOT needed on this branch: the manual
repo-env port here injects only the narrow repoEnvFromOptions
(PATH/PATHEXT + --action_env + --repo_env) into REPO_ENV and falls back to the
already per-variable ClientEnvironmentFunction for everything else. An unrelated
client variable therefore never mutates the whole-map node, so the test passes
as-is. The regression the PR addresses only exists upstream, where the entire
client environment is folded into REPO_ENV via getRepoEnv().
Fixes bazelbuild#27517 by checking Skyframe deps in batches that stop right before any dep that may cause a cycle if checked while previous deps are out-of-date.

This is accompanied by a restructuring of `RepoRecordedInput` that consolidates all Skyframe logic associated with the computation of the corresponding value exclusively within that class. This will also be helpful in adding support for dynamic inputs to the remote repo contents cache in future work.

The upstream commit additionally moved the entirety of `RepositoryFetchFunction` to Skyframe workers so that checking the up-to-dateness of local repo contents cache entries isn't quadratic. That worker refactor is omitted in this 8.x backport, which keeps the existing split `RepositoryDelegatorFunction`/`StarlarkRepositoryFunction` architecture and restart-based evaluation; the cycle fix and the `RepoRecordedInput` restructuring are applied on that model. The batched up-to-dateness check is restart-safe (`isAnyValueOutdated` short-circuits while its batch's values are still missing).

Closes bazelbuild#28206.

Co-authored-by: Xudong Yang <wyverald@gmail.com>
PiperOrigin-RevId: 855252657
Change-Id: Ica18760ae79da5155fc0f3d8cd4f24c52a034c86
(cherry picked from commit 72a25a9)
* The cache was always written to, even if not enabled. (On 8.x this guard was already folded into the remote repo contents cache port, bazelbuild#29075, so only the second fix below is applied here.)
* Google RBE doesn't accept `Command`s without the (deprecated) `Platform` field set. We set it both on `Command` and `Action`, just to be safe.

Fixes bazelbuild#28294 (comment)

Closes bazelbuild#28295.

PiperOrigin-RevId: 856169835
Change-Id: I2479119a173e325a7d39643a36536569f5f831fc
(cherry picked from commit a9946096847e22de98e0e11b1f5dfbb6ec6ecdbb)
fmeum added 2 commits June 24, 2026 13:59
Important outputs and runfiles from external repos that are remote repo contents cache hits got stuck at various levels of the materialization pipeline for being source artifacts. This is fixed by consolidating the skip logic in a `RemoteOutputChecker.mayBeRemote` static helper and letting external-repo source artifacts flow through the toplevel-output download path.

8.x adaptation: v9 routes toplevel outputs through `RemoteImportantOutputHandler`, which does not exist on 8.x (`ImportantOutputHandler` has no production implementor). The equivalent toplevel-output materialization on 8.x lives in `CompletionFunction.ensureToplevelArtifacts`/`downloadArtifact`, which previously bailed out on non-`DerivedArtifact`s and only ran under skymeld. This change lets external-repo source artifacts (which have no generating action and are thus never downloaded by `finalizeAction`) flow through that path in both skymeld and non-skymeld builds, so they honor `--remote_download_outputs` just like build outputs do. `AbstractActionInputPrefetcher.prefetchFiles` already only skips main-repo source artifacts, so it is unchanged beyond annotating the `action` parameter `@Nullable` (source artifacts have no generating action).

Closes bazelbuild#28308.

PiperOrigin-RevId: 881618604
Change-Id: Ifaae8e39b0bcab3803653ca82bcf00d26c487316
(cherry picked from commit 16613f1)
With this change, all reproducible repository rules can now be cached in a disk or remote cache, including those with dependencies recorded dynamically during evaluation.

This is made possible by introducing a new intermediate type of synthetic AC entries. When looking up the predeclared inputs hash for a repo rule with dynamic dependencies, the action result for such an intermediate entry lists one or more sets of inputs (e.g. a particular file in another repo or an environment variable name). These inputs are then requested from Skyframe and their current values are hashed to obtain the key of the next AC entry, which is again either an intermediate entry or a final entry containing the contents of the repository.

8.x: adapted to the split RepositoryDelegatorFunction/StarlarkRepositoryFunction restart model (there is no merged RepositoryFetchFunction worker on 8.x). The cache lookup is wired through RepositoryDelegatorFunction, which returns null to trigger a Skyframe restart when lookupCache reports missing values via env.valuesMissing(). Since DigestWriter is a nested class of RepositoryDelegatorFunction on 8.x rather than a standalone target, the marker file is parsed inline in RemoteRepoContentsCacheImpl via RepoRecordedInput.WithValue.parse to avoid a dependency cycle between the remote and repository rule libraries.

RELNOTES: The remote repo contents cache now supports all reproducible repo rules.

Closes bazelbuild#27634.

PiperOrigin-RevId: 889750228
Change-Id: I9c7e4fed9d86432a85a96b3318f6eccc9c0558eb
(cherry picked from commit ffebc5b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants