[8.8.0] Support dynamic inputs with the remote repo contents cache#29972
Closed
fmeum wants to merge 8 commits into
Closed
[8.8.0] Support dynamic inputs with the remote repo contents cache#29972fmeum wants to merge 8 commits into
fmeum wants to merge 8 commits into
Conversation
* Also upload to the remote cache when the local cache is in use. The fix is simple but subtle: the logic for the two caches in `RepositoryFetchFunction` has to be flipped since the Skyframe restart after adding an entry to the local cache meant that the same code path would not be taken again. * Fix a crash when using both by ensuring that the local repo contents cache uses the file system backing the output base, not the workspace directory: ``` FATAL: bazel crashed due to an internal error. Printing stack trace: java.lang.RuntimeException: Unrecoverable error while evaluating node 'REPOSITORY_DIRECTORY:@@rules_python+' (requested by nodes 'REPO_FILE:@@rules_python+') at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:552) at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:435) at java.base/java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(Unknown Source) at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source) at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source) at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source) at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source) Caused by: java.lang.IllegalArgumentException: Files are on different filesystems: C:/users/runneradmin/_bazel_runneradmin/ebfu7cpi/external/@rules_python+.marker (on com.google.devtools.build.lib.remote.RemoteExternalOverlayFileSystem@79583b9), C:/Users/runneradmin/.cache/bazel-repo/contents/_trash/26a5feef-bf8c-4326-bf3d-500997c7362e (on com.google.devtools.build.lib.windows.WindowsFileSystem@24180f0f) at com.google.devtools.build.lib.vfs.Path.checkSameFileSystem(Path.java:964) at com.google.devtools.build.lib.vfs.Path.renameTo(Path.java:630) at com.google.devtools.build.lib.vfs.FileSystemUtils.moveFile(FileSystemUtils.java:456) at com.google.devtools.build.lib.bazel.repository.cache.LocalRepoContentsCache.moveToCache(LocalRepoContentsCache.java:172) at com.google.devtools.build.lib.bazel.repository.RepositoryFetchFunction.compute(RepositoryFetchFunction.java:297) at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:471) ``` Closes bazelbuild#28002. PiperOrigin-RevId: 855211557 Change-Id: I2f3c40a6aef594682fba989853f7ee982f30c294 (cherry picked from commit b143070)
Since this behavior is quite surprising (it definitely was to the author), this change also improves the test coverage for repo contents cache deletion by asserting that non-BUILD files within it actually exist on disk rather than just exist from the point of Skyframe. Also fix a crash observed while working on the test improvements. Closes bazelbuild#28222. PiperOrigin-RevId: 855225639 Change-Id: Ie4a88e93d14a4f4b7bb5217fc924e998a1779ccd (cherry picked from commit 4839f46)
…nv handling Ports the essential API changes from 01407ce needed by later feature commits: - Add EnvironmentVariableValue record type - Add RepoEnvironmentFunction with REPO_ENV + client env fallback - Register REPOSITORY_ENVIRONMENT_VARIABLE in SkyFunctions and SkyframeExecutor - Update EnvVar.getSkyKey() to use RepoEnvironmentFunction - Update EnvVar.isOutdated() to use EnvironmentVariableValue On 8.7.0, RepoEnvironmentFunction checks --repo_env first, then falls back to the client environment via ClientEnvironmentFunction, since the consolidated repo env computation from CommandEnvironment is not present. (cherry picked from commit 01407ce)
Adds the integration test from bazelbuild#29946 (test_unrelated_env_var_does_not_invalidate_repo): changing an unrelated environment variable via an interleaved `bazel mod` command must not refetch a repository that doesn't depend on it. PR bazelbuild#29946's production fix (inject the repo env per-variable, dropping the whole-map PrecomputedValue.REPO_ENV) is NOT needed on this branch: the manual repo-env port here injects only the narrow repoEnvFromOptions (PATH/PATHEXT + --action_env + --repo_env) into REPO_ENV and falls back to the already per-variable ClientEnvironmentFunction for everything else. An unrelated client variable therefore never mutates the whole-map node, so the test passes as-is. The regression the PR addresses only exists upstream, where the entire client environment is folded into REPO_ENV via getRepoEnv().
Fixes bazelbuild#27517 by checking Skyframe deps in batches that stop right before any dep that may cause a cycle if checked while previous deps are out-of-date. This is accompanied by a restructuring of `RepoRecordedInput` that consolidates all Skyframe logic associated with the computation of the corresponding value exclusively within that class. This will also be helpful in adding support for dynamic inputs to the remote repo contents cache in future work. The upstream commit additionally moved the entirety of `RepositoryFetchFunction` to Skyframe workers so that checking the up-to-dateness of local repo contents cache entries isn't quadratic. That worker refactor is omitted in this 8.x backport, which keeps the existing split `RepositoryDelegatorFunction`/`StarlarkRepositoryFunction` architecture and restart-based evaluation; the cycle fix and the `RepoRecordedInput` restructuring are applied on that model. The batched up-to-dateness check is restart-safe (`isAnyValueOutdated` short-circuits while its batch's values are still missing). Closes bazelbuild#28206. Co-authored-by: Xudong Yang <wyverald@gmail.com> PiperOrigin-RevId: 855252657 Change-Id: Ica18760ae79da5155fc0f3d8cd4f24c52a034c86 (cherry picked from commit 72a25a9)
* The cache was always written to, even if not enabled. (On 8.x this guard was already folded into the remote repo contents cache port, bazelbuild#29075, so only the second fix below is applied here.) * Google RBE doesn't accept `Command`s without the (deprecated) `Platform` field set. We set it both on `Command` and `Action`, just to be safe. Fixes bazelbuild#28294 (comment) Closes bazelbuild#28295. PiperOrigin-RevId: 856169835 Change-Id: I2479119a173e325a7d39643a36536569f5f831fc (cherry picked from commit a9946096847e22de98e0e11b1f5dfbb6ec6ecdbb)
Important outputs and runfiles from external repos that are remote repo contents cache hits got stuck at various levels of the materialization pipeline for being source artifacts. This is fixed by consolidating the skip logic in a `RemoteOutputChecker.mayBeRemote` static helper and letting external-repo source artifacts flow through the toplevel-output download path. 8.x adaptation: v9 routes toplevel outputs through `RemoteImportantOutputHandler`, which does not exist on 8.x (`ImportantOutputHandler` has no production implementor). The equivalent toplevel-output materialization on 8.x lives in `CompletionFunction.ensureToplevelArtifacts`/`downloadArtifact`, which previously bailed out on non-`DerivedArtifact`s and only ran under skymeld. This change lets external-repo source artifacts (which have no generating action and are thus never downloaded by `finalizeAction`) flow through that path in both skymeld and non-skymeld builds, so they honor `--remote_download_outputs` just like build outputs do. `AbstractActionInputPrefetcher.prefetchFiles` already only skips main-repo source artifacts, so it is unchanged beyond annotating the `action` parameter `@Nullable` (source artifacts have no generating action). Closes bazelbuild#28308. PiperOrigin-RevId: 881618604 Change-Id: Ifaae8e39b0bcab3803653ca82bcf00d26c487316 (cherry picked from commit 16613f1)
With this change, all reproducible repository rules can now be cached in a disk or remote cache, including those with dependencies recorded dynamically during evaluation. This is made possible by introducing a new intermediate type of synthetic AC entries. When looking up the predeclared inputs hash for a repo rule with dynamic dependencies, the action result for such an intermediate entry lists one or more sets of inputs (e.g. a particular file in another repo or an environment variable name). These inputs are then requested from Skyframe and their current values are hashed to obtain the key of the next AC entry, which is again either an intermediate entry or a final entry containing the contents of the repository. 8.x: adapted to the split RepositoryDelegatorFunction/StarlarkRepositoryFunction restart model (there is no merged RepositoryFetchFunction worker on 8.x). The cache lookup is wired through RepositoryDelegatorFunction, which returns null to trigger a Skyframe restart when lookupCache reports missing values via env.valuesMissing(). Since DigestWriter is a nested class of RepositoryDelegatorFunction on 8.x rather than a standalone target, the marker file is parsed inline in RemoteRepoContentsCacheImpl via RepoRecordedInput.WithValue.parse to avoid a dependency cycle between the remote and repository rule libraries. RELNOTES: The remote repo contents cache now supports all reproducible repo rules. Closes bazelbuild#27634. PiperOrigin-RevId: 889750228 Change-Id: I9c7e4fed9d86432a85a96b3318f6eccc9c0558eb (cherry picked from commit ffebc5b)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
With this change, all reproducible repository rules can now be cached in a disk or remote cache, including those with dependencies recorded dynamically during evaluation.
This is made possible by introducing a new intermediate type of synthetic AC entries. When looking up the predeclared inputs hash for a repo rule with dynamic dependencies, the action result for such an intermediate entry lists one or more sets of inputs (e.g. a particular file in another repo or an environment variable name). These inputs are then requested from Skyframe and their current values are hashed to obtain the key of the next AC entry, which is again either an intermediate entry or a final entry containing the contents of the repository.
8.x adaptation: the upstream commit changes the merged
RepositoryFetchFunctionworker; 8.x has no such worker, so the cache lookup is wired throughRepositoryDelegatorFunctionon the split, restart-based architecture instead.lookupCachenow takes theSkyFunction.Environmentand requests recorded-input values from Skyframe; the caller returnsnullto restart whenenv.valuesMissing(). BecauseDigestWriteris a nested class ofRepositoryDelegatorFunctionon 8.x rather than the standalone target it is in v9, the marker file is parsed inline inRemoteRepoContentsCacheImplviaRepoRecordedInput.WithValue.parseto avoid a dependency cycle between theremoteand repository-rule libraries. Everything else (intermediate-entry DAG traversal, rolling-hash construction, disk-cache write-through for downloads) is ported as-is.Depends on the local/remote repo contents cache cycle fix (#29087) and its
RepoRecordedInputrestructuring, whose commits are included in this branch.Closes #27634.
PiperOrigin-RevId: 889750228
Change-Id: I9c7e4fed9d86432a85a96b3318f6eccc9c0558eb
(cherry picked from commit ffebc5b)