Ask Claude to Update to 2.21.0#491
Conversation
|
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
<details><summary>Claude's draft</summary>
Bump the feedstock to TensorFlow 2.21.0:
- recipe.yaml: version 2.21.0, refreshed sha256, build 0, bazel 7.*,
run-dep bumps (protobuf >=6.31.1, tensorboard 2.21, keras 3.12), and
CUDA selectors extended to the 13.x series.
- Rebase the patch series onto the 2.21.0 source tree. TF 2.21.0 moved
third_party/{absl,gpus,eigen3,ducc} under third_party/xla/ and dropped
the duplicated TSL tree, so every patch was refreshed; nine obsolete
patches were dropped and several new ones added to restore the
systemlib (absl / protobuf / gRPC) wiring that 2.21.0 removed.
- Patch files keep their original author attribution (Uwe L. Korn,
H. Vetinari, Isuru Fernando and others), sourced from the conda-forge
feedstock history.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resume this Claude session:
```
claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3
```
</details>
<details><summary>Claude's draft</summary> Add .ci_support/migrations/cuda130.yaml so the feedstock builds a CUDA 13.0 variant alongside CUDA 12.8. The migration is taken from the conda-forge cuda130 migrator (key_add operation, wait_for_migrators: cuda129, c_stdlib 2.28), matching the approach used by pytorch-cpu-feedstock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary> TF 2.21.0 changed its build substantially; the feedstock build scripts are reworked to cope, in logical steps: - Compiler: switch the linux build to the conda-forge clang/clangxx 18 toolchain (conda_build_config.yaml). TF 2.21.0 defaults to a hermetic LLVM CC toolchain (rules_ml_toolchain); build_common.sh selects --config=clang_local + --crosstool_top so the conda toolchain and system headers are used instead. - ABI: pin --cxxopt/--host_cxxopt=-fclang-abi-compat=17. clang 18 changed the Itanium mangling of non-type template parameters of dependent type; conda-forge's libabseil/libprotobuf use the older GCC-compatible form, so TF must match it or absl::Cord etc. fail to link. - System libraries: restore TF_SYSTEM_LIBS and, because TF 2.21.0's cc_shared_library does not forward systemlib cc_library linkopts, force-link the systemized libraries (protobuf, grpc, sqlite3, icu, png, jpeg, gif, flatbuffers, snappy, curl, and abseil's ~90 shared objects) so libtensorflow*.so record them as DT_NEEDED. - Caching: restore .bazelrc to a pristine snapshot on every invocation so the per-Python passes do not accumulate duplicate flags (which changed every compile command and defeated all Bazel caching), and add a persistent --disk_cache. - Packaging: fix the XLA header-install path (@local_xla renamed to @xla), chmod build_env writable so rattler-build can clean it up and package every output, and use cp -f for the wheel copy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary> Re-rendered with conda-smithy to pick up the CUDA 13.0 migration and the clang toolchain change: regenerated .ci_support variant files (CUDA 13.0 level1/level3, refreshed CUDA-None variants), the conda-build workflow, and pixi.toml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
ba1e478 to
4ecda86
Compare
<details><summary>Claude's draft</summary> The run dependency was tensorboard >=2.21,<2.22, which is unsatisfiable: no tensorboard 2.21 has been released on PyPI or conda-forge (latest is 2.20.0 on both). This made the tensorflow-base test environment unsolvable. TensorFlow 2.21.0 does not list tensorboard in its wheel REQUIRED_PACKAGES at all; its CI requirement files pin `tensorboard ~= 2.20.0`. Following the meaning of ~=, that is >=2.20.0,<2.21. Pin tensorboard >=2.20,<2.21 to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary> conda-forge CI fails the linux builds ~12s into the Bazel compile: embed_gpu_specs_gen failed (Exit 127): /bin/bash: xxd: command not found TF 2.21.0's bundled XLA added the genrule @xla//xla/backends/gpu/target_config:embed_gpu_specs_gen, which runs `xxd -i` to embed GPU spec files into generated C++. xxd is not in the conda-forge build image -- it was an undeclared host tool that happened to be present on the maintainer's dev machine (it ships with vim). Add vim to the staging output's build requirements (xxd is not a standalone conda-forge package; vim provides it). Fixes the linux CPU, CUDA and aarch64 jobs, which all fail here identically. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary> The osx-arm64 CI build fails 12s into the Bazel compile: error: invalid value '17' in '-fclang-abi-compat=17' Error in child process '/usr/bin/xcrun'. 1 -fclang-abi-compat=17 was added unconditionally to .bazelrc, but it is a linux-specific fix: conda-forge's linux libabseil/libprotobuf use the GCC-compatible pre-clang-18 mangling for dependent non-type template parameters, so the clang-built TF must match. macOS builds with Apple clang (via xcrun), which both rejects the bare '17' value and does not need it (the conda macOS libraries are clang-built already). Emit the --cxxopt/--host_cxxopt=-fclang-abi-compat=17 lines only when target_platform is linux-*. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary> The osx-arm64 build fails compiling generated protobuf code: tpu_embedding_configuration.pb.h: fatal error: 'google/protobuf/runtime_version.h' file not found ... logging_initializer.cc: fatal error: 'absl/base/log_severity.h' not found TF 2.21.0's reworked .bazelrc has `common:macos --config=apple-toolchain`, and `common:apple-toolchain` forces @local_config_apple_cc//:toolchain for --crosstool_top, --apple_crosstool_top and --host_crosstool_top. That overrides the recipe's `--crosstool_top=//bazel_toolchain:toolchain` and, crucially, also sets the host/apple crosstool slots the recipe never touched -- so the conda toolchain (which carries -isystem $PREFIX/include) is bypassed entirely and conda's protobuf/abseil headers are invisible. sed the apple-toolchain config to point all three crosstool slots at the conda //bazel_toolchain:toolchain. This restores the pre-2.21 behaviour where the conda toolchain served the macOS build. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary> The osx-arm64 build now reaches the link step (~22 min in) and fails: ld: illegal thread local variable reference to regular symbol google::protobuf::internal::ThreadSafeArena::thread_cache_ for arm64 conda-forge's macOS libprotobuf is compiled with PROTOBUF_NO_THREADLOCAL, so it exports ThreadSafeArena::thread_cache_ as a regular (non-TLS) symbol. TensorFlow compiles the same protobuf headers without that macro, so its objects emit a thread-local (TLV) relocation against thread_cache_; the Mach-O linker rejects a TLV reference to a non-TLS definition. (ELF tolerates the mismatch, so linux is unaffected.) Add -DPROTOBUF_NO_THREADLOCAL to --copt and --host_copt for osx so TF's protobuf header compilation matches the ABI of the installed libprotobuf. host_copt is needed too: the failing target is the [for tool] exec build. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary> The previous commit added -DPROTOBUF_NO_THREADLOCAL for osx, but protobuf's port_def.inc explicitly rejects it: port_def.inc:731: error: PROTOBUF_NO_THREADLOCAL was previously defined protobuf manages that macro itself and never expects it pre-defined. Revert that change. The underlying osx link error -- ld: illegal thread local variable reference to regular symbol ThreadSafeArena::thread_cache_ -- is a protobuf ABI clash (a non-TLS protobuf object is being linked into libtensorflow_framework.dylib while the systemized protobuf 6.33.5 headers make thread_cache_ __thread); it needs the protobuf systemize-vs-vendor wiring fixed for osx, not a compile define. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary> -fclang-abi-compat=17 was gated on target_platform == linux-*, but the linux CUDA variants build with gcc 14 (the cuda128/cuda130 migrators pin the host compiler to gcc, since nvcc needs a gcc host compiler). gcc has no -fclang-abi-compat flag, so it errors out -- a CUDA-build blocker. Tighten the gate to linux AND c_compiler == clang*, i.e. the CPU variants only. The CUDA variants (gcc) and macOS (Apple clang) both correctly skip the flag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary>
osx-arm64 fails linking libtensorflow_framework.dylib:
ld: illegal thread local variable reference to regular symbol
google::protobuf::internal::ThreadSafeArena::thread_cache_
TF 2.21.0's cc_shared_library does not propagate the systemlib protobuf
linkopt. On linux build_common.sh force-links -lprotobuf (and the other
systemlibs) via LDFLAGS, but the osx branch only added -undefined
dynamic_lookup -- which hides undefined regular protobuf symbols yet
cannot reconcile the TLS storage class of thread_cache_, so ld rejects
the thread-local reference to it as a (regular, undefined) symbol.
Force-link conda's libprotobuf for osx via --linkopt/--host_linkopt
(host too: the failing target is the [for tool] exec-config link), so
thread_cache_ resolves to its TLS definition in libprotobuf.dylib.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resume this Claude session:
```
claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3
```
</details>
<details><summary>Claude's draft</summary> osx-64 CI fails at Bazel analysis: cc_toolchain_suite '//bazel_toolchain:toolchain' does not contain a toolchain for cpu 'darwin_arm64' osx-64 is cross-compiled on an arm64 runner. gen-bazel-toolchain keys the conda cc_toolchain_suite on darwin_x86_64 (target) and darwin_arm64 (build host), but build_common.sh forced --cpu=darwin -- which matches no suite key. Once TF 2.21.0's apple-toolchain config points crosstool_top at that suite, analysis fails. Set TARGET_CPU=darwin_x86_64 so --cpu matches the emitted suite key. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary> conda-forge is moving off CUDA 12.8 on linux. Mirror pytorch-cpu-feedstock, which builds CUDA 12.9 + 13.0: replace the local cuda128.yaml migrator with cuda129.yaml and cuda130.yaml copied verbatim from conda-forge/pytorch-cpu-feedstock's .ci_support/migrations/. A rerender follows to regenerate the .ci_support variant files (drops linux_64_cuda_compiler_version12.8*, adds 12.9). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary> conda_build_config.yaml pinned c_compiler=clang / version 18 for all of linux. The CUDA variants must build with gcc (nvcc needs a gcc host compiler, and the cuda migrators pin c_compiler_version=14) -- the unconditional clang 18 pin clashed with the migrator and made the rerender ambiguous. Gate the clang pin on cuda_compiler_version == "None" so only the CPU variant uses clang 18; the CUDA variants fall through to the global pinning + migrator (gcc). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary> Re-rendered with conda-smithy 3.61.2 / conda-forge-pinning 2026.05.16 to pick up the cuda128 -> cuda129 migrator swap and the clang compiler gating: the linux CUDA variants are now 12.9 and 13.0 (12.8 dropped), and the CPU variant keeps clang while the CUDA variants use gcc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary> The CUDA 13.0 build aborts during Bazel's repository fetch: nvidia_nvshmem: Platform cuda13_x86_64-unknown-linux-gnu is not supported [...] @xla//xla/tsl/cuda:nvshmem_stub depends on @nvidia_nvshmem which failed to fetch TF 2.21.0's nvshmem_stub alias resolves to the hermetic @nvidia_nvshmem redistribution when CUDA libraries are force-included (override_include_cuda_libs=true, which the recipe sets). The pinned NVSHMEM 3.2.5 redist ships only cuda11/cuda12 archives -- no cuda13 -- so the build fails before compiling anything. CUDA 12.x silently worked because a cuda12 archive exists. Add patch 0067 making nvshmem_stub always resolve to the bundled dlopen stub (:nvshmem). conda-forge does not package NVSHMEM and TF's NVSHMEM support (optional multi-GPU collectives) is loaded via dlopen anyway, so the stub is the correct choice for both CUDA 12.9 and 13.0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary> Two source-side CI fixes (a rerender still needs to land separately): - conda_build_config.yaml: restore the unconditional # [linux] clang pin. The `cuda_compiler_version == "None"` selector cannot evaluate at config-parse time, so clang was dropped from every variant and the CPU build regressed to gcc -- XLA's -emit-llvm intrinsic codegen then fails (gcc rejects clang-only flags like -fno-experimental-sanitize-metadata). - recipe.yaml: add cuda-nvrtc-dev to the CUDA-12/13 host deps. It ships targets/<arch>-linux/include/nvrtc.h; without it the hermetic cuda_nvrtc Bazel repo has an empty include/ and the CUDA build aborts with "missing input file @cuda_nvrtc//:include/nvrtc.h". Both pytorch-cpu-feedstock and jaxlib-feedstock list cuda-nvrtc-dev. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary>
c_compiler_version / cxx_compiler_version / c_stdlib_version belong to
conda-forge's `unix` zip_keys group. Under CF_CUDA_ENABLED the cuda
migrator adds a second entry (the CUDA variant), so a single-entry clang
override desynced the group and made conda-smithy rerender fail
("ambiguous ... we did not find ['18'] ... in c_compiler_version
['14','14']").
conda_build_config.yaml: give each overridden linux key two parallel
entries (CPU + CUDA) plus a matching c_stdlib_version block, mirroring
jaxlib-feedstock (which also builds XLA with clang alongside CUDA).
Includes the conda-smithy re-render. Rerender now succeeds: CPU and
CUDA 12.9 render as clang 18; CUDA 13.0 as clang 14 (cuda130 migrator
pin).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resume this Claude session:
```
claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3
```
</details>
<details><summary>Claude's draft</summary> The cuda130.yaml migrator carried gcc-era c_compiler_version (14/13), so the CUDA 13.0 variant rendered with clang 14 -- too old for TF 2.21 / XLA and rejected by the recipe's -fclang-abi-compat=17. jaxlib-feedstock handles this by editing its own copy of the CUDA migrators (they carry use_local: true) to pin the clang version it builds with, in lockstep with recipe/conda_build_config.yaml. Follow that: pin c_compiler_version / cxx_compiler_version / fortran_compiler_version to 18 in cuda130.yaml, and add the same explicit clang-18 block to cuda129.yaml (which previously only got clang 18 by fall-through). Both CUDA 12.9 and 13.0 variants now render c_compiler: clang / c_compiler_version: 18, matching the CPU variant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary>
The CUDA variants now build with clang (conda_build_config.yaml + the
cuda migrators pin clang 18). TF's ./configure failed:
UserInputError: Invalid GCC_HOST_COMPILER_PATH provided 10 times
because build_common.sh still set GCC_HOST_COMPILER_PATH=${GCC} -- and
there is no gcc in a clang build env, so ${GCC} is empty.
TF 2.21.0's configure.py only reads GCC_HOST_COMPILER_PATH when
TF_CUDA_CLANG=0; with TF_CUDA_CLANG=1 it reads CLANG_CUDA_COMPILER_PATH
and clang compiles the CUDA device code directly (--config=cuda_clang).
This matches XLA's own cuda_clang_local reference config and
jaxlib-feedstock's clang CUDA build.
CUDA branch: drop GCC_HOST_COMPILER_PATH/_PREFIX; set TF_CUDA_CLANG=1,
TF_NEED_CLANG=1, CLANG_CUDA_COMPILER_PATH / CLANG_COMPILER_PATH to the
conda clang. Also stop the later unconditional TF_CUDA_CLANG=0 /
TF_NEED_CLANG=0 from clobbering it -- gate them to the non-CUDA build.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resume this Claude session:
```
claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3
```
</details>
<details><summary>Claude's draft</summary> The clang-CUDA build (TF_CUDA_CLANG=1) failed compiling CUDA 13 device code: clang 18's __clang_cuda_runtime_wrapper.h includes headers CUDA 13 removed (texture_fetch_functions.h), and the device pass chokes on __float128 in gcc 15's libstdc++. clang only gained CUDA 13 support in v21 -- clang 18 cannot target CUDA 13 at all. Switch to nvcc for device code with clang 18 as the host compiler: - TF_CUDA_CLANG=0, TF_NVCC_CLANG=1; append `build --config=cuda_nvcc`. - configure.py reads GCC_HOST_COMPILER_PATH on the TF_CUDA_CLANG=0 path and only checks the path exists, so point it at clang -- which is what nvcc uses as host under TF_NVCC_CLANG. - Put nvvm/bin on PATH for nvcc/cicc/ptxas. - Strip TF's hardcoded -fuse-ld=lld from .bazelrc (conda clang has no lld; the cuda_clang config carried it, cuda_nvcc does not). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary> The CUDA build passed sm_100/sm_120 (Blackwell) to clang 18, which errors "unsupported CUDA gpu architecture: sm_100". clang 18 only knows up to sm_90. Drop sm_100/sm_120/compute_120 from HERMETIC_CUDA_COMPUTE_CAPABILITIES -> sm_90/compute_90 ceiling, for both the 12.x and 13.x lists. Blackwell support can return with a newer clang. (One of two changes for the CUDA build; the crosstool-routing fix that keeps nvcc-only copts off plain clang is separate.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary> The CUDA build failed because the recipe appended a blanket `--crosstool_top=//bazel_toolchain:toolchain` (the conda plain-clang toolchain) for all variants. cuda_library .cu.cc targets carry nvcc-only copts (-Xcuda-fatbinary=, -nvcc_options=, -x cuda, --cuda-gpu-arch=); forced onto plain clang 18 those are rejected. --config=cuda_nvcc was set but inert because TF's CUDA crosstool was never selected. cuda_library is a plain cc_library, so Bazel resolution cannot route .cu.cc separately. The mechanism that splits device/host is TF's CUDA crosstool, whose host_compiler is the nvcc wrapper (crosstool_wrapper_driver_is_not_gcc): it sends '-x cuda' actions to nvcc 13 and everything else to conda clang 18 (CLANG_CUDA_COMPILER_PATH). Make --crosstool_top per-variant: CPU keeps //bazel_toolchain:toolchain; CUDA uses @local_config_cuda//crosstool:toolchain (+ host_crosstool_top). That CUDA crosstool's cc_toolchain_suite is keyed k8 not x86_64, so the CUDA linux-64 build also sets --cpu=k8 (CC_CPU). Mirrors TF's own config:rocm pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary> With the CUDA build routed through TF's CUDA crosstool, host .cc compiles failed: 'sqlite3ext.h' / 'absl/base/log_severity.h' file not found. TF's CUDA crosstool (cuda_configure.bzl) carries none of the conda gen-bazel-toolchain customizations -- its cxx_builtin_include_directories are only clang's own builtins, unfiltered_compile_flags is empty -- so conda's -isystem $PREFIX/include and the LDFLAGS force-link block (baked into //bazel_toolchain for the CPU build) never reach it. In the CUDA branch, append explicit flags: --copt/--host_copt -isystem $PREFIX/include, --linkopt/--host_linkopt -L$PREFIX/lib, and a loop turning the assembled $LDFLAGS (the -lprotobuf/-lgrpc/-labsl_*/... force-link set) into --linkopt/--host_linkopt. Same pattern the osx branch already uses for the Apple crosstool. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary> The previous attempt injected conda's header dir with `--copt=-isystem $PREFIX/include`, but Bazel rejects an absolute include path the active toolchain does not declare: hwloc/base64.c: the include path '$PREFIX/include' references a path outside [the execroot] Use CPATH instead. clang reports CPATH directories in `clang -E -v`, so cuda_configure.bzl folds $PREFIX/include into the CUDA crosstool's cxx_builtin_include_directories (declared -> accepted), and clang also searches CPATH at compile time. Export CPATH/CPLUS_INCLUDE_PATH so the cuda_configure repo rule sees them, and pass them via --action_env/--host_action_env so the compile actions do too. Library dirs / force-link libs stay as --linkopt (linkopts are not path-checked). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary> TF's CUDA crosstool passes --cuda-path=external/cuda_nvcc to every compile action. On plain C files (e.g. vendored brotli) clang reports "argument unused during compilation: '--cuda-path=...'", which TF's -Werror,-Wunused-command-line-argument turns into a hard error. Add -Qunused-arguments (--copt/--host_copt) for the CUDA build so clang silently ignores command-line arguments that do not apply to a given source file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
There was a problem hiding this comment.
I'm curious what you all think of this file.
Taking in som stats from my usage
- claude-opus-4-7 123.6k input, 1.4m output, 503.0m cache read , 5.7m cache write ($322.24)
- claude-haiku-4-5: 1.4m input, 34.5k output, 0 cache read, 0 cache write, 44 web search ($1.99)
basically 2 days on a powerful laptop, though i'm pretty sure my computer crashed half way through.
|
I understand the contents of these patches just as much as I understand the contents of the old ones. |
…ibprotobuf
<details><summary>Claude's draft</summary>
`import tensorflow` aborted with a protobuf descriptor double-registration
SIGABRT ("File already exists in database"). With conda's *shared*
libprotobuf there is one process-global generated-descriptor database, and
TF embeds the same generated proto .pb.o into many shared objects
(libtensorflow_framework.so, libtensorflow_cc.so, the _pywrap_*.so
extensions, ...); the second .so to load re-registers a proto file the
first already registered, and protobuf's AddDescriptors aborts.
Fix (keeps shared/systemlib protobuf — no static/hermetic switch, no TF
source patch):
- recipe/tf_proto_descriptor_guard.h — featherweight, force-included into
every TU (--copt=-include). No protobuf/absl headers, so it is safe in
vendored pre-C++17 sources. It just defines AddDescriptors ->
AddDescriptors_TfGuarded.
- recipe/tf_proto_descriptor_guard_impl.h — force-included into the .pb.cc
files only (--per_file_copt). A functional clone of protobuf's
AddDescriptors that skips InternalAddGeneratedFile when
internal_generated_database()->FindFileByName shows the proto file is
already registered — idempotent instead of fatal.
- build_common.sh copies both headers into a toolchain -isystem dir, wires
the copt/per_file_copt flags into .bazelrc, and removes them before
packaging.
Also in build_common.sh: re-enable USE_PYWRAP_RULES (the upstream pywrap
build) for the python wheel and build the standalone libtensorflow /
libtensorflow_cc C/C++ libraries in a separate non-pywrap Bazel pass; drop
the --@local_config_cuda//cuda:override_include_cuda_libs flag so CUDA
libraries are dlopen'd lazily (no hard libcuda.so.1 DT_NEEDED), matching
jaxlib.
Verified: full cp312 pywrap wheel builds (20,267 Bazel actions, 0 errors)
and `import tensorflow` succeeds — tf.__version__ 2.21.0, tf ops run.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resume this Claude session:
```
claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3
```
</details>
…ib protobuf flat deps <details><summary>Claude's draft</summary> Two fixes that together make the CPU build's `import tensorflow` work end-to-end. 0072 (OpRegistry duplicate tolerance — the import-test fix). Both libtensorflow_framework.so.2 (from the pywrap python wheel) and libtensorflow_cc.so.2.21.0 (from the libtensorflow_cc package) each statically embed tensorflow/core/ops/function_ops.cc, so each runs the same _Arg op registration's static initializer at process startup. Because OpRegistry isn't initialized yet at that point, both registrations land on the single OpRegistry::Global() deferred queue. The first load_op_library() call (Lite's audio_microfrontend, reached on `import tensorflow` via compat.v1.lite.experimental.authoring) calls LoadDynamicLibrary, whose first action is ProcessRegistrations -> CallDeferred. The second _Arg in the queue hits try_emplace's existing entry and aborts as AlreadyExistsError. Patch OpRegistry::RegisterAlreadyLocked: when the op name is already registered, compare the new OpDef to the existing one via OpDefEqual; if equal, silently accept (and skip the watcher callback so the duplicate is not mis-attributed to the load_op_library's contributed op list). Genuinely divergent redefinitions still error. 0071 (systemlib _protobuf_deps — the build-analysis fix). With USE_PYWRAP_RULES on, @tsl//tsl/platform:protobuf evaluates tsl_protobuf_deps()'s _protobuf_deps branch, which references @com_google_protobuf//src/google/protobuf/io and the :delimited_message_util / :differencer / :json_util / :type_resolver split targets. The systemlib protobuf.BUILD does not declare those (conda's libprotobuf is one complete library, exposed via flat :protobuf / :protobuf_lite). Drop the missing sub-targets from _protobuf_deps so Bazel analysis succeeds. Verified the CPU build's `import tensorflow` now reaches well past the proto descriptor abort and into the tflite load_op_library path; this commit's 0072 addresses the latter. Full build-locally CPU end-to-end run pending the in-flight rebuild. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary>
TF 2.21.0's setup.py.tpl pins h5py < 3.15.0, but conda-forge ships
h5py 3.16.0+. h5py is API-stable enough that the upper bound is
over-restrictive; the conda-forge pip-check post-install test fails
on the megabuild's `tensorflow` python package output:
tensorflow 2.21.0 has requirement h5py<3.15.0,>=3.11.0,
but you have h5py 3.16.0.
Drop the upper bound. The CPU variant's `import tensorflow` test
(the rest of the megabuild test phase) already passed end-to-end
with this commit's series — descriptor guard (2728fb3) +
OpRegistry duplicate tolerance (077cf82, patch 0072) +
_protobuf_deps systemlib flat targets (077cf82, patch 0071) +
this h5py loosening.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resume this Claude session:
```
claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3
```
</details>
<details><summary>Claude's draft</summary>
Follow jaxlib's approach for hermetic CUDA wheel builds:
* Add `build --config=cuda_wheel` to the per-build `.bazelrc` (CUDA
variants only). This sets `--@local_config_cuda//cuda:include_cuda_libs=
false`, so the CUDA runtime libraries (libcudart, libcublas, libcufft,
libcusparse, libnvjitlink, ...) are loaded lazily at first GPU use rather
than being hard-linked via DT_NEEDED into libtensorflow_framework.so.2.
Without this the wheel ends up needing libcuda.so.1 at `import
tensorflow`, which the conda test envs do not ship.
* XLA's `xla/stream_executor/cuda/cuda_executor.cc` still calls NVML
directly (`nvmlDeviceGetHandleByPciBusId_v2`, ...), so build-time tools
that pull in cuda_executor (e.g. `hlo_to_kernel`) fail with undefined
references unless we hand them libnvidia-ml. Force-link the conda
`cuda-nvml-dev` stub explicitly in LDFLAGS, alongside the existing
`-lcusparse`.
* The proto-text codegen tool (`gen_proto_text_functions`) ends up
DT_NEEDED-ing `libnvidia-ml.so.1` even under cuda_wheel, so symlink the
stub into `${PREFIX}/lib` for build-time runtime resolution, paralleling
the existing libcuda.so.1 stub symlink. Both stubs are removed from
`${PREFIX}/lib` in `recipe/build.sh` before packaging so they never ship.
Validated end-to-end via `build-locally.py` for the CPU variant (✔ python
imports test passed). CUDA 13.0 build-locally is in progress with this
configuration; CUDA 12.9 still to validate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resume this Claude session:
```
claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3
```
</details>
…DYLIB <details><summary>Claude's draft</summary> Mirror of the linux force-link list (commit f7663d6) onto the osx-* branch of `build_common.sh`. TF 2.21.0's `cc_shared_library` rules do not forward the systemlib `cc_library` `linkopts`, so libtensorflow_cc.dylib and _pywrap_tensorflow_internal.so end up with no LC_LOAD_DYLIB entries for the systemized grpc/sqlite3/icu/png/jpeg/gif/flatbuffers/abseil they reference. On linux this manifests as `undefined reference` link errors, which is why we already force-link those there. On osx the existing `-Xlinker -undefined -Xlinker dynamic_lookup` flag lets the link succeed silently and defers symbol resolution to runtime. At `import tensorflow` time nothing has loaded libgrpc++, so dlopen of _pywrap_tensorflow_internal.so fails with e.g. symbol not found in flat namespace '__ZN4grpc6Status2OKE' (== grpc::Status::OK) Fix: explicitly list the systemized libraries in LDFLAGS so ld64 records LC_LOAD_DYLIB entries for them. macOS ld64 does not accept `--no-as-needed` or `--export-dynamic`, but listing `-l<name>` is enough to add the load command; the existing `-undefined dynamic_lookup` remains as a safety net for symbols not covered by any conda dylib. abseil ships ~90 dylibs; enumerate them like the linux branch does. Discovered from osx-arm64 azure CI logs: bazel build + py3.{10,11,12} wheels all built fine; rattler-build's test phase blew up on `import tensorflow` for the py3.10 cpu_py310h... output. The rattler-build "links against" diagnostic in the same log confirms libtensorflow_cc.dylib has no libgrpc++ load command before this fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary> The previous commit (dfdda7e) added `-lnvidia-ml` to the CUDA-variant LDFLAGS so XLA's `cuda_executor.cc` (which calls NVML directly) and build-time tools like `hlo_to_kernel` could resolve NVML symbols against the conda `cuda-nvml-dev` stub. But that LDFLAGS line is appended AFTER the linux branch sets `-Wl,--no-as-needed`, so every binary the linker produced — including tiny extension modules that don't touch NVML at all, like `tensorflow/python/platform/_pywrap_cpu_feature_guard.so` — ended up with `DT_NEEDED libnvidia-ml.so.1`. The build-time stub symlink in `${PREFIX}/lib` papered over this during the build, but the conda test env has no NVML runtime (NVML normally ships with the NVIDIA driver, not as a conda package), so `import tensorflow` blew up: File ".../tensorflow/python/platform/self_check.py", line 63 from tensorflow.python.platform import _pywrap_cpu_feature_guard ImportError: libnvidia-ml.so.1: cannot open shared object file: No such file or directory Fix: bracket `-lnvidia-ml` with `-Wl,--as-needed ... -Wl,--no-as-needed` so the spurious DT_NEEDED is dropped from binaries that don't actually reference NVML symbols. Binaries that DO reference NVML (`libtensorflow_framework.so.2`'s XLA stream_executor, `hlo_to_kernel`, gen_proto_text_functions if it indirectly links cuda_executor) keep their entry and still resolve against the build-time stub symlink. The runtime DT_NEEDED in `libtensorflow_framework.so.2` itself is OK as long as it's only reached on actual GPU use; cuda_wheel's lazy dlopen already covers cudart/cublas/cufft, and import-time code paths like preload_check no longer drag in the NVML chain. Surfaced from CUDA 13.0 build-locally test-phase failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary>
The previous CUDA fixes added `-L${BUILD_PREFIX}/.../stubs -lcusparse
-lnvidia-ml` to LDFLAGS and a `${PREFIX}/lib/libnvidia-ml.so.1`
symlink so build-time codegen tools could resolve NVML symbols. That
worked for the build but baked DT_NEEDED libnvidia-ml.so.1 (and
libcusparse.so.12) into every output -- including
`libtensorflow_framework.so.2.21.0` and tiny extension modules like
`_pywrap_cpu_feature_guard.so`. NVML ships only with the NVIDIA driver
(no conda-forge package provides libnvidia-ml.so.1), so the conda test
env fails at:
from tensorflow.python.platform import _pywrap_cpu_feature_guard
ImportError: libnvidia-ml.so.1: cannot open shared object file
The `--as-needed` wrapper I added in a follow-up didn't help: this
recipe forwards $LDFLAGS to Bazel via the per-token loop further down
build_common.sh; Bazel reorders linkopts by class, which strips the
`--as-needed`/`--no-as-needed` bracket scope, leaving the `-l<name>`
unconditionally NEEDed.
TF/XLA already ship an in-tree solution: `xla/tsl/cuda/nvml_stub.cc`
(with implib_so trampolines in `nvml.tramp.S`/`nvml.symbols`) provides
a lazy-dlopen NVML stub that is auto-aliased in by `:nvml` whenever
`--@local_config_cuda//cuda:include_cuda_libs=false` -- which is what
`--config=cuda_wheel` (already enabled) sets. Same story for cusparse,
cudart, cublas, cufft, cusolver. Jaxlib's `xla_cuda_plugin.so` on
conda-forge is built this way: zero DT_NEEDED for libnvidia-ml,
libcusparse, libcuda, libcudart, etc.
This commit:
1. Removes the `-lcusparse -lnvidia-ml` LDFLAGS additions and the
`${PREFIX}/lib/libnvidia-ml.so.1` symlink (and its build.sh
cleanup). Keeps the `libcuda.so.1` symlink -- that one is for a
different problem (host tools that load libtensorflow_framework
during codegen and hit its driver stub dep).
2. Adds patch `0074-xla-cuda_executor-depend-on-tsl-cuda-nvml-stub`
making `cuda_executor` depend on `//xla/tsl/cuda:nvml` -- a
one-line BUILD patch that mirrors what `cuda_platform` already
does. Without this, a binary that links cuda_executor but not
cuda_platform would still see undefined NVML symbols. Most TF
binaries pull both in, but better safe than another 6-hour rebuild.
Diagnosed by inspecting the staging .bazelrc generated by the recipe
(linkopt reorder), readelf -d on the failing artifacts, and comparing
to a conda-installed jaxlib (clean hermetic layout).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resume this Claude session:
```
claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3
```
</details>
<details><summary>Claude's draft</summary>
The ${PREFIX}/lib/libcuda.so.1 symlink (and its build.sh cleanup) was
added before --config=cuda_wheel to let build-time host tools resolve
libtensorflow_framework.so's libcuda.so.1 DT_NEEDED off-GPU.
Under cuda_wheel (include_cuda_libs=false, a common: flag that also
applies to the exec/host config), XLA routes the CUDA driver through
its always-lazy in-tree stub (//xla/tsl/cuda:cuda -> cuda_stub.cc), so
libtensorflow_framework.so no longer DT_NEEDEDs libcuda.so.1 -- verified
by readelf on the freshly built artifact (zero CUDA DT_NEEDED). The
symlink is therefore unnecessary; remove it and its packaging cleanup.
If a host tool still fails with "libcuda.so.1: cannot open", the fix is
to route that tool through //xla/tsl/cuda:cuda rather than re-adding the
symlink. Being validated by a clean CUDA 13.0 build-locally rebuild.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resume this Claude session:
```
claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3
```
</details>
<details><summary>Claude's draft</summary> The active maintainers do not have the bandwidth to support linux-aarch64 or macOS (osx-64 and osx-arm64) at this time. Add them to the top-level build skip so only linux-64 (x86_64) is built, where the CPU and CUDA 12.9/13.0 variants are built and tested. The rattler-build `aarch64` selector covers linux-aarch64 (and osx-arm64); `osx` covers both osx-64 and osx-arm64. Replaces the narrower "aarch64 and cuda_compiler_version != None" skip. Contributions to re-enable aarch64/osx are welcome. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary>
After the nvml-DT_NEEDED fix let `import tensorflow` get past
preload_check, the CUDA package aborts during stream_executor init:
F0000 repeat_buffer_kernel_cuda.cc:32] Failed to register kernel:
ALREADY_EXISTS: Object for trait ...RepeatBufferKernel... and
platform CUDA is already registered. -> Aborted (core dumped)
Same multi-.so root cause as the proto-descriptor and OpRegistry
duplicates: the RepeatBufferKernelCuda static registrar is compiled into
both libtensorflow_framework.so.2 and libtensorflow_cc.so.2.21.0, and
each .so runs its module initializer once against the process-global
PlatformObjectRegistry singleton. The registrar macros LOG(FATAL) on any
non-OK status, so the second (identical) registration kills the process.
Add patch 0075 making PlatformObjectRegistry::RegisterObject keep the
first registration and return Ok for an identical-key duplicate, instead
of AlreadyExistsError. Fixing it at the single insert chokepoint covers
GPU kernels and every other STREAM_EXECUTOR_REGISTER_OBJECT_STATICALLY
user, avoiding whack-a-mole. Mirrors patch 0072 (OpRegistry).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resume this Claude session:
```
claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3
```
</details>
…nstead) <details><summary>Claude's draft</summary> After patch 0075 fixed the GPU-kernel registry abort, the next duplicate-registration in the multi-.so layout surfaces in systemized absl's process-global FlagRegistry: ERROR: Flag 'coordination_agent_recoverable' was defined more than once but with differing types. Defined in files '.../coordination_service_agent.cc' and '.../coordination_service_agent.cc'. coordination_service_agent.cc is statically embedded in both libtensorflow_framework.so.2 and libtensorflow_cc.so.2, so its ABSL_FLAG static registrar runs twice against the one libabsl_flags registry and aborts `import tensorflow`. Unlike OpRegistry (0072) and PlatformObjectRegistry (0075), the registry here lives in systemized absl and cannot be patched to tolerate the duplicate, and the flag can't be deduplicated across the two .so's (its only reader is in the same TU, and absl's per-TU FastTypeId<bool> differs between the .so's -- hence the "differing types" message). Patch 0076 drops the ABSL_FLAG (an experimental default-false knob whose own TODO asks to move it off a flag) and reads the override from the TF_COORDINATION_AGENT_RECOVERABLE env var via tsl::ReadBoolFromEnvVar, parsed once. The programmatic `recoverable` parameter is unchanged; operators keep a global override via the env var. BUILD dep swapped absl/flags:flag -> //xla/tsl/util:env_var on coordination_service_agent. Diagnosed from a CUDA 13.0 CI test-phase log; applies cleanly to the 2.21.0 source. Will be exercised by the in-flight local CUDA 13.0 build, whose test phase was expected to reach this same error. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
…sede 0076) <details><summary>Claude's draft</summary> Root cause of the duplicate-registration whack-a-mole: TF's pywrap+systemlib megabuild ships both libtensorflow_framework.so and libtensorflow_cc.so in the wheel and the _pywrap_*.so extensions link both. cc_shared_library normally partitions each TU into exactly one .so, but the systemlib force-linking defeats that, so the same static registrars (proto, ops, GPU kernels, AND absl flags) get embedded into both libs. Upstream survives because it uses static absl (per-.so flag registries); our systemized absl has ONE shared FlagRegistry, so a flag defined in two loaded .so's aborts `import tensorflow`: ERROR: Flag 'coordination_agent_recoverable' was defined more than once ERROR: Flag 'leave_barriers_on_recoverable_agent_restart' ... We fixed the proto/OpRegistry/PlatformObjectRegistry families by patching TF's own registries to tolerate duplicates, but absl's FlagRegistry is a prebuilt conda package we cannot patch, and there are ABSL_FLAGs across several subsystems -- converting them one-by-one (patch 0076 did one) is untenable. Systemic fix: add tf_absl_flag_guard.h. It includes absl/flags/flag.h then overrides the ABSL_FLAG_IMPL_REGISTRAR sub-macro to construct FlagRegistrar<T, /*do_register=*/false> -- abseil's own ABSL_FLAGS_STRIP_NAMES build already uses this <T,false> form. Net effect: ABSL_FLAG still defines FLAGS_<name> (so absl::GetFlag keeps working) and keeps .OnUpdate() and the name/help, but skips inserting into the shared FlagRegistry, so the duplicate registration across the two .so's becomes a silent no-op. Scoped via --per_file_copt to just the 10 ABSL_FLAG-defining .cc files enumerated from the 2.21.0 source (every `ABSL_FLAG(` lives in one of them), so it does not re-key the whole build (keeps the bazel disk cache warm) and does not pull absl/flags into unrelated TUs. The header self-guards to non-CUDA C++ so it is a no-op anywhere it might otherwise reach. XLA_FLAGS is unaffected (XLA uses its own flag mechanism, not ABSL_FLAG). The only behavior change is that TF's C++ absl flags are no longer settable via the command-line registry -- not used by the Python package. Reverts patch 0076 (the per-flag env-var hack for coordination_agent_recoverable), now handled uniformly by the guard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary> Document the libcuda rabbit hole in CLAUDE_FEEDSTOCK_GUIDE.md (an anti-pattern bullet + a toolchain-table row): never force-link CUDA libs in LDFLAGS or symlink their stubs into $PREFIX/lib. --config=cuda_wheel already routes them through XLA's in-tree lazy-dlopen stubs, so force-linking leaks a hard DT_NEEDED into every .so and breaks the driverless conda test env. If a target truly needs a CUDA symbol, add //xla/tsl/cuda:<lib> to its BUILD deps (patch 0074). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary>
Condense the comment blocks added during the 2.21.0 bump to 1-2 sentence
explanations, drop historical ("we used to...") and failed-approach
narration, and stop restating what the diffs do.
- build_common.sh: condense the proto/absl guard, LDFLAGS force-link,
NVML/cusparse, nvcc-host-compiler, cccl-flatten, .bazelrc, crosstool,
CPATH, and cuda_wheel comment blocks; delete the historical
libcuda.so.1-symlink note entirely.
- build.sh, recipe.yaml: tighten the read-only-toolchain comment and the
0072/0074/0075 inline patch notes; drop the "supersedes patch 0076"
historical line.
- patch headers 0052/0063/0065/0066/0067/0070/0071/0072/0074/0075:
condense the prose body to 1-2 sentences. Diff hunks untouched
(verified byte-identical to HEAD); 0064/0068/0073 left as-is. Removed a
stray non-format-patch line 1 from 0069.
- tf_proto_descriptor_guard.h, tf_proto_descriptor_guard_impl.h,
tf_absl_flag_guard.h: condense the 40+ line headers (dropped abort-quote
dumps, disassembly, and "this used to be" framing); code unchanged.
Comment-only: shell scripts pass bash -n, no Bazel flags or patch-list
entries changed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resume this Claude session:
```
claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3
```
</details>
|
Next steps are to:
|
<details><summary>Claude's draft</summary> Set github_actions.store_build_artifacts: true (+ artifact_retention_days: 14) in conda-forge.yml and rerender with conda-smithy 3.61.2. The GHA workflow now publishes each job's built .conda packages as a downloadable workflow artifact (actions/upload-artifact@v7, 14-day retention), plus the build env on failure — so PR conda-forge#491 builds can be downloaded and tested without waiting for merge/upload to anaconda.org. Rerender also dropped the now-unused .ci_support configs for the skipped platforms (linux_aarch64, osx_64) and added .scripts/create_conda_build_artifacts.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
|
Hi! This is the friendly automated conda-forge-linting service. I wanted to let you know that I linted all conda-recipes in your PR ( Here's what I've got... For recipe/recipe.yaml:
This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/26364308038. Examine the logs at this URL for more detail. |
<details><summary>Claude's draft</summary> Set github_actions.store_build_artifacts: true (+ artifact_retention_days: 14) so the GHA build jobs publish each config's built .conda packages as a downloadable workflow artifact (14-day retention) — lets PR builds be downloaded and tested before merge/upload. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
…6.05.24.12.35.11 Other tools: - conda-build 26.3.0 - rattler-build 0.65.0 - rattler-build-conda-compat 1.4.14
fd1c802 to
a75c53f
Compare
<details><summary>Claude's draft</summary> 5th and final duplicate-registration manifestation of the systemlib two-.so layout, found by running real GPU ops (import + pip-check, all CI runs, pass because conda-forge runners are GPU-less). With a GPU present TF auto-places ops there; the first host<->device copy aborts: InvalidArgumentError: Multiple OpKernel registrations match NodeDef at the same priority '_Send' device_type: "CPU" and '_Send' device_type: "CPU" _Send/_Recv (and other) kernels are embedded in both libtensorflow_framework.so and libtensorflow_cc.so, so they register twice in the process-global OpKernel registry and FindKernelRegistration errors on the ambiguity. Patch 0077 makes OpKernelRegistrar::InitInternal skip an exact duplicate (same key, kernel_class_name and serialized KernelDef); genuinely distinct kernels are still registered. Mirrors 0072 (OpRegistry) / 0075 (PlatformObjectRegistry). Must be GPU-validated locally since CI cannot exercise it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Resume this Claude session: ``` claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3 ``` </details>
<details><summary>Claude's draft</summary>
Document at the top of recipe.yaml the upstream XLA bug where JIT
(jit_compile=True / keras auto-JIT) crashes emitting the Ampere TF32
tensor-core mma.sync matmul on CUDA 13 ("FloatAttr does not match
expected type of the constant" / "Operand is null" -> Failed to emit
LLVM IR). Records that it is compiler-side XLA codegen (not packaging),
the disable-TF32 workaround, and the corroborating issues jax-ml/jax#20154
and libxsmm/tpp-mlir#870.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resume this Claude session:
```
claude --resume d2eda4d2-e5f8-4dd0-a194-052aea8a0ff3
```
</details>
|
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( I do have some suggestions for making it better though... For recipe/recipe.yaml:
This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/26380789360. Examine the logs at this URL for more detail. |
|
I'm not sure i'm going to be able to get this over the finish line. When i test this with CUDA 13, it just fails at using usable models. |
For me to restart with claude:
Checklist
0(if the version changed)conda-smithy(Use the phrase@conda-forge-admin, please rerenderin a comment in this PR for automated rerendering)