Upgrade PJRT to XLA commit 9e9a0fb / ZML artifacts v17.0.0 by sebffischer · Pull Request #168 · r-xla/pjrt

sebffischer · 2026-04-05T10:40:21Z

Update vendored headers and proto files from openxla/xla@9e9a0fb
Add 2 new proto files: backends.proto, oneapi_compute_capability.proto
Patch backends.proto edition syntax to proto3 for protobuf@21 compat
Bump plugin_version() to 17.0.0
Move patch files from tools/headers/patch/ to tools/patch/ (one per file)
Add manual-cuda CI mode (workflow_dispatch + PR label) for testing PJRT upgrades before cuda R package is updated
Add upgrade-pjrt Claude skill

TODOs:

When using newer CUDA we need to ensure we define all the cuda types / signatures correctly (because we don't actually include CUDA SDK we manually do this, which is error-prone).

- Update vendored headers and proto files from openxla/xla@9e9a0fb - Add 2 new proto files: backends.proto, oneapi_compute_capability.proto - Patch backends.proto edition syntax to proto3 for protobuf@21 compat - Bump plugin_version() to 17.0.0 - Move patch files from tools/headers/patch/ to tools/patch/ (one per file) - Add manual-cuda CI mode (workflow_dispatch + PR label) for testing PJRT upgrades before cuda R package is updated - Add upgrade-pjrt Claude skill Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

XLA commit 9e9a0fb targets CUDA 12.9.1. Update the default container image, cuda R package reference, and cuda_r_package config to 12.9. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sebffischer · 2026-04-05T13:36:47Z

@dfalbel I think there is an issue with the cuda runner. I want to upgrade the plugins so it is usable on linux arm machines (at least with CPU for now)

dfalbel · 2026-04-06T12:06:25Z

We don't have cuda 12.9 yet in the cudatoolkit repo

sebffischer · 2026-04-06T13:33:11Z

but the CI path with the manual-cuda tag does now use the cuda12.9 package (I added this in the PR so we can easily test upgrading in the pjrt package without making changes in other packages).

sebffischer · 2026-04-06T13:36:29Z

there is some container runtime issue: https://github.com/r-xla/pjrt/actions/runs/23999906166/job/69994155165?pr=168

dfalbel · 2026-04-06T13:40:27Z

Sorry, what's manual cuda?

dfalbel · 2026-04-06T13:46:35Z

It's likely a connection timeout error... downloading the cudnn docker container is like a 5GB download running on my local network, which is not super fast.

sebffischer · 2026-04-07T10:30:14Z

@dfalbel claude used the wrong CUDA version. Instead for the new pjrt build from zml we would need cuda 13.0 (https://github.com/zml/pjrt-artifacts/blob/e1c8db3f6730c040e3ee3a008591c80f6d0f8891/openxla/bazelrc/upstream/.bazelrc#L99-L103). However, I think the hardware of the GPU runner might be too old for that. Did you run into this issue as well with torch?

dfalbel · 2026-04-07T10:33:55Z

The GPU I have locally is a 1080 ti with compute capability 6.1. In principle it supports cuda 13. But we need to figure out if ZML pjrt binaries are built with 6.1 support. Torch has dropped it recently and I had to make a custom build for it :(

sebffischer · 2026-04-07T10:38:27Z

Unfortunately 6.1 is not supported anymore: https://github.com/zml/pjrt-artifacts/blob/d104b855719bf4256bf1a87e4542285a54d0e594/openxla/bazelrc/upstream/.bazelrc#L99 (this is also already the case for the commit from release 17.0.0)

sebffischer · 2026-04-07T10:46:23Z

Maybe the easier way for now to add linux arm support is to just include it here: https://github.com/r-xla/pjrt-builds.

Eventually we have to switch to CUDA 13.0 I guess but maybe it's not necessary yet.

Maybe we could also make a PR in pjrt-artifacts and add 6.1 here: https://github.com/zml/pjrt-artifacts/blob/d104b855719bf4256bf1a87e4542285a54d0e594/openxla/bazelrc/upstream/.bazelrc#L99 but I guess it's not as easy as that ...

sebffischer · 2026-04-07T10:50:04Z

I have created an issue in pjrt-artifacts here: zml/pjrt-artifacts#70. Maybe they can just add it.

sebffischer · 2026-04-07T13:05:09Z

@dfalbel there is nothing we can do on your machine I think. With cuda 13.0 offline compilation support for 6.1 was removed (https://docs.nvidia.com/cuda/archive/13.0.0/cuda-toolkit-release-notes/index.html#deprecated-architectures). I might have access to a machine with compute capability 7.5 that can be used to run CI jobs but I will postpone setting this up as long as possible :D

dfalbel · 2026-04-07T13:22:21Z

Ahhh, that's unfortunate. Ideally we should have a cloud hosted gpu, such as the ones available with GitHub actions..

sebffischer · 2026-04-07T13:35:52Z

Yeah, that would indeed be nice. But if we would run half an hour of CUDA CIs per day that would cost 0.05 * 30 * 30 = 45 euro per month. I think it's more realistic that I use a machine from my university (I am quite certain I will be allowed to do it, we just need to set it up). But for now CUDA 12.8 does the job :D (Also I can hope that eventually torch does not run on your machine anymore so posit has to buy you a new GPU :P)

sebffischer added the manual-cuda label Apr 5, 2026

Use CUDA 12.9 as default runtime

02d6108

XLA commit 9e9a0fb targets CUDA 12.9.1. Update the default container image, cuda R package reference, and cuda_r_package config to 12.9. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sebffischer added 11 commits April 7, 2026 07:36

linux arm test

a77215c

support linux arm, fix cuda workflow

cd83ca0

fix?

170f13e

fix?

9265a24

fix?

b6d041c

simplify

4469891

revert unnecessary changes

c667cae

right brancch

d4f4540

debug output

adbdc90

better output

13270d5

debug

ebac138

is runtime enough?

e3d2147

sebffischer marked this pull request as draft April 7, 2026 13:07

Conversation

sebffischer commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sebffischer commented Apr 5, 2026

Uh oh!

dfalbel commented Apr 6, 2026

Uh oh!

sebffischer commented Apr 6, 2026

Uh oh!

sebffischer commented Apr 6, 2026

Uh oh!

dfalbel commented Apr 6, 2026

Uh oh!

dfalbel commented Apr 6, 2026

Uh oh!

sebffischer commented Apr 7, 2026

Uh oh!

dfalbel commented Apr 7, 2026

Uh oh!

sebffischer commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sebffischer commented Apr 7, 2026

Uh oh!

sebffischer commented Apr 7, 2026

Uh oh!

sebffischer commented Apr 7, 2026

Uh oh!

dfalbel commented Apr 7, 2026

Uh oh!

sebffischer commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sebffischer commented Apr 5, 2026 •

edited

Loading

sebffischer commented Apr 7, 2026 •

edited

Loading