Skip to content

LocalCUDACluster + .persist() produces confusing "Missing dependency" errors when downstream code uses scheduler="threads" #1649

@Intron7

Description

@Intron7

Summary

Workflows that combine LocalCUDACluster + Client → .persist() → downstream da.store(..., scheduler="threads") used to work and now fail with ValueError: Missing dependency ....
This pattern is hit by every user of anndata.write_zarr on a persisted array under a LocalCUDACluster, because anndata hardcodes scheduler="threads" in its writers — and that combination used to compose without error.

Minimal reproducer

from dask_cuda import LocalCUDACluster
from dask.distributed import Client
import dask.array as da
import numpy as np
import zarr

cluster = LocalCUDACluster()
client = Client(cluster)

x = da.from_array(np.ones((10000, 200)), chunks=(1000, 200))
x = x.map_blocks(lambda b: b + 1).persist()

g = zarr.open("/tmp/test.zarr", mode="w", shape=x.shape, dtype=x.dtype, chunks=(1000, 200))
da.store(x, g, scheduler="threads")
# ValueError: Missing dependency ('lambda-<hash>', i, 0)
#   for dependents {('store-map-<hash>', i, 0)}

x.compute() works (routes through the client). Only the explicit-non-distributed-scheduler path fails.
Regression
This pattern worked in earlier dask / dask-cuda versions. If you have a known-good combination on hand, please share — I can bisect to identify which version introduced the change if useful. Best candidates for the regressing component:

  • dask-cuda 26.4.0 (or earlier in the 26.x line)
  • dask / distributed 2026.1.x (the new expr engine landed in this series)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions