support numpy/2 and cupy/14 linalg.solve shape requirements#348
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes a crash in redrock.zscan.solve_matrices when using numpy 2.x / cupy 14.x, which tightened linalg.solve RHS shape requirements for batched solves (GPU path).
Changes:
- Normalize
solve_algorithmonce (upper()), simplifying downstream comparisons. - Reshape
yto(..., m, 1)for PCA solves before callingnp/cp.linalg.solve, then reshape back to preserve the existingsolve_matricesAPI.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@craigwarner-ufastro reviewed and commented by Slack:
|
Numpy/2.x and cupy/14.x changed the requirements on the allowable input shapes to
linalg.solve(M, y), leading to aredrock.zscan.solve_matricescrash when using cupy/14.x . We had not caught this with previous numpy/2.x testing because the CPU path callssolve_matricesdifferently than the GPU path and thus didn't hit the API change. Unit tests fail when running on GPUs at NERSC with cupy/14.x (DESI "test-main" environment); this PR fixes that by reshaping to have an extra dimension=1 and then re-reshaping.Example
calls
solve_matrices(M,y)withM.shape=(1446, 10, 10)andy.shape=(1446, 10)which is then passed tocp.linalg.solve(M,y). Previously this was allowable, but now needsy.shape=(1446,10,1). I made the reshaping change internal tosolve_matricesso that its API remains unchanged.This update is required for Matterhorn. I have verified that with the current desi main environment (numpy/1.22.4, cupy/13.1.0) this PR produces bitwise identical output as Redrock main. With the new desi test-main environment (numpy/2.3.5, cupy/14.0.1), Redrock main crashes, while this PR produces output that differs by some rounding but no substantial differences compared out our current environment.
Background:
cupy.linalg.solvecompatible withnumpyv2 cupy/cupy#8629