-
Notifications
You must be signed in to change notification settings - Fork 1
Performance Improvements #331
Copy link
Copy link
Open
Labels
Description
sebffischer
opened on May 15, 2026
Issue body actions
- implement the LU pivots vector -> permutation vector as a custom CUDA call. Currently, we implement this via a while loop, which is rather expensive on XLA:GPU (LU decomposition runtime is dominated by lu_pivots_to_permutation on GPU jax-ml/jax#5880). This requires a mechanism to not only register custom calls, but to also have custom cuda kernels, which we currently do not have infrastructure for.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Type
Fields
Give feedbackNo fields configured for issues without a type.