Add a dispatch for LinearAlgebra.norm2 by sharanry · Pull Request #2302 · JuliaGPU/CUDA.jl

sharanry · 2024-03-22T22:26:25Z

norm(@view x[..], 2) was previously leading to a call of LinearAlgebra.generic_norm2 which led to a scalar indexing. This catches such cuda subarray norm2 calls earlier.

Inf-norm and p-norm with cuda subarrays still lead to the following dispatches:

LinearAlgebra.generic_normInf(x) = float(mapreduce(norm, max, x))
LinearAlgebra.generic_norm1(x) = mapreduce(float ∘ norm, +, x)

I am not sure if there is a better way to dispatch the above.

should resolve #2280

maleadt · 2024-03-27T12:33:51Z

What about generalizing the LinearAlgebra.norm method above to StridedCuArray? That seems cleaner than overriding an internal method.

Azercoco · 2025-01-06T15:04:11Z

Hi, what's the status of this PR ? This issue is troublemsome for one of my code and I would like to know if the fix will be implemented into CUDA.jl

maleadt · 2025-01-06T15:22:38Z

The PR fails CI, and there's an outstanding comment of mine, so it needs work I'd say. Feel free to take it up if you want.

maleadt · 2026-05-15T09:18:05Z

Rebased. Depends on JuliaGPU/GPUArrays.jl#720 now.

Generalizes the BLAS-optimized `norm`/`norm2` methods from `DenseCuArray` to `StridedCuVecOrDenseMat`, so 1D strided subarray views also dispatch to `nrm2`. Multi-dim non-contiguous views go through the sum-based fallback in GPUArrays (which now dispatches on `AnyGPUArray`). Resolves JuliaGPU#2280, replaces JuliaGPU#2302. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codecov · 2026-05-15T18:24:52Z

Codecov Report

❌ Patch coverage is 66.66667% with 1 line in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@e264549). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
lib/cublas/src/linalg.jl	66.66%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2302   +/-   ##
=======================================
  Coverage        ?   16.39%           
=======================================
  Files           ?      124           
  Lines           ?     9827           
  Branches        ?        0           
=======================================
  Hits            ?     1611           
  Misses          ?     8216           
  Partials        ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Generalizes the BLAS-optimized `norm`/`norm2` methods from `DenseCuArray` to `StridedCuVecOrDenseMat`, so 1D strided subarray views also dispatch to `nrm2`. Multi-dim non-contiguous views go through the sum-based fallback in GPUArrays (which now dispatches on `AnyGPUArray`). Resolves JuliaGPU#2280, replaces JuliaGPU#2302. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions

CUDA.jl Benchmarks

Details

Benchmark suite	Current: `156a51a`	Previous: `2ace55f`	Ratio
`array/accumulate/Float32/1d`	`101544` ns	`100927` ns	`1.01`
`array/accumulate/Float32/dims=1`	`76473` ns	`77244` ns	`0.99`
`array/accumulate/Float32/dims=1L`	`1585314` ns	`1586192` ns	`1.00`
`array/accumulate/Float32/dims=2`	`143576` ns	`144286` ns	`1.00`
`array/accumulate/Float32/dims=2L`	`657193.5` ns	`658394` ns	`1.00`
`array/accumulate/Int64/1d`	`118474` ns	`118643` ns	`1.00`
`array/accumulate/Int64/dims=1`	`79908` ns	`79829` ns	`1.00`
`array/accumulate/Int64/dims=1L`	`1706190.5` ns	`1694383.5` ns	`1.01`
`array/accumulate/Int64/dims=2`	`156617` ns	`156179.5` ns	`1.00`
`array/accumulate/Int64/dims=2L`	`961814` ns	`961974` ns	`1.00`
`array/broadcast`	`20524` ns	`20478` ns	`1.00`
`array/construct`	`1257.05` ns	`1277.6` ns	`0.98`
`array/copy`	`17916` ns	`18044` ns	`0.99`
`array/copyto!/cpu_to_gpu`	`213898` ns	`215225` ns	`0.99`
`array/copyto!/gpu_to_cpu`	`281245` ns	`282660` ns	`0.99`
`array/copyto!/gpu_to_gpu`	`10695` ns	`10909` ns	`0.98`
`array/iteration/findall/bool`	`134649` ns	`134548` ns	`1.00`
`array/iteration/findall/int`	`148368` ns	`149686` ns	`0.99`
`array/iteration/findfirst/bool`	`81151` ns	`81331` ns	`1.00`
`array/iteration/findfirst/int`	`83416.5` ns	`83732` ns	`1.00`
`array/iteration/findmin/1d`	`85341` ns	`86429` ns	`0.99`
`array/iteration/findmin/2d`	`114129.5` ns	`117614.5` ns	`0.97`
`array/iteration/logical`	`200491.5` ns	`199724.5` ns	`1.00`
`array/iteration/scalar`	`67792` ns	`68306` ns	`0.99`
`array/permutedims/2d`	`52319.5` ns	`52646` ns	`0.99`
`array/permutedims/3d`	`52444` ns	`52872` ns	`0.99`
`array/permutedims/4d`	`51338` ns	`51484` ns	`1.00`
`array/random/rand/Float32`	`13335` ns	`12708` ns	`1.05`
`array/random/rand/Int64`	`24379` ns	`25198` ns	`0.97`
`array/random/rand!/Float32`	`9851.666666666666` ns	`8737.333333333334` ns	`1.13`
`array/random/rand!/Int64`	`21090` ns	`22022` ns	`0.96`
`array/random/randn/Float32`	`43186` ns	`42861` ns	`1.01`
`array/random/randn!/Float32`	`30810` ns	`30594` ns	`1.01`
`array/reductions/mapreduce/Float32/1d`	`34712` ns	`34442` ns	`1.01`
`array/reductions/mapreduce/Float32/dims=1`	`40828` ns	`39651` ns	`1.03`
`array/reductions/mapreduce/Float32/dims=1L`	`51311` ns	`51115` ns	`1.00`
`array/reductions/mapreduce/Float32/dims=2`	`58180` ns	`56288.5` ns	`1.03`
`array/reductions/mapreduce/Float32/dims=2L`	`67774` ns	`69317` ns	`0.98`
`array/reductions/mapreduce/Int64/1d`	`42789.5` ns	`42454.5` ns	`1.01`
`array/reductions/mapreduce/Int64/dims=1`	`51811` ns	`50760` ns	`1.02`
`array/reductions/mapreduce/Int64/dims=1L`	`87260` ns	`87043.5` ns	`1.00`
`array/reductions/mapreduce/Int64/dims=2`	`60671.5` ns	`59370.5` ns	`1.02`
`array/reductions/mapreduce/Int64/dims=2L`	`84038` ns	`84559` ns	`0.99`
`array/reductions/reduce/Float32/1d`	`34945` ns	`34787` ns	`1.00`
`array/reductions/reduce/Float32/dims=1`	`39822.5` ns	`39944` ns	`1.00`
`array/reductions/reduce/Float32/dims=1L`	`51236` ns	`51284` ns	`1.00`
`array/reductions/reduce/Float32/dims=2`	`58238` ns	`56294` ns	`1.03`
`array/reductions/reduce/Float32/dims=2L`	`68142` ns	`69595` ns	`0.98`
`array/reductions/reduce/Int64/1d`	`42679` ns	`42459` ns	`1.01`
`array/reductions/reduce/Int64/dims=1`	`43466` ns	`50053` ns	`0.87`
`array/reductions/reduce/Int64/dims=1L`	`87194` ns	`87114` ns	`1.00`
`array/reductions/reduce/Int64/dims=2`	`60654` ns	`59331.5` ns	`1.02`
`array/reductions/reduce/Int64/dims=2L`	`84499.5` ns	`84197` ns	`1.00`
`array/reverse/1d`	`17902` ns	`17694` ns	`1.01`
`array/reverse/1dL`	`68542` ns	`68202` ns	`1.00`
`array/reverse/1dL_inplace`	`65769` ns	`65756` ns	`1.00`
`array/reverse/1d_inplace`	`8541.666666666666` ns	`10156.166666666668` ns	`0.84`
`array/reverse/2d`	`20901` ns	`21001` ns	`1.00`
`array/reverse/2dL`	`73023` ns	`73024` ns	`1.00`
`array/reverse/2dL_inplace`	`65914` ns	`65634` ns	`1.00`
`array/reverse/2d_inplace`	`9973` ns	`11107` ns	`0.90`
`array/sorting/1d`	`2734742` ns	`2736491` ns	`1.00`
`array/sorting/2d`	`1068254` ns	`1070402` ns	`1.00`
`array/sorting/by`	`3303672` ns	`3304900` ns	`1.00`
`cuda/synchronization/context/auto`	`1145.3` ns	`1123.1` ns	`1.02`
`cuda/synchronization/context/blocking`	`921.1923076923077` ns	`902.5957446808511` ns	`1.02`
`cuda/synchronization/context/nonblocking`	`7130.2` ns	`7638.8` ns	`0.93`
`cuda/synchronization/stream/auto`	`992.5625` ns	`980.6666666666666` ns	`1.01`
`cuda/synchronization/stream/blocking`	`833.6666666666666` ns	`806.5` ns	`1.03`
`cuda/synchronization/stream/nonblocking`	`7230.299999999999` ns	`7187.8` ns	`1.01`
`integration/byval/reference`	`143781` ns	`143733` ns	`1.00`
`integration/byval/slices=1`	`145763` ns	`145971` ns	`1.00`
`integration/byval/slices=2`	`284545` ns	`284607` ns	`1.00`
`integration/byval/slices=3`	`423071` ns	`423028` ns	`1.00`
`integration/cudadevrt`	`102317` ns	`102291` ns	`1.00`
`integration/volumerhs`	`23424198.5` ns	`23455094` ns	`1.00`
`kernel/indexing`	`13267` ns	`13164` ns	`1.01`
`kernel/indexing_checked`	`13841` ns	`13822` ns	`1.00`
`kernel/launch`	`2182.3333333333335` ns	`2137` ns	`1.02`
`kernel/occupancy`	`675.5125` ns	`674.2788461538462` ns	`1.00`
`kernel/rand`	`17207` ns	`14157` ns	`1.22`
`latency/import`	`3848898595` ns	`3799013573` ns	`1.01`
`latency/precompile`	`4628935725` ns	`4593655026.5` ns	`1.01`
`latency/ttfp`	`4456320918` ns	`4367019984.5` ns	`1.02`

This comment was automatically generated by workflow using github-action-benchmark.

sharanry force-pushed the sy/strided_norm2 branch from 413c397 to 0e2ef84 Compare March 22, 2024 22:29

maleadt added needs changes Changes are needed. labels May 24, 2024

maleadt marked this pull request as draft May 24, 2024 13:34

maleadt added cuda array Stuff about CuArray. labels May 24, 2024

maleadt force-pushed the master branch 15 times, most recently from 5d585c4 to c850163 Compare December 20, 2024 08:18

maleadt added the good first issue Good for newcomers label Apr 23, 2025

maleadt force-pushed the master branch from f1e7455 to 5a6f767 Compare March 26, 2026 08:13

maleadt force-pushed the main branch from ce29a43 to d3fe605 Compare May 14, 2026 10:31

maleadt force-pushed the sy/strided_norm2 branch from 0e2ef84 to 8d4a91f Compare May 15, 2026 09:17

maleadt force-pushed the sy/strided_norm2 branch from 8d4a91f to fa536f3 Compare May 15, 2026 17:41

maleadt marked this pull request as ready for review May 15, 2026 19:30

maleadt force-pushed the sy/strided_norm2 branch from fa536f3 to 156a51a Compare May 15, 2026 19:39

github-actions Bot reviewed May 15, 2026

View reviewed changes

maleadt merged commit 37d99a9 into JuliaGPU:main May 16, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a dispatch for LinearAlgebra.norm2#2302

Add a dispatch for LinearAlgebra.norm2#2302
maleadt merged 1 commit into
JuliaGPU:mainfrom
sharanry:sy/strided_norm2

sharanry commented Mar 22, 2024

Uh oh!

maleadt commented Mar 27, 2024

Uh oh!

Azercoco commented Jan 6, 2025 •

edited

Loading

Uh oh!

maleadt commented Jan 6, 2025

Uh oh!

maleadt commented May 15, 2026

Uh oh!

codecov Bot commented May 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sharanry commented Mar 22, 2024

Uh oh!

maleadt commented Mar 27, 2024

Uh oh!

Azercoco commented Jan 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maleadt commented Jan 6, 2025

Uh oh!

maleadt commented May 15, 2026

Uh oh!

codecov Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

CUDA.jl Benchmarks

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Azercoco commented Jan 6, 2025 •

edited

Loading

codecov Bot commented May 15, 2026 •

edited

Loading