Added Base.similar methods for CuSparseMatrixCOO and BSR#3114
Open
rainerrodrigues wants to merge 5 commits into
Open
Added Base.similar methods for CuSparseMatrixCOO and BSR#3114rainerrodrigues wants to merge 5 commits into
rainerrodrigues wants to merge 5 commits into
Conversation
kshyatt
reviewed
Apr 21, 2026
Member
|
Also, can some tests be added? |
Contributor
There was a problem hiding this comment.
CUDA.jl Benchmarks
Details
| Benchmark suite | Current: 54f12f6 | Previous: 0ad0204 | Ratio |
|---|---|---|---|
array/accumulate/Float32/1d |
101051 ns |
101066 ns |
1.00 |
array/accumulate/Float32/dims=1 |
76800 ns |
76791 ns |
1.00 |
array/accumulate/Float32/dims=1L |
1585883 ns |
1585160.5 ns |
1.00 |
array/accumulate/Float32/dims=2 |
144087 ns |
142797.5 ns |
1.01 |
array/accumulate/Float32/dims=2L |
658505 ns |
657399 ns |
1.00 |
array/accumulate/Int64/1d |
119114 ns |
118248.5 ns |
1.01 |
array/accumulate/Int64/dims=1 |
80706 ns |
79631 ns |
1.01 |
array/accumulate/Int64/dims=1L |
1695925.5 ns |
1694201.5 ns |
1.00 |
array/accumulate/Int64/dims=2 |
156668 ns |
155693 ns |
1.01 |
array/accumulate/Int64/dims=2L |
962510 ns |
961345 ns |
1.00 |
array/broadcast |
20728 ns |
20364 ns |
1.02 |
array/construct |
1274.7 ns |
1262.1 ns |
1.01 |
array/copy |
18167 ns |
17799 ns |
1.02 |
array/copyto!/cpu_to_gpu |
216068 ns |
213023 ns |
1.01 |
array/copyto!/gpu_to_cpu |
283881 ns |
281078 ns |
1.01 |
array/copyto!/gpu_to_gpu |
10895 ns |
10549.833333333332 ns |
1.03 |
array/iteration/findall/bool |
135031 ns |
133910 ns |
1.01 |
array/iteration/findall/int |
148800 ns |
148076 ns |
1.00 |
array/iteration/findfirst/bool |
81943 ns |
80909 ns |
1.01 |
array/iteration/findfirst/int |
83874 ns |
82938 ns |
1.01 |
array/iteration/findmin/1d |
85714 ns |
82262 ns |
1.04 |
array/iteration/findmin/2d |
114989 ns |
113580 ns |
1.01 |
array/iteration/logical |
200307.5 ns |
199223 ns |
1.01 |
array/iteration/scalar |
66077.5 ns |
67075 ns |
0.99 |
array/permutedims/2d |
52804.5 ns |
51959 ns |
1.02 |
array/permutedims/3d |
53100 ns |
52150 ns |
1.02 |
array/permutedims/4d |
51570 ns |
51448.5 ns |
1.00 |
array/random/rand/Float32 |
13073 ns |
12962 ns |
1.01 |
array/random/rand/Int64 |
24669 ns |
24057 ns |
1.03 |
array/random/rand!/Float32 |
8569.5 ns |
9735.5 ns |
0.88 |
array/random/rand!/Int64 |
21401 ns |
21218 ns |
1.01 |
array/random/randn/Float32 |
37534.5 ns |
43055 ns |
0.87 |
array/random/randn!/Float32 |
30935 ns |
28025 ns |
1.10 |
array/reductions/mapreduce/Float32/1d |
35251 ns |
33732 ns |
1.05 |
array/reductions/mapreduce/Float32/dims=1 |
39956 ns |
49005 ns |
0.82 |
array/reductions/mapreduce/Float32/dims=1L |
51342 ns |
51002 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=2 |
56407 ns |
57573.5 ns |
0.98 |
array/reductions/mapreduce/Float32/dims=2L |
69581 ns |
66928.5 ns |
1.04 |
array/reductions/mapreduce/Int64/1d |
43074 ns |
42242 ns |
1.02 |
array/reductions/mapreduce/Int64/dims=1 |
44013 ns |
48347 ns |
0.91 |
array/reductions/mapreduce/Int64/dims=1L |
87302 ns |
86835 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=2 |
59675 ns |
60601 ns |
0.98 |
array/reductions/mapreduce/Int64/dims=2L |
84777 ns |
83761 ns |
1.01 |
array/reductions/reduce/Float32/1d |
35039 ns |
34140.5 ns |
1.03 |
array/reductions/reduce/Float32/dims=1 |
40196 ns |
39851 ns |
1.01 |
array/reductions/reduce/Float32/dims=1L |
51440 ns |
51248 ns |
1.00 |
array/reductions/reduce/Float32/dims=2 |
56612 ns |
58473 ns |
0.97 |
array/reductions/reduce/Float32/dims=2L |
70037 ns |
67693 ns |
1.03 |
array/reductions/reduce/Int64/1d |
42966 ns |
42152 ns |
1.02 |
array/reductions/reduce/Int64/dims=1 |
42138.5 ns |
43607.5 ns |
0.97 |
array/reductions/reduce/Int64/dims=1L |
87276 ns |
86832 ns |
1.01 |
array/reductions/reduce/Int64/dims=2 |
59434 ns |
60330 ns |
0.99 |
array/reductions/reduce/Int64/dims=2L |
84738 ns |
83394 ns |
1.02 |
array/reverse/1d |
18008.5 ns |
17656 ns |
1.02 |
array/reverse/1dL |
68718 ns |
68244 ns |
1.01 |
array/reverse/1dL_inplace |
65781 ns |
65690 ns |
1.00 |
array/reverse/1d_inplace |
10294.666666666666 ns |
8405.333333333334 ns |
1.22 |
array/reverse/2d |
20925 ns |
20682 ns |
1.01 |
array/reverse/2dL |
73178 ns |
72823 ns |
1.00 |
array/reverse/2dL_inplace |
65875 ns |
65656 ns |
1.00 |
array/reverse/2d_inplace |
10734 ns |
9850 ns |
1.09 |
array/sorting/1d |
2735825 ns |
2735008 ns |
1.00 |
array/sorting/2d |
1076226 ns |
1068540 ns |
1.01 |
array/sorting/by |
3327795 ns |
3304170 ns |
1.01 |
cuda/synchronization/context/auto |
1183.8 ns |
1131.2 ns |
1.05 |
cuda/synchronization/context/blocking |
957.2058823529412 ns |
882.3673469387755 ns |
1.08 |
cuda/synchronization/context/nonblocking |
7488.299999999999 ns |
8698.3 ns |
0.86 |
cuda/synchronization/stream/auto |
1031.3 ns |
994.6 ns |
1.04 |
cuda/synchronization/stream/blocking |
839.1375 ns |
825.1954022988506 ns |
1.02 |
cuda/synchronization/stream/nonblocking |
8112.8 ns |
7316.4 ns |
1.11 |
integration/byval/reference |
143931 ns |
143767 ns |
1.00 |
integration/byval/slices=1 |
146072.5 ns |
145850 ns |
1.00 |
integration/byval/slices=2 |
284794 ns |
284678 ns |
1.00 |
integration/byval/slices=3 |
423418 ns |
423481 ns |
1.00 |
integration/cudadevrt |
102495 ns |
102411.5 ns |
1.00 |
integration/volumerhs |
23430349 ns |
23457202 ns |
1.00 |
kernel/indexing |
13301 ns |
13137 ns |
1.01 |
kernel/indexing_checked |
13955 ns |
13819 ns |
1.01 |
kernel/launch |
2313.4444444444443 ns |
2083.777777777778 ns |
1.11 |
kernel/occupancy |
745.082191780822 ns |
668.00625 ns |
1.12 |
kernel/rand |
14310 ns |
14198 ns |
1.01 |
latency/import |
3836824344.5 ns |
3845206986.5 ns |
1.00 |
latency/precompile |
4634396037 ns |
4633948752 ns |
1.00 |
latency/ttfp |
4409220668.5 ns |
4453935633 ns |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
kshyatt
reviewed
Apr 21, 2026
rainerrodrigues
commented
Apr 22, 2026
Contributor
Author
rainerrodrigues
left a comment
There was a problem hiding this comment.
@kshyatt Hi, can you check if this is suitable and extensive enough for testing?
f08a059 to
b48050e
Compare
Member
|
Same as #3119, you seem to have many unrelated changes in here that cause CI failures. |
maleadt
reviewed
May 18, 2026
| # Julia's `sparse()` constructor and SciPy/CuPy. For Bool we OR instead of sum, | ||
| # also matching `sparse()`, since Bool + Bool doesn't stay Bool. | ||
| sum_duplicate(a, b) = a + b | ||
| sum_duplicate(a::Bool, b::Bool) = a | b |
Member
There was a problem hiding this comment.
More unrelated stuff... Please rebase that out.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds the missing Base.similar methods for CuSparseMatrixCOO and CuSparseMatrixBSR, allowing them to fallback gracefully without converting to dense CPU arrays.
Fixes #3061
Fixes #3055