Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #767 +/- ##
==========================================
- Coverage 83.09% 80.70% -2.39%
==========================================
Files 62 61 -1
Lines 2851 2846 -5
==========================================
- Hits 2369 2297 -72
- Misses 482 549 +67 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Contributor
There was a problem hiding this comment.
Metal Benchmarks
Details
| Benchmark suite | Current: d1047e1 | Previous: b94fd4b | Ratio |
|---|---|---|---|
array/accumulate/Float32/1d |
1119417 ns |
1098958 ns |
1.02 |
array/accumulate/Float32/dims=1 |
1563917 ns |
1554708 ns |
1.01 |
array/accumulate/Float32/dims=1L |
9845771 ns |
9848583.5 ns |
1.00 |
array/accumulate/Float32/dims=2 |
1877792 ns |
1886771 ns |
1.00 |
array/accumulate/Float32/dims=2L |
7236750 ns |
7256459 ns |
1.00 |
array/accumulate/Int64/1d |
1245896 ns |
1261958 ns |
0.99 |
array/accumulate/Int64/dims=1 |
1841479 ns |
1824291.5 ns |
1.01 |
array/accumulate/Int64/dims=1L |
11601292 ns |
11664208.5 ns |
0.99 |
array/accumulate/Int64/dims=2 |
2165917 ns |
2170333.5 ns |
1.00 |
array/accumulate/Int64/dims=2L |
9751208 ns |
10120062.5 ns |
0.96 |
array/broadcast |
608708 ns |
605916 ns |
1.00 |
array/construct |
6334 ns |
6292 ns |
1.01 |
array/permutedims/2d |
1170333 ns |
1168125 ns |
1.00 |
array/permutedims/3d |
1677687 ns |
1673084 ns |
1.00 |
array/permutedims/4d |
2388812.5 ns |
2365959 ns |
1.01 |
array/private/copy |
565958 ns |
545792 ns |
1.04 |
array/private/copyto!/cpu_to_gpu |
809375 ns |
802916 ns |
1.01 |
array/private/copyto!/gpu_to_cpu |
811416 ns |
801917 ns |
1.01 |
array/private/copyto!/gpu_to_gpu |
636333 ns |
634458 ns |
1.00 |
array/private/iteration/findall/bool |
1413167 ns |
1402750 ns |
1.01 |
array/private/iteration/findall/int |
1561625 ns |
1564021 ns |
1.00 |
array/private/iteration/findfirst/bool |
2040000 ns |
2055916 ns |
0.99 |
array/private/iteration/findfirst/int |
2066708 ns |
2064479.5 ns |
1.00 |
array/private/iteration/findmin/1d |
2491959 ns |
2499959 ns |
1.00 |
array/private/iteration/findmin/2d |
1775917 ns |
1790791 ns |
0.99 |
array/private/iteration/logical |
2656688 ns |
2631896 ns |
1.01 |
array/private/iteration/scalar |
4529875 ns |
5047625 ns |
0.90 |
array/random/rand/Float32 |
1164792 ns |
582958 ns |
2.00 |
array/random/rand/Int64 |
1326750 ns |
775667 ns |
1.71 |
array/random/rand!/Float32 |
924250 ns |
574750 ns |
1.61 |
array/random/rand!/Int64 |
874750 ns |
550792 ns |
1.59 |
array/random/randn/Float32 |
1068458.5 ns |
1006937.5 ns |
1.06 |
array/random/randn!/Float32 |
820146 ns |
755666 ns |
1.09 |
array/reductions/mapreduce/Float32/1d |
1033375 ns |
1029500 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1 |
835917 ns |
840875 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=1L |
1618958.5 ns |
1324000 ns |
1.22 |
array/reductions/mapreduce/Float32/dims=2 |
805354 ns |
860875 ns |
0.94 |
array/reductions/mapreduce/Float32/dims=2L |
1818229.5 ns |
1799541 ns |
1.01 |
array/reductions/mapreduce/Int64/1d |
1552833 ns |
1374875 ns |
1.13 |
array/reductions/mapreduce/Int64/dims=1 |
1105583 ns |
1097625 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=1L |
2027500 ns |
2002854 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=2 |
1177312 ns |
1145000 ns |
1.03 |
array/reductions/mapreduce/Int64/dims=2L |
3619958 ns |
3614000 ns |
1.00 |
array/reductions/reduce/Float32/1d |
1028125 ns |
1028437.5 ns |
1.00 |
array/reductions/reduce/Float32/dims=1 |
831125 ns |
832667 ns |
1.00 |
array/reductions/reduce/Float32/dims=1L |
1316542 ns |
1318416.5 ns |
1.00 |
array/reductions/reduce/Float32/dims=2 |
853563 ns |
853041.5 ns |
1.00 |
array/reductions/reduce/Float32/dims=2L |
1814042 ns |
1810250 ns |
1.00 |
array/reductions/reduce/Int64/1d |
1496937.5 ns |
1516958 ns |
0.99 |
array/reductions/reduce/Int64/dims=1 |
1116833 ns |
1095375 ns |
1.02 |
array/reductions/reduce/Int64/dims=1L |
2011937.5 ns |
2023499.5 ns |
0.99 |
array/reductions/reduce/Int64/dims=2 |
1157459 ns |
1240750 ns |
0.93 |
array/reductions/reduce/Int64/dims=2L |
4242729 ns |
4233875 ns |
1.00 |
array/shared/copy |
242125 ns |
252417 ns |
0.96 |
array/shared/copyto!/cpu_to_gpu |
81750 ns |
80750 ns |
1.01 |
array/shared/copyto!/gpu_to_cpu |
81709 ns |
80667 ns |
1.01 |
array/shared/copyto!/gpu_to_gpu |
82584 ns |
83083 ns |
0.99 |
array/shared/iteration/findall/bool |
1421500 ns |
1427208.5 ns |
1.00 |
array/shared/iteration/findall/int |
1558458 ns |
1559875 ns |
1.00 |
array/shared/iteration/findfirst/bool |
1622000 ns |
1649000 ns |
0.98 |
array/shared/iteration/findfirst/int |
1635500 ns |
1672458 ns |
0.98 |
array/shared/iteration/findmin/1d |
2093458 ns |
2115583 ns |
0.99 |
array/shared/iteration/findmin/2d |
1783604.5 ns |
1792625 ns |
0.99 |
array/shared/iteration/logical |
2427709 ns |
2292167 ns |
1.06 |
array/shared/iteration/scalar |
201250 ns |
199958 ns |
1.01 |
integration/byval/reference |
1582500 ns |
1544250 ns |
1.02 |
integration/byval/slices=1 |
1588625 ns |
1560229.5 ns |
1.02 |
integration/byval/slices=2 |
2614875 ns |
2598333.5 ns |
1.01 |
integration/byval/slices=3 |
7820666.5 ns |
8092333 ns |
0.97 |
integration/metaldevrt |
878645.5 ns |
868125 ns |
1.01 |
kernel/indexing |
621958 ns |
592667 ns |
1.05 |
kernel/indexing_checked |
628125 ns |
598292 ns |
1.05 |
kernel/launch |
11584 ns |
11791.5 ns |
0.98 |
kernel/rand |
567917 ns |
570709 ns |
1.00 |
latency/import |
1417687042 ns |
1425597062.5 ns |
0.99 |
latency/precompile |
25459122750 ns |
25453724708 ns |
1.00 |
latency/ttfp |
2335347354.5 ns |
2341177208 ns |
1.00 |
metal/synchronization/context |
19792 ns |
19667 ns |
1.01 |
metal/synchronization/stream |
18833 ns |
18459 ns |
1.02 |
This comment was automatically generated by workflow using github-action-benchmark.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adapts to JuliaGPU/GPUArrays.jl#707. The new RNG is generally faster than the MPS one (and much faster than the native one), so switch over for all operations.
Analysis by 🤖 below.
Performance comparison
Apple M-series GPU,
Metal.@synctiming, minimum of 30 samples per call."ratio" is
MPS / GPUArrays— values below 1.0 mean MPS is faster.Uniform
rand!Normal
randn!Summary
Looking at the 16M-element results (where launch overhead is negligible):
small signed integer types (Int8, Int16) where MPS is ~1.3–1.5× faster — and
even there the absolute gap is sub-millisecond.
randn!(1.08–1.50×) on top of fixingthe NaN bug.
KernelRNGis consistently 2–4× slower than the GPUArrays RNG.Routing all in-place / out-of-place calls through GPUArrays simplifies the code
(no per-type dispatch table) and makes the Metal randoms path consistent with
CUDA.jl. MPS stays available behind
Metal.mps_rng()for users who want it.