Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
527 commits
Select commit Hold shift + click to select a range
d5c59a4
removed some debug messages
03szust Jun 10, 2025
24a7bb0
more testing
03szust Jun 10, 2025
1131500
more debug
03szust Jun 10, 2025
fda8456
swapped A and B in test
03szust Jun 10, 2025
e1b5775
swapped back
03szust Jun 10, 2025
df626e3
transpose L
03szust Jun 10, 2025
0d60090
typo
03szust Jun 10, 2025
142acdc
changed conj.T to the trans param
03szust Jun 10, 2025
bfbdcf1
actually implement frist trsm
03szust Jun 10, 2025
e80022c
changed all trsm's
03szust Jun 10, 2025
2a82910
after the previous version faailed, this is the second attempt
03szust Jun 10, 2025
df87182
removed side from arrow because of dim mismatch, added it to other ar…
03szust Jun 10, 2025
db39806
implemented trsm side right for all non arrow solves
03szust Jun 10, 2025
91b6d9b
imported cupy gemm to local
03szust Jun 10, 2025
d8c1d7a
added error to test
03szust Jun 10, 2025
8730201
fixed error
03szust Jun 10, 2025
e894146
implemented one provsionary gemm in pobtaf
03szust Jun 10, 2025
d2c3cd4
removed error for testing
03szust Jun 10, 2025
d251adb
fixed validating array if no array was present to begin with
03szust Jun 10, 2025
58429da
fixed c1 one not existing if c was none
03szust Jun 10, 2025
af86da8
changed gemm to accomodate in place operations
03szust Jun 10, 2025
5759621
changed first gemm to trans_b = c
03szust Jun 10, 2025
f3fb2b5
fixed different trans name
03szust Jun 10, 2025
a48f568
used alpha param on first gemm
03szust Jun 10, 2025
69da4ae
removed alpha and beta hardcoding
03szust Jun 10, 2025
4e6a81a
changed to minus
03szust Jun 10, 2025
f148ab3
inserted some debug messages
03szust Jun 10, 2025
c3de746
reverted minus
03szust Jun 10, 2025
dac0489
exposed alpha, beta and c for host gemm
03szust Jun 10, 2025
cada2cb
convert alpha to complex for cgemm and zgemm host
03szust Jun 10, 2025
13e2b79
inser dytpe debug
03szust Jun 10, 2025
9d85d22
changed type debug
03szust Jun 10, 2025
ce4f0bc
changed type debug again
03szust Jun 10, 2025
fc12dc1
swapped order in function call
03szust Jun 10, 2025
7887bb0
removed debug
03szust Jun 10, 2025
1ae08d4
fully use gemm at first location
03szust Jun 10, 2025
cfd7bd7
changed check for existing c
03szust Jun 10, 2025
993c17f
fixed c not being able to be true
03szust Jun 10, 2025
c3aa505
fixed c again
03szust Jun 10, 2025
a38c39d
further c fix
03szust Jun 10, 2025
a873567
second gemm
03szust Jun 10, 2025
73eeabe
removed square matrix check in gemm that was leftover from trsm
03szust Jun 10, 2025
899d7c9
changed input validation
03szust Jun 10, 2025
e02b484
third gemm
03szust Jun 10, 2025
b3b8c22
full normal pobtaf gemm implemented
03szust Jun 10, 2025
0bf8f58
gemm in permuted pobtaf
03szust Jun 10, 2025
50312bb
rollback to just one gemm
03szust Jun 10, 2025
2cf395a
removed leftover conj t
03szust Jun 10, 2025
3b3b244
rollback to 1 gemm in permuted
03szust Jun 10, 2025
88b3a4c
next gemm in permuted
03szust Jun 10, 2025
83f7247
another gemm
03szust Jun 10, 2025
7cae093
next gemm
03szust Jun 10, 2025
3f889e1
smaller gemm
03szust Jun 10, 2025
3c85f7b
last permuted gemm
03szust Jun 10, 2025
ad936cb
first gemm in streaming
03szust Jun 10, 2025
9c4f496
second gemm streaming
03szust Jun 10, 2025
e0e5bdd
third gemm streaming
03szust Jun 10, 2025
4bb8e6e
two permuted streaming gemms
03szust Jun 10, 2025
480d982
implemented gemms for permuted streaming
03szust Jun 10, 2025
bfadd08
implemented a form of syrk/herk and added a error for testing
03szust Jun 11, 2025
4e6bbde
added another print for debug
03szust Jun 11, 2025
f8ccb36
added another print
03szust Jun 11, 2025
256e3d0
implemented syherk. sadly it's not yet useful
03szust Jun 11, 2025
9d8ea28
removed debug prints
03szust Jun 11, 2025
5885338
added test error
03szust Jun 11, 2025
1436b7f
fixed typo
03szust Jun 11, 2025
b8a5a25
moved test
03szust Jun 11, 2025
2bf27a4
attempt for using syherk
03szust Jun 11, 2025
e62d9ff
missing parenthesis
03szust Jun 11, 2025
bc52ddc
changed input for _syherk
03szust Jun 11, 2025
cf3f722
removed iteration error
03szust Jun 11, 2025
8fcdf5f
changed implementation of syherk to not use cherk and zherk if not av…
03szust Jun 12, 2025
4aedc59
fixed gemm call in syherk
03szust Jun 12, 2025
fd73f46
added debug print
03szust Jun 12, 2025
317a7cf
more debug print
03szust Jun 12, 2025
4308160
changed call to not include c1
03szust Jun 12, 2025
d9e1798
added print
03szust Jun 12, 2025
d8e2557
more print
03szust Jun 12, 2025
2e03d5f
added different prints
03szust Jun 12, 2025
469eb67
more debug prints
03szust Jun 12, 2025
0ed6382
new debug
03szust Jun 12, 2025
11c257e
further print
03szust Jun 12, 2025
83b54e1
removed some prints
03szust Jun 12, 2025
af61f32
more print
03szust Jun 12, 2025
2a80293
changed print
03szust Jun 12, 2025
e0a54b4
changed prints again
03szust Jun 12, 2025
154d8c7
reverted some prints
03szust Jun 12, 2025
2759663
forcing out=none for noncomplex device syrk
03szust Jun 12, 2025
5e27f76
changed syher behavior in pobtas for testing
03szust Jun 12, 2025
f05a714
further change
03szust Jun 12, 2025
2238026
reverted pobtaf changes
03szust Jun 12, 2025
efcfea0
changes to test syherk
03szust Jun 12, 2025
01730d5
more debug changes
03szust Jun 12, 2025
7a5abe2
removed forced out
03szust Jun 12, 2025
c3d9355
reverted to gemm for testing
03szust Jun 12, 2025
3c1715a
removed raised error
03szust Jun 12, 2025
49ec746
changed back to syherk
03szust Jun 12, 2025
c9fdcd0
more prints
03szust Jun 12, 2025
7958743
reverted to gemm
03szust Jun 12, 2025
6d4d817
trying gemm again
03szust Jun 12, 2025
705de55
gemm works, returning to syrk
03szust Jun 12, 2025
9803af4
Merge branch 'integrate_missing_streaming' of https://github.com/vinc…
03szust Jun 12, 2025
69811e2
added print
03szust Jun 12, 2025
c483ccc
changed print
03szust Jun 12, 2025
a32f0b1
fixed print
03szust Jun 12, 2025
1ae99f9
sanity check
03szust Jun 12, 2025
b1b5838
more sanity
03szust Jun 12, 2025
15deadf
sanity 1
03szust Jun 12, 2025
8ff51d0
sanity 2
03szust Jun 12, 2025
ab106df
sanity 3
03szust Jun 12, 2025
5435a22
fixed parenthesis
03szust Jun 12, 2025
870c1f9
changed lower to upper
03szust Jun 12, 2025
f79c21a
swap lower on device to match cholesky. hopefully temporary
03szust Jun 12, 2025
55548d0
random commit
03szust Jun 12, 2025
030124f
updated 2. syherk
03szust Jun 12, 2025
7e66166
third syherk
03szust Jun 12, 2025
8cbda4b
removed messages
03szust Jun 12, 2025
cd5dd35
exposed lower param on cholesky for cupy
03szust Jun 12, 2025
9395fc8
switched lower
03szust Jun 12, 2025
9d64ee9
removed lower param in pobtaf from cholesky
03szust Jun 12, 2025
5fc8601
added lower param again
03szust Jun 12, 2025
d73e221
transposing arrow tip block
03szust Jun 12, 2025
862cd3b
trans T
03szust Jun 12, 2025
41899f7
added one new cu_chol
03szust Jun 12, 2025
c592343
added cu_chol to all syherk
03szust Jun 12, 2025
9fa975d
added lower to second cholesky
03szust Jun 12, 2025
87f23ee
further cholesky lower
03szust Jun 12, 2025
1f90c2e
first permuted syherk
03szust Jun 12, 2025
1e9eefe
2 syherk permuted
03szust Jun 12, 2025
9219872
syherk 3 oermuted
03szust Jun 12, 2025
9e802a5
syherk streaming 1
03szust Jun 12, 2025
9e09f8b
syherk streaming 2
03szust Jun 12, 2025
04bb683
syherk streaming 3
03szust Jun 12, 2025
3b78056
2 permuted streaming syherk
03szust Jun 12, 2025
4a9c3ff
all syherk done
03szust Jun 12, 2025
30cfd09
test if L can be ommited from factorize last block
03szust Jun 16, 2025
bec3cba
reverted the L's for now
03szust Jun 16, 2025
1da2396
inserted print test to check A and L after cholesky
03szust Jun 16, 2025
cfb2cac
moved the print test to see ifsomething changed
03szust Jun 16, 2025
413aec5
switcht break condition
03szust Jun 16, 2025
7df1dc1
switched L diag block to A diag block to see if it works
03szust Jun 16, 2025
97e057a
trying to switch a few more L's
03szust Jun 16, 2025
01d0caa
removed all the L's
03szust Jun 16, 2025
9120fff
actually removed all the L's
03szust Jun 16, 2025
2e0ca8b
reverted the L's back in
03szust Jun 16, 2025
cb4b4d5
trying to put chol in place
03szust Jun 16, 2025
940407a
reverted cholesky_lowerfill because it changed nothing
03szust Jun 16, 2025
5458591
switched override b to true in trsm
03szust Jun 16, 2025
144cf8a
added nvtx for testing
03szust Jun 16, 2025
0315503
more nvtx
03szust Jun 16, 2025
5743a68
removed nvtx
03szust Jun 16, 2025
a2d821c
added streaming tests to pobtaf
03szust Jun 17, 2025
9d300b9
check if streaming is happening
03szust Jun 17, 2025
d722ad8
added more testing code
03szust Jun 17, 2025
4d7043d
permuted pobtasi streaming
03szust Jun 17, 2025
feaa90e
removed improvements from pobtaf perm
03szust Jun 17, 2025
6da2b8a
check first imp in perm pobtaf
03szust Jun 17, 2025
b7a541c
second imp
03szust Jun 17, 2025
f1f99db
third imp test
03szust Jun 17, 2025
e229ec1
4 imp test
03szust Jun 17, 2025
963eb9f
5 imp test
03szust Jun 17, 2025
66547e1
6 imp test
03szust Jun 17, 2025
f6ac683
7 imp test
03szust Jun 17, 2025
1dbc0d9
8 imp test
03szust Jun 17, 2025
537488b
9 imp test
03szust Jun 17, 2025
e1e5583
check full
03szust Jun 17, 2025
8efe2da
removed all improvemnts from pobtaf permuted for sanity, check now so…
03szust Jun 17, 2025
dea6e70
2 imp add
03szust Jun 17, 2025
00f2e31
3 imp add
03szust Jun 17, 2025
4bce6bc
4 imp add
03szust Jun 17, 2025
d414da3
fix attempt first error
03szust Jun 17, 2025
3a1ad6e
print added to check for error
03szust Jun 17, 2025
e0d7bf7
added test for error
03szust Jun 17, 2025
8371352
switched lower
03szust Jun 17, 2025
e54697d
added print
03szust Jun 17, 2025
3738bf7
reverted to nonimproved to see actual sol
03szust Jun 17, 2025
b3a7e08
added break
03szust Jun 17, 2025
7817338
added lower to chol
03szust Jun 17, 2025
95e34aa
trying syherk again
03szust Jun 17, 2025
709335e
print chol sol as well
03szust Jun 17, 2025
a568fa2
removed syherk
03szust Jun 17, 2025
daccf3f
moved error
03szust Jun 17, 2025
befa65b
added syherk to test for test behaviour
03szust Jun 17, 2025
252abef
added prints in test
03szust Jun 17, 2025
fc37075
replaced syherk with gemm in permuted streaming because of a propagat…
03szust Jun 17, 2025
e16d803
added all syher withc cu_chol false
03szust Jun 17, 2025
f901ee2
check if diagonal blocks are similar
03szust Jun 17, 2025
f0eacf0
removed intermittend assert
03szust Jun 17, 2025
f8a09c1
changed one syherk to gemm
03szust Jun 17, 2025
c5ae3ef
added comment explaining weird behaviour
03szust Jun 17, 2025
f23c53a
Merge remote-tracking branch 'origin/integrate_missing_streaming' int…
03szust Jun 17, 2025
0ca95d6
fix for syherk issue in normal streaming
03szust Jun 17, 2025
83d12f9
added on pobtas trsm and trying to fix pobtaf streaming
03szust Jun 18, 2025
b88fb75
fixing the fix
03szust Jun 18, 2025
2ecb420
further syherk fix
03szust Jun 18, 2025
c5a1993
more fixing
03szust Jun 18, 2025
7982cf8
fixed keywords
03szust Jun 18, 2025
e349e21
next syherk fix
03szust Jun 18, 2025
c175b8e
third syherk fix
03szust Jun 18, 2025
33cbb9a
replaced with working streaming code
03szust Jun 18, 2025
118745e
reset streaming code to original
03szust Jun 18, 2025
591aca6
added streaming improvements back in
03szust Jun 18, 2025
d1b3570
reverted to start of day to see if issue is local
03szust Jun 18, 2025
a76e378
check if issue with permuted streaming was local
03szust Jun 18, 2025
39b15c9
further test
03szust Jun 18, 2025
eac2822
issue wasn't local, reverted to gemm
03szust Jun 18, 2025
1c88f06
first two improvemnts in pobtas
03szust Jun 18, 2025
adef207
fixed missing comma
03szust Jun 18, 2025
39c898f
removed gemm improvemnt
03szust Jun 18, 2025
39ca52b
added missing minus
03szust Jun 18, 2025
6f5ed81
added more trsm
03szust Jun 18, 2025
e7df35a
changed all solve trinagular to trsm
03szust Jun 18, 2025
6b072bd
added first gemm
03szust Jun 18, 2025
e5da93f
fixed keyword
03szust Jun 18, 2025
d9539b1
switched a and b
03szust Jun 18, 2025
c90226b
reverted switch
03szust Jun 18, 2025
de91147
fixed gemm validation
03szust Jun 18, 2025
5db201e
error message to show which shape check is wrong
03szust Jun 18, 2025
17490b2
swapped shapes
03szust Jun 18, 2025
f210506
fixing shape check
03szust Jun 18, 2025
19a2bfb
next gemm
03szust Jun 18, 2025
f2ca36c
removed minus
03szust Jun 18, 2025
390022e
another gemm
03szust Jun 18, 2025
f12632f
gemm in trsm
03szust Jun 18, 2025
138b599
attempt at disambiguating last solve in normal pobtas
03szust Jun 18, 2025
a6b11e2
cleaning up mess
03szust Jun 18, 2025
7f1418e
added some gemms
03szust Jun 18, 2025
bdc9f63
disambiguate big solve block
03szust Jun 18, 2025
f6c29c4
reformattign and adding first gemm to streaming
03szust Jun 18, 2025
1e7dae8
reformatting and adding more gemm
03szust Jun 18, 2025
67d93bd
added even more gemm and reformatting
03szust Jun 18, 2025
7ce90bf
diambiguate and add gemm
03szust Jun 18, 2025
b331e0b
added trsm to pobtasi
03szust Jun 18, 2025
5360379
fixed pobtasi trsm
03szust Jun 18, 2025
7429907
added print and error to see how multiplication works
03szust Jun 18, 2025
a927d9d
removed tests
03szust Jun 18, 2025
e881574
check if gemm can be applied in pobtasi
03szust Jun 18, 2025
5d7800e
removed gemm from pobtasi
03szust Jun 18, 2025
89b0e07
removed print at end of pobtas test
03szust Jun 18, 2025
885a8d6
removed all leftover prints
03szust Jun 18, 2025
293922b
pobtf perf improvements
03szust Jun 18, 2025
a27dca9
improvedd pobtf
03szust Jun 18, 2025
4a2c175
pobts normal improved
03szust Jun 18, 2025
80ccc4c
pobts permuted improved
03szust Jun 18, 2025
a005d1c
improved pobts
03szust Jun 18, 2025
9936df6
improved pobtsi
03szust Jun 18, 2025
9b5e955
added missing import
03szust Jun 18, 2025
9a17bf1
added copyright
03szust Jun 18, 2025
2406d59
removed unneeded matmul
03szust Jun 18, 2025
7d50b35
removed another matmul
03szust Jun 18, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions src/serinv/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@

# In the case of CuPy, we want to use the lowerfill version
# tweaked in serinv. (More performances)
from serinv.cupyfix.cholesky_lowerfill import cholesky_lowerfill as cu_cholesky
from serinv.cupyfix.cholesky_lowerfill import cholesky as cu_cholesky

# Check if cupy is actually working. This could still raise
# a cudaErrorInsufficientDriver error or something.
Expand Down Expand Up @@ -163,7 +163,7 @@ def _use_nccl(comm):
return False


def _get_nccl_parameters(arr, comm, op: str):
def _get_nccl_parameters(arr, comm, rank, op: str):
"""Get the NCCL parameters for the given operation."""
if np.iscomplexobj(arr):
factor = 2
Expand All @@ -172,8 +172,8 @@ def _get_nccl_parameters(arr, comm, op: str):

if backend_flags["nccl_avail"]:
if op == "allgather":
count = (arr.size // comm.size) * factor
displacement = count * comm.rank * arr.dtype.itemsize
count = (arr.size // comm.size()) * factor
displacement = count * rank * (arr.dtype.itemsize // factor)
elif op == "allreduce":
count = arr.size * factor
displacement = 0
Expand Down
Loading