Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
110 commits
Select commit Hold shift + click to select a range
976bec2
Modify get_halo to work with non-balanced DNDarray
ClaudiaComito Feb 19, 2021
1d5c3b6
Merge branch 'master' into features/unique-sort-distributed
ClaudiaComito Mar 1, 2021
30b3ec3
Create lshape_map without communication if DNDarray is balanced.
ClaudiaComito Mar 5, 2021
db4ebed
in-place resplit to work in imbalanced DNDarrays as well
ClaudiaComito Mar 5, 2021
060b48a
Implement distributed unique, return inverse indices
ClaudiaComito Mar 5, 2021
1a27f45
Merge branch 'master' into features/unique-sort-distributed
ClaudiaComito Mar 10, 2021
1d9f71f
Debugging unique
ClaudiaComito Mar 10, 2021
77fb0d4
Fix error in counts, displs for unbalanced resplit_(None)
ClaudiaComito Mar 10, 2021
63c96c4
Fix error in counts, displs for unbalanced resplit_(None)
ClaudiaComito Mar 10, 2021
a6620da
Fix imbalanced resplit()
ClaudiaComito Mar 10, 2021
1fe8428
Debugging unique
ClaudiaComito Mar 10, 2021
8dd0d82
Fix incoming_offset error in sparse unique
ClaudiaComito Mar 10, 2021
c35c72a
Updated documentation, fixed some split errors.
ClaudiaComito Mar 10, 2021
ae56b86
Skip non-populated ranks in imbalanced gethalo
ClaudiaComito Mar 12, 2021
fa597a8
Merge changes to reduce_op
ClaudiaComito Mar 17, 2021
39050f0
Modify tests for imbalanced gethalo()
ClaudiaComito Mar 20, 2021
e57235a
Generalize sort() implementation into helper function _pivot_sorting …
ClaudiaComito Mar 20, 2021
20a9930
Update test_unique based on new distributed implementation
ClaudiaComito Mar 20, 2021
9290c40
Fix write-out bug in MPI ring
ClaudiaComito Mar 22, 2021
79e6219
Expand "dense unique" tests
ClaudiaComito Mar 22, 2021
8123b6a
Expand test_unique
ClaudiaComito Mar 23, 2021
19442fc
Merge branch 'master' into features/unique-sort-distributed
ClaudiaComito Mar 23, 2021
290f11d
minimize boiler-plate code in test_unique
ClaudiaComito Mar 24, 2021
f5af549
Merge branch 'master' into features/unique-sort-distributed
ClaudiaComito Mar 24, 2021
4bed18a
remove excess ###
ClaudiaComito Mar 24, 2021
2c2aa6e
Debugging
ClaudiaComito Mar 24, 2021
5a2e592
Debugging
ClaudiaComito Mar 24, 2021
3c068e7
Debugging
ClaudiaComito Mar 24, 2021
3d6a02b
Debugging
ClaudiaComito Mar 24, 2021
b4b4763
Fix empty Allgather problem
ClaudiaComito Mar 24, 2021
d8f73ea
Debugging
ClaudiaComito Mar 25, 2021
f4b7f03
Skip second local torch.unique if local tensor is empty
ClaudiaComito Mar 25, 2021
fc8cc5f
Merge branch 'master' into features/unique-sort-distributed
ClaudiaComito Mar 25, 2021
eb57b5e
Expand tests, fix spit inconsistencies
ClaudiaComito Mar 28, 2021
32b8857
Fix inverse indices dtype in non-distributed case
ClaudiaComito Mar 29, 2021
a575223
Test NonImplementedError exception in distributed case only
ClaudiaComito Mar 29, 2021
2cba6cc
Fix lshape_map of local sorted uniques when nodes are empty
ClaudiaComito Mar 29, 2021
68eb57c
Set dndarray.__balanced to `balanced`, not None
ClaudiaComito Apr 7, 2021
519c020
Remove `sorted` option from ht.unique()
ClaudiaComito Apr 8, 2021
edc12ef
Merge branch 'bug/sort-balance' into features/unique-sort-distributed
ClaudiaComito Apr 11, 2021
c5de73f
Fix race condition in test_qr
ClaudiaComito Apr 11, 2021
1964cc8
Fix prev_rank/next_rank indices for imbalanced gethalo, expand tests
ClaudiaComito Apr 11, 2021
dd9b797
Update changelog
ClaudiaComito Apr 11, 2021
50876a5
Documentation update
ClaudiaComito Apr 12, 2021
906c804
Update changelog
ClaudiaComito May 3, 2021
b40193f
Merge branch 'release/1.0.x' into features/unique-sort-distributed
ClaudiaComito May 25, 2021
4163882
Merge branch 'master' into features/unique-sort-distributed
ClaudiaComito Jun 8, 2021
9f85dac
Merge branch 'master' into features/unique-sort-distributed
ClaudiaComito Jun 21, 2021
96211ed
Merge branch 'master' into features/unique-sort-distributed
ClaudiaComito Jul 6, 2021
519b875
Merge branch 'master' into features/unique-sort-distributed
ClaudiaComito Aug 20, 2021
7a6944f
Merge branch 'master' into features/unique-sort-distributed
ClaudiaComito Aug 20, 2021
eb266df
Merge branch 'master' into features/unique-sort-distributed
ClaudiaComito Aug 23, 2021
4aaf893
Merge branch 'master' into features/unique-sort-distributed
ClaudiaComito Aug 23, 2021
1ed4708
Replace explicit `counts, displs` calculation with `dndarray.counts_d…
ClaudiaComito Aug 23, 2021
f2611b6
Merge branch 'master' into features/unique-sort-distributed
ClaudiaComito Sep 10, 2021
723b37a
Address review, part I
ClaudiaComito Sep 10, 2021
f5e360c
Address review Part II of II
ClaudiaComito Sep 10, 2021
2753bfe
Reshape empty `local_sorted` to match global shape
ClaudiaComito Sep 10, 2021
b3de8c6
Merge branch 'master' into features/unique-sort-distributed
ClaudiaComito Sep 14, 2021
bfa58ba
GPU Debugging
ClaudiaComito Sep 14, 2021
68d437d
Debug devices
ClaudiaComito Sep 14, 2021
1cfd565
Debug tensor devices
ClaudiaComito Sep 14, 2021
1f6cec2
Debug tensor devices
ClaudiaComito Sep 14, 2021
f923834
Specify device for torch `_like` factories
ClaudiaComito Sep 14, 2021
34667c8
Specify device for all torch _like factories
ClaudiaComito Sep 14, 2021
e1c28c5
Devices
ClaudiaComito Sep 14, 2021
1ca6ceb
Add more missing devices
ClaudiaComito Sep 14, 2021
209d5d3
Replace np.cumsum calls with torch.cumsum
ClaudiaComito Sep 15, 2021
01281cc
Debug test_sort on GPU
ClaudiaComito Sep 15, 2021
badd041
Remove print (debugging) statements
ClaudiaComito Sep 15, 2021
369203e
Set up memory profiling
ClaudiaComito Sep 29, 2021
3bf5e03
Merge branch 'master' into features/unique-sort-distributed
ClaudiaComito Oct 5, 2021
db77a91
Merge branch 'master' into features/unique-sort-distributed
coquelin77 Oct 8, 2021
28bccf6
Merge branch 'features/unique-sort-distributed' of github.com:helmhol…
ClaudiaComito Nov 17, 2021
1e12ca3
Merge branch 'master' into features/unique-sort-distributed
ClaudiaComito Nov 17, 2021
b46924d
Improve efficiency, adopt `dndarray.lshape_map` and `dndarray.counts_…
ClaudiaComito Nov 17, 2021
e5a713c
Comment out memory_profiler import
ClaudiaComito Nov 17, 2021
690f9e0
Remove redundant split sanitation from self.comm.chunk
ClaudiaComito Nov 18, 2021
dc2672f
Do not clone obj[slices] if copy is False. Remove unnecessary split s…
ClaudiaComito Nov 18, 2021
49efd0a
Improve memory usage for sanitize_memory_layout
ClaudiaComito Nov 18, 2021
1bf8d66
Reorganize sanitation logic
ClaudiaComito Nov 18, 2021
3a91bca
Remove "sparse unique" implementation
ClaudiaComito Nov 18, 2021
10354b7
Specify factories.array(copy=False)
ClaudiaComito Nov 18, 2021
0ea2c3a
Always copy obj if specified dtype is different from original dtype
ClaudiaComito Nov 18, 2021
f809c95
Copy obj when specified dtype different from original
ClaudiaComito Nov 18, 2021
d124819
Copy output of torch.diagonal (partial view)
ClaudiaComito Nov 18, 2021
a4dd4d4
remove dead code
ClaudiaComito Nov 18, 2021
16318ef
Debugging GPU error
ClaudiaComito Nov 19, 2021
07b6fb1
Debugging test_sort on GPU
ClaudiaComito Nov 19, 2021
5cd0b8c
Make test_sort more stable incl. for GPUs
ClaudiaComito Nov 20, 2021
90b0d71
Debug GPU test_sort
ClaudiaComito Nov 22, 2021
7aa51ae
Degug test_sort on GPU
ClaudiaComito Nov 22, 2021
dbce096
Debug test_sort on GPU
ClaudiaComito Nov 22, 2021
f615ade
Debug test_sort on GPU
ClaudiaComito Nov 22, 2021
ea0aab4
Debug
ClaudiaComito Nov 22, 2021
152fc55
Debug
ClaudiaComito Nov 22, 2021
bfe38ed
Merge branch 'master' into features/unique-sort-distributed
ClaudiaComito Nov 22, 2021
3939564
Debugging
ClaudiaComito Nov 24, 2021
a27f7f8
Debugging
ClaudiaComito Nov 24, 2021
f0f7926
Do not test sorting indices on GPU if sorting non-unique values
ClaudiaComito Nov 24, 2021
73c914d
Expand test_sort to 3d
ClaudiaComito Nov 24, 2021
b972a3a
Expand test_sort for empty-node case
ClaudiaComito Nov 24, 2021
6daa7c2
Remove size-1 test in test_sort
ClaudiaComito Nov 24, 2021
96f5d9d
Update changelog
ClaudiaComito Nov 24, 2021
6a41a9c
Remove dead code
ClaudiaComito Nov 24, 2021
6007d31
Reinstate "sparse" unique
ClaudiaComito Nov 25, 2021
ec0684c
Remove unnecessary `balance_` before distributed `unique`
ClaudiaComito Nov 25, 2021
924fcd3
Merge branch 'master' into features/unique-sort-distributed
ClaudiaComito Jan 20, 2022
6d385e8
Bring back `factories.array` to original state, changes forked to ded…
ClaudiaComito Jan 20, 2022
159d1c5
Merge branch 'master' into features/unique-sort-distributed
coquelin77 Jan 31, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 12 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,11 @@
- [#846](https://github.com/helmholtz-analytics/heat/pull/846) Fixed an issue in `_reduce_op` when axis and keepdim were set.
- [#846](https://github.com/helmholtz-analytics/heat/pull/846) Fixed an issue in `min`, `max` where DNDarrays with empty processes can't be computed.
- [#868](https://github.com/helmholtz-analytics/heat/pull/868) Fixed an issue in `__binary_op` where data was falsely distributed if a DNDarray has single element.
- [#876](https://github.com/helmholtz-analytics/heat/pull/876) Make examples work (Lasso and kNN)

## Feature Additions
### Linear Algebra
- [#842](https://github.com/helmholtz-analytics/heat/pull/842) New feature: `vdot`
- [#867](https://github.com/helmholtz-analytics/heat/pull/867) Support torch 1.9.0
- [#884](https://github.com/helmholtz-analytics/heat/pull/884) Support PyTorch 1.10.0, this is now the recommended version to use.

## Feature additions
### Communication
Expand All @@ -22,16 +23,20 @@
- [#856](https://github.com/helmholtz-analytics/heat/pull/856) New `DNDarray` method `__torch_proxy__`
- [#885](https://github.com/helmholtz-analytics/heat/pull/885) New `DNDarray` method `conj`

### Factories
- [#749](https://github.com/helmholtz-analytics/heat/pull/749) `ht.array(copy=False)` behaviour now more in line with `np.array(copy=False)`, reduced memory footprint
# Feature additions
### Linear Algebra
- [#840](https://github.com/helmholtz-analytics/heat/pull/840) New feature: `vecdot()`
- [#842](https://github.com/helmholtz-analytics/heat/pull/842) New feature: `vdot`
- [#846](https://github.com/helmholtz-analytics/heat/pull/846) New features `norm`, `vector_norm`, `matrix_norm`
- [#850](https://github.com/helmholtz-analytics/heat/pull/850) New Feature `cross`
- [#877](https://github.com/helmholtz-analytics/heat/pull/877) New feature `det`

### Logical
- [#862](https://github.com/helmholtz-analytics/heat/pull/862) New feature `signbit`
### Manipulations
- [#749](https://github.com/helmholtz-analytics/heat/pull/749) Distributed sorted `ht.unique`
- [#829](https://github.com/helmholtz-analytics/heat/pull/829) New feature: `roll`
- [#853](https://github.com/helmholtz-analytics/heat/pull/853) New Feature: `swapaxes`
- [#854](https://github.com/helmholtz-analytics/heat/pull/854) New Feature: `moveaxis`
Expand All @@ -43,6 +48,7 @@
### Rounding
- [#827](https://github.com/helmholtz-analytics/heat/pull/827) New feature: `sign`, `sgn`


# v1.1.1
- [#864](https://github.com/helmholtz-analytics/heat/pull/864) Dependencies: constrain `torchvision` version range to match supported `pytorch` version range.

Expand Down Expand Up @@ -104,6 +110,9 @@ Example on 2 processes:
### Linear Algebra
- [#718](https://github.com/helmholtz-analytics/heat/pull/718) New feature: `trace()`
- [#768](https://github.com/helmholtz-analytics/heat/pull/768) New feature: unary positive and negative operations

### Manipulations
- [#820](https://github.com/helmholtz-analytics/heat/pull/820) `dot` can handle matrix vector operation now
- [#820](https://github.com/helmholtz-analytics/heat/pull/820) `dot` can handle matrix-vector operation now

### Manipulations
Expand Down Expand Up @@ -199,6 +208,7 @@ Example on 2 processes:
### Manipulations
- [#690](https://github.com/helmholtz-analytics/heat/pull/690) Enhancement: reshape accepts shape arguments with one unknown dimension.
- [#706](https://github.com/helmholtz-analytics/heat/pull/706) Bug fix: prevent `__setitem__`, `__getitem__` from modifying key in place
- [#744](https://github.com/helmholtz-analytics/heat/pull/744) Fix split semantics for reduction operations
Comment thread
coquelin77 marked this conversation as resolved.
### Unit testing / CI
- [#717](https://github.com/helmholtz-analytics/heat/pull/717) Switch CPU CI over to Jenkins and pre-commit to GitHub action.
- [#720](https://github.com/helmholtz-analytics/heat/pull/720) Ignore test files in codecov report and allow drops in code coverage.
Expand Down
16 changes: 8 additions & 8 deletions heat/core/communication.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,21 +170,21 @@ def chunk(
Parameters
----------
shape : Tuple[int,...]
The global shape of the data to be split
The global shape of the data to be split.
split : int
The axis along which to chunk the data
The axis along which to chunk the data. Must be within the range of ``shape``.
rank : int, optional
Process for which the chunking is calculated for, defaults to ``self.rank``.
Intended for creating chunk maps without communication
Intended for creating chunk maps without communication.
w_size : int, optional
The MPI world size, defaults to ``self.size``.
Intended for creating chunk maps without communication

Intended for creating chunk maps without communication.
"""
# ensure the split axis is valid, we actually do not need it
split = sanitize_axis(shape, split)
if split is None:
return 0, shape, tuple(slice(0, end) for end in shape)
if split < 0:
split = len(shape) + split

rank = self.rank if rank is None else rank
w_size = self.size if w_size is None else w_size
if not isinstance(rank, int) or not isinstance(w_size, int):
Expand Down Expand Up @@ -212,7 +212,7 @@ def counts_displs_shape(
self, shape: Tuple[int], axis: int
) -> Tuple[Tuple[int], Tuple[int], Tuple[int]]:
"""
Calculates the item counts, displacements and output shape for a variable sized all-to-all MPI-call (e.g.
Calculates the item counts, displacements and output shape for a variable-sized all-to-all MPI-call (e.g.
``MPI_Alltoallv``). The passed shape is regularly chunk along the given axis and for all nodes.

Parameters
Expand Down
27 changes: 18 additions & 9 deletions heat/core/dndarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -368,10 +368,6 @@ def get_halo(self, halo_size) -> torch.Tensor:
halo_size : int
Size of the halo.
"""
if not self.is_balanced():
raise RuntimeError(
"halo cannot be created for unbalanced tensors, running the .balance_() function is recommended"
)
if not isinstance(halo_size, int):
raise TypeError(
"halo_size needs to be of Python type integer, {} given".format(type(halo_size))
Expand All @@ -381,30 +377,43 @@ def get_halo(self, halo_size) -> torch.Tensor:
"halo_size needs to be a positive Python integer, {} given".format(type(halo_size))
)

if self.comm.is_distributed() and self.split is not None:
if self.is_distributed():
# gather lshapes
lshape_map = self.create_lshape_map()
rank = self.comm.rank
size = self.comm.size

first_rank = 0
next_rank = rank + 1
prev_rank = rank - 1
last_rank = size - 1

# if local shape is zero and it's the last process
if not self.balanced:
populated_ranks = torch.nonzero(lshape_map[:, 0]).squeeze().tolist()
if rank in populated_ranks:
first_rank = populated_ranks[0]
last_rank = populated_ranks[-1]
next_rank = rank + 1
prev_rank = rank - 1
if rank != last_rank:
next_rank = populated_ranks[populated_ranks.index(rank) + 1]
if rank != first_rank:
prev_rank = populated_ranks[populated_ranks.index(rank) - 1]

# if local shape is zero
if self.lshape[self.split] == 0:
return # if process has no data we ignore it

if halo_size > self.lshape[self.split]:
# if on at least one process the halo_size is larger than the local size throw ValueError
raise ValueError(
"halo_size {} needs to be smaller than chunck-size {} )".format(
"halo_size {} needs to be smaller than chunk-size {} )".format(
halo_size, self.lshape[self.split]
)
)

a_prev = self.__prephalo(0, halo_size)
a_next = self.__prephalo(-halo_size, None)

res_prev = None
res_next = None

Expand All @@ -418,7 +427,7 @@ def get_halo(self, halo_size) -> torch.Tensor:
)
req_list.append(self.comm.Irecv(res_prev, source=next_rank))

if rank != 0:
if rank != first_rank:
self.comm.Isend(a_prev, prev_rank)
res_next = torch.zeros(
a_next.size(), dtype=a_next.dtype, device=self.device.torch_device
Expand Down
1 change: 0 additions & 1 deletion heat/core/linalg/tests/test_qr.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,6 @@ def test_qr(self):
self.assertTrue(
ht.allclose(ht.eye(m, dtype=ht.double), qr2.Q @ qr2.Q.T, rtol=1e-5, atol=1e-5)
)

# test if calc R alone works
a2_0 = ht.array(st2, split=0)
a2_1 = ht.array(st2, split=1)
Expand Down
Loading