Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
282 commits
Select commit Hold shift + click to select a range
4da69fd
merge branch release/1.2.x
ClaudiaComito Dec 12, 2022
27ea911
Update ubuntu
ClaudiaComito Dec 12, 2022
d0fb6c8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 12, 2022
0e704d4
switch back to ubuntu 20.04
ClaudiaComito Dec 12, 2022
f5d7850
pull
ClaudiaComito Dec 12, 2022
acfe9bd
Upgrade CI to ubuntu 22.04 and cuda 11.7.1
ClaudiaComito Dec 12, 2022
0fd3d87
avoid unnecessary gathering of test DNDarrays
ClaudiaComito Dec 20, 2022
3c4c07c
early out for resplit of non-distributed DNDarrays
ClaudiaComito Dec 20, 2022
989e0f4
match split of comparison array to expected output
ClaudiaComito Dec 20, 2022
6d66fad
avoid MPI calls in non-distributed cases
ClaudiaComito Dec 20, 2022
a37b4d3
avoid MPI calls in non-distributed resplit
ClaudiaComito Dec 20, 2022
8eebe10
set default to None
ClaudiaComito Dec 20, 2022
22c5c68
remove print statement
ClaudiaComito Dec 20, 2022
c692bff
upgrade torch version
ClaudiaComito Dec 20, 2022
df6a4e5
copy to cpu before comparing
ClaudiaComito Dec 20, 2022
af0e721
use ht.allclose instead of np.allclose
ClaudiaComito Dec 23, 2022
bac6d4e
cast different dtype operands to promoted dtype within torch call
ClaudiaComito Dec 23, 2022
c0c6362
compare local tensors to corresponding slice of expected_array only
ClaudiaComito Dec 23, 2022
587bc05
expand tests
ClaudiaComito Dec 23, 2022
24239a1
remove redundant code
ClaudiaComito Dec 23, 2022
cd65b37
Implement slicing with negative step
ClaudiaComito Dec 26, 2022
86e8801
test slicing with negative step
ClaudiaComito Dec 26, 2022
6779010
merge branch bugs/#1057-Allgatherv-contiguity-mismatch
ClaudiaComito Dec 26, 2022
3b1f46d
Fix single-element indexing within mixed-type key
ClaudiaComito Dec 27, 2022
1a4bf97
Non-ordered indexing, split != 0
ClaudiaComito Dec 27, 2022
9e42156
generalize negative step slicing to all splits, loss of dims
ClaudiaComito Dec 28, 2022
1a310a9
loop over active ranks only when key in descending order
ClaudiaComito Dec 28, 2022
c2ba0d9
replace list-on-list mapping with argsort mapping for non-ordered key
ClaudiaComito Dec 29, 2022
f6bb5c3
replace list-on-list mapping with argsort mapping for boolean indexing
ClaudiaComito Dec 30, 2022
cad9975
fix advanced indexing via list, remove last key-mapping bottleneck fo…
ClaudiaComito Dec 31, 2022
83e6950
fix local slices, expand tests
ClaudiaComito Jan 2, 2023
28ab925
fix and test dimensional indexing
ClaudiaComito Jan 2, 2023
bc226fc
Fix same-dim advanced indexing, expand tests
ClaudiaComito Jan 5, 2023
c48c66e
[skip ci] implement single-element indexing along split axis w/ Itera…
ClaudiaComito Jan 25, 2023
18329a1
[skip ci] generalize advanced indexing incl. distributed DNDarray key
ClaudiaComito Jan 26, 2023
f024ebb
[skip ci] Expand tests combined advanced / basic indexing
ClaudiaComito Jan 29, 2023
6ae2788
[skip ci] fix advanced dimensional indexing on non-distributed array
ClaudiaComito Feb 5, 2023
178d7f8
Merge branch 'main' into 914_adv-indexing-outshape-outsplit
ClaudiaComito Jul 26, 2023
09e586c
fix distr advanced indexing with broadcasted shape
ClaudiaComito Jul 27, 2023
c56ebf4
transpose without copying
ClaudiaComito Jul 29, 2023
86f704a
[skip ci] document __process_key(), clean up code
ClaudiaComito Aug 1, 2023
68ead71
[skip ci] docs edits
ClaudiaComito Aug 1, 2023
252995c
fix Ellipsis dimensions
ClaudiaComito Aug 4, 2023
c2a7e20
fix shape and split bookkeeping within advanced indexing
ClaudiaComito Aug 4, 2023
235a7b8
test adv indexing on non consecutive dims
ClaudiaComito Aug 4, 2023
4e936e8
abstract scalar key checks for both getitem and setitem
ClaudiaComito Aug 7, 2023
8a74cd9
setitem scalar key
ClaudiaComito Aug 8, 2023
8cf3ff1
DRAFT - abstraction common utilities for getitem and setitem
ClaudiaComito Aug 9, 2023
b45578a
handle all single-element indexing along split axis in same block
ClaudiaComito Aug 9, 2023
cec4bb9
resolve send/recv dimensions mismatch in a few edge cases
ClaudiaComito Aug 10, 2023
cc49a49
transpose self back to original shape after indexing
ClaudiaComito Aug 12, 2023
fe26ae8
add setitem tests
ClaudiaComito Aug 30, 2023
611b46d
Merge branch 'main' into 914_adv-indexing-outshape-outsplit
ClaudiaComito Aug 30, 2023
affdb60
Merge branch 'main' into 914_adv-indexing-outshape-outsplit
ClaudiaComito Nov 27, 2023
7e5be66
Merge branch 'main' into 914_adv-indexing-outshape-outsplit
ClaudiaComito Dec 6, 2023
6d2e369
do not index input unnecessarily for sanitation
ClaudiaComito Dec 7, 2023
f528356
test named split dimension for torch_proxy
ClaudiaComito Dec 7, 2023
01a1140
value broadcasting abstraction
ClaudiaComito Dec 8, 2023
f8264a9
introduce distr sanitation for value when key is ordered
ClaudiaComito Dec 13, 2023
b1cd02f
keep track of original key
ClaudiaComito Dec 13, 2023
31bdb34
fix value broadcasting for advanced setitem
ClaudiaComito Dec 14, 2023
c4d6749
match broadcasting to numpy
ClaudiaComito Dec 16, 2023
5782d6e
finalize broadcast_value and fix test
ClaudiaComito Dec 20, 2023
2174e84
assignment to negative slice along split axis
ClaudiaComito Dec 20, 2023
782bde2
getitem: index underlying tensor with processed key in non-distr case
ClaudiaComito Jan 8, 2024
084371d
setitem: test neg step slice along non-zero split axis
ClaudiaComito Jan 8, 2024
b1aa7aa
allow for nominal value/self split mismatch
ClaudiaComito Jan 8, 2024
1c2b71e
expand test negative step along split axis
ClaudiaComito Jan 8, 2024
7201a89
allow value.ndim > indexed_dims if extra dims are singletons
ClaudiaComito Jan 12, 2024
dfc7266
BROKEN: expand negative step tests
ClaudiaComito Jan 12, 2024
8bbe242
squeeze out singleton dimensions when broadcasting value
ClaudiaComito Jan 15, 2024
00a17e6
fix negative step slicing on 1 process
ClaudiaComito Jan 15, 2024
bdd2dd8
setitem w. dimensional indexing, add tests
ClaudiaComito Jan 16, 2024
1fbd4d6
setitem w. advanced indexing on first dim
ClaudiaComito Jan 17, 2024
95d3c92
setitem: test boolean indexing, local and split=0
ClaudiaComito Jan 17, 2024
f335aa8
fix output shape for boolean indexing w. split>0
ClaudiaComito Jan 18, 2024
d520ddf
setitem with non-ordered, mask-like key and non-distr value
ClaudiaComito Jan 18, 2024
d754a9c
allow for partial boolean indexing on first key.ndim dims of array
ClaudiaComito Jan 19, 2024
5e69fe6
remove unnecessary check
ClaudiaComito Jan 19, 2024
8d9849e
add tests for partial boolean indexing
ClaudiaComito Jan 19, 2024
66ae371
set w. single-tensor key and non-distr value
ClaudiaComito Jan 22, 2024
980e8f0
Merge branch 'main' into 914_adv-indexing-outshape-outsplit
ClaudiaComito Feb 5, 2024
ae4d423
non-ordered, non-mask-like key and local value
ClaudiaComito Feb 5, 2024
b695e5a
broken: set up comm map for full distributed setitem
ClaudiaComito Feb 7, 2024
e6c1e10
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 7, 2024
d42f1cb
implement setitem w. distributed non-ordered key
ClaudiaComito Feb 8, 2024
7868fa0
[skip ci] broken: add tests for distr value non-ordered key
ClaudiaComito Feb 8, 2024
f8055ff
Merge branch '914_adv-indexing-outshape-outsplit' of github.com:helmh…
ClaudiaComito Feb 8, 2024
2944903
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 8, 2024
1bffd26
__process_key(): refactor adv indexing tensor extraction
ClaudiaComito Feb 12, 2024
83842ec
working: setitem w. mask-like adv indexing, non-ordered split key
ClaudiaComito Feb 12, 2024
366aaf9
adapt tests
ClaudiaComito Feb 12, 2024
bbe0a7b
refactor __process_key(): address boolean ind within adv ind
ClaudiaComito Feb 12, 2024
24c6cd1
Merge branch '914_adv-indexing-outshape-outsplit' of github.com:helmh…
ClaudiaComito Feb 12, 2024
1c47b42
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 12, 2024
4ee9b96
getitem: address mask-like key
ClaudiaComito Feb 15, 2024
3c88f8b
Merge branch '914_adv-indexing-outshape-outsplit' of github.com:helmh…
ClaudiaComito Feb 16, 2024
54db23d
Merge branch 'main' into 914_adv-indexing-outshape-outsplit
ClaudiaComito Feb 16, 2024
15cce44
define nonzero_size in non-distr case
ClaudiaComito Feb 18, 2024
09fb199
handle split_bookkeeping when key is mask-like
ClaudiaComito Feb 18, 2024
9c8d051
fix key type mismatch in advanced indexing
ClaudiaComito Feb 18, 2024
41fba0a
getitem: address n-D key along split axis, free memory
ClaudiaComito Feb 18, 2024
e4a90de
balance indexed array before eq()
ClaudiaComito Feb 18, 2024
c8967e7
remove print statements
ClaudiaComito Feb 18, 2024
2d443d8
Merge branch 'main' into 914_adv-indexing-outshape-outsplit
ClaudiaComito Feb 18, 2024
95eaaeb
test adv ind on non-consecutive dims
ClaudiaComito Feb 18, 2024
835a13f
remove print statement
ClaudiaComito Feb 20, 2024
216a1a0
setitem: mixed indexing w. shape broadcasting
ClaudiaComito Feb 20, 2024
b62bad2
expand tests for mixed indexing w. broadcasting
ClaudiaComito Feb 20, 2024
7ea2abe
Merge branch 'main' into 914_adv-indexing-outshape-outsplit
ClaudiaComito Feb 20, 2024
435ff0c
reinstate tests for specific bugs
ClaudiaComito Feb 20, 2024
30efe59
Merge branch 'main' into 914_adv-indexing-outshape-outsplit
ClaudiaComito Mar 8, 2024
e6679a0
Merge branch 'main' into 914_adv-indexing-outshape-outsplit
ClaudiaComito Apr 10, 2024
ad96822
prep send_buffer - expand value dimension if necessary
ClaudiaComito Apr 10, 2024
c9d44ae
fix send_indices dims when key is not mask-like
ClaudiaComito Apr 10, 2024
cc70400
test split mismatch on comm.size > 1
ClaudiaComito Apr 11, 2024
b78de30
broadcasting assignment along split axis
ClaudiaComito Apr 11, 2024
a8f2d57
expand tests
ClaudiaComito Apr 11, 2024
62b2142
Merge branch 'main' into 914_adv-indexing-outshape-outsplit
ClaudiaComito May 2, 2024
ae10038
Created a test file mytest.py
Hakdag97 Nov 6, 2024
3b62d4f
Implementation of parallel initialization
Dec 16, 2024
0889330
Refined comments for better readability
Dec 17, 2024
f826b7e
Merge branch 'main' of github.com:helmholtz-analytics/heat into featu…
Feb 3, 2025
1a89328
Created skeleton for lof.
Hakdag97 Feb 4, 2025
672d32e
Added file for quick-testing parts of the implementation.
Hakdag97 Feb 4, 2025
6043489
Created a first draft of the distance matrix with reduced memory cons…
Hakdag97 Feb 5, 2025
a45bf7c
Added index tracking to cdist_small function. Validation still
Hakdag97 Feb 7, 2025
6bbbdba
Validated results of reduced distance matrix (cdist_small)
Hakdag97 Feb 11, 2025
d895adb
Implemented fit routine for lof
Hakdag97 Feb 20, 2025
7334f36
Test skeleton for reachability distance
Hakdag97 Feb 24, 2025
7bc14d3
Building communication for reachability distance v.0
Hakdag97 Feb 27, 2025
1d901b9
Built communication for reachability distance
Hakdag97 Feb 28, 2025
6d76b0c
Merge branch 'main' into 914_adv-indexing-outshape-outsplit
ClaudiaComito Mar 6, 2025
1686b0a
Validated results
Hakdag97 Mar 7, 2025
8dfe79f
Added unit tests.
Hakdag97 Mar 14, 2025
ef6385f
Refined exceptions
Hakdag97 Mar 14, 2025
57d9dbb
Merge branch 'main' into 914_adv-indexing-outshape-outsplit
ClaudiaComito Mar 19, 2025
86e89ee
Merge branch 'main' into features/1758-Implementation_of_local_outlie…
mrfh92 Mar 21, 2025
58823f6
Merge branch 'main' into 914_adv-indexing-outshape-outsplit
ClaudiaComito Mar 25, 2025
4a5ef0f
edits
ClaudiaComito Mar 25, 2025
7388eb7
Merge branch 'main' into features/1758-Implementation_of_local_outlie…
mrfh92 Mar 27, 2025
243084e
Merge branch 'main' into 914_adv-indexing-outshape-outsplit
ClaudiaComito Mar 28, 2025
4525d54
Started implementation of fully distributed version
Hakdag97 Mar 28, 2025
51150af
Merge branch 'features/1758-Implementation_of_local_outlier_factor' o…
Hakdag97 Mar 28, 2025
d2c0fc4
Merge branch '914_adv-indexing-outshape-outsplit' into features/1758-…
Hakdag97 Mar 28, 2025
941c28e
get rid of torch.tensor warning
ClaudiaComito Mar 28, 2025
fbb3fe5
fix dimension loss
ClaudiaComito Apr 2, 2025
0a120d7
add edge case for boolean mask
ClaudiaComito Apr 2, 2025
994c997
Merge branch 'main' into features/1758-Implementation_of_local_outlie…
mrfh92 Apr 7, 2025
9657746
Merge branch 'main' into 914_adv-indexing-outshape-outsplit
ClaudiaComito Apr 8, 2025
c4e9421
.
Hakdag97 Apr 11, 2025
0b7ca83
Merge branch 'features/1758-Implementation_of_local_outlier_factor' o…
Hakdag97 Apr 11, 2025
98dfc18
Merge branch '914_adv-indexing-outshape-outsplit' into features/1758-…
Hakdag97 Apr 11, 2025
b0bfa08
do not index scalar value
ClaudiaComito Apr 12, 2025
dfb0667
debugging
ClaudiaComito Apr 12, 2025
2e8001a
Merge branch 'main' into 914_adv-indexing-outshape-outsplit
ClaudiaComito Apr 12, 2025
6b588df
Merge branch '914_adv-indexing-outshape-outsplit' into features/1758…
Hakdag97 Apr 14, 2025
1d95084
.
Hakdag97 Apr 14, 2025
606b837
Adjustments according to most recent changes in available advanced in…
Hakdag97 Apr 16, 2025
ae2a5e8
Corrected Deadlock problem with large data sets
Hakdag97 Apr 25, 2025
39ab011
Merge branch 'main' into 914_adv-indexing-outshape-outsplit
ClaudiaComito May 6, 2025
5b64ff0
Added test cases for cdist_small
Hakdag97 May 13, 2025
cd6838e
Added option for chunk-wise computation to reduce memory consumption
Hakdag97 May 13, 2025
b64b63b
Bug fixes
Hakdag97 May 13, 2025
93894fc
adapted communication pattern in cdist_small
Hakdag97 May 20, 2025
ecb6feb
Added non-blocking sending and receiving in cdist_small
Hakdag97 May 20, 2025
5815e98
Bug fix in _chunk_wise_topk
Hakdag97 May 21, 2025
6f1ec62
Merge branch 'main' into features/1758-Implementation_of_local_outlie…
mrfh92 Jun 16, 2025
03a7981
Merge branch 'main' into features/1758-Implementation_of_local_outlie…
mrfh92 Jun 17, 2025
b8de0c6
Added parameter to speed-up computation using pytorch's advanced inde…
Hakdag97 Jun 26, 2025
8301a81
Merge branch 'features/1758-Implementation_of_local_outlier_factor' o…
Hakdag97 Jun 26, 2025
dc70e29
.
Hakdag97 Jul 1, 2025
332d49d
.
Hakdag97 Jul 15, 2025
cbb6e97
Added test case
Hakdag97 Jul 15, 2025
3c29366
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 15, 2025
c9f0514
Made list of nearest neighbors accesible as a class attribute
Hakdag97 Jul 16, 2025
6ff8a04
Merge branch 'features/1758-Implementation_of_local_outlier_factor' o…
Hakdag97 Jul 16, 2025
cb03cb7
Merge branch 'main' into features/1758-Implementation_of_local_outlie…
Hakdag97 Jul 29, 2025
6d848d6
Merge branch 'main' into 914_adv-indexing-outshape-outsplit
ClaudiaComito Oct 29, 2025
cae4670
Merge branch '914_adv-indexing-outshape-outsplit' of github.com:helmh…
ClaudiaComito Oct 31, 2025
9d74da2
I already hate talisman after 1 day
ClaudiaComito Nov 3, 2025
b204589
Merge branch 'main' into 914_adv-indexing-outshape-outsplit
Hakdag97 Nov 10, 2025
5a5ae6e
Fixed __setitem__ bug for unordered split_key
Hakdag97 Nov 28, 2025
0aa3ee0
Fixed bugs causing errors in test_getitem_boolean_fewer_dims
Hakdag97 Nov 28, 2025
36855d7
Bug fixes for test_setitem_edge_cases
Hakdag97 Dec 1, 2025
151d2b2
Further bug fixes
Hakdag97 Dec 1, 2025
960c5dd
All tests are running
Hakdag97 Dec 1, 2025
d67c5a9
Merge branch 'main' into 914_adv-indexing-outshape-outsplit
Hakdag97 Dec 1, 2025
3e0e6a6
Edge case handling for test_indexing intermediate results
Hakdag97 Dec 4, 2025
25e1b34
Fixed test_indexing.py
Hakdag97 Dec 4, 2025
95c72a2
Merge branch 'main' into 914_adv-indexing-outshape-outsplit
Hakdag97 Dec 4, 2025
0f09e7e
Bug fixes in function where()
Hakdag97 Dec 5, 2025
9956639
Merge branch '914_adv-indexing-outshape-outsplit' of github.com:helmh…
Hakdag97 Dec 5, 2025
aadcf35
Edge case handling for slice type keys in __getitem__
Hakdag97 Dec 5, 2025
b0147ce
Debugging tests for clustering - intermediate results
Hakdag97 Dec 8, 2025
f2e168c
Fixed edge case in indexing causing deadlock in kmedoids clustering
Hakdag97 Dec 8, 2025
638d1f8
Delete bug prints
Hakdag97 Dec 8, 2025
466c1f0
Edge case handling for keys like [:, -1], in order to fix test_basics
Hakdag97 Dec 10, 2025
047488c
Bug fixes for test_factories.py
Hakdag97 Dec 10, 2025
376cbb1
Fixed bug in test_cov (wrong balance)
Hakdag97 Dec 12, 2025
50f0ad1
Fixed bug in test_manipulations.py (function tile)
Hakdag97 Dec 12, 2025
c2ce57e
Drop tensor names in function tile
Hakdag97 Dec 12, 2025
595f84a
Handle edge case for test_svd and test_eigh
Hakdag97 Dec 12, 2025
28e46a1
Fix test_knn.py
Hakdag97 Dec 12, 2025
3eac84d
Merge branch 'main' into features/1758-Implementation_of_local_outlie…
Hakdag97 Dec 12, 2025
2c34b36
Added edge case neccessary for local outlier factor
Hakdag97 Dec 12, 2025
d790865
.
Hakdag97 Dec 12, 2025
b9f132e
Merge branch 'main' into 914_adv-indexing-outshape-outsplit
Hakdag97 Dec 12, 2025
ba2a87a
Merge branch '914_adv-indexing-outshape-outsplit' into features/1758-…
Hakdag97 Dec 12, 2025
8a373c9
Fixed device mismatch in process_key
Hakdag97 Dec 15, 2025
3c9ea98
Merge branch '914_adv-indexing-outshape-outsplit' of github.com:helmh…
Hakdag97 Dec 15, 2025
bc6616b
Refine test_dndarray
Hakdag97 Dec 15, 2025
43f73c3
Handling of duplicate advanced indices
Hakdag97 Dec 15, 2025
1fc4f1e
Merge branch 'main' into 914_adv-indexing-outshape-outsplit
Hakdag97 Dec 15, 2025
2240117
.
Hakdag97 Dec 15, 2025
e6256d5
Merge branch '914_adv-indexing-outshape-outsplit' of github.com:helmh…
Hakdag97 Dec 15, 2025
c6332a4
Merge branch '914_adv-indexing-outshape-outsplit' into features/1758-…
Hakdag97 Dec 15, 2025
550f792
Avoid float64 tests in test_basic for mps
Hakdag97 Dec 15, 2025
6da8259
Improved code coverage in tests
Hakdag97 Dec 15, 2025
bde7e69
Merge branch '914_adv-indexing-outshape-outsplit' into features/1758-…
Hakdag97 Dec 15, 2025
3c9b989
Merge branch 'main' into 914_adv-indexing-outshape-outsplit
mrfh92 Dec 17, 2025
1673b00
Merge branch '914_adv-indexing-outshape-outsplit' into features/1758-…
Hakdag97 Dec 17, 2025
0636e03
Robustified tests for lof
Hakdag97 Dec 17, 2025
e9e220c
Raised tolerances for LOF tests
Hakdag97 Dec 18, 2025
343fcee
Debug Ci fails with reduced tolerance in tests for lof and cdist_small
Hakdag97 Dec 18, 2025
6790b6a
Debug prints for CI
Hakdag97 Dec 18, 2025
96092b7
Remove bug prints
Hakdag97 Dec 18, 2025
a68a80a
.
Hakdag97 Dec 18, 2025
dd3762b
Measure against missing CUDA awareness of MPI
Hakdag97 Dec 18, 2025
d0a1324
Fixed typo
Hakdag97 Dec 18, 2025
ca5bb04
Bug fix
Hakdag97 Dec 18, 2025
9961791
Merge branch 'main' into features/1758-Implementation_of_local_outlie…
Hakdag97 Dec 18, 2025
162e85a
Added comment
Hakdag97 Dec 19, 2025
33b2c32
Merge branch 'features/1758-Implementation_of_local_outlier_factor' o…
Hakdag97 Dec 19, 2025
879aa2a
test
Hakdag97 Dec 19, 2025
850e5a9
.
Hakdag97 Dec 19, 2025
cba9f58
Test more memory efficent implementation of cdist_small
Hakdag97 Dec 19, 2025
39c20d1
Refined comments
Hakdag97 Dec 19, 2025
6cfa7db
Adjusted Documentation and test according to review
Hakdag97 Dec 19, 2025
2bfdbc5
Test debugging advanced indexing for dmd
Hakdag97 Dec 20, 2025
0149260
Merge branch 'main' into 914_adv-indexing-outshape-outsplit
Hakdag97 Dec 20, 2025
17446a2
Fixed bug in process_key leading to failing dmd test
Hakdag97 Dec 20, 2025
9aa581e
Robustified edge cases in __process_key
Hakdag97 Dec 20, 2025
0ad0418
Merge branch '914_adv-indexing-outshape-outsplit' into features/1758-…
Hakdag97 Dec 20, 2025
fb07f9c
Consistent tie-break behaviour for arbitrary arbitrary number of MPI …
Hakdag97 Dec 20, 2025
e08dfaf
Extended tests in test_lof.py and test_distance.py
Hakdag97 Dec 21, 2025
2efe6df
Increase test coverage of test_lof.py
Hakdag97 Dec 22, 2025
f6a9447
Merge branch 'main' into features/1758-Implementation_of_local_outlie…
Hakdag97 Jan 5, 2026
bdcc09a
Refined test
Hakdag97 Jan 5, 2026
457b175
Refined test.lof
Hakdag97 Jan 5, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .talismanrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
fileignoreconfig:
- filename: heat/core/dndarray.py
checksum: 6f686fc92dc83c619144cfcde577b8f195213d3c02e9ba63b26760dd799e144d
version: "1.0"
1 change: 1 addition & 0 deletions heat/classification/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
"""Provides classification algorithms."""

from .kneighborsclassifier import *
from .localoutlierfactor import *
6 changes: 3 additions & 3 deletions heat/classification/kneighborsclassifier.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,11 +122,11 @@ def predict(self, x: DNDarray) -> DNDarray:
"""
distances = self.effective_metric_(x, self.x)
_, indices = ht.topk(distances, self.n_neighbors, largest=False)
predictions = self.y[indices.flatten()]

predictions = self.y[indices]
predictions.balance_()
predictions = ht.reshape(predictions, (indices.gshape + (self.y.gshape[1],)))
predictions = ht.reshape(predictions, indices.gshape + (self.y.gshape[1],))
predictions = ht.sum(predictions, axis=1)

self.classes_ = ht.argmax(predictions, axis=1)

return self.classes_
304 changes: 304 additions & 0 deletions heat/classification/localoutlierfactor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,304 @@
"""Implementation of the Local Outlier Factor (LOF) algorithm"""

import heat as ht
import torch
import warnings
from heat.core import types
from mpi4py import MPI
from heat.core.dndarray import DNDarray
from heat.spatial.distance import cdist, cdist_small, _euclidian, _manhattan, _gaussian

__all__ = ["LocalOutlierFactor"]


class LocalOutlierFactor:
"""
Class for the Local Outlier Factor (LOF) algorithm. The LOF algorithm is a density-based outlier detection method.

Parameters
----------
n_neighbors : int, optional (default=20)
Number of neighbors used to calculate the density of points in the lof algorithm. Denoted as MinPts in [1].
metric : str, optional (default=_euclidian)
The distance metric to use for the tree.
binary_decision : string, optional
Defines which classification method should be used:
- "threshold": everything greater or equal to the specified threshold is considered an outlier.
- "top_n": the data points with the ``top_n`` largest outlier scores are considered outliers.
Default is "threshold".
threshold : float, optional
The threshold value for the "threshold" method. Default is 1.5.
top_n : int, optional
The number of top outliers for the "top_n" method. Default is 10.

Attributes
----------
n_neighbors : int
Number of neighbors used to calculate the density of points in the lof algorithm. Denoted as MinPts in [1].
binary_decision: string
Method that converts lof score into a binary decision of outlier and non-outlier. Can be "threshold" or "top_n".
metric : str
The measure of the distance. Can be "euclidian", "manhattan", or "gaussian".
threshold : float
The threshold value for the "threshold" method used for binary classification.
top_n : int
The number of top outliers for the "top_n" method used for binary classification.
lof_scores : DNDarray
The local outlier factor for each sample in the data set.
anomaly : DNDarray
Array with binary outlier classification (1 -> outlier, -1 -> inlier).
chunks : int
Compute the distance matrix iteratively in chunks to reduce memory consumption (but with larger runtime).
For ``chunks``= 2: first compute one half of the distance matrix and then the second half.
fully_distributed : bool
Decides whether to distribute auxiliary vectors during the computation among all MPI processes.
Only set to True for a very large number of data points that may already cause memory issues on their own.
True is more memory efficient, but much slower than False due to large communication overhead.
idx_n_neighbors : DNDarray
Indices of nearest neighbors for each sample in the data set.

References
----------
[1] Breunig, M. M., Kriegel, H. P., Ng, R. T., & Sander, J. (2000). LOF: identifying density-based local outliers.
"""

def __init__(
self,
n_neighbors=20,
metric="euclidian",
binary_decision="threshold",
threshold=1.5,
chunks=1,
top_n=None,
fully_distributed=False,
):
self.n_neighbors = n_neighbors
self.binary_decision = binary_decision
self.threshold = threshold
self.top_n = top_n
self.lof_scores = None
self.anomaly = None
self.metric = metric
self.chunks = chunks
self.fully_distributed = fully_distributed
self.idx_n_neighbors = None

self._input_sanitation()

def fit(self, X: DNDarray):
"""
Fit the LOF model to the data.

Parameters
----------
X : DNDarray
Data points.
"""
# Compute the LOF for each sample in X
self._local_outlier_factor(X)
# Classifying the data points as outliers or inliers
self._binary_classifier()

def _local_outlier_factor(self, X: DNDarray):
"""
Compute the LOF for each sample in X.

Parameters
----------
X : DNDarray
Data points.
"""
# number of data points
length = X.shape[0]

# input sanitation
# If n_neighbors is larger than or equal the number of samples, continue with the whole sample when evaluating the LOF
if self.n_neighbors >= length:
self.n_neighbors = length - 1 # length of data is n_neighbors + the point itself
# [1] suggests a minimum of 10 neighbors
if length <= 10:
raise ValueError(
f"The data set is too small for a reasonable LOF evaluation. The number of samples should be larger than 10, but was {X.shape[0]}."
)

# Compute the distance matrix for the n_neighbors nearest neighbors of each point and the corresponding indices
# (only these are needed for the LOF computation).
size = X.comm.Get_size()

# If the amount of chosen neighbors is larger than the number of samples per process, one can use the classic cdist function
if self.n_neighbors + 1 > length // size:
dist, idx = ht.topk(
cdist(X), k=self.n_neighbors + 1, sorted=True, largest=False
) # cdist stores also the distance of each point to itself, therefore use n_neighbors+1
else:
# Note that cdist_small sorts from the lowest to the highest distance
dist, idx = cdist_small(
X, X, metric=self.metric, n_smallest=self.n_neighbors + 1, chunks=self.chunks
) # cdist_small stores also the distance of each point to itself, therefore use n_neighbors+1

# Extract the k-distance and the indices of the k-nearest neighbors
k_dist = dist[:, -1]
idx_neighbors = idx[:, 1 : self.n_neighbors + 1]
# Make the indices of the n-nearest neighbors available for a use outside this function
self.idx_n_neighbors = idx_neighbors

k_dist_neighbors = self._advanced_indexing(k_dist, idx_neighbors)

# Compute the reachability distance for each point
reachability_dist = ht.maximum(k_dist_neighbors, dist[:, 1 : self.n_neighbors + 1])

# Compute the local reachability density (lrd) for each point
lrd = 1 / (
ht.mean(reachability_dist, axis=1) + 1e-10
) # add 1e-10 to avoid division by zero (important for many duplicates in data)

# Calculate the local reachability distance for each point's neighbors
lrd_neighbors = self._advanced_indexing(lrd, idx[:, 1 : self.n_neighbors + 1])

lof = ht.mean(lrd_neighbors, axis=1) / lrd

self.lof_scores = lof

def _binary_classifier(self):
"""
Binary classification of the data points as outliers (1) or inliers (-1) based on their non-binary LOF. According to the method,
the data points are classified as outliers if their LOF is greater or equal to a specified threshold or if they have one
of the top_n largest LOF scores.
"""
if self.binary_decision == "top_n":
# Determine the threshold based on the top_n largest LOF scores
self.threshold = ht.topk(self.lof_scores, k=self.top_n, sorted=True, largest=True)[0][
-1
]
# Classify anomalies based on the threshold value
self.anomaly = ht.where(self.lof_scores >= self.threshold, 1, -1)

def _advanced_indexing(self, A: DNDarray, idx: DNDarray) -> DNDarray:
"""
Perform advanced indexing on a distributed DNDarray, allowing for optional runtime optimization.

This function handles advanced indexing for distributed DNDarrays. It supports two modes:
1. Fully distributed mode (`fully_distributed=True`): handles indexing in a completely distributed manner.
This mode is memory safe but rather slow.
2. Local mode (`fully_distributed=False`):uses local arrays (torch tensors) to perform indexing
efficiently, assuming that local arrays of dimension (A.shape[0], `n_neighbors`) fit into memory.

Parameters
----------
A : DNDarray
The input DNDarray to be indexed.
idx : DNDarray
The indices used for advanced indexing.

Returns

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, we dont use Returns as section
(also at the other functions)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adapted this in all functions

-------
indexed_A : DNDarray
The result of advanced indexing on the input array.
"""
# Using heat's advanced indexing for large data set
if self.fully_distributed is True:
indexed_A = A[idx]
# Use local arrays, i.e., torch.tensors, to reduce runtime while indexing
# (only possible if all local arrays defined below fit into memory)
else:
split = A.split
type = A.dtype
# Use none-split arrays to reduce communication overhead
A_ = A.resplit_(None).larray.contiguous()
idx_ = idx.resplit_(None).larray.contiguous()
# Apply standard advanced indexing
indexed_A_ = A_[idx_]
# Convert the result back to a distributed DNDarray
indexed_A = ht.array(indexed_A_, split=split, dtype=type)
return indexed_A

def _map_idx_to_proc(self, idx, comm):
"""
Auxiliary function to map indices to the corresponding MPI process ranks.

This function takes an array of indices and determines which MPI process
each index belongs to, based on the distribution of data across processes.
It returns an array where each index is replaced by the rank of the process
that contains the corresponding data.

Parameters
----------
idx : DNDarray
The array of indices to be mapped to MPI process ranks. The array should
be distributed along the first axis (split=0).
comm: MPI.COMM_WORLD
The MPI communicator.

Returns
-------
mapped_idx : DNDarray
An array of the same shape as `idx`, where each index is replaced by the
rank of the MPI process that contains the corresponding data.
"""
size = comm.Get_size()
_, displ, _ = comm.counts_displs_shape(idx.shape, idx.split)
mapped_idx = ht.zeros_like(idx)
for rank in range(size):
lower_bound = displ[rank]
if rank == size - 1: # size-1 is the last rank
upper_bound = idx.shape[0]
else:
upper_bound = displ[rank + 1]
mask = (idx >= lower_bound) & (idx < upper_bound)
mapped_idx[mask] = rank
return mapped_idx

def _input_sanitation(self):
"""
Check if the input parameters are valid and raise warnings or exceptions.
"""
# check number of neighbors, [1] suggests n_neighbors >= 10
if self.n_neighbors < 1:
raise ValueError(f"n_neighbors must be great one. but was {self.n_neighbors}.")
if self.n_neighbors < 10 and self.n_neighbors > 100:
warnings.warn(
f"For reasonable results n_neighbors is expected between 10 and 100, but was {self.n_neighbors}.",
UserWarning,
)

# check for correctly binary decision method
if self.binary_decision not in ["threshold", "top_n"]:
raise ValueError(
f"Unknown method for binary decision: {self.binary_decision}. Use 'threshold' or 'top_n'."
)

# check if the top_n parameter is specified when using the top_n method
if self.binary_decision == "top_n":
if self.threshold != 1.5:
warnings.warn(
"You are specifying the parameter threshold, although binary_decision is set to 'top_n'. The threshold will be ignored.",
UserWarning,
)
if self.top_n is None:
raise ValueError(
"For binary decision='top_n', the parameter 'top_n' has to be specified."
)
if self.top_n < 1:
raise ValueError("The number of top outliers should be greater than one.")

if self.binary_decision == "threshold":
if self.threshold <= 1 or self.threshold is None:
raise ValueError("The threshold should be greater than one.")
if self.top_n is not None:
warnings.warn(
"You are specifying the parameter top_n, although binary_decision is set to 'threshold'. The value of top_n will be ignored.",
UserWarning,
)

# check for valid metric
valid_metrics = ["euclidian", "gaussian", "manhattan"]
if self.metric not in valid_metrics:
raise ValueError(f"Invalid metric '{self.metric}'. Must be one of {valid_metrics}.")

# replace the name of the metric with the corresponding function
if self.metric == "gaussian":
self.metric = _gaussian
elif self.metric == "manhattan":
self.metric = _manhattan
elif self.metric == "euclidian":
self.metric = _euclidian
Loading