Expand distributed indexing, match numpy indexing scheme#938
Expand distributed indexing, match numpy indexing scheme#938ClaudiaComito wants to merge 271 commits into
Conversation
for more information, see https://pre-commit.ci
brownbaerchen
left a comment
There was a problem hiding this comment.
I had a quick look through all the stuff that is not actually the advanced indexing. I think it would be good to clean up this PR by
- Removing any changes we don't want to keep at all
- Separate PR with refactoring of basic tests
- Separate PR with changes to
non_zero - Separate PR with adding keyword arguments to DNDarray instantiation
These separate PRs can be merged very quickly and then the PR does only what it promises to and is easier to review.
|
|
||
|
|
||
| def nonzero(x: DNDarray) -> DNDarray: | ||
| def nonzero(x: DNDarray, as_tuple: bool = True) -> tuple[DNDarray, ...] | DNDarray: |
There was a problem hiding this comment.
I think the changes to this function belong in its own PR since it seems unrelated to advanced indexing and could be merged quickly.
There was a problem hiding this comment.
I think the changes to this function belong in its own PR since it seems unrelated to advanced indexing and could be merged quickly.
In principle you're right and I agree, but in practice the changes and tests are not so easy to disentangle from the new indexing capabilities. If you want to go for it, I've started PR #2332 but I won't spend time on it.
|
|
||
| # 1D boolean mask resolution | ||
| first = key[0] if isinstance(key, tuple) and len(key) >= 1 else key | ||
| if isinstance(first, (DNDarray, torch.Tensor, np.ndarray)) and arr.ndim >= 1: |
There was a problem hiding this comment.
I think it would be nice to cast numpy arrays and torch tensors to DNDarray in the beginning of this function. Then we always know we have a DNDarray and don't have to worry about stuff like numel or size.
I think it would be nice if we do:
- Early out for some special things that we need to be fast
- Cast array keys to DNDarray such that we have a key that is a tuple of ellipses, slices, integers, or DNDarrays
- Any further processing of keys
What do you think, @ClaudiaComito? Would that make sense?
for more information, see https://pre-commit.ci
Co-authored-by: Thomas Saupe <39156931+brownbaerchen@users.noreply.github.com>
for more information, see https://pre-commit.ci
* First small cleanup * Another small simplification
Description
This pull request introduces a significant overhaul of distributed indexing within
dndarray.py, specifically targeting the__getitem__and__setitem__methods. The primary objective is to achieve full NumPy indexing compliance in a distributed environment while minimizing MPI overhead and memory footprint.The logic has been refactored to identify zero-communication paths ("early out"), and route heavy unordered advanced indexing through optimized communication.
The following table shows the distribution semantics of DNDarray indexing operations.
UPDATED 26.5.2026
array[key]array[key]splitaxis and balanced status directly from the distributed key.array[key]Yes for slices/masks. Unordered local advanced indices are automatically distributed across the split axis under the hood.
array[key]distr_maskfast-path or triggers__getitem_unorderedfor cross-node MPI collective fetching.array[key] = valarray[key] = valarray[key] = valvalue's split axis doesn't match the target's split axis, aRuntimeErroris raised. If they do match,valueis dynamically load-balanced (redistribute_) to match the target's chunk sizes before assignment.array[key] = valarray[key] = valarray[key] = valAlltoallvshuffle to assign elements to their global unordered indices.Routing logic
UPDATED 26.5.2026
graph TD Start((Receive Key)) --> CheckScalar{Is key a pure scalar<br/>and not boolean?} CheckScalar -- Yes --> EvalRoot{Compute root} EvalRoot --> OpScalar[op_type = 'scalar'] CheckScalar -- No --> CheckFastPath{Matches distr_mask<br/>fast path?} CheckFastPath -- Yes & not tuple --> OpDistrMask1[op_type = 'distr_mask'] CheckFastPath -- No / Tuple --> Normalize[Normalize keys, extract bounds,<br/>check dimensionality & broadcast] Normalize --> FinalRouting{Evaluate Key State} FinalRouting -->|root is not None| OpScalar2[op_type = 'scalar'] FinalRouting -->|split_key_is_ordered == 0| OpDist[op_type = 'distributed'<br/>Unordered MPI Communication] FinalRouting -->|split_key_is_ordered == -1| OpDesc[op_type = 'descending_slice'] FinalRouting -->|key_is_mask_like == True| MaskTypeCheck{distr_mask_fast_path?} MaskTypeCheck -- Yes --> OpDistrMask2[op_type = 'distr_mask'] MaskTypeCheck -- No --> OpLocalMask[op_type = 'local_mask'] FinalRouting -->|Default / Ordered| OpAdv[op_type = 'advanced'<br/>Local Fast Path] %% Map to actual handlers subgraph Handlers [Target Routing Methods] OpScalar & OpScalar2 --> H_Scalar[__getitem_scalar<br/>__setitem_scalar] OpDist --> H_Dist[__getitem_advanced_distributed<br/>__setitem_advanced_distributed] OpDesc --> H_Desc[__getitem_descending_slice_distributed<br/>__setitem_descending_slice_distributed] OpDistrMask1 & OpDistrMask2 --> H_DistMask[__getitem_mask<br/>__setitem_mask] OpLocalMask --> H_LocalMask[__getitem_advanced_local<br/>__setitem_advanced_local] OpAdv --> H_Adv[__getitem_advanced_local<br/>__setitem_advanced_local] end %% Styling classDef target fill:#d4edda,stroke:#28a745,stroke-width:2px; class H_Scalar,H_Dist,H_Desc,H_DistMask,H_LocalMask,H_Adv target;Main changes
To Be Continued...
Memory footprint
Scaling behaviour
Issue/s resolved: #703 #914 #918 #1012 #1019 #2135 #1816 #824
Changes proposed:
Type of change
Memory requirements
Performance
Due Diligence
Does this change modify the behaviour of other functions? If so, which?
yes / no
skip ci