Skip to content

nonzero, where fixes from #938#2332

Open
ClaudiaComito wants to merge 4 commits into
mainfrom
features/nonzero-updates
Open

nonzero, where fixes from #938#2332
ClaudiaComito wants to merge 4 commits into
mainfrom
features/nonzero-updates

Conversation

@ClaudiaComito

Copy link
Copy Markdown
Member

Due Diligence

  • General:
  • Implementation:
    • unit tests: all split configurations tested
    • unit tests: multiple dtypes tested
    • NEW unit tests: MPS tested (1 MPI process, 1 GPU)
    • benchmarks: created for new functionality
    • benchmarks: performance improved or maintained
    • documentation updated where needed

Description

Issue/s resolved: #

Changes proposed:

Type of change

Memory requirements

Performance

Does this change modify the behaviour of other functions? If so, which?

yes / no

Comment thread heat/core/indexing.py Outdated
Comment thread heat/core/indexing.py
Comment thread heat/core/indexing.py Outdated
Comment thread heat/core/indexing.py Outdated
Comment thread heat/core/indexing.py Outdated
Comment thread heat/core/indexing.py
)
# vectorized sorting of nz indices along axis 0
global_nonzero.balance_()
global_nonzero = manipulations.unique(global_nonzero, axis=0)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? Seems like duplicate entries would be a bug at this point. Or are there some side effects of unique here?

Comment thread tests/core/test_indexing.py Outdated
self.assertEqual(len(wh), 2)
self.assertEqual(wh[0].gshape[0], 6)
self.assertEqual(wh[0].dtype, ht.int64)
self.assertEqual(wh[0].split, None)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this could be compared to numpy and torch too?

@github-project-automation github-project-automation Bot moved this from Todo to In Progress in Roadmap Jun 2, 2026
@brownbaerchen

Copy link
Copy Markdown
Collaborator

The issues arise because the API of nonzero is totally changed in this PR. See the following example:

import heat as ht
import torch
import numpy as np

a = ht.arange(7)

print(torch.nonzero(a.larray > 3))
# tensor([[4],
#         [5],
#         [6]])

print(np.nonzero(a.numpy() > 3))
# (array([4, 5, 6]),)

print(ht.nonzero(a > 3))
# on main: DNDarray([4, 5, 6], dtype=ht.int64, device=cpu:0, split=None)
# on features/nonzero-updates: (DNDarray(MPI-rank: 0, Shape: (3,), Split: None, Local Shape: (3,), Device: cpu:0, Dtype: int64, Data:
#                                        [4, 5, 6]),)

# new on features/nonzero-updates:
print(ht.nonzero(a > 3, as_tuple=False))
# DNDarray([[4],
#           [5],
#           [6]], dtype=ht.int64, device=cpu:0, split=None)

That is to say the previous API was neither numpy, nor torch and this PR changes it to default to numpy but with the option to get the torch behavior.

This is a good thing in principle, but I don't like to silently change the API. So, yes it's annoying that the current main relies on the previous API in a bunch of places and doesn't work with only these changes, but also we should clearly indicate this as a breaking, API-changing change.

We have two options:

Either way, we need to mention the API changes in the release notes. What do you think, @JuanPedroGHM, @mtar, @ClaudiaComito?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working indexing

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants