Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Do not raise deprecation warning for all nans in unique #19301

Merged
merged 4 commits into from Jun 29, 2021

Conversation

thomasjpfan
Copy link
Contributor

@thomasjpfan thomasjpfan commented Jun 22, 2021

Fixes gh-19300

This PR adjusts np.unique for the edge cases where all values are nan.

@seberg seberg added this to the 1.21.1 release milestone Jun 23, 2021
@charris charris changed the title BUG: Do not raise depreation warning for all nans in unique BUG: Do not raise deprecation warning for all nans in unique Jun 23, 2021
@charris charris added the 09 - Backport-Candidate PRs tagged should be backported label Jun 25, 2021
@seberg
Copy link
Member

seberg commented Jun 29, 2021

Thanks @thomasjpfan, LGTM.

@seberg seberg merged commit 6ff787b into numpy:main Jun 29, 2021
charris pushed a commit to charris/numpy that referenced this pull request Jul 1, 2021
…9301)

This PR adjusts np.unique for the edge cases where all values are nan.

Fixes numpygh-19300
@charris charris removed this from the 1.21.1 release milestone Jul 1, 2021
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Jul 1, 2021
charris added a commit that referenced this pull request Jul 1, 2021
BUG: Do not raise deprecation warning for all nans in unique (#19301)
@mattip
Copy link
Member

mattip commented Aug 2, 2021

This changed the result of np.unique(np.full(5, np.nan, dtype=np.float32)). It used to be array([nan, nan, nan, nan, nan], dtype=float32) but after this PR it is array([nan], dtype=float32). I think the previous value was correct: nan is always unique.

@seberg
Copy link
Member

seberg commented Aug 2, 2021

Oh, I missed that :(. @thomasjpfan do you want to have a look? If there is an argument to be made for returning only one NaN, that would indeed be a separate thing.

@seberg
Copy link
Member

seberg commented Aug 2, 2021

Sorry, this should have been on #18070 which changed the behaviour. Maybe that made this fix necessary though.

@thomasjpfan
Copy link
Contributor Author

If there is an argument to be made for returning only one NaN, that would indeed be a separate thing.

In scikit-learn, we have a custom _unique to return only one NaN. From a practical machine learning point of view the nans should be treated all the same.

@mattip
Copy link
Member

mattip commented Aug 2, 2021

I came across this from a comparison of the Array API spec to NumPy. The spec returns multiple nans for unique, as does PyTorch:

>>> torch.unique(torch.tensor([math.nan]*5))
tensor([nan, nan, nan, nan, nan])

but that may be because NumPy used to also return multiple nans. Maybe chicken-and-egg problem?

@seberg
Copy link
Member

seberg commented Aug 2, 2021

Yeah, we changed in 1.21, it has a release note, but I don't think the list was pinged on it, which would have been good. I have send a mail to the list now. I lean towards just rolling with it, but lets see what everyone thinks, maybe I am missing something anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

np.unique raises a deprecation warning when all entries are nan.
4 participants