Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add new function nancov #14784

Closed
wants to merge 2 commits into from
Closed

Conversation

Harry-Kwon
Copy link

Added a new user facing function nancov, which calculates the covariance
of variables while ignoring nan values. Partially addresses features
requested in issue #14414 and improves upon PR #14688.

Added a new user facing function nancov, which calculates the covariance
of variables while ignoring nan values. Partially addresses features
requested in issue numpy#14414 and improves upon PR numpy#14688.
Base automatically changed from master to main March 4, 2021 02:04
@InessaPawson InessaPawson added triage review Issue/PR to be discussed at the next triage meeting and removed 61 - Stale labels Jun 16, 2022
@InessaPawson InessaPawson changed the title ENH: added new function nancov ENH: Add new function nancov Nov 2, 2022
@rossbar
Copy link
Contributor

rossbar commented Nov 16, 2022

We discussed this at the triage meeting - let's ping the mailing list to see if there's any discussion. There is apparently similar functionality in astropy, which may be an argument for upstreaming.

@rossbar rossbar removed the triage review Issue/PR to be discussed at the next triage meeting label Nov 16, 2022
@rgommers
Copy link
Member

#13198 (comment) still applies I think. This PR predates that discussion, and it hasn't moved in 3 years. My preference would be to not do this.

@rkern
Copy link
Member

rkern commented Dec 14, 2022

The main problem with the pairwise covariance calculation is that the guarantee of a positive definite matrix is no longer satisfied. It would need to be fixed up (e.g. finding the positive-definite matrix that is closest, in a Frobenius-norm sense, via a convex optimization problem) in order to be used by many downstream operations. The relationship between the covariance and correlation matrix will also be messed up because the scaling of the off-diagonal elements will be based on different data than the diagonal elements. You would need a corresponding nancorrcoeff() to do a similar pairwise estimation of the correlation coefficients.

Overall, given the subtleties of the statistical issues surrounding this topic, and the implication of several more functions to handle them, I'd probably suggest that scipy.stats would be a better place.

The code would need another pass. There are leftover print() calls and unidiomatic indexing.

@Harry-Kwon Harry-Kwon closed this Dec 14, 2022
@Harry-Kwon Harry-Kwon deleted the add-nancov branch December 14, 2022 15:30
@shoyer
Copy link
Member

shoyer commented Dec 15, 2022

We have an equivalent of nancov in Xarray (https://docs.xarray.dev/en/stable/generated/xarray.cov.html), although the interface is sufficiently different that we would not be able to use numpy.nancov.

A key question would be precisely defining what is means to "skip NaN" in a covariance calculation. If there is a mixture of NaN and non-NaN observations for different variables, how do those get handled?

@rkern
Copy link
Member

rkern commented Dec 16, 2022

This PR introduced two such options. Both are sensible and used in different situations (and available in R), but there are more options that suggests to me to push it out to scipy.stats and these other libraries.

Since the author has withdrawn the PR, unless if they want to champion it again, I think we can leave it settled.

@bashtage
Copy link
Contributor

Does SciPy still have nan functions? It looks like these were removed. While I think this is useful, I don't think it is very hard to implement in pure python when needed, especially if using the drop rows option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

None yet

9 participants