New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add new function nancov #14784
ENH: Add new function nancov #14784
Conversation
Added a new user facing function nancov, which calculates the covariance of variables while ignoring nan values. Partially addresses features requested in issue numpy#14414 and improves upon PR numpy#14688.
We discussed this at the triage meeting - let's ping the mailing list to see if there's any discussion. There is apparently similar functionality in |
#13198 (comment) still applies I think. This PR predates that discussion, and it hasn't moved in 3 years. My preference would be to not do this. |
The main problem with the pairwise covariance calculation is that the guarantee of a positive definite matrix is no longer satisfied. It would need to be fixed up (e.g. finding the positive-definite matrix that is closest, in a Frobenius-norm sense, via a convex optimization problem) in order to be used by many downstream operations. The relationship between the covariance and correlation matrix will also be messed up because the scaling of the off-diagonal elements will be based on different data than the diagonal elements. You would need a corresponding Overall, given the subtleties of the statistical issues surrounding this topic, and the implication of several more functions to handle them, I'd probably suggest that The code would need another pass. There are leftover |
We have an equivalent of A key question would be precisely defining what is means to "skip NaN" in a covariance calculation. If there is a mixture of NaN and non-NaN observations for different variables, how do those get handled? |
This PR introduced two such options. Both are sensible and used in different situations (and available in Since the author has withdrawn the PR, unless if they want to champion it again, I think we can leave it settled. |
Does SciPy still have |
Added a new user facing function nancov, which calculates the covariance
of variables while ignoring nan values. Partially addresses features
requested in issue #14414 and improves upon PR #14688.