-
-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add optional parameter to ignore nan values in np.cov and np.corrcoef #14688
Conversation
Perhaps this feature should be called nancov |
I'll work on adding a np.nancov function with an option to either ignore entire observations with a nan value or to use pariwise rows with no nan values. |
The hard part of designing multivariate nan-aware APIs is the decision of how nan's should be treated if not uniform. If I have three variables
and I call
which is the pairwise nan-dropped correlation? This is what pandas returns, for example. Similarly, should |
@bashtage I was thinking of making the default behavior use only rows with all non-missing values, and add an optional So given
Calling
Whereas |
non-nan values If all rows in x have at least 1 nan values, np.cov(x, ignore_nan=True) will return a n by n array of nans where n is the number of rows(variables) in x.
I've pushed my version of nancov to this branch. I would greatly appreciate some input on the documentation and default behavior from someone who understand the subject and use cases better than I do. (feel free to fork the branch and make a new PR if needed) |
Added a new user facing function nancov, which calculates the covariance of variables while ignoring nan values. Partially addresses features requested in issue numpy#14414 and improves upon PR numpy#14688.
Addresses one of the features suggesting in issue #14414.
Adds an optional parameter
ignore_nan=False
tonp.cov
andnp.corrcoef
that allows any observations where any variable isnp.nan
to be ignored.Example case:
Previous output (without
ignore_nan
):Added behavior (with
ignore_nan=True
):