[Feature Request]: Nan Values for correlation and cross correlation #14414

Jeanselme · 2019-09-03T13:56:06Z

Would it be possible to automatically ignore the nan values when computing np.corrcoef or np.correlate ? We could create a function like np.nan_correlate.

In the case of corrcoef it is straight forward and can be solved by ignoring the nan values of both arrays, however in the convolution setting, it might have different lag on the two series which would create unwanted results.

Current Behavior

import numpy as np

a = np.array([0, 1, 2, 3, 4])
b = np.array([0, 1, np.nan, 3, 4])

np.correlate(a, b, 'full')

Returns array([ 0., 4., nan, nan, nan, nan, nan, 4., 0.])

It would be useful in some case to return :
Returns array([ 0., 4., 11., 18., 26., 18., 11., 4., 0.])

This is just ignoring any nan in the summation

Harry-Kwon · 2019-10-12T08:45:10Z

I'd like to work on adding an optional argument to those functions to ignore NaN values.

Added a new user facing function nancov, which calculates the covariance of variables while ignoring nan values. Partially addresses features requested in issue numpy#14414 and improves upon PR numpy#14688.

aleksejs-fomins · 2020-02-10T11:14:45Z

First of all, I would appreciate this functionality as a user. Thank you for bringing this up

Secondly, please be careful when implementing this. For many applications, the interesting quantity is not the correlation coefficient itself, but the mean correlation coefficient corr(x, y) / len(x). So far, the users have manually removed nan's before processing, which is hard, but correct. Now, if a user has nan's in the data and they are implicitly dropped by the correlate function, the user might proceed to unsuspectingly divide the correlation by len(x), whereas they should only be dividing by the length of the non-nan part of the sum. So firstly, I suggest that correlate should throw a warning if there are Nan's in the arguments. Secondly, perhaps it makes sense to implement a mean correlation function (e.g. correlate_mean(x,y,...) ), which would divide the overlap by its non-nan length.

rossbar · 2020-07-23T01:25:16Z

The addition of more nan* functions has been discussed and the current consensus is against doing so, so I will close this for now. If you are interested in pursuing the feature request, consider bringing it up on the mailing list (you can link to the issues for context).

Harry-Kwon mentioned this issue Oct 13, 2019

ENH: Add optional parameter to ignore nan values in np.cov and np.corrcoef #14688

Closed

Harry-Kwon mentioned this issue Oct 26, 2019

ENH: Add new function nancov #14784

Closed

rossbar changed the title ~~Nan Values for correlation and cross correlation~~ [Feature Request]: Nan Values for correlation and cross correlation Jul 23, 2020

rossbar closed this as completed Jul 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Nan Values for correlation and cross correlation #14414

[Feature Request]: Nan Values for correlation and cross correlation #14414

Jeanselme commented Sep 3, 2019

Harry-Kwon commented Oct 12, 2019

aleksejs-fomins commented Feb 10, 2020

rossbar commented Jul 23, 2020

[Feature Request]: Nan Values for correlation and cross correlation #14414

[Feature Request]: Nan Values for correlation and cross correlation #14414

Comments

Jeanselme commented Sep 3, 2019

Current Behavior

Harry-Kwon commented Oct 12, 2019

aleksejs-fomins commented Feb 10, 2020

rossbar commented Jul 23, 2020