scipy.stats.pearsonr conflict with matplotlib #13035

FarnoodF · 2020-11-03T18:24:54Z

Calculating two pearsonrs and then plotting the arrays using matplotlib and adding ylim forces the first pearsonr to nan.

Reproducing code example:

import numpy as np
from scipy.stats import pearsonr
import matplotlib.pyplot as plt

a = np.random.randn(1000)
b = np.random.randn(1000)
c = np.random.randn(1000)

plt.figure()
plt.plot(a, b, '.')
plt.ylim(-1, 1)
plt.show()

p1 = pearsonr(a, b)[0]
p2 = pearsonr(a, c)[0]
print('p1: %f - p2: %f' % (p1, p2))

Scipy/Numpy/Python version information:

1.5.3 1.18.4 sys.version_info(major=3, minor=8, micro=3, releaselevel='candidate', serial=1)

The text was updated successfully, but these errors were encountered:

WarrenWeckesser · 2020-11-03T19:07:53Z

@FarnoodF, could you include an example that shows exactly how you get a nan? When I add the appropriate imports to your code, I don't see any problems (but I didn't check with exactly the same versions of the packages that you reported).

FarnoodF · 2020-11-03T19:17:03Z

@WarrenWeckesser Thanks for the reply.
I updated the code in my original comment.
I run this (with the versions mentioned) and I get the first pearsonr nan. You run it again without plt.ylim(...) and it's the correct value.

WarrenWeckesser · 2020-11-03T19:39:55Z

I switched my numpy to 1.18.4, and tried again. I'm on Mac OSX, with Python 3.8.3 installed with Miniconda, and with numpy 1.18.4, scipy 1.5.3, and matplotlib 3.3.2 installed with pip (not conda). With or without the ylim call, I don't get nan from pearsonr.

Can anyone else reproduce the problem reported here?

mdhaber · 2020-11-03T19:55:40Z

Windows Conda Python 3.8.3 SciPy 1.5.0 Numpy 1.18.5 Matplotlib 3.2.2 No nan in output

MarcinKonowalczyk · 2020-11-05T22:16:09Z

No `Nan's in python 3.7.3, 3.8.3 and 3.8.6 on macOS (Catalina 10.15.7)

Scipy/Numpy set to 1.5.3/1.18.4 respectively.

3.7.3 - sys.version_info(major=3, minor=7, micro=3, releaselevel='final', serial=0)
3.8.3 - sys.version_info(major=3, minor=8, micro=3, releaselevel='final', serial=0) (I could not get the releaselevel and serial set with pyenv.
3.8.6 - sys.version_info(major=3, minor=8, micro=6, releaselevel='final', serial=0)

@FarnoodF what is your Matplotlib version? Let's check with that too. Actually, could you possibly paste the output of pip-chill?

pip install -Uq pip && pip install -Uq pip-chill && pip-chill -v | grep -v pip-chill

Explanation: Quietly upgrade pip, if successful quietly install pip-chill, if successful run pip-chill verbosely and drop lines which contain pip-chill)

FarnoodF · 2020-11-09T20:18:26Z

Thank you guys,
I tried it with my MacOS and it works fine, too. I get this issue only on my windows machine.

My matplotlib version is 3.3.2.

And here is the pip-chill output:

-atplotlib==3.3.2
autoreject==0.2.1
brainflow==3.6.0
jupyter==1.0.0
jupyterthemes==0.20.0
meegkit==0.1
mne==0.21.1
oct2py==5.2.0
pip-chill==1.0.0
progress==1.5
psychopy==2020.2.5
pyarrow==2.0.0
pydrive==1.3.1
pygame==2.0.0
pygetwindow==0.0.9
pymanopt==0.2.5
pyopenssl==19.1.0
pyqtgraph==0.11.0
pyriemann==0.2.6
pyxdf==1.16.3
seaborn==0.11.0
statsmodels==0.12.1
tabulate==0.8.7
tensorflow==2.3.1
xlsxwriter==1.3.7

MarcinKonowalczyk · 2020-11-11T16:48:33Z

I can confirm that it works fine (aka no NaNs) on MacOS (with the same python version, and all the same packages installed).

mdhaber · 2020-11-21T08:22:33Z

Hi @FarnoodF, are you sure this is a SciPy issue? Can you please post the output of:

import numpy as np
from scipy.stats import pearsonr
import matplotlib.pyplot as plt

np.random.seed(0)
a = np.random.randn(1000)
b = np.random.randn(1000)
c = np.random.randn(1000)
d = a.copy()
e = b.copy()
f = c.copy()

plt.figure()
plt.plot(a, b, '.')
plt.ylim(-1, 1)
plt.show()

p1 = pearsonr(a, b)[0]
p2 = pearsonr(a, c)[0]
print('p1: %f - p2: %f' % (p1, p2))

print((a==d).all())
print((b==e).all())
print((c==f).all())
p3 = pearsonr(d, e)[0]
p4 = pearsonr(d, f)[0]
print(f'p3: {p3} - p4: {p4}')

Here I'm copying the arrays to see if they're being changed in a way that we can detect.

FarnoodF · 2020-11-23T17:22:53Z

Hi @mdhaber ,
Here is the output:

p1: nan - p2: 0.005712
True
True
False
p3: -0.0016395159240095588 - p4: -0.023933813840252754

mdhaber · 2020-11-23T21:43:52Z

I'm so sorry @FarnoodF but I made typos in important places: I've corrected the code above and would appreciate it if you'd give it another shot.

However, we already see something bizarre. p3 = pearsonr(d, e)[0] which parallels p1 = pearsonr(a, b)[0]. We expect the same result because print((a==d).all()) prints True and print((b==e).all()) prints True, yet p3 is a reasonable number while p1 is nan.

If you are motivated to find the problem, you could add breakpoints to the pearsonr function in stats.py and step through line by line to see where the NaN first appears. After that, I'd dig deeper into the statements in which it appears. At some point you'll hit pure python or compiled code, so you won't be able to go any deeper, and maybe from that we can guess how plotting is affecting the results of the calculation. I'd be curious to know what you find, but I'm not sure how to debug this further from outside pearsonr.

FarnoodF · 2020-11-24T14:39:42Z

Thank you @mdhaber, I will definitely dig more into it very soon.

As you pointed out something bizarre is happening. It is only the matter of where I place the matplolib functions, they force the first calculated pearsonr to nan.
I can put one plt before every pearsonr and they all print out nan.

WarrenWeckesser · 2020-11-28T18:52:46Z

I get this issue only on my windows machine.

Which version and build of Windows are you using? There have been problems reported using Windows 10 build 2004 (see, for example, numpy/numpy#16744).

FarnoodF · 2020-12-03T15:30:41Z

@WarrenWeckesser
I am using Windows 10 version 2004 (OS Build 19041.630).
I check out the issue you mentioned and I get exactly the same error as well.
After calling matplotlib plotting, I get an SVD did not converge error.

mdhaber · 2020-12-19T20:18:45Z

So do we think this is a NumPy issue, @WarrenWeckesser?

WarrenWeckesser · 2020-12-19T20:43:32Z

Actually, it is apparently a Windows issue that shows up in NumPy. According to the latest comment in the NumPy issue, the problem should be fixed when an update to Windows is released.

I'm closing the issue, but if anyone prefers to keep this SciPy issue open until we're sure the Windows update fixes the problem, feel free to reopen it.

AtsushiSakai added the scipy.stats label Nov 4, 2020

mdhaber mentioned this issue Nov 21, 2020

A Solid Foundation for Statistics in Python with SciPy mdhaber/scipy#26

Closed

WarrenWeckesser closed this as completed Dec 19, 2020

WarrenWeckesser added the upstream bug Items related to bugs in upstream projects label Dec 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scipy.stats.pearsonr conflict with matplotlib #13035

scipy.stats.pearsonr conflict with matplotlib #13035

FarnoodF commented Nov 3, 2020 •

edited

WarrenWeckesser commented Nov 3, 2020

FarnoodF commented Nov 3, 2020

WarrenWeckesser commented Nov 3, 2020

mdhaber commented Nov 3, 2020

MarcinKonowalczyk commented Nov 5, 2020

FarnoodF commented Nov 9, 2020

MarcinKonowalczyk commented Nov 11, 2020

mdhaber commented Nov 21, 2020 •

edited

FarnoodF commented Nov 23, 2020

mdhaber commented Nov 23, 2020 •

edited

FarnoodF commented Nov 24, 2020

WarrenWeckesser commented Nov 28, 2020 •

edited

FarnoodF commented Dec 3, 2020

mdhaber commented Dec 19, 2020

WarrenWeckesser commented Dec 19, 2020

scipy.stats.pearsonr conflict with matplotlib #13035

scipy.stats.pearsonr conflict with matplotlib #13035

Comments

FarnoodF commented Nov 3, 2020 • edited

Reproducing code example:

Scipy/Numpy/Python version information:

WarrenWeckesser commented Nov 3, 2020

FarnoodF commented Nov 3, 2020

WarrenWeckesser commented Nov 3, 2020

mdhaber commented Nov 3, 2020

MarcinKonowalczyk commented Nov 5, 2020

FarnoodF commented Nov 9, 2020

MarcinKonowalczyk commented Nov 11, 2020

mdhaber commented Nov 21, 2020 • edited

FarnoodF commented Nov 23, 2020

mdhaber commented Nov 23, 2020 • edited

FarnoodF commented Nov 24, 2020

WarrenWeckesser commented Nov 28, 2020 • edited

FarnoodF commented Dec 3, 2020

mdhaber commented Dec 19, 2020

WarrenWeckesser commented Dec 19, 2020

FarnoodF commented Nov 3, 2020 •

edited

mdhaber commented Nov 21, 2020 •

edited

mdhaber commented Nov 23, 2020 •

edited

WarrenWeckesser commented Nov 28, 2020 •

edited