Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scipy.stats.pearsonr conflict with matplotlib #13035

Closed
FarnoodF opened this issue Nov 3, 2020 · 15 comments
Closed

scipy.stats.pearsonr conflict with matplotlib #13035

FarnoodF opened this issue Nov 3, 2020 · 15 comments
Labels
scipy.stats upstream bug Items related to bugs in upstream projects

Comments

@FarnoodF
Copy link

FarnoodF commented Nov 3, 2020

Calculating two pearsonrs and then plotting the arrays using matplotlib and adding ylim forces the first pearsonr to nan.

Reproducing code example:

import numpy as np
from scipy.stats import pearsonr
import matplotlib.pyplot as plt

a = np.random.randn(1000)
b = np.random.randn(1000)
c = np.random.randn(1000)

plt.figure()
plt.plot(a, b, '.')
plt.ylim(-1, 1)
plt.show()

p1 = pearsonr(a, b)[0]
p2 = pearsonr(a, c)[0]
print('p1: %f - p2: %f' % (p1, p2))

Scipy/Numpy/Python version information:

1.5.3 1.18.4 sys.version_info(major=3, minor=8, micro=3, releaselevel='candidate', serial=1)
@WarrenWeckesser
Copy link
Member

@FarnoodF, could you include an example that shows exactly how you get a nan? When I add the appropriate imports to your code, I don't see any problems (but I didn't check with exactly the same versions of the packages that you reported).

@FarnoodF
Copy link
Author

FarnoodF commented Nov 3, 2020

@WarrenWeckesser Thanks for the reply.
I updated the code in my original comment.
I run this (with the versions mentioned) and I get the first pearsonr nan. You run it again without plt.ylim(...) and it's the correct value.

@WarrenWeckesser
Copy link
Member

I switched my numpy to 1.18.4, and tried again. I'm on Mac OSX, with Python 3.8.3 installed with Miniconda, and with numpy 1.18.4, scipy 1.5.3, and matplotlib 3.3.2 installed with pip (not conda). With or without the ylim call, I don't get nan from pearsonr.

Can anyone else reproduce the problem reported here?

@mdhaber
Copy link
Contributor

mdhaber commented Nov 3, 2020

Windows Conda Python 3.8.3 SciPy 1.5.0 Numpy 1.18.5 Matplotlib 3.2.2 No nan in output

@MarcinKonowalczyk
Copy link
Contributor

No `Nan's in python 3.7.3, 3.8.3 and 3.8.6 on macOS (Catalina 10.15.7)

Scipy/Numpy set to 1.5.3/1.18.4 respectively.

3.7.3 - sys.version_info(major=3, minor=7, micro=3, releaselevel='final', serial=0)
3.8.3 - sys.version_info(major=3, minor=8, micro=3, releaselevel='final', serial=0) (I could not get the releaselevel and serial set with pyenv.
3.8.6 - sys.version_info(major=3, minor=8, micro=6, releaselevel='final', serial=0)

@FarnoodF what is your Matplotlib version? Let's check with that too. Actually, could you possibly paste the output of pip-chill?

pip install -Uq pip && pip install -Uq pip-chill && pip-chill -v | grep -v pip-chill

Explanation: Quietly upgrade pip, if successful quietly install pip-chill, if successful run pip-chill verbosely and drop lines which contain pip-chill)

@FarnoodF
Copy link
Author

FarnoodF commented Nov 9, 2020

Thank you guys,
I tried it with my MacOS and it works fine, too. I get this issue only on my windows machine.

My matplotlib version is 3.3.2.

And here is the pip-chill output:

-atplotlib==3.3.2
autoreject==0.2.1
brainflow==3.6.0
jupyter==1.0.0
jupyterthemes==0.20.0
meegkit==0.1
mne==0.21.1
oct2py==5.2.0
pip-chill==1.0.0
progress==1.5
psychopy==2020.2.5
pyarrow==2.0.0
pydrive==1.3.1
pygame==2.0.0
pygetwindow==0.0.9
pymanopt==0.2.5
pyopenssl==19.1.0
pyqtgraph==0.11.0
pyriemann==0.2.6
pyxdf==1.16.3
seaborn==0.11.0
statsmodels==0.12.1
tabulate==0.8.7
tensorflow==2.3.1
xlsxwriter==1.3.7

@MarcinKonowalczyk
Copy link
Contributor

I can confirm that it works fine (aka no NaNs) on MacOS (with the same python version, and all the same packages installed).

@mdhaber
Copy link
Contributor

mdhaber commented Nov 21, 2020

Hi @FarnoodF, are you sure this is a SciPy issue? Can you please post the output of:

import numpy as np
from scipy.stats import pearsonr
import matplotlib.pyplot as plt

np.random.seed(0)
a = np.random.randn(1000)
b = np.random.randn(1000)
c = np.random.randn(1000)
d = a.copy()
e = b.copy()
f = c.copy()

plt.figure()
plt.plot(a, b, '.')
plt.ylim(-1, 1)
plt.show()

p1 = pearsonr(a, b)[0]
p2 = pearsonr(a, c)[0]
print('p1: %f - p2: %f' % (p1, p2))

print((a==d).all())
print((b==e).all())
print((c==f).all())
p3 = pearsonr(d, e)[0]
p4 = pearsonr(d, f)[0]
print(f'p3: {p3} - p4: {p4}')

Here I'm copying the arrays to see if they're being changed in a way that we can detect.

@FarnoodF
Copy link
Author

Hi @mdhaber ,
Here is the output:

p1: nan - p2: 0.005712
True
True
False
p3: -0.0016395159240095588 - p4: -0.023933813840252754

@mdhaber
Copy link
Contributor

mdhaber commented Nov 23, 2020

I'm so sorry @FarnoodF but I made typos in important places: I've corrected the code above and would appreciate it if you'd give it another shot.

However, we already see something bizarre. p3 = pearsonr(d, e)[0] which parallels p1 = pearsonr(a, b)[0]. We expect the same result because print((a==d).all()) prints True and print((b==e).all()) prints True, yet p3 is a reasonable number while p1 is nan.

If you are motivated to find the problem, you could add breakpoints to the pearsonr function in stats.py and step through line by line to see where the NaN first appears. After that, I'd dig deeper into the statements in which it appears. At some point you'll hit pure python or compiled code, so you won't be able to go any deeper, and maybe from that we can guess how plotting is affecting the results of the calculation. I'd be curious to know what you find, but I'm not sure how to debug this further from outside pearsonr.

@FarnoodF
Copy link
Author

Thank you @mdhaber, I will definitely dig more into it very soon.

As you pointed out something bizarre is happening. It is only the matter of where I place the matplolib functions, they force the first calculated pearsonr to nan.
I can put one plt before every pearsonr and they all print out nan.

@WarrenWeckesser
Copy link
Member

WarrenWeckesser commented Nov 28, 2020

I get this issue only on my windows machine.

Which version and build of Windows are you using? There have been problems reported using Windows 10 build 2004 (see, for example, numpy/numpy#16744).

@FarnoodF
Copy link
Author

FarnoodF commented Dec 3, 2020

@WarrenWeckesser
I am using Windows 10 version 2004 (OS Build 19041.630).
I check out the issue you mentioned and I get exactly the same error as well.
After calling matplotlib plotting, I get an SVD did not converge error.

@mdhaber
Copy link
Contributor

mdhaber commented Dec 19, 2020

So do we think this is a NumPy issue, @WarrenWeckesser?

@WarrenWeckesser
Copy link
Member

Actually, it is apparently a Windows issue that shows up in NumPy. According to the latest comment in the NumPy issue, the problem should be fixed when an update to Windows is released.

I'm closing the issue, but if anyone prefers to keep this SciPy issue open until we're sure the Windows update fixes the problem, feel free to reopen it.

@WarrenWeckesser WarrenWeckesser added the upstream bug Items related to bugs in upstream projects label Dec 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scipy.stats upstream bug Items related to bugs in upstream projects
Projects
None yet
Development

No branches or pull requests

5 participants