Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

variance and covariance inconsistency #17086

Closed
tanwang2020 opened this issue Aug 14, 2020 · 7 comments
Closed

variance and covariance inconsistency #17086

tanwang2020 opened this issue Aug 14, 2020 · 7 comments
Labels
33 - Question Question about NumPy usage or development 50 - Duplicate

Comments

@tanwang2020
Copy link

This may be trivial problem, but there seems to be inconsistency between variance and covariance

import numpy as np
x = [1,2,3,4,5,6,7]
np.var(x)

produces 4.0, but then

np.cov(x,x)

produces

array([[4.66666667, 4.66666667],
[4.66666667, 4.66666667]])

Of course, mathematically var(x) = cov(x,x). The difference here is clearly because in np.var the sum seems to be divided by n while in np.cov the sum seems to be divided by n - 1. For a naive user without paying attention to the detail, this inconsistency may cause some problem.

@mattip
Copy link
Member

mattip commented Aug 15, 2020

Duplicate of #5835, and documented in the note to var (about 2/3 down the page)

@mattip mattip added 33 - Question Question about NumPy usage or development 50 - Duplicate labels Aug 15, 2020
@tanwang2020
Copy link
Author

tanwang2020 commented Aug 15, 2020 via email

@tanwang2020
Copy link
Author

tanwang2020 commented Aug 15, 2020 via email

@tanwang2020
Copy link
Author

tanwang2020 commented Aug 15, 2020 via email

@mattip
Copy link
Member

mattip commented Aug 15, 2020

Floating point numbers are tricky. See this discussion.

>>> (3 * 0.2) * 1000.0
600.0000000000001
>>> 3 * (0.2 * 1000.0)
600.0

@tanwang2020
Copy link
Author

tanwang2020 commented Aug 16, 2020 via email

@rgommers
Copy link
Member

closing as a duplicate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
33 - Question Question about NumPy usage or development 50 - Duplicate
Projects
None yet
Development

No branches or pull requests

3 participants