Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pairwise Summation/Reduction for var #103

Open
ParadaCarleton opened this issue Jan 30, 2022 · 1 comment
Open

Pairwise Summation/Reduction for var #103

ParadaCarleton opened this issue Jan 30, 2022 · 1 comment

Comments

@ParadaCarleton
Copy link
Contributor

At the moment, var does a naive sum by adding up the squared deviations from the mean. However, when var is called on a collection, we can speed it up and also reduce the floating-point error significantly by using pairwise summation with a recursive algorithm -- roughly:

mean(var(first_half), var(second_half)) + var([mean(first_half), mean(second_half)])

(Note that this would require implementing fused statistics like mean_and_var from StatsBase, or else we would have to do more than one pass -- one for mean and one for var.)

@nalimilan
Copy link
Member

Interesting. Do you have references about this? One tricky part would be to compute the variance of means without storing them in a intermediate array, or the performance benefit would probably be lost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants