Sample performance #250

Yomguithereal · 2017-10-02T18:07:47Z

Currently, the sample function runs in O(k) (k being sample size) time but in O(n) (n being the size of the population). This can be awful for algorithms sampling large lists frequently (k-means, NN-descent etc...).

I developed some more efficients methods (having their drawback, however) in the pandemonium library (reservoir sampling still lacking but will soon be added). Does some of the library's method interest simple-statistics?

The text was updated successfully, but these errors were encountered:

tmcw · 2017-10-02T18:19:53Z

Absolutely! Simple-statistics doesn't have much flexibility in sampling: I started with the Fisher-Yates approach because I could be comfortable with it being suitably random. Would love to add other methods, especially ones that are similarly random-enough and ones that work with streaming data.

dsaxton mentioned this issue Mar 4, 2021

Shuffle less during sample #565

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sample performance #250

Sample performance #250

Yomguithereal commented Oct 2, 2017

tmcw commented Oct 2, 2017

Sample performance #250

Sample performance #250

Comments

Yomguithereal commented Oct 2, 2017

tmcw commented Oct 2, 2017