FASTSTATS - fast algorithm to do statistics

This package is my current exploration on how to make fast statistics on big data. Functions are typically several orders of magnitude faster, or so they claim.

I recently discovered how slow certain algorithms in numpy/scipy could be very robust but very slow because they have to handle many tests and dimensions and multipurpose usage and so on. They are most legit implementation decisions. However when you deal with tons of data, say 10^7 points in a 10 or 20 dimensions, the slightest overhead could end up overloading your computer and potentially crash your system.

Note: algorithms in this package as usage targeted. This is how we can speed up the algorithms.

API documentation: here

Quick example

from scipy.stats import gaussian_kde

def npkde(x, xe):
   kde = gaussian_kde(x)
   r = kde.evaluate(xe)
   return r

x = np.random.normal(0, 1, 1e6)
xe = np.linspace(0., 1., 256)

%timeit fastkde1D(x)
10 loops, best of 3: 31.9 ms per loop

%timeit npkde(x, xe)
1 loops, best of 3: 11.8 s per loop

The result is a **~ 10 ^ 4 speed up** !!! Results are identical Note that gaussian_kde is not optimized for this specific application.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

FASTSTATS - fast algorithm to do statistics

Quick example

Files

README.md

Latest commit

History

README.md

File metadata and controls

FASTSTATS - fast algorithm to do statistics

Quick example