Update statistics.py Faster implementation for normal distribution #118958

ThibaultDECO · 2024-05-11T22:48:34Z

Faster implementation for normal distribution

Faster

cpython-cla-bot · 2024-05-11T22:48:37Z

All commit authors signed the Contributor License Agreement.

bedevere-app · 2024-05-11T22:48:38Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

eendebakpt · 2024-05-12T13:23:15Z

@ThibaultDECO Your PR needs an issue describing the bigfix or inprovements. See https://devguide.python.org/getting-started/pull-request-lifecycle/#quick-guide

About the PR: could you provide benchmarks showing how much this improves performance? I suspect that at least some of the changes have no impact

rhettinger · 2024-05-14T17:46:21Z

Nice effort but this doesn't make sense. A 1/2 is already peephole optimized to 0.5.

Also, the variables in the case statement ARE the precomputation. They are closure variables so that the actual function call uses those precomputed values.

Disassembly of the kernel function:

>>> from statistics import kde
>>> from dis import dis
>>> f_hat = kde([0], h=1, kernel='normal')
>>> dis(f_hat.__closure__[0].cell_contents)
  --           COPY_FREE_VARS           1

 923           RESUME                   0
               LOAD_GLOBAL              1 (exp + NULL)
               LOAD_CONST               1 (-0.5)
               LOAD_FAST                0 (t)
               BINARY_OP                5 (*)
               LOAD_FAST                0 (t)
               BINARY_OP                5 (*)
               CALL                     1
               LOAD_DEREF               1 (sqrt2pi)
               BINARY_OP               11 (/)
               RETURN_VALUE

rhettinger · 2024-05-14T18:19:53Z

If you're interested in working on a significant speed-up, I could use some help with the kernel inv_cdf approximation functions in kde_random. If the approximation functions are made more accurate near the end points, there will be significantly fewer iterations in the newton-raphson code.

Both of these could substantially benefit from an afternoon of piecewise curve-fitting or some rational approximation:

def _quartic_invcdf_estimate(p):
    sign, p = (1.0, p) if p <= 1/2 else (-1.0, 1.0 - p)
    x = (2.0 * p) ** 0.4258865685331 - 1.0
    if p >= 0.004 < 0.499:
        x += 0.026818732 * sin(7.101753784 * p + 2.73230839482953)
    return x * sign

def _triweight_invcdf_estimate(p):
    sign, p = (1.0, p) if p <= 1/2 else (-1.0, 1.0 - p)
    x = (2.0 * p) ** 0.3400218741872791 - 1.0
    return x * sign

Update statistics.py Faster

7f11632

Faster

bedevere-app bot added the awaiting review label May 11, 2024

rhettinger self-assigned this May 14, 2024

rhettinger closed this May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update statistics.py Faster implementation for normal distribution #118958

Update statistics.py Faster implementation for normal distribution #118958

ThibaultDECO commented May 11, 2024

cpython-cla-bot bot commented May 11, 2024 •

edited

bedevere-app bot commented May 11, 2024

eendebakpt commented May 12, 2024

rhettinger commented May 14, 2024 •

edited

rhettinger commented May 14, 2024 •

edited

Update statistics.py Faster implementation for normal distribution #118958

Update statistics.py Faster implementation for normal distribution #118958

Conversation

ThibaultDECO commented May 11, 2024

cpython-cla-bot bot commented May 11, 2024 • edited

bedevere-app bot commented May 11, 2024

eendebakpt commented May 12, 2024

rhettinger commented May 14, 2024 • edited

rhettinger commented May 14, 2024 • edited

cpython-cla-bot bot commented May 11, 2024 •

edited

rhettinger commented May 14, 2024 •

edited

rhettinger commented May 14, 2024 •

edited