Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update statistics.py Faster implementation for normal distribution #118958

Closed
wants to merge 1 commit into from

Conversation

ThibaultDECO
Copy link

Faster implementation for normal distribution

Copy link

cpython-cla-bot bot commented May 11, 2024

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-app
Copy link

bedevere-app bot commented May 11, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@eendebakpt
Copy link
Contributor

@ThibaultDECO Your PR needs an issue describing the bigfix or inprovements. See https://devguide.python.org/getting-started/pull-request-lifecycle/#quick-guide

About the PR: could you provide benchmarks showing how much this improves performance? I suspect that at least some of the changes have no impact

@rhettinger rhettinger self-assigned this May 14, 2024
@rhettinger
Copy link
Contributor

rhettinger commented May 14, 2024

Nice effort but this doesn't make sense. A 1/2 is already peephole optimized to 0.5.

Also, the variables in the case statement ARE the precomputation. They are closure variables so that the actual function call uses those precomputed values.

Disassembly of the kernel function:

>>> from statistics import kde
>>> from dis import dis
>>> f_hat = kde([0], h=1, kernel='normal')
>>> dis(f_hat.__closure__[0].cell_contents)
  --           COPY_FREE_VARS           1

 923           RESUME                   0
               LOAD_GLOBAL              1 (exp + NULL)
               LOAD_CONST               1 (-0.5)
               LOAD_FAST                0 (t)
               BINARY_OP                5 (*)
               LOAD_FAST                0 (t)
               BINARY_OP                5 (*)
               CALL                     1
               LOAD_DEREF               1 (sqrt2pi)
               BINARY_OP               11 (/)
               RETURN_VALUE

@rhettinger rhettinger closed this May 14, 2024
@rhettinger
Copy link
Contributor

rhettinger commented May 14, 2024

If you're interested in working on a significant speed-up, I could use some help with the kernel inv_cdf approximation functions in kde_random. If the approximation functions are made more accurate near the end points, there will be significantly fewer iterations in the newton-raphson code.

Both of these could substantially benefit from an afternoon of piecewise curve-fitting or some rational approximation:

def _quartic_invcdf_estimate(p):
    sign, p = (1.0, p) if p <= 1/2 else (-1.0, 1.0 - p)
    x = (2.0 * p) ** 0.4258865685331 - 1.0
    if p >= 0.004 < 0.499:
        x += 0.026818732 * sin(7.101753784 * p + 2.73230839482953)
    return x * sign

def _triweight_invcdf_estimate(p):
    sign, p = (1.0, p) if p <= 1/2 else (-1.0, 1.0 - p)
    x = (2.0 * p) ** 0.3400218741872791 - 1.0
    return x * sign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants