Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement sincos() (Trac #2034) #2626

Closed
numpy-gitbot opened this issue Oct 19, 2012 · 11 comments
Closed

Implement sincos() (Trac #2034) #2626

numpy-gitbot opened this issue Oct 19, 2012 · 11 comments

Comments

@numpy-gitbot
Copy link

Original ticket http://projects.scipy.org/numpy/ticket/2034 on 2012-01-30 by trac user nschloe, assigned to unknown.

Hi,

NumPy has capabilites for all kinds of trigonometric functions, but still missing out one important bit.
In most every case I know of, both the sine and cosine of an angle need to be computed (e.g. for the representation of a circle).
Numerically, sine and cosine can be evaluated at the same time, thus saving a great deal of complexity. This numerical method is implemented in http://linux.die.net/man/3/sincos, for example.

Cheers,
Nico

@brandon-rhodes
Copy link

I had the impression that the operations beneath NumPy would be running sin() and cos() as single-machine-instruction operations. In which case the questions would be: does a modern math coprocessor implement the combined sincos() operation or not?

@juliantaylor
Copy link
Contributor

i686 cpus do implement the fsincos machine instructions. the latency is only a few percent higher than fsin and fcos alone.
there is no equivalent in amd64 though.
in glibc it is implemented via SSE instead of going back to the legacy instruction set.

@pv
Copy link
Member

pv commented Aug 22, 2013

Is it faster than the memory bandwidth?
sincos is a GNU extension.

@charris
Copy link
Member

charris commented Feb 21, 2014

Still don't have one as of 1.9-devel

@djsutherland
Copy link

djsutherland commented Jul 28, 2016

I find myself needing both the sin and cos of large arrays fairly often, and thinking that I need this.

On my computer, for fairly large inputs, it seems like MKL's sincos is somewhat faster than calling MKL's sin and cos separately, and both are much faster than calling np.sin / np.cos.

Setup code:

import numpy as np
import ctypes

mkl = ctypes.cdll.LoadLibrary('/usr/local/anaconda/lib/libmkl_rt.dylib')

in_array = np.ctypeslib.ndpointer(dtype=np.float64)
out_array = np.ctypeslib.ndpointer(dtype=np.float64, flags='WRITEABLE')

sincos_d = mkl.vdSinCos
sincos_d.argtypes = [ctypes.c_int64, in_array, out_array, out_array]
sincos_d.restype = None

sin_d = mkl.vdSin
sin_d.argtypes = [ctypes.c_int64, in_array, out_array]
sin_d.restype = None

cos_d = mkl.vdCos
cos_d.argtypes = [ctypes.c_int64, in_array, out_array]
cos_d.restype = None

With MKL sincos:

%%timeit  a = np.random.normal(size=100000); out_sin = np.empty_like(a); out_cos = np.empty_like(a)
sincos_d(a.size, a, out_sin, out_cos)

10000 loops, best of 3: 188 µs per loop

Calling MKL sin and cos:

%%timeit  a = np.random.normal(size=100000); out_sin = np.empty_like(a); out_cos = np.empty_like(a)
sin_d(a.size, a, out_sin)
cos_d(a.size, a, out_cos)

10000 loops, best of 3: 224 µs per loop

(That difference is pretty stable, though not huge.)

Using the numpy functions is about 10x slower:

%%timeit  a = np.random.normal(size=100000); out_sin = np.empty_like(a); out_cos = np.empty_like(a)
np.sin(a, out=out_sin)
np.cos(a, out=out_cos)

100 loops, best of 3: 2.6 ms per loop

Numpy is linked to MKL, but it seems it doesn't use MKL for this operation (#671, which got no interest).

Poking around at various array sizes it seems like MKL sincos offers maybe a 20% speedup over MKL sin and cos, while numpy sin/cos is 10x slower for largeish inputs on my machine.

My takeaway is that I should be calling MKL directly for things like this, while the difference with sincos is less of a big deal.

@juliantaylor
Copy link
Contributor

you might be interested in this branch: #7865
as you have mkl can you check if mkl provides the openmp vector abi:
nm -D library | grep _ZGV

@djsutherland
Copy link

@juliantaylor Cool! My libmkl_rt.so does not include any _ZGV names, though.

@Phillip-M-Feldman
Copy link

I'd be grateful if someone can explain why NumPy is > 10X slower than MKL.

@rkern
Copy link
Member

rkern commented May 30, 2018

Part of it is likely just C function call overhead. The MKL implements these functions as natively working on arrays. You enter the sin_d() function just once to compute all of the values. The inner loop function of numpy.sin(), on the other hand, is just the generic ufunc inner loop that just wraps the standard libm sin() function, calling it once for each element. That inner loop function could certainly be specialized for each CPU to call the CPU's specific intrinsic functions that inline the appropriate instruction, at the expense of complicating the configuration and build. I don't believe that anyone has contributed such code. But it seems like the performance of these functions has been adequate for most people and not required acceleration. Those who do require faster versions likely need it for specific cases that can be written to directly use the MKL or similar implementations if they need to.

@Phillip-M-Feldman
Copy link

Phillip-M-Feldman commented May 30, 2018 via email

@seberg
Copy link
Member

seberg commented Feb 24, 2021

I have opened gh-18483 to gather this type of requests. I think there would be value in this, but maybe it doesn't have to start in NumPy.

@seberg seberg closed this as completed Feb 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants