Implement sincos() (Trac #2034) #2626

numpy-gitbot · 2012-10-19T22:33:41Z

Original ticket http://projects.scipy.org/numpy/ticket/2034 on 2012-01-30 by trac user nschloe, assigned to unknown.

Hi,

NumPy has capabilites for all kinds of trigonometric functions, but still missing out one important bit.
In most every case I know of, both the sine and cosine of an angle need to be computed (e.g. for the representation of a circle).
Numerically, sine and cosine can be evaluated at the same time, thus saving a great deal of complexity. This numerical method is implemented in http://linux.die.net/man/3/sincos, for example.

Cheers,
Nico

brandon-rhodes · 2013-08-22T03:40:53Z

I had the impression that the operations beneath NumPy would be running sin() and cos() as single-machine-instruction operations. In which case the questions would be: does a modern math coprocessor implement the combined sincos() operation or not?

juliantaylor · 2013-08-22T17:22:30Z

i686 cpus do implement the fsincos machine instructions. the latency is only a few percent higher than fsin and fcos alone.
there is no equivalent in amd64 though.
in glibc it is implemented via SSE instead of going back to the legacy instruction set.

pv · 2013-08-22T17:54:32Z

Is it faster than the memory bandwidth?
sincos is a GNU extension.

charris · 2014-02-21T04:37:11Z

Still don't have one as of 1.9-devel

djsutherland · 2016-07-28T02:43:24Z

I find myself needing both the sin and cos of large arrays fairly often, and thinking that I need this.

On my computer, for fairly large inputs, it seems like MKL's sincos is somewhat faster than calling MKL's sin and cos separately, and both are much faster than calling np.sin / np.cos.

Setup code:

import numpy as np
import ctypes

mkl = ctypes.cdll.LoadLibrary('/usr/local/anaconda/lib/libmkl_rt.dylib')

in_array = np.ctypeslib.ndpointer(dtype=np.float64)
out_array = np.ctypeslib.ndpointer(dtype=np.float64, flags='WRITEABLE')

sincos_d = mkl.vdSinCos
sincos_d.argtypes = [ctypes.c_int64, in_array, out_array, out_array]
sincos_d.restype = None

sin_d = mkl.vdSin
sin_d.argtypes = [ctypes.c_int64, in_array, out_array]
sin_d.restype = None

cos_d = mkl.vdCos
cos_d.argtypes = [ctypes.c_int64, in_array, out_array]
cos_d.restype = None

With MKL sincos:

%%timeit  a = np.random.normal(size=100000); out_sin = np.empty_like(a); out_cos = np.empty_like(a)
sincos_d(a.size, a, out_sin, out_cos)

10000 loops, best of 3: 188 µs per loop

Calling MKL sin and cos:

%%timeit  a = np.random.normal(size=100000); out_sin = np.empty_like(a); out_cos = np.empty_like(a)
sin_d(a.size, a, out_sin)
cos_d(a.size, a, out_cos)

10000 loops, best of 3: 224 µs per loop

(That difference is pretty stable, though not huge.)

Using the numpy functions is about 10x slower:

%%timeit  a = np.random.normal(size=100000); out_sin = np.empty_like(a); out_cos = np.empty_like(a)
np.sin(a, out=out_sin)
np.cos(a, out=out_cos)

100 loops, best of 3: 2.6 ms per loop

Numpy is linked to MKL, but it seems it doesn't use MKL for this operation (#671, which got no interest).

Poking around at various array sizes it seems like MKL sincos offers maybe a 20% speedup over MKL sin and cos, while numpy sin/cos is 10x slower for largeish inputs on my machine.

My takeaway is that I should be calling MKL directly for things like this, while the difference with sincos is less of a big deal.

juliantaylor · 2016-07-28T06:07:07Z

you might be interested in this branch: #7865
as you have mkl can you check if mkl provides the openmp vector abi:
nm -D library | grep _ZGV

djsutherland · 2016-07-28T06:35:33Z

@juliantaylor Cool! My libmkl_rt.so does not include any _ZGV names, though.

Phillip-M-Feldman · 2018-05-29T22:41:48Z

I'd be grateful if someone can explain why NumPy is > 10X slower than MKL.

rkern · 2018-05-30T02:52:01Z

Part of it is likely just C function call overhead. The MKL implements these functions as natively working on arrays. You enter the sin_d() function just once to compute all of the values. The inner loop function of numpy.sin(), on the other hand, is just the generic ufunc inner loop that just wraps the standard libm sin() function, calling it once for each element. That inner loop function could certainly be specialized for each CPU to call the CPU's specific intrinsic functions that inline the appropriate instruction, at the expense of complicating the configuration and build. I don't believe that anyone has contributed such code. But it seems like the performance of these functions has been adequate for most people and not required acceleration. Those who do require faster versions likely need it for specific cases that can be written to directly use the MKL or similar implementations if they need to.

Phillip-M-Feldman · 2018-05-30T03:33:42Z

I suspect that "for each CPU" boils down to two choices--Intel and AMD. But, I agree that this kind of specialization would be unclean. A pointer to any documentation that explains how to directly use the MKL would be greatly appreciated.

…

On Tue, May 29, 2018 at 7:52 PM, Robert Kern ***@***.***> wrote: Part of it is likely just C function call overhead. The MKL implements these functions as natively working on arrays. You enter the sin_d() function just once to compute all of the values. The inner loop function of numpy.sin(), on the other hand, is just the generic ufunc inner loop that just wraps the standard libm sin() function, calling it once for each element. That inner loop function could certainly be specialized for each CPU to call the CPU's specific intrinsic functions that inline the appropriate instruction, at the expense of complicating the configuration and build. I don't believe that anyone has contributed such code. But it seems like the performance of these functions has been adequate for most people and not required acceleration. Those who do require faster versions likely need it for specific cases that can be written to directly use the MKL or similar implementations if they need to. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2626 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABBBxBjm5-wsBuv6jULPXiwEAyA_DCAmks5t3glagaJpZM4ANew0> .

seberg · 2021-02-24T21:30:09Z

I have opened gh-18483 to gather this type of requests. I think there would be value in this, but maybe it doesn't have to start in NumPy.

djsutherland mentioned this issue Sep 5, 2016

faster featurizer djsutherland/pummeler#5

Open

mhvk mentioned this issue Jan 25, 2017

ENH: Add to numpy simple functions for transform coordinate systems #5228

Closed

mattip added the component: benchmarks label May 29, 2018

mattip removed the priority: normal label Oct 21, 2018

rmjarvis mentioned this issue Mar 5, 2019

Add some features that TreeCorr uses. LSSTDESC/Coord#10

Merged

mhvk mentioned this issue Mar 22, 2019

Ufuncs for complex numbers #13179

Open

4 tasks

seberg closed this as completed Feb 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement sincos() (Trac #2034) #2626

Implement sincos() (Trac #2034) #2626

numpy-gitbot commented Oct 19, 2012

brandon-rhodes commented Aug 22, 2013

juliantaylor commented Aug 22, 2013

pv commented Aug 22, 2013

charris commented Feb 21, 2014

djsutherland commented Jul 28, 2016 •

edited

juliantaylor commented Jul 28, 2016

djsutherland commented Jul 28, 2016

Phillip-M-Feldman commented May 29, 2018

rkern commented May 30, 2018

Phillip-M-Feldman commented May 30, 2018 via email

seberg commented Feb 24, 2021

Implement sincos() (Trac #2034) #2626

Implement sincos() (Trac #2034) #2626

Comments

numpy-gitbot commented Oct 19, 2012

brandon-rhodes commented Aug 22, 2013

juliantaylor commented Aug 22, 2013

pv commented Aug 22, 2013

charris commented Feb 21, 2014

djsutherland commented Jul 28, 2016 • edited

juliantaylor commented Jul 28, 2016

djsutherland commented Jul 28, 2016

Phillip-M-Feldman commented May 29, 2018

rkern commented May 30, 2018

Phillip-M-Feldman commented May 30, 2018 via email

seberg commented Feb 24, 2021

djsutherland commented Jul 28, 2016 •

edited