New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement sincos() (Trac #2034) #2626
Comments
I had the impression that the operations beneath NumPy would be running sin() and cos() as single-machine-instruction operations. In which case the questions would be: does a modern math coprocessor implement the combined sincos() operation or not? |
i686 cpus do implement the fsincos machine instructions. the latency is only a few percent higher than fsin and fcos alone. |
Is it faster than the memory bandwidth? |
Still don't have one as of 1.9-devel |
I find myself needing both the sin and cos of large arrays fairly often, and thinking that I need this. On my computer, for fairly large inputs, it seems like MKL's Setup code: import numpy as np
import ctypes
mkl = ctypes.cdll.LoadLibrary('/usr/local/anaconda/lib/libmkl_rt.dylib')
in_array = np.ctypeslib.ndpointer(dtype=np.float64)
out_array = np.ctypeslib.ndpointer(dtype=np.float64, flags='WRITEABLE')
sincos_d = mkl.vdSinCos
sincos_d.argtypes = [ctypes.c_int64, in_array, out_array, out_array]
sincos_d.restype = None
sin_d = mkl.vdSin
sin_d.argtypes = [ctypes.c_int64, in_array, out_array]
sin_d.restype = None
cos_d = mkl.vdCos
cos_d.argtypes = [ctypes.c_int64, in_array, out_array]
cos_d.restype = None With MKL sincos: %%timeit a = np.random.normal(size=100000); out_sin = np.empty_like(a); out_cos = np.empty_like(a)
sincos_d(a.size, a, out_sin, out_cos)
10000 loops, best of 3: 188 µs per loop Calling MKL sin and cos:
(That difference is pretty stable, though not huge.) Using the numpy functions is about 10x slower: %%timeit a = np.random.normal(size=100000); out_sin = np.empty_like(a); out_cos = np.empty_like(a)
np.sin(a, out=out_sin)
np.cos(a, out=out_cos)
100 loops, best of 3: 2.6 ms per loop Numpy is linked to MKL, but it seems it doesn't use MKL for this operation (#671, which got no interest). Poking around at various array sizes it seems like MKL My takeaway is that I should be calling MKL directly for things like this, while the difference with sincos is less of a big deal. |
you might be interested in this branch: #7865 |
@juliantaylor Cool! My libmkl_rt.so does not include any _ZGV names, though. |
I'd be grateful if someone can explain why NumPy is > 10X slower than MKL. |
Part of it is likely just C function call overhead. The MKL implements these functions as natively working on arrays. You enter the |
I suspect that "for each CPU" boils down to two choices--Intel and AMD.
But, I agree that this kind of specialization would be unclean.
A pointer to any documentation that explains how to directly use the MKL
would be greatly appreciated.
…On Tue, May 29, 2018 at 7:52 PM, Robert Kern ***@***.***> wrote:
Part of it is likely just C function call overhead. The MKL implements
these functions as natively working on arrays. You enter the sin_d()
function just once to compute all of the values. The inner loop function of
numpy.sin(), on the other hand, is just the generic ufunc inner loop that
just wraps the standard libm sin() function, calling it once for each
element. That inner loop function could certainly be specialized for each
CPU to call the CPU's specific intrinsic functions that inline the
appropriate instruction, at the expense of complicating the configuration
and build. I don't believe that anyone has contributed such code. But it
seems like the performance of these functions has been adequate for most
people and not required acceleration. Those who do require faster versions
likely need it for specific cases that can be written to directly use the
MKL or similar implementations if they need to.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2626 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABBBxBjm5-wsBuv6jULPXiwEAyA_DCAmks5t3glagaJpZM4ANew0>
.
|
I have opened gh-18483 to gather this type of requests. I think there would be value in this, but maybe it doesn't have to start in NumPy. |
Original ticket http://projects.scipy.org/numpy/ticket/2034 on 2012-01-30 by trac user nschloe, assigned to unknown.
Hi,
NumPy has capabilites for all kinds of trigonometric functions, but still missing out one important bit.
In most every case I know of, both the sine and cosine of an angle need to be computed (e.g. for the representation of a circle).
Numerically, sine and cosine can be evaluated at the same time, thus saving a great deal of complexity. This numerical method is implemented in http://linux.die.net/man/3/sincos, for example.
Cheers,
Nico
The text was updated successfully, but these errors were encountered: