New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xarray.dot() dask problems #2074
Comments
Agreed. xarray/xarray/core/computation.py Lines 1039 to 1043 in 99b457c
dask='parallelrized' -> dask='allow'
|
Might be worth revisiting how cc @mrocklin |
See also dask/dask#2225 |
@jakirkham from what I understand Ok this is funny. I ran a few more benchmarks, and apparently
Output:
This particular bit is shocking and I can't wrap my head around it?!?
|
Basically The question is whether the performance keeps up with that formulation. Currently it sounds like chunking causes some problems right now IIUC. However things like
What are the arrays used as input for this case?
Having a little trouble following this. |
See blob in the opening post
|
+1 for using dask.array.einsum in xarray.dot. |
@crusaderky , Thanks for the detailed benchmarking.
In your example,
|
When doing benchmarks with things that might call BLAS operations in
multiple threads I recommend setting the OMP_NUM_THREADS environment
variable to 1. This will avoid oversubscription.
…On Mon, Apr 23, 2018 at 7:32 PM, Keisuke Fujii ***@***.***> wrote:
@crusaderky <https://github.com/crusaderky> , Thanks for the detailed
benchmarking.
Further note:
- xr.dot uses tensordot if possible, as when I implemented dask did
not have einsum.
In the other cases, we use dask.atop with np.einsum.
In your example, bench(100, False, ['t'], '...i,...i') uses dask.tensordot
,
bench(100, True, ['t'], '...i,...i') uses np.einsum.
bench(100, True, [], ...i,...i->...i) also uses np.einsum.
But I have no idea yet why dot(a, b, dims=[]) is faster than a * b.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2074 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASszD_CL-zC6QgDunKQVaIGCiQA7u5Jks5trmSUgaJpZM4TfDSk>
.
|
Done the work - but we'll need to wait for dask 0.17.3 to integrate it |
xarray.dot() has comparable performance with numpy.einsum.
However, when it uses a dask backend, it's much slower than the new dask.array.einsum function (dask/dask#3412).
The performance gap widens when the dimension upon which you are reducing is chunked.
Also, for some reason
dot(a<s, t>, b<t>, dims=[t])
anddot(a<s,t>, a<s,t>, dims=[s,t])
do work (very slowly) whens
andt
are chunked, whiledot(a<s, t>, a<s, t>, dims=[t])
crashes complaining it can't operate on a chunked core dim (related discussion: #1995).The proposed solution is to simply wait for dask/dask#3412 to reach the next release and then reimplement xarray.dot to use dask.array.einsum. This means that dask users will lose the ability to use xarray.dot if they upgrade xarray version but not dask version, but I believe it shouldn't be a big problem for most?
Output:
The text was updated successfully, but these errors were encountered: