New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Array creation and __array_function__ #4883
Comments
I'm not sure I fully understand, what are situations where this goes poorly? For example, if I can |
That's correct, and if I'm not mistaken, this is the primary reason we have to use |
And beyond that, we have cases where we create an array from a list, for example, currently there's no mechanism to ensure that this array will be of same type of some other array which will be operated on. One example is |
If possible I would like to focus first on the Some relevant code is here: Lines 2316 to 2334 in 165f71e
There is some clear special-casing of NumPy here. I wonder if there is some check we can do instead of an explicit type check here that would be valid. cc @shoyer in case he has suggestions. For example I wonder what would happen if we added the check asarray = not hasattr(x, '__array_function__') This would change behavior on matrix objects (although maybe that's ok now with Also, I noticed that there is an Lines 3163 to 3201 in 165f71e
|
For the |
I don't have a particular order in which I would like to keep, I'm just raising this question in case we already have something ongoing, and to get ideas from other people. The idea I had was similar to your Maybe I have incorrect expectations and this is not intended with NEP-18 for a reason, but for now I think it's missing that capability, which is essential in some situations, such as I mentioned before with |
And by the way, something that maybe wasn't clear before, the issue is not with Dask alone, but we can't use >>> import numpy as np
>>> import cupy
>>> a = cupy.empty((2, 2))
>>> b = np.asarray(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/nfs/pentschev/.local/lib/python3.7/site-packages/numpy/core/numeric.py", line 538, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: object __array__ method not producing an array |
For fixing That said, IIUC the |
I think the main problem isn't
Correct, and I'm raising this here to see what ideas we have on the Dask side, but this may be a question that we need to raise in NumPy itself. |
Cupy intentionally doesn't want np.asarray(my_cupy_array) to work. Their
reasoning here is that this often happens and reduces performance. They
would rather that users intentionally call a to_numpy method, or something
similar.
…On Wed, Jun 5, 2019 at 10:35 AM Peter Andreas Entschev < ***@***.***> wrote:
For fixing da.asarray, maybe we can go through a deprecation cycle and
then default to asarray=False.
I think the main problem isn't da.asarray, but np.asarray itself, since
we can't dispatch that with a CuPy array, for example. Maybe if that was
possible, we wouldn't need to use asarray=False in Dask array creation at
all.
That said, IIUC the da.asarray is a finer point. The larger issue is, how
do we handle array creation in algorithmic code? Currently we have
hard-coded NumPy array creation in some places. How can we relax this to
create an array matching the input type when needed?
Correct, and I'm raising this here to see what ideas we have on the Dask
side, but this may be a question that we need to raise in NumPy itself.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4883?email_source=notifications&email_token=AACKZTG4EZT2RQCEW363AJ3PY72MLA5CNFSM4HTU2Q52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXAOSJI#issuecomment-499181861>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACKZTG6VMMT56BENA545LDPY72MLANCNFSM4HTU2Q5Q>
.
|
IIUC Peter is suggesting that Anyways I don't think we should get too bogged down in |
My guess is that that's not possible today. Too much of the ecosystem probably depends on |
You're correct in your entire statement. :) |
That's right, it isn't possible today, and that's exactly what I want to discuss: how do we make it possible? It is too often the case that we need to create new arrays, and for |
My view is that we need a new coercion function/protocol inside NumPy, something like Nathaniel Smith and I discussed this in NEP-22, but didn't come to a concrete proposal, mostly because we struggled to come up with a good name. |
Is there an issue on the NumPy repo that we should use to discuss this proposal, @shoyer? |
TBH, I've been trying to wrap my head around the NEP-22 proposal, but I still don't understand how we would use that in practice. I'm very sure I don't understand the proposal well, but my little understanding is that duck arrays would automatically convert themselves, for which we will face some rejection, and here I'm thinking particularly of CuPy, where no implicit conversion is allowed. Happy to continue the discussion either here or some more appropriate place. |
It wouldn't be a bad idea to discuss this in the NumPy tracker somewhere. My thinking something like the following implementation for the protocol: import numpy as np
# hypothetical np.duckarray() function
def duckarray(array_like):
if hasattr(array_like, '__duckarray__'):
# return an object that can be substituted for np.ndarray
return array_like.__duckarray__()
return np.asarray(array_like) Example usage: class SparseArray:
def __duckarray__(self):
return self
def __array__(self):
raise TypeError
duckarray(SparseArray()) # returns a SparseArray object
np.array(SparseArray()) # raises TypeError |
What you are proposing sounds reasonable, Stephan. Is there a particular NumPy issue we should move to? |
@shoyer, I went ahead and opened issue ( numpy/numpy#13831 ) (copying everyone here) to get the ball rolling on this. Hope that is ok. 🙂 |
We have now a more appropriate place to discuss this in numpy/numpy#13831, so I'm closing this. |
One of the issues that arise when introducing
__array_function__
in a NumPy-like library, such as Dask, is array creation. Many functions require some sort of array to be created, either for temporary usage or to hold results. Specifically within Dask, most resulting arrays are simply NumPy-arrays, most of the times wrapped in a Dask array. However, when we deal with another NumPy-like library, be it either CuPy, xarray, Sparse, etc., we need to ensure that the arrays created within Dask match those types.A very common case is the use of
empty
(and counterparts,full
,ones
,zeros
), which we can now overcome with the introduction ofshape
argument inempty_like
(and counterparts) in numpy/numpy#13046. This is a great starting point, but doesn't solve all array creation issues. Most notably, we havearray
andasarray
, which are not included in the__array_function__
scope, so we have no simple way to deal with certain situations involving those functions.With Dask, we could of course introduce mechanisms such as looking up the type of the array and dispatching a call to that library, but this would limit the scope of use to a few libraries that we support and would be an effort, to my understanding, counter-productive towards
__array_function__
.So my question here is: are there ideas out there already or discussions already initiated on how to solve such problems? Maybe @shoyer, @mrocklin or @jakirkham know something about this already?
The text was updated successfully, but these errors were encountered: