New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting duck array coercion #13831
Comments
The proposed implementation would look something like the following: import numpy as np
# hypothetical np.duckarray() function
def duckarray(array_like):
if hasattr(array_like, '__duckarray__'):
# return an object that can be substituted for np.ndarray
return array_like.__duckarray__()
return np.asarray(array_like) Example usage: class SparseArray:
def __duckarray__(self):
return self
def __array__(self):
raise TypeError
np.duckarray(SparseArray()) # returns a SparseArray object
np.array(SparseArray()) # raises TypeError Here I've used
Some other name ideas: |
Since after a long time we haven't come up with a better name, perhaps we should just bless the "duck-array" name...... |
I like the compatible word, maybe we can think of variations along that line as well |
Maybe |
To extend a bit on the topic, there's one other case that isn't covered with >>> import numpy as np, cupy as cp
>>> a = cp.array([1, 2])
>>> b = np.ones_like(a)
>>> type(b)
<class 'cupy.core.core.ndarray'> On the other hand, if we have an import numpy as np, cupy as cp
a = cp.array([1, 2])
b = [1, 2]
c = np.asarray(b, like=a) Any ideas/suggestions on this? |
Maybe np.copy_like? We would want to define carefully which properties
(e.g., including dtype or not) are copied from the other array.
…On Mon, Jul 1, 2019 at 5:40 AM Peter Andreas Entschev < ***@***.***> wrote:
To extend a bit on the topic, there's one other case that isn't covered
with np.duckarray, which is the creation of new arrays with a type based
on an existing type, similar to what functions such as np.empty_like do.
Currently we can do things like this:
>>> import numpy as np, cupy as cp>>> a = cp.array([1, 2])>>> b = np.ones_like(a)>>> type(b)<class 'cupy.core.core.ndarray'>
On the other hand, if we have an array_like that we would like to create
a CuPy array from via NumPy's API, that's not possible. I think it would be
helpful to have something like:
import numpy as np, cupy as cp
a = cp.array([1, 2])
b = [1, 2]
c = np.asarray(b, like=a)
Any ideas/suggestions on this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#13831?email_source=notifications&email_token=AAJJFVRCWDHRAXHHRDHXXM3P5H3LRA5CNFSM4H3HQWAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY57YVQ#issuecomment-507247702>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAJJFVRSYHUYHMPWQTW2NLLP5H3LRANCNFSM4H3HQWAA>
.
|
Sorry for the beginner's question, but should something like |
We don't really have strict rules about this, but I would lean towards putting One note about |
Great, thanks for the info, and I agree with your dispatching proposal. I will work on an NEP for the implementation of both |
Awesome, thank you Peter!
…On Mon, Jul 1, 2019 at 9:29 AM Peter Andreas Entschev < ***@***.***> wrote:
Great, thanks for the info, and I agree with your dispatching proposal. I
will work on an NEP for the implementation of both np.duckarray and
np.copy_like and submit a draft PR this week for that.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#13831?email_source=notifications&email_token=AAJJFVW2YUBNUCJZK6JWDBTP5IWHNA5CNFSM4H3HQWAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY6VM3Q#issuecomment-507336302>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAJJFVR2KTPAZ4JPWDYYMFLP5IWHNANCNFSM4H3HQWAA>
.
|
My pleasure, and thanks a lot for the ideas and support with this work! |
The |
It's true, these would only really be useful if you want to support duck typing. But certainly |
All array implementations have a
+1 |
That's actually the case which I would like to have addressed with something like Just to be clear, are you referring also to a function |
It should, it's decorated to support
Yes. And yes agree it can be confusing.
Yes that name implies a data copy.
I thought that that was |
I think Peter's example above might help clarify this. Copied below and subbed in import numpy as np, cupy as cp
a = cp.array([1, 2])
b = [1, 2]
c = np.copy_like(b, like=a) |
Actually, Thanks @jakirkham for the updated example. |
So that will dispatch to CuPy via
Introducing new functions in the main namespace that don't make sense for NumPy itself to support working around a limitation of |
I wouldn't say it has to fail necessarily. We could default to NumPy and raise a warning (or don't raise it at all), for example.
Certainly it would be nice to have a full-featured dispatching mechanism, but I imagine this wasn't done before due to its complexity and backwards compatibility issues? I wasn't around when discussions happened, so just guessing.
I certainly see your point, but I also think that if we move too many things away from main namespace, it could scare users off. Maybe I'm wrong and this is just an impression. Either way, I'm not at all proposing to implement functions that won't work with NumPy, but perhaps only not absolutely necessary when using NumPy by itself. |
Actually, in this sense, also |
I think that one is more defensible (analogous to
I think there's multiple reasons. |
No real objections from me, I think it's better to have functionality somewhere rather than nowhere. :)
But if we want to leverage |
I think you indeed need a set of utility features like that, to go from covering some fraction of use cases to >80% of use cases. I don't think there's a way around that. I just don't like cluttering up the main namespace, so propose to find a better place for those.
I mean, we're just plugging a few obvious holes here right? We're never going to cover all of the "more complex cases". Say you want to override As for alternatives, uarray is not yet there and I'm not convinced yet that the overhead will be pushed down low enough to be used by default in NumPy, but it's getting close and we're about to try it to create the |
The dispatch approach with uarray is certainly interesting. Though I'm still concerned about how we handle meta-arrays (like Dask, xarray, etc.). Please see this comment for details. It's unclear this has been addressed (though please correct me if I've missed something). I'd be interested in working with others at SciPy to try and hash out how we solve this problem. |
I think the changes of the last week resolve that, but not sure - let's leave that for another thread.
I'll be there, would be great to meet you in person. |
Maybe |
@pentschev This was the case until recently, when we added the ability to “register” a backend, but we recommend only NumPy (or a reference implementation) does this. Then users using Dask would need just a single set_backend. |
Got it, I guess this is what @rgommers mentioned in #13831 (comment), pointing to the backends in https://github.com/Quansight-Labs/uarray/tree/master/unumpy. Sorry for so many questions, but what if some hypothetical application relies on various backends, for example, both NumPy and Sparse, where depending on the user input, maybe everything will be NumPy-only, Sparse-only, or a mix of both. @peterbell10 mentioned multiple backends are supported #13831 (comment), but can the selection of backend be made automatic or would there be a need to handle the three cases separately? |
So, for this case, you would ideally register NumPy, use a context manager for Sparse, and return |
At SciPy @rgommers, @danielballan, and myself talked about this issue. We concluded it would be valuable to proceed with adding |
This all sounds great to me, but it would be good to start with a short NEP spelling out the exact proposal. See #13831 (comment) |
Sure that makes sense. 🙂 |
As for the copying point that has been brought up previously, I'm curious if this isn't solved through existing mechanisms. In particular what about these lines? a2 = np.empty_like(a1)
a2[...] = a1[...] Admittedly it would be nice to get this down to one line. Just curious whether this already works for that use case or if we are missing things. |
I have already started to write that, haven't been able to complete it yet though (sorry for my bad planning #13831 (comment)). |
You can do that, but it may require special copying logic (such as in CuPy cupy/cupy#2079). That said, a copy function may be best, to avoid this sort additional code from being necessary. On the other hand, this would be sort of a replacement for
If there's a chance we would like to revisit that, maybe would be better to start a new thread. Any ideas, suggestions, objections? |
Just to be clear on my comment above, I myself don't know if a new protocol is a great idea (probably many cumbersome details that I don't foresee are involved), really just wondering if that's an idea we should revisit and discuss. |
The consensus from the dev meeting and sprint at SciPy'19 was: let's get 1.17.0 out the door and get some real-world experience with it before taking any next steps.
probably yes, but in a few months. |
Ok, thanks for the reply! |
My main issue with this is that it wouldn't work for duck arrays that are immutable, which is not terribly uncommon. Also, for NumPy the additional cost of allocating an array and then filling it may be nearly zero, but I'm not sure that's true for all duck arrays. |
I don't think it's a good idea to change the behavior of That said, we could consider adding a |
That's fair. Actually we can already simplify things. For instance this works with CuPy and Sparse today. a2 = np.copy(a1) |
Yes, but we also want "copy this duck-array into the type of this other duck-array" |
I'm also unsure about this, and I was reluctant even to raise this question, this is why I hadn't until today.
I don't know if there would be any complications with that, we probably need some careful though, but I tend to like this idea. That would seem redundant in various levels, but maybe to follow the existing pattern, instead of adding a |
What about basing this around |
Feel free to correct me if I'm wrong, but I'm assuming you mean something like: np.copyto(cupy_array, numpy_array) That could work, assuming NumPy is willing to change the current behavior, e.g., |
def copyto(dst, src):
dst[...] = src We want the equivalent of: def copylike(src, like):
dst = np.empty_like(like)
dst[...] = src
return dst |
Correct, this is what we want. |
Well np.copyto(cp.ndarray, np.random.random((3,))) This could translate into something like allocate and copy over the data as we have discussed. If we dispatch around |
Just to surface another thought that occurred to me recently, it's worthing thinking about what these APIs will mean downstream between other libraries (for instance how Dask and Xarray interact). |
This NEP proposes the introduction of the __duckarray__ protocol, as described in high-level by NEP-22 and further discussed in #13831 . We have another idea by @shoyer on how to handle duck array typing through __array_function__, as mentioned in #13831 (comment): we could consider adding a like argument to duckarray. That would require changing the protocol from the simplified proposal above -- maybe to use array_function instead of a dedicated protocol like duckarray? I haven't really thought this through. The idea above seems viable, and perhaps more complete as well. That said, I want to either extend this NEP to cover that, or maybe write a separate NEP so we can discuss and judge which one is a better solution. In the meantime, let's start discussing the text here.
Opening this issue after some discussion with @shoyer, @pentschev, and @mrocklin in issue ( dask/dask#4883 ). AIUI this was discussed in NEP 22 (so I'm mainly parroting other people's ideas here to renew discussion and correct my own misunderstanding ;).
It would be useful for various downstream array libraries to have a function to ensure we have some duck array (like
ndarray
). This would be somewhat similar tonp.asanyarray
, but without the requirement of subclassing. It would allow libraries to return their own (duck) array type. If no suitable conversion was supported by the object, we could fallback to handlendarray
subclasses,ndarray
s, and coercion of other things (nested lists) tondarray
s.cc @njsmith (who coauthored NEP 22)
The text was updated successfully, but these errors were encountered: