Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Other *_like creation functions #14441

Closed
jakirkham opened this issue Sep 6, 2019 · 26 comments
Closed

Other *_like creation functions #14441

jakirkham opened this issue Sep 6, 2019 · 26 comments

Comments

@jakirkham
Copy link
Contributor

Much as there are functions like zeros_like, ones_like, etc. It would be useful to have a few more *_like functions for operations such as arange, linspace, etc. This would be useful for cases where one wishes to create an array of a specific type that has some particular initialization.

cc @pentschev

@rgommers
Copy link
Member

rgommers commented Sep 6, 2019

So you're looking for some shorthand for, e.g.,

np.linspace(0, 1, num=x.size, dtype=x.dtype).reshape(x.shape)

?

@jakirkham
Copy link
Contributor Author

Not exactly no. We are looking for something like this.

np.linspace_like(x, 0, 1, 0.1)

Where x is some array type we want to match. The exact syntax still needs to be hashed out, but the idea is to use x to determine the resultant array type.

@rgommers
Copy link
Member

rgommers commented Sep 6, 2019

I'm not sure that that makes sense. If x is not an ndarray but, say, a Dask array, then NumPy won't know how to construct it.

@seberg
Copy link
Member

seberg commented Sep 6, 2019

Seems to me that the "default" numpy implementation just has to raise an error if type(x) is not np.ndarray (or array subclass). After that, you can use the existing __array_function__ dispatching rules on x (whatever you call that argument).

@rgommers
Copy link
Member

rgommers commented Sep 6, 2019

@seberg then this becomes a very weird thing - we add a _like function with a first argument that's basically a dummy. It can only be ndarray for NumPy itself to make sense of it, and at that point it does nothing anymore.

I think this is a variation of "__array_function__ doesn't cover array creation", which is covered by NEP 31.

@seberg
Copy link
Member

seberg commented Sep 6, 2019

I am not saying I like the idea :), although not ruling out that hiding them away somewhere for library users may actually be a practical solution. Of course that is what NEP 31 is about, but that does not mean alternatives may not exist. It may be worth spelling out how this is expected to work for libraries. Since it could be nice if NEP 31 can help libraries to write such an implicit choice/return:

backend = unp.guess_backend_based_on(array_input_to_library_function)

# ensure that backend is truly supported, otherwise fall back to numpy, or error...
return arraytype_associated_to_backend

@rgommers
Copy link
Member

rgommers commented Sep 6, 2019

the library would simply write:

unumpy.linspace(0, 1, 0.1)   # or np.linspace in case we'd make it default

and the user would write:

with unumpy.set_backend(dask_backend):   # EDIT: or set the dask backend globally
    call_whatever_dask_function_the_above_is_in()

Of course that is what NEP 31 is about, but that does not mean alternatives may not exist.

Of course. But then we need a real proposal. "we need a few more", and then "another few", and then still have incomplete coverage is problematic. We have a lot of array creation functions: https://numpy.org/devdocs/reference/routines.array-creation.html
Would we need eye_like, fromstring_like, geomspace_like, etc.?

@seberg
Copy link
Member

seberg commented Sep 6, 2019

Yes, NEP 31 is all about explict opt-in (which is good). I am wondering if we can and should allow library authors to not force that explicit opt-in on the end user. This is what __array_function__ provides (to some extend at least). My own library can simply assume the input is an array-like, and it what you get out is what you put in.
NEP 31 does not seem to help me to write such a function (I have tried to ask this before, I think). I.e. a function that does not require explicit opt-in by users. For some things, like FFT backend, explict opt-in makes a lot of sense to me. But for Dask arrays, xarrays, etc. in many cases implicit "array-like" typing does seem useful?

Anyway, sorry. I really just wanted to clarify how I think I understood the request, this is getting way too much discussion about NEP 31 and explicit opt-in vs. "array-like" type behaviour.

@rgommers
Copy link
Member

rgommers commented Sep 7, 2019

Actually it would be nice to complete this proposal, so we can compare with the alternative proposal. I think the only thing that makes sense is to add a _like variant of all array creation functions that don't have an array-like input. Then the question is where to put them: all next to their non-_like counterparts, or all in a separate namespace.

Yes, NEP 31 is all about explict opt-in (which is good)
...
NEP 31 does not seem to help me to write such a function

It seems like you're asking to have both opt-in and default at the same time. In general, I think it's one or the other.

But for Dask arrays, xarrays, etc. in many cases implicit "array-like" typing does seem useful?

__array_function__ already does this, and it's not going anywhere right?

@jakirkham
Copy link
Contributor Author

I'm not sure that that makes sense. If x is not an ndarray but, say, a Dask array, then NumPy won't know how to construct it.

I don't think that's true. For example, I wrote this code for arange recently.

np.cumsum(np.ones_like(X, dtype="i8", shape=(n_samples,))) - 1

That said, writing this multiple times is not great. Also library authors generally don't like the lack of clarity in this code. An np.arange_like operation would keep other library code clearer.

@seberg
Copy link
Member

seberg commented Sep 9, 2019

I do think Ralf is right though, having an issue about it is not all that helpful. We need an NEP (or a a bit more comprehensive write up+proposal), that discusses different options to solve this:

  1. Add new _like functions somewhere (possibly not the main namespace).
  2. Add a np.dispatched(array_like, numpy_function)(...) to allow choosing which array-likes __array_function__ to use specfically.
  3. unumpy proposal (although that does not cover this use case currently).
  4. ...?

And I guess, what about arguments that one array like needs and another does not? Is it OK to ignore those?

The original duck array NEP also lists a few other interesting points, such as checking whether a duck array/array-like can act as an out= kwarg or not.

@rgommers
Copy link
Member

rgommers commented Sep 9, 2019

I've also been wondering about another alternative. (note, not well thought out)

with np.some_new_context_manager(x):
    # context manager keeps track of `x.__array_function__`
    # np.arange checks context manager, and uses it exactly like the first
    # argument of `np.arange_like` would have done
    np.arange(...)

This issue basically proposes a workaround for array creation functions not having an array-like input. So perhaps that can be solved with such a hack, which is limited in scope, rather than doubling the number of array creation functions we offer (because that feels really wrong ...).

@seberg
Copy link
Member

seberg commented Sep 9, 2019

Right, another option. My first gut feeling is to not do that, since it probably allows for scary scoping which is a risk that is probably unnecessary in this use case. As typically, I would love library authors to step in to at least co-champion proposals such as this, because they have better experience with their needs.

@pentschev
Copy link
Contributor

This issue comes in hand with the ones I've been personally experiencing. Part of that is addressed with NEP-30 for duckarray, but there's still the open question for something like array_like to address __array_function__-based array creation, which would boil down to another function for which nobody is really sure about naming and scope, given that AFAIK it's today limited to my one use case, as discussed in #13831.

I agree that doubling all array creation functions is undesired, but what about providing a new optional argument like= to the existing array creation functions? Admittedly, I don't know what would be the complexity of allowing __array_function__ to dispatch based on an array passed as an argument, nor if it would even be possible to do so. Completely disregarding the implementation issues for a moment, I think this would be the cleanest and simplest solution that would cover the use cases I've seen so far. Maybe @shoyer can tell that this isn't possible and we can safely ignore this idea altogether. :)

@rgommers
Copy link
Member

Thanks @pentschev, I think a like= keyword is feasible and preferable over adding many new functions.

given that AFAIK it's today limited to my one use case,

I don't think that's the case. I tried using dispatch with SciPy functions (ref http://scipy.github.io/devdocs/roadmap.html#support-for-distributed-arrays-and-gpu-arrays) to see how far I could get with the current state of NumPy, and the answer was: not very far. Pretty much everywhere there's an asarray or an array or an arange or an errstate that is a blocker.

@jakirkham
Copy link
Contributor Author

If adding like= keyword arguments is feasible, that would be helpful for my use case as well.

@shoyer
Copy link
Member

shoyer commented Sep 10, 2019 via email

@pentschev
Copy link
Contributor

That's awesome! I guess this covers most (if not all, and apart from duckarrays) use cases related to array creation with __array_function__. I can work on getting a PR started for this. Is this something that we would need a NEP? I'm guessing no, but asking just in case. :)

@rgommers
Copy link
Member

@pentschev not sure, but doing a PR first seems like a good idea either way. Perhaps it's so simple that it's just obvious that this is the best idea. Perhaps there's bigger design questions once seeing a PR, and then a NEP can be produced afterwards.

@seberg
Copy link
Member

seberg commented Sep 10, 2019

I think I would be slightly in favor of a very short NEP. And if just to list (rejected) alternatives and clarify a bit which part it solves and which parts it does not (i.e. random creation functions as Hameer noted, although I think that is something best left alone in any case?).

@pentschev
Copy link
Contributor

I will try to write something short and start with a PR for one of the functions so we can get a feel of how it would work (or not). Thanks @rgommers @shoyer @seberg for feedback.

@pentschev
Copy link
Contributor

Just to keep this thread updated, I've opened #14715 with the proposal.

@seberg
Copy link
Member

seberg commented Mar 25, 2020

Thanks, I am going to close this in favor of the NEP draft at gh-14715 which I think replaces this proposal. That may want to be pushed ahead.

@seberg seberg closed this as completed Mar 25, 2020
@pentschev
Copy link
Contributor

Thanks @seberg , indeed #14715 should replace this. I need to take a push on that, but I haven't really had the time the past few months. I hope things will be more calm starting sometime around mid-April and will try to follow-up on that.

@seberg
Copy link
Member

seberg commented Mar 25, 2020

Yeah, there are a lot of things to hash out in that general direction. @pentschev if you are up to it some time mid April, we could even think about a larger video conference envent around these protocols/ideas. I will put something in my calendar for then, lets see where things go.

@pentschev
Copy link
Contributor

Sorry for the delayed response @seberg . Indeed there are many things to be sorted out, I think a conference would be very nice for that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants