Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DLPack support for NumPy #19013

Closed
hameerabbasi opened this issue May 14, 2021 · 32 comments · Fixed by #19083
Closed

DLPack support for NumPy #19013

hameerabbasi opened this issue May 14, 2021 · 32 comments · Fixed by #19083

Comments

@hameerabbasi
Copy link
Contributor

hameerabbasi commented May 14, 2021

Feature

Motivation

Currently, any library which needs to exchange data with NumPy needs to have it as a dependency. This issue will allow moving away from that approach to a more Pythonic standards-based approach leveraging the DLPack Library, as described in the Array API standard. This is also mentioned in and a prerequisite for NEP 47, although it can be discussed and integrated independently of it as well, the only caveat being that if the NEP is accepted; adopting DLPack is a given.

DLPack is a small C header-only library with a stable ABI.

Changes needed to NumPy

The numpy.ndarray type will need to gain two new methods:

And the NumPy namespace will gain one extra function:

  • from_dlpack: This will consume a PyCapsule containing a DLPack struct and create a numpy.ndarray based on that. It will raise a RuntimeError on all unsupported configurations of the object.

Relevant issues/discussion:

Edit: Previously this issue said from_dlpack was a method, that has been corrected.

cc @rgommers @mattip

@mattip
Copy link
Member

mattip commented May 14, 2021

For other commenters: the discussion at data-apis/consortium-feedback#1 is quite lengthy but does include answers to many questions like object lifetime management. It is helpful to understand the context. Many of my concerns were addressed there.

A side issue is the struct itself: the header mentions "[The data] pointer is always aligned to 256 bytes as in CUDA". If this is a hard requirement, NEP 49 may help achieve this in a zero-copy manner.

I think this can be done separately from #18585.

@seberg
Copy link
Member

seberg commented May 14, 2021

I have tried to look through the discussion. I still don't understand why the "consumed exactly once" is important (the PyCapsule is reference counted and can clean up on delete, so the memory management seems fine? – although maybe that is tricky without reference counts). But maybe there is something about streams or so that I just don't see right now.

Would NumPy be able to just use the offset to "make it fit" even if that means data points to some random (earlier) place?

For the API, I think, I am still a bit confused about view vs. copy. If __dlpack__ can return either a view or a copy, it would be nice if it could indicate so? Either as an input copy={None, True, False} or as an additional return flag (or even both).
Most of the current protocols don't have this problem, the one that does is __array__ and it would be nice to have it there!

Especially, if the alignment of the memory allocation in NumPy can lead to a copy return instead of a view, that seems like it must at least be easy to check (I guess np.may_share_memory might work). Since there is no good way to "predict" what is going to happen.

But, in general I think it just needs a good PR. Inn the end I don't really want to worry about it too much, if other projects already implemented basically the same thing (and it seems like they did). So long NumPy isn't the odd one out, due to being the only one that has to copy for reasons like alignment or byte-swapping.

@rgommers
Copy link
Member

rgommers commented May 15, 2021

I have tried to look through the discussion. I still don't understand why the "consumed exactly once" is important (the PyCapsule is reference counted and can clean up on delete, so the memory management seems fine?

Exactly, it's not that important. It came from before we improved the Python API; the old way of doing it in other libraries was:

# `x` is some array/tensor object
capsule = to_dlpack(x)
x2 = from_dlpack(capsule)

At that point you had a (non-useful) PyCapsule object floating around in the users code, and then doing x3 = from_dlpack(capsule) somewhere else in the code would of course lead to problems. Using __dlpack__ avoids this by design.

@hameerabbasi
Copy link
Contributor Author

Either as an input copy={None, True, False} or as an additional return flag (or even both).

I think we can make this a keyword-only argument without breaking compat with the existing protocol, and default it to None. Perhaps it can be in a future version, but that may need to be discussed separately.

Would NumPy be able to just use the offset to "make it fit" even if that means data points to some random (earlier) place?

I believe not, in that case the allocation still isn't "aligned" unfortunately.

Especially, if the alignment of the memory allocation in NumPy can lead to a copy return instead of a view,

For other commenters, this is due to the by-element strides of DLPack instead of the by-bytes.

@hameerabbasi
Copy link
Contributor Author

hameerabbasi commented May 15, 2021

A side issue is the struct itself: the header mentions "[The data] pointer is always aligned to 256 bytes as in CUDA". If this is a hard requirement, NEP 49 may help achieve this in a zero-copy manner.

This is, indeed, a sticking point. I do not know how to resolve this. Perhaps this restriction could be clarified or removed if the rationale didn't apply to CPU data pointers.

cc @tqchen would you happen to know the answer?

@rgommers
Copy link
Member

If __dlpack__ can return either a view or a copy, it would be nice if it could indicate so? Either as an input copy={None, True, False} or as an additional return flag (or even both).

Why would we want this? Semantics of a program using DLPack should not rely on view vs. copy (avoid in-place ops). The protocol tries to return a view for performance reasons, but it may not always be possible for all libraries - and then it will return a view. Having an explicit copy= keyword seems like a mistake to me.

@hameerabbasi
Copy link
Contributor Author

hameerabbasi commented May 15, 2021

Why would we want this?

Usually, performance. Maybe one may want the flexibility of a different path when a view isn't possible, e.g., perform operation in the existing library rather than moving it to a new one using DLPack.

Or, conversely, one may want to specify a copy because in-place ops are needed for an algorithm/function (I guess one can always do x = x.copy() or similar, but that's an extra LOC).

Of course, the default would be None, i.e., we don't guarantee either and then everything you said applies.

@rgommers
Copy link
Member

rgommers commented May 15, 2021

Usually, performance.

Performance is already optimal, it default to zero copy.

(I guess one can always do x = x.copy() or similar, but that's an extra LOC).

Exactly. That's about zero characters extra, so if you want to do something numpy-specific, use that. There are libraries that don't even have the concept of a view, and the buffer protocol and __array_interface__ don't have a copy keyword either. Adding a copy keyword doesn't make sense to me.

@seberg
Copy link
Member

seberg commented May 15, 2021

Performance is already optimal, it defaults to zero copy.

But that is not a stated requirement anywhere, I found yet? And apparently NumPy cannot always export zero copy (which is fine, but then we need an additional np.export_dlpack(...) or np.ensure_dlpack_exportable function to do the copy if necessary)?

If the user wants to copy the data, they may have to do so twice, because they don't know whether the original array-like was copied already. Either because the exporter does so always, or because it does so "if necessary" like NumPy would apparently.
Further, the buffer protocol also supports signaling read only export. JAX could use that to allow a no-copy export when given?

Semantics of a program using DLPack should not rely on view vs. copy (avoid in-place ops).

Could you explain why this such a strong preference that it doesn't matter to think about how you could allow both exports transparently? What about copy on write semantics, or just plain old numpy style usage?

[The consume only once behavior...] Exactly, it's not that important. It came from before we improved the Python API; the old way of doing it in other libraries was:

Fair enough, it might help with memory management. It also might help a bit to ensure that only a single consumers will end up writing to the data. (I guess this doesn't relaly matter if we assume all exports are zero-copy, though.)

@rgommers
Copy link
Member

But that is not a stated requirement anywhere, I found yet?

Hmm, I'll have a look at where the most obvious place is to mention this. DLPack works just like the buffer protocol, except:

  • it has a deleter mechanism rather than refcounting, because a design goal was to make it work with pure C libraries as well
  • it has device support

And apparently NumPy cannot always export zero copy

I don't think that is true. And I reviewed the implementations in other libraries like CuPy, MXNet, PyTorch and JAX a while back, and can't remember anything regarding the 256 byte alignment thing.

There may be more exotic libraries that perhaps could be forced to make a copy, e.g. Vaex has a concept of "virtual columns" where data is basically a string expression rather than it living in memory. In that case, zero-copy access isn't possible.

Another thing that can happen is that NumPy has, say, an array with odd strides. And then JAX and TensorFlow can't represent that, they only have contiguous arrays internally - hence they make a copy on the consumer side.

But for our purposes, I think there isn't too much magic - just think of it as an enhanced buffer protocol.

@rgommers
Copy link
Member

rgommers commented May 15, 2021

I hadn't noticed the comment (https://github.com/dmlc/dlpack/blob/main/include/dlpack/dlpack.h#L130) before. It talks about CUDA and OpenCL, and the only mention in the issue tracker is dmlc/dlpack#1 (comment), which talks about vector types like float4 for accelerators. It's not clear that the comment was meant to apply to CPU.

PEP 3118 also briefly mentions alignment: " The default endian is @ which means native data-types and alignment. If un-aligned, native data-types are requested, then the endian specification is '^'.". Requiring alignment maybe makes sense, but why would one need 256-byte alignment on CPU?

@rgommers
Copy link
Member

Oh wait, this explains it (from issue description of data-apis/consortium-feedback#1):

Data field mandatory aligns to 256 bytes(for aligned load), allow byte_offset to offset the array if necessary

  /*! \brief The offset in bytes to the beginning pointer to data */
  uint64_t byte_offset;

That should solve the issue?

@rgommers
Copy link
Member

Further, the buffer protocol also supports signaling read only export. JAX could use that to allow a no-copy export when given?

I think that's more of a problem than a feature in the buffer protocol actually, and it reflects that it was written mostly from a "NumPy needs/design" point of view. Most libraries simply do not have a read-only array/tensor data structure, so if you create an array in such a library with from_dlpack, what are you supposed to do with a readonly = 1 flag?

@rgommers
Copy link
Member

So long NumPy isn't the odd one out, due to being the only one that has to copy for reasons like alignment or byte-swapping.

The last point is good - I think __dlpack__ should just raise an exception for non-native endianness. That byte-swapping is still a prominent chapter in the NumPy fundamentals section of the user guide makes very little sense, we should just hide that in some obscure corner.

@rgommers
Copy link
Member

Could you explain why this such a strong preference that it doesn't matter to think about how you could allow both exports transparently?

DLPack is, by design, zero-copy. It's highly unusual for it to have to make a copy on the producer side, the Vaex virtual column is the only example I could think of. And if we add copy=False for such corner cases, then that seems like design for the <1% case which makes the whole thing more complex for little benefit. Next will be the do_a_device_transfer=False, readonly=False, etc. Other protocols also don't do this, and where we do do it (np.array) we end up making the super-thin wrapper (np.asarray) that removes the keyword again.

What about copy on write semantics, or just plain old numpy style usage?

Copy-on-write is nice, it's like the one thing I give Matlab a bit of credit for:) That's safe though, and stays within a library that would implement it - no keyword needed.

Plain-old numpy style usage: could be done if you know you are working with 2 libraries that have the same semantics (we don't have any 100% the same, but PyTorch and MXNet are fairly close), and you have full control over the code you write. Then you already have all the tools you need though, again no keyword needed. You can go write ahead and mutate memory that's shared between two libraries.

What I had in mind was library authors: if you don't know whether you're talking to JAX, PyTorch or library X, you simply cannot reliably mix views and in-place operations.

@seberg
Copy link
Member

seberg commented May 15, 2021

[NumPy cannot always export zero copy?] I don't think that is true.

[byte_offset] should solve the issue?

Well, those were my main initial questions! What is the definite answer?

it has a deleter mechanism rather than refcounting, because a design goal was to make it work with pure C libraries as well

This seems just wrong. The buffer protocol has a deleter mechanism, and doesn't even use reference counting itself. Python puts the result into memoryview and a lower level helper objects to add custom reference counting on top of that.

It is likely possible to extend the buffer protocol with a new "device" field (although maybe a bit clunky). It might even be possible to "backport" it to older Pythons.
At that point the buffer-protocol may be a bit of a feature creap, but is probably a strict superset of the __dlpack__ capabilities.

One thing where the two really differ (aside form device support and feature creap in the buffer protcol, and some standardized flags). From a protocol point of view, I think the only real difference is that the buffer protocol allows the requester to pass simple flags to the exporter.

[About no readonly flag] I think that's more of a problem than a feature in the buffer protocol actually

Honestly, now I am just confused and a bit annoyed. How can such a simple flag be a problem, the consumer can easily check it after all and the exporter can always set readonly trivially? My points do not feel addressed at all unless you make the two requirements:

  1. Any export must be no copy. (You seem to be fine to make this a soft requirement, but that probably limits some use-cases slightly, or burdens the user with knowing about it. Which I am fine with, as long as it is very predictable.)
  2. Any import must assume that chunk of memory is (consumer) read-only (But the exporter is still allowed to modify it!).

Those are some clear limitations. If that is the intention: OK, I don't care! But I do feel it strange to claim that the buffer protocol is bad because it doesn't have this limittion.

Why are view semantics, in-place algorithms accepting a dlpack manged object so out of question that you just dismiss them? What about a visualization library that would like to visualize an array that is being updated in-place? We had a long discussion about that a while ago with respect to __array__ and napari!

And if we add copy=False for such corner cases [...] makes the whole thing more complex

Why? We do not have to support a copy=True specifically (and even if, the exporter could just raise that it doesn't support it). I was never suggesting a better API. I am only asking the question whether use-cases that would be covered by copy=... really not worthwhile supporting. (And the uncertainty about NumPy being able to export no-copy just increases that issue.)

As I also said, it would be completely sufficient to have a two new flags on the exported DLManagedTensor (or additionally if you shy away from modifying DLpack):

  • shared: The buffer shares the original exporters data.
  • readonly the buffer is readonly and may not be modified by the importer

It might be nice for the napari use-case to be able to signal the shared as well, since it avoids VEAX potentially copying everything, just to throw it away again.

I don't argue for copy= itself, I am not trying to create a new API, but there are use-cases that you are dimssing and not allowing. And I think those should be stated clearly and to be honest, I am not yet convinced DLPack gave those use-cases proper thought.

What I had in mind was library authors: if you don't know whether you're talking to JAX, PyTorch or library X, you simply cannot reliably mix views and in-place operations. {... and the paragraph above it ...}

Yes, but you are not addressing the questions/use case! If I use numpy but import a JAX array (unknowingly where it came from), I should have to know about NUMPY only. And that means that numpy has to know about JAX's expectations! That is the whole point of an exchange protocol to smoothen out such differences!

If Matlab is copy-on-write, Matlab has to know that the provider (NumPy) may not be, and maybe just copy right away becuase it sees the exported buffer is "shared" and "writeable". The other way around, NumPy might want to know that it must not write to the Matlab data.

It seems to me you are dissmissing use-cases because you have an ideal single package homogeneous world in mind. When we have the chance right here to make the different worlds not collide, but rather work together harmoniously. And all we seem to have to do is to add a few simple flags to the protocol! (maybe only for the exporter, maybe also for the request, I don't care/know).


Look, I do not care enough to fight over this: __dlpack__ is just another lightweight attribute, we already have like 4 of those. The only churn is the from_dlpack classmethod and that probably can deal just fine if a __dlpack_v2__ happens...

@seberg
Copy link
Member

seberg commented May 15, 2021

Another argument that is completely fine for me, is if we say: Well, DLPack may grow those features/use-cases in the future, and we will be able to even add a "requests" API in the future without problem.
(It might be nice to map out map out how that could grow in the future, but I expect it isn't terribly hard, if we don't mind a try:/except: a specially).

At that point, the only thing would need to be to clarify expectations. E.g. copy should not happen, but may. And whether or not its OK to write to a dlpack array, or we should e.g. default to a ndarray.from_dlpack(object, writeable=None) and force the user to pass writeable=True if they want to write into the view. (And we live with potential additional copies, probably.)

@rgommers
Copy link
Member

Well, those were my main initial questions! What is the definite answer?

That was 5 comments up, as one line, followed by lots of discussion about adding a copy= keyword. I missed that one line. Answer is yes, it solves it.

This seems just wrong. The buffer protocol has a deleter mechanism

I dunno, this is what PEP 3118 says as a "rejected idea": Having a "releaser" object whose release-buffer was called. This was deemed unacceptable because it caused the protocol to be asymmetric (you called release on something different than you "got" the buffer from). It also complicated the protocol without providing a real benefit.. If it does have a deleter mechanism now, maybe that's wrong - or I misunderstood, I haven't looked at the code for ages.

It is likely possible to extend the buffer protocol with a new "device" field (although maybe a bit clunky). It might even be possible to "backport" it to older Pythons.

This was extensively discussed for about a decade I believe. No one has done it, it's a lot of work to write a new PEP, and I don't see how you would backport it. Putting things in Python itself just means you have to wait years after making any change before you can use it.

It's also kind of a "proof is in the pudding": despite the buffer protocol being supported in the language itself, adoption is quite low. Unlike for DLPack, which is much newer.

I am only asking the question whether use-cases that would be covered by copy=... really not worthwhile supporting. (And the uncertainty about NumPy being able to export no-copy just increases that issue.)

As Hameer already pointed out, x = x.copy() is equivalent. Hence I don't think that there are unsupported use cases. I do not see any case where NumPy must copy.

Honestly, now I am just confused and a bit annoyed. ....
My points do not feel addressed at all unless you make the two requirements: ...
so out of question that you just dismiss them?
Yes, but you are not addressing the questions/use case!
there are use-cases that you are dimssing and not allowing.
you have an ideal single package homogeneous world in mind.

Sebastian, seriously, you are slinging a ton of accusations as well as more API proposals. We have been working on this for months in the DLPack and array-api repos (it's a very much nontrivial topic), with review/contributions by quite a few people with extensive experience using all of DLPack, buffer protocol, and __array_interface__/__cuda_array_interface__. It seems you're upset about something. I suggest to take a break from adding more, and use a higher bandwidth call to discuss afterwards if needed.

I'll just repeat for now: I don't think there are unsupported use cases, and hence I'm also not "dismissing" them. I just tried to answer questions as best I could.

@hameerabbasi
Copy link
Contributor Author

hameerabbasi commented May 15, 2021

I believe there are some (admittedly rare) cases I see where NumPy must copy: The strides inside DLPack assume a unit of array elements, and those inside NumPy assume a unit of bytes, which is more flexible. One can find cases where zero-copy isn't possible, and one can see these if all elements of arr.strides aren't a multiple of arr.dtype.itemsize. Concrete examples I can find are:

  1. Slicing weirdly and then viewing (I was unable to make this work, even if it was theoretically possible within NumPy's framework):
    >>> np.ones((4, 5), dtype=np.int8)[:, 1:].view(np.int32).strides  # Would otherwise be (20, 5)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: To change to a dtype of a different size, the array must be C-contiguous
  2. Selecting a field from a structured array (which is already rare):
    >>> dt = np.dtype([('int', np.int32), ('byte', np.int8)])
    >>> np.zeros((4,), dtype=dt)['int'].strides
    (5,)

So, yes, it's possible, rare IMO, and can be easily predicted by checking for all(s % arr.itemsize == 0 for s in arr.strides).

But what I see as the more pressing matter I should clarify is the alignment: Should one just round down the pointer to the next 256 bytes and also add the "round down" amount to the offset?

@leofang
Copy link
Contributor

leofang commented May 16, 2021

On my cell, so please forgive me for leaving only a drive-by comment: please ignore the alignment stuff. If we are talking about host-device copies, it's an implementation detail (of CUDA, HIP, or any sane memory pool implementation sitting on top). CUDA/HIP's copy APIs do not care about alignments. DLPack does not. CUDA Array Interface does not. No one does.

Another thing is I seem to recall we agreed in a Array API meeting that if copy is needed, it's best for the user to handle it explicitly. I can try to look up a note on this later.

@seberg
Copy link
Member

seberg commented May 16, 2021

please ignore the alignment stuff

Thanks for clearing that up Ralf and @leofang, sorry for missing the link to the comment where it was more clearly defined. I am not concerned about NumPy providing __dlpack__ then. Forcing the user to copy on weird strides or dtypes seems totally fine.

Another thing is I seem to recall we agreed in an Array API meeting is that if a copy is needed, it's best for the user to handle it explicitly. I can try to look up a note on this later.

@leofang: If a copy is never done, that is good. My concern is mainly that it should be predictable. Although, never might even be nicer. (The array api says it can return a view or copy, but it doesn't specify who would be making that copy, the exporter, importer, or either/both, I guess the copy is expected to be made by the exporter only currently).


@rgommers, sorry if you feel I am just blocking your producitivity by making up moot issues and proposing changes to something that you have settled for yourself. Maybe I am late to the party, but maybe can I have a bit of time to think out loud? I am happy be pointed to a different place/issue, if you think this is a terrible place...

To be clear, I am still just exploring limitations, and how those limitations could be removed. The copy= argument is a thing we had discussed for __array__() because of napari, so it didn't come out of nowhere, it always addressed at least the napari use-case below.

And yes, I may have had the wrong impression that this API still could be modified more easily if new use-cases/problems come up, easier than the buffer-protocol at least. And just because I am discussing possible new API, doesn't mean I am trying to shut down adding the current as is.


The limitations as use-cases, instead of "missing API" are:

  • Napari wants to show a numpy or cupy array that is will be updated in-place by the user. It would be nice if it can inform the user when a function is not compatible with the input (i.e. the plot will never update if a copy was exported, confusing the user). EDIT: I misremembered the napari story. Pytorch refused to do the copy, napari wanted to force a copy (signal that it is OK if a copy is made). So, "napari" is not right, but we still decide that in-place modification (or viewing of mutable data) via DLPack import is discouraged (the user has to deal with the fact that it will not always work, a library that might want to do it can only document that fact).
  • Vaex must copy the data, some algorithm would normally do a copy and work on that in-place. But if it knows that the data was copied by Vaex during the export, it can skip the additional copy of the data (for performance and not to blow up memory).
  • JAX has immutable tensors, but if I do np.ndarray.from_dlpack(jax_tensor) what shouldl NumPy do? If it came from Vaex, or anther library with writeable-view semantics, we can create a normal array view. But if it is a JAX one, Numpy should either copy the data, or set the array to readonly. Otherwise, the user might accidentally mutate the input array.
  • The opposite: If JAX consumes a writeable-view semantics tensor from NumPy, it must copy to ensure its content is actually immutable? Or does NumPy have to export a copy always (if the array is writeable)?
  • Library B wraps a huge read-only memory mapped area and exports it with DLPack to NumPy. Again, NumPy has to set the read-only flag or allow users to crash the process by accidentally writing to the array.

Now you did dismiss a few of these as user-problems, or just not relevant enough (i.e. don't use DLPack if you write in-place algorithms or like writeable-view semantics). And the unnecessary copy is mitigated by Veax exposing a copy being uncommon, I guess?

But at least some still seem like (small?) user traps to me. And it seems we have to at least always set the new array to read-only in NumPy or risk interpreter crashes if a truly readonly buffer is passed (EDIT: but yes the array-standard does mention that non-writing is preferred). So the "no writeable-view" actually becomes a hard requirement, rather than a soft preference).

Is adding one or two bits of information to DLPack on the table or are all of these small issues that we just want to live with?


And with that, sorry for a way too long post...

@seberg
Copy link
Member

seberg commented May 16, 2021

About the buffer protocol: You are right, I guess. The buffer protocol has only the "release function". And that could be enough to allow deleting the original object, but it is not specified to be a valid. (By passing a new, different "owner" object out during buffer creation. CPython's memoryviews would work with that happily, but NumPy doesn't.)

So yes, unlike DLPack, the buffer protocol doesn't quite allow deleting the original exporting object. So maybe the "release" function is not a proper "delete" function in that sense.

@rgommers
Copy link
Member

@rgommers, sorry if you feel I am just blocking your producitivity by making up moot issues and proposing changes to something that you have settled for yourself. Maybe I am late to the party, but maybe can I have a bit of time to think out loud?

This is the thing - if you propose changes on this issue tracker, you're talking to others who then have to spend time responding to your proposals. It's different from "thinking out loud" (which I interpret as "to clarify things for yourself"). For me the appropriate places to take notes or clarify things for myself are a piece of paper, a HackMD doc, my own fork, etc. - and then ask clarifying questions first rather than make new proposals immediately. The bandwidth differential we have (you're full-time on NumPy, I'm struggling to find enough time) makes the impact of that worse.

I think this is an important communication issue that comes up more often that I'd like. It'd be great if we could chat about it a little , maybe a call in the next few days - I'll ping you on Slack.

And yes, I may have had the wrong impression that this API still could be modified more easily

The DLPack maintainers are quite responsive, and significant changes have already been made to DLPack itself (complex dtypes support, stream support, Python API, renaming struct fields for clarity, etc.). More changes, provided they fit in the DLPack design and have good rationales, are very welcome I'm sure. That said, it'd be great to get some experience with at least a prototype of the current design first - proposing changes not based on any experience with the protocol seems a little odd.

@rkern
Copy link
Member

rkern commented May 16, 2021

As annoying as it is to be on the receiving end of that, I have found that it has always been the case that the fault was mine, as the proposer. It's not your audience's fault that they weren't following all of your deliberations in real-time that would have shown how you worked out their problem cases. If it's not clearly laid out in the final proposal, it's a fair question. A ton of the PRNG design work went on in the default BitGenerator mega-thread, but I'd never refer people back to it except to prove that we did talk about something, not what the answer is. If their question isn't explained on numpy.org, then it's a fair one to ask and be answered.

Every new audience is a new discussion. You can short-circuit that discussion by drafting the proposal expansively and clearly. For whatever benefits the DLPack maintainers have gotten by keeping their RFCs strictly in Github issues, it does mean that there's no such document for you to use to forestall questions.

@rgommers
Copy link
Member

Thanks @rkern, that's all very fair. NEP 47, the array API docs and docs in the DLPack can all certainly be improved.

If it's not clearly laid out in the final proposal, it's a fair question.

Definitely, please ask those questions. Ask to see a PR first so you can play with it. And point out which are the important doc gaps to fill. It's simply kind of hard if that's mostly skipped over and we jump straight to an API extension proposal.

@rkern
Copy link
Member

rkern commented May 16, 2021

It's simply kind of hard if that's mostly skipped over and we jump straight to an API extension proposal.

Agreed. There can be multiple reasons for the disconnect (the proposal isn't what you think it is, there's a disagreement about what the consequences of the proposal actually are, or there's a disagreement about whether the consequences are desirable). A counter-proposal is only appropriate once the disconnect is revealed, if not resolved.

AFAICT, there's nothing that prevents a prototype from being made outside of numpy first. There might just be a little wrapper object in the way until support lands in a release of numpy.

x = other_array.from_dlpack(numpy_dlpack.to_dlpack(some_ndarray))
y = numpy_dlpack.from_dlpack(some_other_array)

Such a prototype package would be useful in production in order to support older versions of numpy, regardless. It's not going to solve all use cases (you won't be able to write "NEP 47-generic" functions with it), but a number of use cases involving explicit conversions can be handled. A small third-party prototype will make the discussions concrete and even premature counter-proposals more tolerable to discuss.

@leofang
Copy link
Contributor

leofang commented May 19, 2021

First, replying to myself:

Another thing is I seem to recall we agreed in a Array API meeting that if copy is needed, it's best for the user to handle it explicitly. I can try to look up a note on this later.

  • Here we said "Zero-copy semantics where possible, making a copy only if needed (e.g. when data is not contiguous in memory). Rationale: performance."
  • Here we said "If an array that is accessed via the interchange protocol lives on a device that the requesting library does not support, it is recommended to raise a TypeError"
  • Here we said the host-device copy is disfavored.

I think it's clear that host-device transfer should raise in NumPy's case. Now, let me move to Robert's above reply:

AFAICT, there's nothing that prevents a prototype from being made outside of numpy first. There might just be a little wrapper object in the way until support lands in a release of numpy.

x = other_array.from_dlpack(numpy_dlpack.to_dlpack(some_ndarray))
y = numpy_dlpack.from_dlpack(some_other_array)

Can we please not do this outside NumPy? Two reasons:

  1. It is a very small pair of functions (to_dlpack/from_dlpack) that have been implemented in virtually all other major libraries as far as NEP 47 / Array API is concerned, including CuPy / PyTorch / TensorFlow and JAX. Apart from the deleter behavior which no one got it right in the first shot AFAIK, it actually works ok with little user complaints. I think PyTorch supports CPU tensors so I would recommend to peek at their implementation to see how they handle the CPU-GPU case. Otherwise, it is fairly straightforward to support.
  2. @rkern you missed that NEP 47 / Array API actually does not suggest to implement to_dlpack(), only __dlpack__ + __dlpack_device__ + from_dlpack(). The dunder methods must live in NumPy as a result...

@rkern
Copy link
Member

rkern commented May 19, 2021

No, I saw that. But it's not essential for a prototype that can help work out some of the other details. I'm not saying you should maintain it forever (though it could be useful for working with current versions of numpy). It's noticeably easier to build and evaluate a small prototype package than it is to build and evaluate a branch of numpy, especially if what I want to test is how it relates to other packages, which have their own dependencies on versions of numpy.

@eric-wieser
Copy link
Member

eric-wieser commented May 25, 2021

I've only skimmed the conversation here, so apologies in advance if I'm repeating things

#19013 (comment) says

AFAICT, there's nothing that prevents a prototype from being made outside of numpy first. There might just be a little wrapper object in the way until support lands in a release of numpy.

x = other_array.from_dlpack(numpy_dlpack.to_dlpack(some_ndarray))
y = numpy_dlpack.from_dlpack(some_other_array)

Are there technical reasons that we can't go through a PEP3118 memoryview as an intermediary?

some_buffer = memoryview(some_ndarray)
x = other_array.from_dlpack(pep3118_dlpack.to_dlpack(some_buffer))
y_buffer = pep3118_dlpack.from_dlpack(some_other_array)
y = np.array(y_buffer)

That is, could a pep3118_dlpack library be written that provides all object supporting PEP3118 with dlpack support, not just np.ndarray? Or is it impossible to write pep3118_dlpack.to_dlpack and pep3118_dlpack.from_dlpack correctly?

@hameerabbasi
Copy link
Contributor Author

@eric-wieser I believe the answer is in #19013 (comment),

you missed that NEP 47 / Array API actually does not suggest to implement to_dlpack(), only __dlpack__ + __dlpack_device__ + from_dlpack(). The dunder methods must live in NumPy as a result...

@eric-wieser
Copy link
Member

My questions is more along the lines of "is it possible to 'smuggle' simple memoryviews through the dlpack interface and vice versa", and less about exactly what the conventions around the python API might look like - so I don't think that answers my question.

@seberg
Copy link
Member

seberg commented May 25, 2021

Or is it impossible to write pep3118_dlpack.to_dlpack and pep3118_dlpack.from_dlpack correctly?

The feature set is a bit disjunct: the buffer-protocol supports much more, but doesn't support devices, but I don't think that is what you mean?

If there was a dlpack.DLPackManagedTensor library, then yeah... NumPy could effectively import that and use DLPackManagedTensor.from_buffer() and memoryview(DLPAckManagedTensor) (I am assuming DLPackManagedTensor would implement the buffer protocol).

I am not sure the indirection is useful enough? Hopefully DLPack is just so simple that it doesn't matter too much? Effectively, NumPy would/could be exactly that library (although not the most elegant implementation)! As far as I can tell NumPy supports the full set of features that is supported by both DLPack and Memoryview.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants