Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEP: add np.allow_object to the proposal #15083

Closed
wants to merge 3 commits into from
Closed

Conversation

mattip
Copy link
Member

@mattip mattip commented Dec 10, 2019

xref gh-15047

I settled on np.allow_ragged even though, as @rgommers points out "ragged" is not a concept in the NumPy lexicon. I think it captures the intent. We will need to document the sentinel carefully, and maybe add a section on "automatic conversion" to the user documents and reference documents

@seberg
Copy link
Member

seberg commented Dec 10, 2019

Another plausible name "component" is sequence. Such as ragged_sequences_as_object.

@rgommers
Copy link
Member

Why invent the new concept here though? Could it simply be allow_object?

@rgommers
Copy link
Member

That would also make it easier to extend the deprecation later to other things that create object arrays.

@seberg
Copy link
Member

seberg commented Dec 10, 2019

Ah, because there is something missing, we might want to have the option to opt-in to the new behaviour for objects. In which case allow_object would be closer to allow_object_but_not_ragged_seuqences.

In fact,I think having an OptIn solution (disable the warning and go directly to error), is probably more important for downstream: In most cases this stuff is weird for them as well, and they may be OK with adding slow cludges within the except branch of a try/except.

@mattip
Copy link
Member Author

mattip commented Dec 11, 2019

Why invent the new concept here though? Could it simply be allow_object?

I will add a sentence to the NEP proposal why this is not specific enough, along the lines of the comment above

@mattip
Copy link
Member Author

mattip commented Dec 11, 2019

Another plausible name "component" is sequence. Such as ragged_sequences_as_object.

The NEP refers to "ragged nested sequences" but I think the shorter allow_ragged, if documented properly, is sufficient.

@mattip
Copy link
Member Author

mattip commented Dec 11, 2019

What is the proper procedure to modify a NEP? If there are no editorial comments, I guess the next step is ask for comments about this change on the mailing list. Would it be better to

  • leave this PR open and ask for comments directly, or
  • change the NEP status to "Provisional" and merge it?, and then write to the mailing list?

@eric-wieser eric-wieser self-requested a review December 11, 2019 08:06

The name is not specific to ragged nested sequences. In discussion we
considered a more specific ``np.allow_ragged``. However it was felt that there
is really no difference between an object array with ragged sequences and an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion there is a difference, it's that the former is ambiguous while the latter is not - today there is no reason you would pass allow_object unless you care about ragged sequences, so it seems like the name should clearly indicate that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure that's not true, there were lots of examples given in the call about other things that could be used inside object arrays (e.g. arrays of Decimal or Fraction was @charris's example).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, there are lots of examples of object arrays, as far as I understood, the reasoning is:

  1. Try to have as few new names in the API as possible
  2. We may deprecate automatic conversion to objects in general (or at least in more generally
  3. It is likely OK, if (once we start deprecating more) allow_object will cover ragged-sequences as well (meaning we likely do not need an "allow objects but not ragged-sequences")

With those considerations (at least for me) I can see that allow_object as a single catch all "legacy object" behaviour may be desirable. I will admit I am still a bit unsure if a single object is sufficient granularity, and that an "OptIn" for getting an error right now is not useful. But if pandas will not have a use-case soon, my guess is it does not matter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there were lots of examples given in the call about other things that could be used inside object arrays.

Right, but this NEP doesn't give any justification for why we would want to deprecate np.array([Decimal(1), Decimal(2)]) in future, yet chooses a spelling that assumes we would want to do so. If we're going to pick that spelling, we should at least speculate some other situations that would require it to be passed within the NEP. Currently, I can't think of a convincing use case other than nested sequences.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one clear justification: I think we may want to deprecate it, because people should be using np.array([Decimal(2), Decimal(2)], dtype=np.PyObject[Decimal]), i.e. making type specific object dtypes should be easier and should be preferred in many cases. (of course the result of operations on these, unless defined by the user could spill to a generic object quickly, so it may not be super straight forward always).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. making type specific object dtypes should be easier and should be preferred in many cases.

Given that np.array([1, 1]) infers to np.int_, wouldn't inferring np.array([Decimal(1), Decimal(1)]) to decimal to np.PyObject[Decimal] also be reasonable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric-wieser yes, but I think someone may be expected to define that dtype first probably. Since you could also imagine someone creating a non-object version of Decimal which is much faster. (There is always the issue if two people define dtypes for the same python type, breaking the 1:1 mapping of python type to dtype, we could error/warn when that happens, I am not sure it is a huge deal. Or we cannot allow the above pattern...).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to deprecate it, because people should be using ...

I don't think it makes sense to me to throw the people who can't yet be bothered to define their own extension type into the same bucket as people who are deliberately using ragged arrays.

By picking a generic name here that could apply to multiple deprecations (of which we only have remotely concrete plans for the first), either:

  • We never create a second deprecation that makes use of this sentinel, and for its entire lifetime allow_object is a less precise way to spell allow_ragged
  • We create a second and maybe third deprecation. We can't finish the first one until we finish all the later ones, as we didn't introduce a new spelling for each.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was drifting of a bit too far. That deprecation is not concrete yet. Even if it will happen eventually, it would be at a time where this one is at least close to being done probably.

Well, there is at least one concrete one, we could do soon, the promotion of python integers to object (or unsigned/larger integer). I will be frank, I am not quite convinced myself that saving one sentinel here is worth the trouble. For example, I could rather imagine hiding the sentinel somewhere, where it is less spammy (I don't know, np.dtype.allow_ragged for all I care).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And to be very clear: I am also not convinced that it is so bad to call it allow_object, that I am willing to argue seriously against it :).

@mattip mattip changed the title NEP: add np.allow_ragged to the proposal NEP: add np.allow_object to the proposal Dec 16, 2019
@@ -30,6 +30,10 @@ recursive sequence of sequences, where not all the sequences on the same
level have the same length) will force users who actually wish to create
``object`` arrays to specify that explicitly. Note that ``lists``, ``tuples``,
and ``nd.ndarrays`` are all sequences [0]_. See for instance `issue 5303`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and ``nd.ndarrays`` are all sequences [0]_. See for instance `issue 5303`_.
and ``np.ndarrays`` are all sequences [0]_. See for instance `issue 5303`_.

@eric-wieser
Copy link
Member

For example, I could rather imagine hiding the sentinel somewhere, where it is less spammy

np.compat.allow_ragged?

@mattip
Copy link
Member Author

mattip commented Jan 13, 2020

Closing. Downstream libraries have fixed their code and no longer require this work-around

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants