Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: inline yt.funcs.is_sequence #3499

Closed
wants to merge 4 commits into from

Conversation

neutrinoceros
Copy link
Member

@neutrinoceros neutrinoceros commented Sep 5, 2021

PR Summary

This is in support to YTEP 0038, and more broadly in support to better typing in yt.
It is a follow up to #2893, in which I renamed has_len into is_sequence based on a misconception
I was having about the word "sequence" in Python.

I encourage reviewers to read this short page https://docs.python.org/3/library/collections.abc.html

So, the main reason this function needs to be discouraged is that its name is misleading. It is not equivalent to the following implementation

from collections.abc import Sequence
def is_sequence(obj):
    # check that `obj` has __getitem__, __len__, __reversed__ ...
    return isinstance(obj, Sequence)

On the other hand, it is equivalent to

from collections.abc import Sized
def is_sequence(obj):
    # check that obj has __len__
    return isinstance(obj, Sized)

and because this implementation is try-free as well as simpler, I replaced the current implem with it.

Now, since it is apparent that it can now be inlined at the same cost (one import), it seems reasonable to drop its usage internally, but we can still keep it for now, to avoid breaking downstream applications.

In many places where it is used, checking that a given object knows the __len__ protocol isn't exactly the appropriate thing to do. Sometimes we really want to check that objets are instances of collections.abc.Sequence or collections.abc.Collection. However, to avoid breaking anything by changing a lot of code at once, I stopped at the inlining step, except in one or two obvious cases, which I will highlight here with comments.

It is worth noting that np.ndarray objects are not considered as collections.abc.Sequence, because they don't implement the __reversed__ protocol. See https://docs.python.org/dev/glossary.html#term-sequence for more detail

PR Checklist

  • [N/A] New features are documented, with docstrings and narrative docs
  • [N/A] Adds a test for any bugs fixed. Adds tests for new features.

@@ -594,7 +595,7 @@ def to_frb(self, width, resolution, center=None, height=None, periodic=False):
)

validate_width_tuple(width)
if is_sequence(resolution):
if isinstance(resolution, Iterable):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the few cases that I corrected the logic instead of producing exactly equivalent code

@@ -253,7 +254,11 @@ def __init__(self, data_source, figure_size, fontsize):
self.data_source = data_source
self.ds = data_source.ds
self.ts = self._initialize_dataset(self.ds)
if is_sequence(figure_size):
if isinstance(figure_size, Sized):
if len(figure_size) != 2:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a sensible check here

@neutrinoceros neutrinoceros force-pushed the is_seq_inline branch 2 times, most recently from 119598e to fa11298 Compare September 5, 2021 19:21
@matthewturk
Copy link
Member

Can you remind me if this has or had any impact on our usage of ensure_list to make sure that field lists are in fact lists of fields?

@neutrinoceros
Copy link
Member Author

neutrinoceros commented Sep 5, 2021

@matthewturk ensure_list was removed in #2893
The present PR is intended to have no impact on anything, besides the two (extremely minor) corrections highlighted above.
It is only about avoiding to entertain an incorrect notion of what "sequences" mean in Python, and make room to later use isinstance(..., Sequence) where relevant.

@neutrinoceros
Copy link
Member Author

The GH workflow failures seem to be due to a change upstream on MacOs + home-brew + proj + gets + Python 3.9.7... So fixing this is out of scope here. However, the failures on Jenkins are real, so I'll make this a draft for now.

@neutrinoceros neutrinoceros marked this pull request as draft September 6, 2021 08:00
@neutrinoceros
Copy link
Member Author

This allowed me to discover a bug in unyt yt-project/unyt#201
though this problem is blocking for the current PR so I'll keep it as a draft until it is resolved upstream.

@neutrinoceros
Copy link
Member Author

Turns out the bug isn't in unyt, but in numpy. I think this PR is doomed because numpy arrays have weird behaviours and don't seem to play well with collections.abc. Down the line these problems completely break what I was trying to do here

See numpy/numpy#19833, and a related, long standing issue numpy/numpy#2776

I'll open another PR to try and reflect this knowledge better in our code base.

@neutrinoceros neutrinoceros deleted the is_seq_inline branch September 6, 2021 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Making something better
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants