Add parse_index function for parsing NDIndex from string #91

telamonian · 2020-10-12T02:40:56Z

Fixes #86, see #86 for discussion.

Adds a literal_eval_index function, built off of a modified version of the ast.literal_eval function from the cpy std lib.

I've tested the literal_eval_index function in both cpy37 and cpy38, seems to be working pretty well. Still needs some work to integrate this into the ndindex package (including adding tests).

telamonian · 2020-10-12T17:06:32Z

I've now also tested this code in cpy39 (after some teething issues, I managed to use pyenv plus some json hackery to get isolated jupyter kernels for each of cpy 37, 38, and 39), and it seems to be working well. Basically, the ast authors removed a bunch of classes from ast in cpy38 and cpy39 (eg ast.Num), but they left in class stubs so my calls to eg isinstance(x, ast.Num) don't cause errors in any version.

telamonian · 2020-10-13T05:32:54Z

I've now added unittests for literal_eval_index. Locally they're passing with 100% coverage.

There's still a couple of minor flaws in how I'm using hypothesis to generate string representations of ellipses (always'Ellipsis', never '...') and slices (always 'slice(...)', never 'x:y'), but all of those cases are being covered by my @examples. Shouldn't be too hard to improve that.

Hypothesis is tricky! And interesting. I've never used it before

scopatz · 2020-10-13T14:33:50Z

ndindex/literal_eval_index.py

@@ -0,0 +1,67 @@
+import ast


Please add a module level docstring

scopatz · 2020-10-13T14:35:15Z

The code looks reasonable to me. @asmeurer - will you take a look at this please, whenever you get a chance? Thanks!

asmeurer · 2020-10-13T20:12:05Z

ndindex/tests/test_literal_eval_index.py

+@example('slice(-12, -72, 14)')
+@example('3:15, -5, slice(12, -14), (1,2,3)')
+@example('..., -5, slice(12, -14), (1,2,3)')
+@example('3:15, -5, slice(12, -14), (1,2,3), Ellipsis')


These are not valid indices, at least as far as ndindex is concerned. Nested tuples are not allowed.

I did see

https://github.com/Quansight/ndindex/blob/9e5399fb20f5800474aa2bf081e8323cfeaeabf4/ndindex/tuple.py#L65

Why did you decide to be more restrictive about this syntax than numpy?

The reason is that I wanted to avoid confusion from both Tuple(1, 2) and Tuple((1, 2)) being valid and meaning different things. See #17.

I also had thought that using a tuple to represent an array was deprecated, but now I can't find that. Certainly there is ambiguity even at the NumPy level. If you expect a tuple to act like a fancy index, then you will be surprised that a[array([1, 2])] will give a different thing than a[(1, 2)].

Particularly this comment #17 (comment). Perhaps this should be added to the docs. I do have https://quansight.github.io/ndindex/type-confusion.html#tuple, but that is more about best practices and doesn't really discuss why things are the way they are.

Okay, so it will be a little difficult to implement disallowing nested tuples indexes, since there are a bunch of places where we do need to parse ast.Tuple regardless. I think the way to do this will be to include some kind of guard in the ast.Tuple handling clause of _convert that will raise an error if it gets called twice. So far I've got:

class _Guard: def __init__(self): self.val = False def __call__(self): if self.val: return True else: self.val = True return False

and then

_nested_tuple_guard = _Guard() def _convert(node): if isinstance(node, ast.Tuple): if _nested_tuple_guard(): raise ValueError(f'tuples inside of tuple indices are not supported: {node!r}') return tuple(map(_convert, node.elts)) ...

There's probably a cleaner way to implement the guard

I think the simpler way is to not worry about nonsense combinations here, and let ndindex() handle it at the end.

asmeurer · 2020-10-13T20:18:20Z

Thanks for doing the work on this. This is looking good. Some comments:

eval_literal_index should pass the result to ndindex() so that it returns an ndindex type. The main benefit of this is that you will get type checking on the index that you wouldn't get without doing so. For example, currently nonsense like literal_eval_index('slice(slice(1, 2))') is allowed. literal_eval_index('1.0') is also allowed. In this way, we don't have to do type checking at the ast level, and can only handle the most basic things.
eval_literal_index isn't the greatest name. I would prefer something like parse_index.
This code can go in ndindex.py.
The hypothesis tests are fine. You are basically checking that literal_eval_index(str(index)) works for any ndindex type. I think it might be simpler to do

@given(ndindices)
def test_literal_eval_index(idx):
    assert literal_eval_index(str(idx)) == idx

This doesn't handle the cases of non-conforming indices, though. For that, we would need some strategies that generate ast. I think hypothesis doesn't have first-class support for this. We could try something like https://github.com/Zac-HD/hypothesmith, but I think it might not be easy. So for this, I think we can just have a bunch of normal tests that syntax features outside of the allowed list are not valid. And also test that things that look valid from the ast but aren't actually valid fail properly.

asmeurer · 2020-10-13T20:27:53Z

The hypothesis tests are fine. You are basically checking that literal_eval_index(str(index)) works for any ndindex type. I think it might be simpler to do

Actually I think it needs to look like this

@given(ndindices)
def test_literal_eval_index(idx):
    index = ndindex(idx)
    str_index = str(index)
    repr_index = repr(index)
    try:
        assert literal_eval_index(str_index) == idx
    except SyntaxError:
        # str() gives invalid syntax for array indices
        assume(False)
    assert literal_eval_index(repr_index) == idx

This will test both the str and repr forms, and properly skip the test for the str form of an array (which isn't valid Python).

This still won't handle a lot of combinations, including slice literals, but we'd have to create custom strategies to do that. Although I've also considered that perhaps Slice.__str__ should return slice literal syntax.

asmeurer · 2020-10-13T20:41:35Z

Regarding Python 3.9, it is out now, so we need to support it. See #92

- see Quansight-Labs#91 (comment)

telamonian · 2020-10-19T21:45:11Z

Actually I think it needs to look like this

@given(ndindices)
def test_literal_eval_index(idx):
    index = ndindex(idx)
    str_index = str(index)
    repr_index = repr(index)
    try:
        assert literal_eval_index(str_index) == idx
    except SyntaxError:
        # str() gives invalid syntax for array indices
        assume(False)
    assert literal_eval_index(repr_index) == idx

This will test both the str and repr forms, and properly skip the test for the str form of an array (which isn't valid Python).

@asmeurer I tried the approach to writing test_parse_index that you suggest above, but it fundamentally won't work as-is. If I understand your intention correctly, you want (at least roughly) the repr/str ops and the parse_index op to be reciprocal w.r.t. equality, right? This will require some reworking of how str and repr work in NDIndex and/or some of its subclasses.

@asmeurer @scopatz Can you please give this PR another review and let me know if anything still needs to change? Since the version of parse_index in this PR works and is passing all tests/coverage checks as of right now, my vote would be to pull this PR in (more-or-less) as-is. We could then ensure that repr/str and parse_index are reciprocal in a future PR which also makes the necessary rationalizations to the behavior of repr/str themselves.

telamonian · 2020-10-19T23:14:31Z

@asmeurer I played around with repr/str and parse_index a bunch, but everything I tried ran into some kind of trouble. Here's a summary of some of the obstacles:

Most returns from repr(ndindex(x)) or str(ndindex(x)) start with the name of an ndindex class (eg Tuple, IntegerArray, etc) which parse_index does not (and probably should not, ideally it will stick to literals) understand
There's special handling for repr(ellipsis()) which turns it into an empty plain tuple (ie ())
For array indices, str(IntegerArray([0])) is still valid syntax for parse_index if the shape of the array index is 0 or 1

Basically, whenever I fixed one issue I ran into another, so all of the complete fixes I came up so far involve radical changes to .__repr__, .__str__, and/or .raw for some/all NDIndex subclasses

telamonian · 2020-10-23T19:28:14Z

I'll close/reopen this PR to trigger the new py39 test

telamonian · 2020-10-24T00:33:34Z

The PR is now passing for py39 as well

asmeurer · 2020-10-25T06:43:42Z

ndindex/ndindex.py

@@ -65,6 +66,91 @@ def __init__(self, f):
    def __get__(self, obj, owner):
        return self.f(owner)

+class _Guard:


I don't think this guard stuff is needed. The Tuple constructor should catch this after calling ndindex() at the end (and will give a better error message than what this currently gives).

asmeurer · 2020-10-25T06:47:01Z

ndindex/ndindex.py

+        node_or_string = node_or_string.slice
+
+    def _raise_malformed_node(node):
+        raise ValueError(f'malformed node or string: {node!r}, {ast.dump(node)!r}')


The node part here isn't particularly helpful alongside the dump.

Also I don't understand why the literal_eval error message says "node or string". How can the string be malformed? If there is a syntax error, it raises a different exception. I like this better:

Suggested change

raise ValueError(f'malformed node or string: {node!r}, {ast.dump(node)!r}')

raise ValueError('unsupported node when parsing index: {ast.dump(node)}')

I think your language is clearer in any case. I'll pull this suggestion in

asmeurer · 2020-10-25T06:49:02Z

ndindex/tests/helpers.py


 def slices(start=one_of(none(), ints()), stop=one_of(none(), ints()),
-           step=one_of(none(), ints())):
+           step=one_of(none(), ints_nonzero())):


This includes zero step because it is used to test the slice constructor. Although it would probably be better to use this for the tests that don't actually test that (see the discussion at #46). It may be better to work on this in a separate PR, though.

I added this since I kept getting ValueErrors due to the related check in the Slice constructor.

So what I don't get is how this wasn't causing errors previously for tests using the @given(ndindices) strategy. Wouldn't that just mean that (due to one of the many weird quirks of hypothesis) @given(ndindices) just happened to never attempt to generate ndindex(slice(x, y, 0))?

ndindices uses filter(_doesnt_raise) which make it only generate valid indices. Actually I suspect this means several tests

asmeurer · 2020-10-25T06:50:21Z

ndindex/tests/test_ndindex.py

+    ints(),
+    slices(),
+    Tuples,
+).map(lambda x: f'{x}')


Put strategies in helpers.py

You can include arrays here too, although you'll have to use the repr form in that case.

asmeurer · 2020-10-25T06:51:24Z

ndindex/tests/test_ndindex.py

+@example('..., 3:15, -5, slice(-12, -72, 14)')
+@given(ndindexStrs)
+def test_parse_index_hypothesis(ixStr):
+    assert eval(f'_dummy[{ixStr}]') == parse_index(ixStr)


You can check here if eval gives an error than so should parse_index.

asmeurer · 2020-10-25T06:56:10Z

One idea to get better test coverage would be to __str__ for ndindex types to give the raw string form, like

>>> str(Slice(1, 2))
'1:2'
>>> str(Tuple(1, 2))
'(1, 2)'

and then test both str and repr of ndindex(idx) (and ndindex(idx).raw) (str for arrays will still have to be skipped because it uses non-valid syntax).

It may also be a good idea to support the ndindex classes as valid names, so parse_index('Slice(1, 2)') works.

asmeurer · 2020-10-25T06:57:44Z

ndindex/ndindex.py

+
+def parse_index(node_or_string):
+    """
+    "Safely" (needs validation) evaluate an expression node or a string containing


What does "needs validation" here mean?

I guess it depends on what we mean by "safely". parse_index('('*1000 + ')'*1000) raises MemoryError (but so does ast.literal_eval('('*1000 + ')'*1000)). It shouldn't be possible to execute arbitrary code, though, as far as I can tell.

doesn't really mean anything, mostly just a reflection of my paranoia about building a) any parser, and b) something meant to take raw user input. My goal has been to make something that's at least as safe as the existing literal_eval. At this point I've examined the code and the surrounding issues enough that I'm reasonably confident this has actually been archived, so I'll edit out the wishy-washy language

asmeurer · 2020-10-25T06:59:55Z

ndindex/ndindex.py

+def parse_index(node_or_string):
+    """
+    "Safely" (needs validation) evaluate an expression node or a string containing
+    a (limited subset) of valid numpy index or slice expressions.


The docstring should probably clarify what is and isn't allowed. I guess we aren't allowing any kind of expressions that create numbers.

Actually, we may at one point want to support just anything that parses, and make the current behavior under a safe=True flag, which I guess should be the default. But we can worry about that later.

Also this docstring needs examples. and should be added to Sphinx.

telamonian added 2 commits October 11, 2020 22:33

added literal_eval_index.py with working literal_eval_index function

a3520b1

removed unnecessary cpy minor version guards from literal_eval_index

35a6b8a

telamonian mentioned this pull request Oct 12, 2020

WIP: n-dimensional support jupyterlab/jupyterlab-hdf5#20

Closed

scopatz assigned asmeurer Oct 12, 2020

telamonian added 2 commits October 13, 2020 00:54

added unittests with full coverage for literal_eval_index

996cbf5

added support for .. and Ellipsis to literal_eval_index

0f30a09

scopatz reviewed Oct 13, 2020

View reviewed changes

ndindex/literal_eval_index.py Outdated

@@ -0,0 +1,67 @@

import ast

Copy link

Contributor

scopatz Oct 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a module level docstring

asmeurer reviewed Oct 13, 2020

View reviewed changes

first attempt at disallowing nested tuple indices in literal_eval_index

447cda0

- see Quansight-Labs#91 (comment)

ericdatakelly added this to the October 2020 milestone Oct 13, 2020

telamonian added 2 commits October 13, 2020 19:48

renamed eval_literal_index -> parse_index, moved it to ndindex.py

0b6e7c8

parse_index now returns an NDIndex

3733175

telamonian changed the title ~~[WIP] Add literal_eval_index function for parsing ndindex from string~~ Add parse_index function for parsing NDIndex from string Oct 19, 2020

telamonian requested review from asmeurer and scopatz October 19, 2020 21:45

telamonian closed this Oct 23, 2020

telamonian reopened this Oct 23, 2020

telamonian added 2 commits October 23, 2020 19:12

add coverage pragmas in parse_index for code paths unused in cpy38/39

8345caa

export to top-level __init__.py

6174a67

telamonian force-pushed the add-literal-eval-index branch from c08db5a to 6174a67 Compare October 23, 2020 23:12

asmeurer reviewed Oct 25, 2020

View reviewed changes

asmeurer added the Nice-to-have label Nov 2, 2020

ericdatakelly modified the milestones: October 2020, November 2020 Nov 16, 2020

tgeorgeux mentioned this pull request Nov 18, 2020

18 November 2020 Meeting Minutes jupyterlab/frontends-team-compass#109

Closed

ericdatakelly modified the milestones: November 2020, December 2020 Dec 17, 2020

ericdatakelly modified the milestones: December 2020, January 2021 Jan 6, 2021

ericdatakelly modified the milestones: January 2021, February 2021 Feb 4, 2021

ericdatakelly modified the milestones: February 2021, March 2021 Mar 8, 2021

ericdatakelly modified the milestones: March 2021, April 2021 Apr 5, 2021

ericdatakelly modified the milestones: April 2021, May 2021 May 10, 2021

ericdatakelly removed this from the May 2021 milestone Oct 6, 2021

asmeurer removed the Skip report label Nov 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parse_index function for parsing NDIndex from string #91

Add parse_index function for parsing NDIndex from string #91

telamonian commented Oct 12, 2020 •

edited

telamonian commented Oct 12, 2020

telamonian commented Oct 13, 2020

scopatz Oct 13, 2020

scopatz commented Oct 13, 2020

asmeurer Oct 13, 2020

telamonian Oct 13, 2020

asmeurer Oct 13, 2020

asmeurer Oct 13, 2020

telamonian Oct 13, 2020

asmeurer Oct 14, 2020

asmeurer commented Oct 13, 2020 •

edited

asmeurer commented Oct 13, 2020 •

edited

asmeurer commented Oct 13, 2020

telamonian commented Oct 19, 2020

telamonian commented Oct 19, 2020

telamonian commented Oct 23, 2020

telamonian commented Oct 24, 2020

asmeurer Oct 25, 2020

asmeurer Oct 25, 2020

telamonian Oct 25, 2020

asmeurer Oct 25, 2020

telamonian Oct 25, 2020

asmeurer Nov 2, 2020

asmeurer Oct 25, 2020

asmeurer Oct 25, 2020

asmeurer Oct 25, 2020

asmeurer commented Oct 25, 2020

asmeurer Oct 25, 2020

asmeurer Oct 25, 2020

telamonian Oct 25, 2020 •

edited

asmeurer Oct 25, 2020

	raise ValueError(f'malformed node or string: {node!r}, {ast.dump(node)!r}')
	raise ValueError('unsupported node when parsing index: {ast.dump(node)}')

Add parse_index function for parsing NDIndex from string #91

Are you sure you want to change the base?

Add parse_index function for parsing NDIndex from string #91

Conversation

telamonian commented Oct 12, 2020 • edited

telamonian commented Oct 12, 2020

telamonian commented Oct 13, 2020

Choose a reason for hiding this comment

scopatz commented Oct 13, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asmeurer commented Oct 13, 2020 • edited

asmeurer commented Oct 13, 2020 • edited

asmeurer commented Oct 13, 2020

telamonian commented Oct 19, 2020

telamonian commented Oct 19, 2020

telamonian commented Oct 23, 2020

telamonian commented Oct 24, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asmeurer commented Oct 25, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

telamonian Oct 25, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

telamonian commented Oct 12, 2020 •

edited

asmeurer commented Oct 13, 2020 •

edited

asmeurer commented Oct 13, 2020 •

edited

telamonian Oct 25, 2020 •

edited