Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add containment for trees and graphs #118

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

BramVanroy
Copy link
Contributor

@BramVanroy BramVanroy commented Aug 7, 2023

Added simple containment __contains__ for graphs and trees. Tests are added, too.

For graphs:

  • str: whether a given string is part of the graph as a terminal instance
  • tuple (or Triple): whether it is part of self.triples
  • NotImplemented in other cases - so testing for subgraphs is not yet implement. Open to suggestions how to do that though. I was thinking of just doing triplet matching, but perhaps we want to be less strict and allow a match with different varnames as well?

For trees:

  • str: whether a given string is part of the tree as a terminal instance

I get the terminal nodes in the tree like this:

terminals = [target for _, (role, target) in self.walk() if role == "/" and is_atomic(target)]

which feels elaborate so perhaps there is a simpler way? Happy to change it if there is!

closes #10 (partially, at least, since subgraph testing is not included in this PR)

@BramVanroy
Copy link
Contributor Author

Failing lint: which command should I run? I see that flake8 is in the setup.py dependencies but I am not sure which arguments to run it with for your preferences.

Failing tests: seems unrelated to this PR (an error about py36 not available).

Copy link
Owner

@goodmami goodmami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this fell off my radar. I've reviewed the PR. Generally it seems good, and I now think that subgraph checks should be done via a special method rather than via the in operator. The linting errors are due to raising NotImplemented instead of NotImplementedError, as I've pointed out below, as well as some lines being longer than 79 characters (which is the default and current line length limit).

I also wonder if we should simplify it so the in check only looks for triples in graphs and use the Graph._filter_triples() method, so you could use None wildcards to do things like:

(None, ":ARG9", None) in g  # any triple with role :ARG9
(None, ":instance", "foo") in g  # any instance with concept "foo"

It's a bit more verbose for checking for concepts, but it's more consistent. We could also change the Graph.instances() method to accept a concept parameter to make things easier:

g.instances(concept="foo")  # -> [("f", ":instance", "foo")]

It would help to know what people's use cases are for the functionality. What do you think?

penman/graph.py Outdated Show resolved Hide resolved
penman/tree.py Outdated
Comment on lines 47 to 51
if isinstance(item, str):
terminals = [target for _, (role, target) in self.walk() if role == "/" and is_atomic(target)]
return item in terminals
else:
raise NotImplemented
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you could look for branches as you did for triples above. Otherwise, same comments as before:

  • make a set
  • Try not to raise a specific error, or if it's an unsupported type use TypeError


assert "alpha" in g
assert "beta" in g
assert "b" not in g
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is confusing because b is in g, it's just not defined as such. Not sure what the best path forward is. But you could have something obviously false, like assert "gamma" not in g

BramVanroy and others added 2 commits October 23, 2023 15:20
Co-authored-by: Michael Wayne Goodman <goodman.m.w@gmail.com>
@BramVanroy
Copy link
Contributor Author

BramVanroy commented Oct 23, 2023

@goodmami I have considered your comments and I like the suggestion of just allowing for triple matching in the graphs only, and then also allowing users to specify "None" in a triple to ignore that field. Two questions/concerns:

  • Is it ensured that None is never a valid value in a triple? Because if it is, we cannot use this approach (because users might want to explicitly match on None)
  • In terms of typing, this would require a tiny additional change. Currently Triple only allows str as the Variable and Role and None is not valid. So for valid typing, this should change - or I change the signature of contains so that item can only be a Tuple[Variable|None, Constant, Role|None].

What do you think?

PS: linter fails on files that I have not touched, so not sure how I can help that.

@goodmami
Copy link
Owner

  • Is it ensured that None is never a valid value in a triple? Because if it is, we cannot use this approach (because users might want to explicitly match on None)

None is a valid value, where it indicates something is missing (and Penman sometimes logs a warning in these cases):

>>> import penman
>>> penman.decode('()').triples
[(None, ':instance', None)]
>>> penman.decode('(a)').triples
[('a', ':instance', None)]
>>> penman.decode('(a :ARG0)').triples
Missing target: (a :ARG0)
[('a', ':instance', None), ('a', ':ARG0', None)]

I don't think roles are ever None, though, now that they are canonically written with the ::

>>> penman.decode('(a :)').triples
Missing target: (a :)
[('a', ':instance', None), ('a', ':', None)]

So, yes, it's true that you cannot use the graph methods like Graph.edges() to look for those that have a literal None value because it is overloaded in these methods to be a wildcard. The current workaround is to look through Graph.triples manually. We could use a different sentinel value for the wildcard, such as Ellipsis, but it would be a backward incompatible change to make None function as a literal value, so there would need to be a deprecation period. I think the workaround suffices for now.

  • In terms of typing, this would require a tiny additional change. Currently Triple only allows str as the Variable and Role and None is not valid. So for valid typing, this should change - or I change the signature of contains so that item can only be a Tuple[Variable|None, Constant, Role|None].

I think you're right. Let's hold off on changing Role to allow for None since I don't think that's possible (correct me if I'm wrong). But for the triple source, penman.types.BasicTriple also needs to be similarly changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simple membership test: X in graph
2 participants