Skip to content
This repository has been archived by the owner on Nov 21, 2022. It is now read-only.

Visitor example #137

Open
gvanrossum opened this issue Jul 17, 2020 · 10 comments
Open

Visitor example #137

gvanrossum opened this issue Jul 17, 2020 · 10 comments

Comments

@gvanrossum
Copy link
Owner

gvanrossum commented Jul 17, 2020

I cooked up a match/case version of a visitor pattern from the PEG parser. This is in a function that computes and prints FIRST sets for the grammar. (It is not used as part of the PEG generator, though it could be used to optimize certain situations.)

brandtbucher/cpython@7db606e

I observe that the new code is indented two (!) extra clicks, but has less boilerplate. It's slightly shorter because I managed to combine a few trivial cases.

However it does highlight an issue: the original code's visitor pattern's running time is independent from the number of classes (assuming dict lookup is O(1), which is is for our purpose), while the match/case version is O(N). In this case it doesn't matter, but if this was mypy (which uses the same way to implement a visitor pattern) we'd probably be looking at a significantly slower running time. (Note added in proof: For this little utility I see no significant difference in running time.)

All in all an interesting exercise.

@gvanrossum
Copy link
Owner Author

I also notice that I miss tooling that can handle our syntax. E.g. there are some type annotations in the code, but I can't run mypy, and there are some unused imports, but I can't run flake8 or pylint. Obviously this is normal in this stage of new syntax. But I already miss it. :-)

@brandtbucher
Copy link
Collaborator

brandtbucher commented Jul 17, 2020

I imagine that your work on pegen was at least part of the inspiration for pattern-matching. Cool to see some dogfooding in action!

I wonder if optimizing compilers like Cython/PyPy/mypyc/numba/etc. could generate more efficient control-flow structures when they have sufficient knowledge of program state (especially given the freedom to legally perform more aggressive optimizations here).

@brandtbucher
Copy link
Collaborator

And yes, I've also been dearly missing black, mypy, and pylint when writing the test suite. At least my editor auto-indents correctly!

@gvanrossum
Copy link
Owner Author

I still am in need of more help arguing that we have a good alternative to the OO visitor pattern.

Pattern matching and OO

Pattern matching is complimentary to the object-oriented paradigm.
Using OO and inheritance we can easily define a method on a base class
that defines default behavior for a specific operation on that class,
and we can override this default behavior in subclasses.

But this is not appropriate for all situations. For example, a code
generator may consume an AST, and have many operations where the
generated code needs to vary based on the class. Yet the AST class
hierarchy should probably not be adorned with methods corresponding to
the needs of the code generator.

Example: Visitor pattern

There is a classic OO pattern (the visitor pattern) to address this,
but it requires a lot of infrastructure. For example, the Python
version in the Wikipedia page on the visitor pattern requires every
CarElement subclass to define an accept() method, and it
requires an abstract CarElementVisitor class that defines a method
corresponding to each class. And every concrete visitor class needs
to define a concrete methods corresponding to each class.

In this example, to add a new CarElement subclass Door, we
must add a new abstract visitDoor method to CarElementVisitor,
and we must add a new visitDoor class to each visitor class.

If we were to implement this pattern using match, we could replace
the accept() method with a visit() method with a default
implementation, and replace the visitor classes with functions. For
example::

https://gist.github.com/gvanrossum/b2ab40483805d409890ef754f67db4cb

In this version, adding a new class still requires adding an extra
case to each of the visit functions.

@brandtbucher
Copy link
Collaborator

One additional advantage over the common getattr(self, f"visit_{node.__class__.__name__}", self.generic_visit) visitor pattern is that we are much more OO-friendly. We don't use names as a proxy for types, and inheritance works very naturally.

I think ours is the pattern where it is easiest to group related functionality outside of the inheritance tree, while also respecting inheritance.

@viridia
Copy link
Collaborator

viridia commented Sep 15, 2020

Part of my problem in commenting is that I don't want to risk repeating myself, but I can't easily track down what I have already said.

So the high-level tenets here are:

  • Python is intended to be a language that supports multiple programming paradigms
  • This means that the language includes syntactic support for paradigms which are popular.
  • Historically, the OOP paradigm has been supported very strongly, but not exclusively (i.e. comprehensions are clearly inspired by functional styles)

Now, here we run into a bit of trouble because people who only know OOP and not other styles tend to take an expansive (and often dogmatic) view of what OOP is, but let's skip over that for the moment.

According to Wirth, "Algorithms + Data Structures = Programs". Or the way I like to think about it, programming paradigms are a way of organizing nouns and verbs. OOP tends to focus on nouns being the primary organizing factor, with "classes" that represent nouns arranged in a hierarchy, and containing "methods" which are the verbs.

This style tends to make it easy to introduce new nouns. It is also easy to add a new verb to a specific noun, but adding a cross-cutting verb - one that applies to many nouns - requires considerably more exertion, because now you have to update a bunch of classes in various source files.

With the pattern-matching visitor pattern, the verbs become the primary organizing principle, and the nouns are cases within each verb. This makes it easy to add new verbs, without having to go and revisit a bunch of classes, but makes it more difficult to add a cross-cutting now, i.e. a noun that is the subject of many verbs.

However, this limitation only applies to match statements that are used like a typed switch: each case clause represents a different type, so adding new types requires adding more case clauses.

A more powerful form of pattern matching is one where we match on attributes or shapes: much like duck-typing, we select objects based on characteristics that we are interested in, which may or may not be related to the specific type of that object. Is it selectable? Is it renderable? Is it loggable? Is it commutative? And so on - much like a database query, we can perform complex operations on large swathes of objects with relatively small sections of code which precisely express our intent. However, designing an object taxonomy that works this way requires substantial re-thinking of your data structures up front - simply retro-fitting pattern matching into an existing codebase may not yield as much improvement as you might want.

@gvanrossum
Copy link
Owner Author

gvanrossum commented Sep 15, 2020

One additional advantage over the common getattr(self, f"visit_{node.__class__.__name__}", self.generic_visit) visitor pattern is that we are much more OO-friendly.

Well, if you look at the Visitor example page on Wikipedia you'll see that it doesn't use this. Instead, each class has this:

class Foo(CarElement):
    def accept(self, visitor):
        visitor.visitFoo(self)

@Tobias-Kohn
Copy link
Collaborator

But I think Brandt's argument still holds. The issue with the visitor pattern here is that the class needs to specify which 'visit'-method is called. This can go two possible ways:

  • each object Foo calls the visitFoo() method;
  • an object may inherit the accept from its superclass Bar, say, so that it calls visitBar() rather than visitFoo().

In either way, it is the class hierarchy that specifies the 'similarity' of objects and classes. I would argue that pattern matching is more flexible because the differentiation between objects and classes can be adapted to the specific use case, without having to change anything in the classes themselves. In other words: pattern matching is more loosely coupled than the visitor pattern.


However, there is an even clearer advantage of pattern matching. The visitor pattern needs to anticipate any possible distinction characteristics when defining the accept-methods (however this is done). In most cases, this is only done on the basis of classes, leaving out structural factors such as the length of a sequence, or the type/structure of sub-elements, say. For instance, (2, 3, 5, 7, 11) and (2, (3, (5, (7, (11, None))))) are both tuples, but with wildly different structures. The visitor pattern cannot match this richness in a meaningful way—depending on the use case, it has to either over- or under-approximate it.

This is slightly different for statically typed OOP languages like Java. The static types of classes make many of the factors in question much more predictable than in Python. For instance, the two examples above would have quite different types in Scala, namely something like Tuple5[Int, Int, Int, Int, Int] vs. Tuple2[Int, None | Tuple2[_]]. Even with type erasure, we would still have Tuple5 vs. Tuple2 as the runtime classes. The visitor pattern is therefore slightly more powerful in such statically typed languages.

@Tobias-Kohn
Copy link
Collaborator

I really like this "Algorithms + Data Structures = Programs" / "Nouns and Verbs" approach. Thanks, @viridia for sharing this; I will have to read up on that.

May I add that the idea of redesigning your class hierarchy or data structures builds on the premise that you have full control over it. In reality, however, you have to deal with a fixed hierarchy specified by someone else's library and have to somehow deal with it as a given, immutable thing. Even if you are the original author, others might depend on the current layout already, making it almost impossible to change the overall structure.

@gvanrossum
Copy link
Owner Author

I would really like to see a cohesive section that we can add to the Motivation section of PEP 635 which shows how match is more flexible than the visitor pattern, using a worked-out example of a similar size as the "Car" example from Wikipedia, since this is one of the key points we're using to argue for match (IIRC Mark Shannon said something that amounted to "the Visitor pattern works fine").

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants