Skip to content
This repository has been archived by the owner on Nov 21, 2022. It is now read-only.

Sigils for stores -- should we mark lvalues? #143

Open
viridia opened this issue Aug 25, 2020 · 7 comments
Open

Sigils for stores -- should we mark lvalues? #143

viridia opened this issue Aug 25, 2020 · 7 comments
Labels
fully pepped Issues that have been fully documented in the PEP rejected A rejected idea sc-feedback Issues raised in the steering committee feedback

Comments

@viridia
Copy link
Collaborator

viridia commented Aug 25, 2020

This was one of the key points brought up in the SC feedback, and I suspect it is the most difficult one.

IIRC correctly, the strongest arguments for using sigils for loads (instead of stores) are:

  • Stores are much more prevalent than loads in patterns, so marking loads instead of stores reduces overall syntactical clutter.
  • A key design tenet for pattern matching is that patterns should resemble the expressions used to construct the object that is being destructured. The motivation is to provide a mnemonic device for programmers learning to use patterns: if unsure of the syntax for a pattern, use the same syntax that you would use to construct the object or expression.
  • Similarly, there is a desire to be consistent with existing destructuring assignment syntax. Lvalue-references are not explicitly marked in statements of this type.

However, all of these arguments have weak points.

First, "clutter" is a not a value-neutral term. It generally refers to excessive punctuation that harms readability. But we have not yet rigorously established that store-sigils would harm comprehension overall. At least one SC commenter opined that it would help readability and comprehension to mark stores. While it is true that extra punctuation might be jarring to look at, that may only be an artifact of their newness and unfamiliarity.

Second, the design tenet for patterns resembling construction, while well-intentioned, comes into conflict with other Python design tenets when applied too rigorously. It relies heavily on another key idea: that patterns are their own syntactical context that has different rules than regular Python code. A bare reference to an identifier within a pattern means something different than it does within a Python statement or expression. The mental leap necessary for grasping this change of context is an easy one to make for compiler-geeks like the PEP authors, but (judging from the mailing list traffic) is not as easy for the average Python programmer.

Third, the existing destructuring syntax does not have to deal with a mix of both l-values and r-values. Since there are only l-values in destructuring patterns, the syntactical choices are much simpler.

There's also a compelling argument in favor of marking stores: a number of the special cases in the PEP go away, and the overall complexity of the PEP is reduced. We no longer have to distinguish between simple names and compound names. We no longer have to warn users away from using pattern matching as a 'switch' statement. (Well, there may be other reasons not to use it that way, but at least it will function as the user expects).

What sigil do I propose? At some point I think we have discussed every punctuation character in the 7-bit ASCII set, and then some. There are some characters that obviously can't work - any character that is already used as a unary operator, or is a paired delimiter (like parents) is obviously off the table. Some characters, like period, have strongly-established meanings that don't harmonize with the intended use here.

As a strawman, I would propose caret (^). This suggestion actually surfaced briefly in previous discussions, but was abandoned when we went down the "sigils for loads" route. The reason for selecting this character is that (a) it isn't obviously disqualified by the criteria of the previous paragraph, and (b) it doesn't have a lot of "ink". By that I mean that it has few dark pixels compare to white-pixels - this helps to mitigate the "clutter" critique mentioned earlier. A symbol like @ or $ has more ink and is more visually disruptive IMHO.

Note that this proposal does not entirely address some of the arguments raised previously - patterns are still syntactically special, they are just less special than before.

Under this scheme, a typical match statement might look like this:

    match expr:
        case BinaryOp(^op, ^left, ^right):
            result = \
                f"{format_expr(left, expr.precedence)} {op} {format_expr(right, expr.precedence+1)}"
            # Surround the result in parentheses if needed
            if precedence > expr.precedence:
                return f"({result})"
            else:
                return result
        case UnaryOp(^op, ^arg):
            return f"{op}{format_expr(arg, 0)}"
        case VarExpr(^name):
            return name
        case float() | int():
            return str(expr)
        else:
            raise ValueError(f"Invalid expression value: {repr(expr)}")

I honestly don't think that looks too terrible.

However, what we don't know at this point is whether making a change like this will affect the SC vote. We know at least one SC member was opposed to using sigils for loads, but we don't know if any SC members were in favor of it.

See previous issues on this topic:
#1
#90

@viridia viridia added the sc-feedback Issues raised in the steering committee feedback label Aug 25, 2020
@stereobutter

This comment has been minimized.

@gvanrossum
Copy link
Owner

@SaschaSchlemmer Please stay out of our discussion. This is now between the SC and the PEP authors, and outside interference (however well meant) is very distracting. Please just sit on your hands and watch. If you keep adding comments I may have to figure out how to ban you, or revert the repo to Private. I don't want to do either of those things, but I cannot handle too many cooks in the kitchen right now.

@gvanrossum
Copy link
Owner

A potential problem with marking lvalues is that it opens the door for allowing arbitrary expressions in patterns. If you have to do something new and special to bind a value, you could easily allow things like this (none of which bind any variables):

case (x+1, d[k], a[i+1], ",".join(a)): ...
case {f(x): x, f(y): y}: ...
case (p, q, *rest): ...

But now we'd have a problem fitting in class patterns. Is this a class pattern or just creating an object?

case int(s): ...

Is this a function call or a class pattern?

case func(a=1, b=2): ...

If we don't allow near-arbitrary expressions, basically keeping the existing proposal except requiring a ^ before a binding name, we'll definitely get pressure in the future to allow expressions -- and in the meantime the restrictions and special cases will still have to be explained to everyone learning about patterns:

  • you can use names, dotted names, literals,
  • but not function calls or subscripts or operators,
  • except | (or or), which has a special meaning,
  • and what looks like a function call really is a class pattern,
  • and list, tuple and dict literals are allowed,
  • but not set literals.

I understand that we have exactly those restrictions now, but they are currently motivated by the strong desire to have an unadorned, unqualified name be a capture pattern, and the other constructs are available to build up more complex patterns.

@Tobias-Kohn
Copy link
Collaborator

Thanks, @viridia for summarising many of the issues with the load/store semantics (or lvalue/rvalue, respectively, according to some people). There are few things I would like to reply to.

It relies heavily on another key idea: that patterns are their own syntactical context that has different rules than regular Python code. A bare reference to an identifier within a pattern means something different than it does within a Python statement or expression.

These two sentences show very clearly that we did not succeed in explaining the very basic idea of patterns in the first place. The rules for patterns are not that different to the rest of Python—if we could finally move away from comparing them to expressions! I think this is highly connected to Guido mentioning:

A potential problem with marking lvalues is that it opens the door for allowing arbitrary expressions in patterns.

Let us perhaps try and briefly recapitulate where patterns are coming from. The base form of a pattern is the name as a binding target, not a literal value like 0. Python has long introduced an extension in that the target name can also be 'tuple-like' to bind several names 'concurrently'. This can be used to deconstruct a sequence, of course. Now, with pattern matching, we basically build on this together with the question: "what if we don't know the length of a sequence and our assignment could fail?"

It is then natural to ask whether we could use the idea of deconstruction on other data structures than just sequences. And it is convenient to integrate some basic comparisons into the picture, e.g. allow patterns to have literal values. It seems to me that too many readers see this is the central aspect of it all, rather than some syntactic sugar to make life easier. Anyway, perhaps the most tricky part is how to express those 'other data structures' without people mistakenly taking them for expressions, which feeds into another of @viridia's comments:

[...] the design tenet for patterns resembling construction [...]

At least for sequences, this has been true of Python for a long time, again, actually. You can write, e.g., (a, *b) = (a, *b), and sure enough the stars on the left and right hand sides complement each other to make this semantically basically a no-op. Isn't this symmetry exactly what makes it so easy to use and remember?

Our goal is, in a way, to find some syntax to bring in arbitrary classes so that, in principle, the above would generalise to C(a, b) = C(a, b), say, where C could be any class or type that defines a structure for the data. It seems that our problem with that is that so many oppose to this, because now the left hand side looks like an expression (actually a function call). But, if we look closely again, Python is absolutely symmetric even here: def C(a, b): (just try to squint the def away ;-) ). You can even say def C(a, *b):, where the star (once again) nicely complements its use in other context.

Anyway, I feel that the entire load/store or lvalue/rvalue discussion is really quite symptomatic of a more fundamental issue, i.e. the nature of patterns in the first place.

P.S. Having all that said: if introducing a sigil like ^ really makes everyone happy and magically solves all our problems, I could live with it, although most certainly without enthusiasm.

@dmoisset
Copy link
Collaborator

The load/store problem is:

  • hard
  • there was a lot of effort invested already and nothing new has come up
  • only mentioned as a big concern by a single SC member, which is the more negative about the PEP (i.e. effort put here has less chances of affecting the outcome).

If it was up to me, I would deprioritise our focus on this issue given other things to address....

@gvanrossum gvanrossum changed the title Sigils for stores Sigils for stores -- should we mark lvalues? Aug 26, 2020
@gvanrossum gvanrossum added fully pepped Issues that have been fully documented in the PEP rejected A rejected idea labels Sep 16, 2020
@gvanrossum
Copy link
Owner

In the SC-VC we ended up deciding to keep the existing approach. However, Thomas plans to write a PEP to allow ? as a throwaway target anywhere, which would (if accepted in time for 3.10) invalidate the need for treating _ as a wildcard in patterns.

@gvanrossum gvanrossum added needs more pep An issue which needs to be documented in the PEP and removed rejected A rejected idea labels Sep 16, 2020
@gvanrossum gvanrossum added rejected A rejected idea and removed needs more pep An issue which needs to be documented in the PEP labels Oct 20, 2020
@gvanrossum
Copy link
Owner

Labeled as rejected (we're not marking lvalues) and fully pepped (PEP 635 addresses this).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
fully pepped Issues that have been fully documented in the PEP rejected A rejected idea sc-feedback Issues raised in the steering committee feedback
Projects
None yet
Development

No branches or pull requests

5 participants