Skip to content
This repository has been archived by the owner on Nov 21, 2022. It is now read-only.

Literal comparison for True, False, None #139

Open
stereobutter opened this issue Aug 17, 2020 · 36 comments
Open

Literal comparison for True, False, None #139

stereobutter opened this issue Aug 17, 2020 · 36 comments
Labels
accepted Discussion leading to a final decision to include in the PEP fully pepped Issues that have been fully documented in the PEP

Comments

@stereobutter
Copy link

stereobutter commented Aug 17, 2020

As much as I disagree with @markshannon about the general usefulness of pattern matching I agree with him (and previously didn't think about) cases like

match key:
    ...
    case True:
        key = 'true'
    case False:
        key = 'false'
    case int():
        key = _intstr(key)
    ...

where for match 1: the case True is selected. The reverse thing happens when one puts case int() before case True, then for match True the case int() is selected.

I also noticed the discussion in #16 which mentions the handling of True and 1 and 1.0 (the latter matching case True is even more surprising imho.)

This is probably not what people expect (I didn't) and also not that helpful when match is going to be used a lot precisely in cases like the above where a function's return depends on the type of its argument (and some literal values). I'd expect people to think of True and False more of constants that they want to literally match a value against and less of some_value == True style equality checking. If they wanted equality checking case x: x == True is much more explicit about what it does than the current behavior of using equality checking behind the scene.

@stereobutter
Copy link
Author

stereobutter commented Aug 17, 2020

match x:
    case True: ...
    case 1: ...
    case 1.0: ...

would work intuitively if the check for literals were isinstance(x, type(literal)) and x == literal.

def check(match, case):
    return isinstance(match, type(case)) and match == case
[(f'match {match} case {case}', check(match, case)) for match in (True, 1, 1.0) for case in (True, 1, 1.0)]
>>>
[('match True case True', True),
 ('match True case 1', True),
 ('match True case 1.0', False),
 ('match 1 case True', False),
 ('match 1 case 1', True),
 ('match 1 case 1.0', False),
 ('match 1.0 case True', False),
 ('match 1.0 case 1', False),
 ('match 1.0 case 1.0', True)]

Notice though that case True must come before case 1 since for x = True isinstance(True, type(1)) and True == 1 still is true so True matches case 1. To be able and resolve ambiguity between True and 1 (and False and 0) one would need to special case bool and the check for literals would be something similar to:

def check(match, case):
    if isinstance(match, bool):
        return match is case
    else:
        return isinstance(match, type(case)) and match == case
examples = [False, True, 0, 0.0,  1, 1.0]
[(f'match {match} case {case}', check(match, case)) for match in examples for case in examples]
>>>
[('match False case False', True),
 ('match False case True', False),
 ('match False case 0', False),
 ('match False case 0.0', False),
 ('match False case 1', False),
 ('match False case 1.0', False),
 ('match True case False', False),
 ('match True case True', True),
 ('match True case 0', False),
 ('match True case 0.0', False),
 ('match True case 1', False),
 ('match True case 1.0', False),
 ('match 0 case False', False),
 ('match 0 case True', False),
 ('match 0 case 0', True),
 ('match 0 case 0.0', False),
 ('match 0 case 1', False),
 ('match 0 case 1.0', False),
 ('match 0.0 case False', False),
 ('match 0.0 case True', False),
 ('match 0.0 case 0', False),
 ('match 0.0 case 0.0', True),
 ('match 0.0 case 1', False),
 ('match 0.0 case 1.0', False),
 ('match 1 case False', False),
 ('match 1 case True', False),
 ('match 1 case 0', False),
 ('match 1 case 0.0', False),
 ('match 1 case 1', True),
 ('match 1 case 1.0', False),
 ('match 1.0 case False', False),
 ('match 1.0 case True', False),
 ('match 1.0 case 0', False),
 ('match 1.0 case 0.0', False),
 ('match 1.0 case 1', False),
 ('match 1.0 case 1.0', True)]

@stereobutter stereobutter changed the title Literal comparison for True and False Literal comparison for True, False, 1, 0, 1.0, and 0.0 Aug 17, 2020
@JelleZijlstra
Copy link

I agree that this behavior is confusing, but your proposed change could also lead to confusing behavior: if I pass say a numpy.float64(1.0), I'd probably expect it to match 1.0, but it's a different type.

Also, this same behavior would happen if you did a series of if x == checks (although to be fair, there you'd have the option of writing x is True for the bool). In practice this may not be a big deal because people aren't terribly likely to mix bools and ints or floats in the same variable.

@gvanrossum
Copy link
Owner

Yeah, I has something like this (an isinstance check based on the type of the literal) in an earlier version, but there were too many surprising corner cases. To start, ‘case 3.0’ should match int as well as float, according to the numeric tower. I played with isinstance and the types from the numbers module, but those are too slow.

The only part here that does make sense is to use ‘is’ instead of ‘==‘ for True, False and None.

@stereobutter
Copy link
Author

stereobutter commented Aug 17, 2020

@gvanrossum Proposing that is should be used instead of == for True and False is actually why I first opened the thread and I agree that this solves most of the issues because the average python developer is probably aware of the numeric tower (minus the True and False foot gun). I just made the alterations above and showed check to point out that maybe not using the numeric tower is also an option since match is a totally new sublanguage anyway and distinguishing between booleans, ints, floats (how are bytes and strings currently handled btw?) etc. is something where match could really shine (without the developer having to be aware of all the intricacies involved)

@gvanrossum
Copy link
Owner

Let’s add this to the next revision of the PEP, once the SC has sent us their feedback (expected this week).

@gvanrossum gvanrossum added needs more pep An issue which needs to be documented in the PEP open Still under discussion labels Aug 17, 2020
@stereobutter
Copy link
Author

stereobutter commented Aug 17, 2020

if I pass say a numpy.float64(1.0), I'd probably expect it to match 1.0, but it's a different type.

@JelleZijlstra Actually numpy.float64(1.0) would match case 1.0 since issubclass(numpy.float64, float) is true.

@JelleZijlstra
Copy link

@JelleZijlstra Actually numpy.float64(1.0) would match case 1.0 since issubclass(numpy.float64, float) is true.

Oops, you're right. However, issubclass(numpy.int64, int) is False, so let's pretend I said numpy.int64(1) instead (or numpy.float32(1.0), which is also not a subclass of float). I suppose it's because float64 has the same range as builtins.float, while int64 has a smaller range because builtins.int is arbitrary precision.

I agree with using is for builtin singletons in pattern matching.

@stereobutter
Copy link
Author

stereobutter commented Aug 17, 2020

@JelleZijlstra yeah I just noticed that numpy.float16(1.0) would not match case 1.0 which I'd think might actually be a feature since someone who goes to the length of using numpy specific types might actually care about distingusing numpy types from regular types e.g to use match to implement a numpy-optimized version of some function if a numpy type is passed or use some specific precision logic for numpy types.

@stereobutter
Copy link
Author

@gvanrossum
Another argument against using the numeric tower would be that match is conceptually much more about destructuring objects and dispatching on types than about comparing values i.e. switch-case semantics. In that light using isinstance checks instead of == for matching literal values does make a ton of sense (to me at least ;) )

@gvanrossum
Copy link
Owner

I'm beginning to think that Rust was right after all to disallow floats in patterns altogether. This made it into the PEP's Rejected ideas section, and I don't think it would be quite right for Python. E.g. what would we do if a value pattern had a float value? Come to think of it, value patterns currently strictly use ==. If we were to change the semantics of case True, would that mean we'd also change the semantics of case x.y if x.y happens to be the constant True?

One more concern with simple isinstance checks vs. the numeric tower: int is not a subclass of float, but Python does a lot of work to make integers acceptable wherever floats are. (E.g. the change to division semantics was motivated by this.) PEP 484 also considers int a subtype of float. So I think it would be uncool if this didn't work:

match 42:
  case 42.0: print("The answer")
  case _: assert False

@stereobutter
Copy link
Author

stereobutter commented Aug 17, 2020

I totally forgot about PEP 484 considering int a subtype of float and you are of course right that the distinction between 1 and 1.0 does not matter much in typical python code. For True and False however I think special casing is warranted since treating True and False as integers is rather less common (I'd personally say this is borderline discouraged). Maybe for pedantic people like me Literal[42.0] could be (ab-)used to signal that "yes I literally mean 42.0, not 42 thank you very much".

@stereobutter
Copy link
Author

stereobutter commented Aug 17, 2020

curious question: would the parser be able to distinguish float(42.0) and 42.0? Then if someone wanted to be really pedantic about their floats and ints one might be able to write:

match value:
    case float(42.0): ...
    case int(42): ....

@brandtbucher
Copy link
Collaborator

curious question: would the parser be able to distinguish float(42.0) and 42.0? Then if someone wanted to be really pedantic about their floats and ints one might be able to write:

match value:
    case float(42.0): ...
    case int(42): ....

This already works the way you expect it to.

@stereobutter
Copy link
Author

stereobutter commented Aug 17, 2020

@brandtbucher I didn't know that; thanks a lot. Using that syntax it's possible to write

match x:
    case bool(True):...
    case int(1): ...
    case float(1.0): ...

however the issue with x=True matching case int(1) remains so one has to be careful about the order of the case statements (someone should probably tell @markshannon ;) )

... also now that we know how to distinguish (bool), int and float... how does one spell just any old number; case Number(x): ... with Number from numbers doesn't seem to work (of course there is always case int(x) | float (x): ... but I would have really liked if Number just worked)

@brandtbucher
Copy link
Collaborator

brandtbucher commented Aug 17, 2020

... also now that we know how to distinguish (bool), int and float... how does one spell just any old number; case Number(x): ... with Number from numbers doesn't seem to work (of course there is always case int(x) | float (x): ... but I would have really liked if Number just worked)

Not exactly sure what you want without more helpful examples, but perhaps case x := Number() or case Number() if value == x fits your need?

Certain built-ins like bool, int, and float are special-cased for the above behavior, so it won't just work with any class.

@stereobutter
Copy link
Author

stereobutter commented Aug 17, 2020

I though about something like json-serialization e.g.

def encode(value):
    match value:
        case None: return 'null'
        case bool(x): return str(x).lower()
        case str(x): return x
        case int(x) | float(x): return x
        case {**kwargs}: return {encode(k): encode(v) for k, v in kwargs.items()}
        case [*args]: return [encode(e) for e in args]
        case obj: raise ValueError(f'{obj} is not serializable')

side note: maybe this example, trivial as it is, should be part of the PEP because everyone knows json and it is a good example of the type of application where match really shines.

@stereobutter
Copy link
Author

stereobutter commented Aug 17, 2020

maybe this example, trivial as it is, [...]

it actually isn't all that trivial since there are also float('-inf'), float('inf') and float('nan') the first two don't currently appear to compare correctly using the syntax case float('inf') even though == works for them; while for float('nan') comparison with == doesn't work at all.

It would really be a shame if json serialization (with all its weird edge cases) could not be written as something like:

def encode(value):
    match value:
        case None: return 'null'
        case str(x): return x
        case bool(x): return str(x).lower()
        case undef:= float('inf') | float('-inf') | float('nan'): return str(undef)
        case int(x) | float(x): return x
        case {**mapping}: return {encode(k): encode(v) for k, v in mapping.items()}
        case [*iterable]: return [encode(e) for e in iterable]
        case obj: raise ValueError(f'{obj} is not serializable')

@JelleZijlstra
Copy link

You could do it with a guard, something like case float(x) if math.isnan(x) or math.isinf(x).

@gvanrossum
Copy link
Owner

For True and False however I think special casing is warranted

Yes. Also for None. These are all final types BTW. And I'd say only for these three we should use is instead of ==.

For the other stuff, I believe we should not change anything. And given the dark edge cases of the JSON example I don't think we should add it to the PEP (it's already too long).

@brandtbucher
Copy link
Collaborator

Agreed. I guess the only remaining question here is:

If we were to change the semantics of case True, would that mean we'd also change the semantics of case x.y if x.y happens to be the constant True?

I'm neutral.

@gvanrossum
Copy link
Owner

If we were to change the semantics of case True, would that mean we'd also change the semantics of case x.y if x.y happens to be the constant True?

I'm neutral.

Yeah, that's a tough one. For case True and so on the code generator can easily generate different code. But I imagine that for case x.y and other value patterns the code generator currently just loads the value and then compares it to the target using == -- it would seem awkward to have to generate special cases for True, False and None there. So I'm inclined not to change the semantics of value patterns, and always use == for those. After all people shouldn't be creating aliases for True or False (let alone None, oh horror :-) to be used as "constant" values; they should be creating enums.

@brandtbucher
Copy link
Collaborator

brandtbucher commented Aug 17, 2020

Well, in terms of implementation it would probably just be a new opcode that does the correct comparison at runtime, rather than separate compile-time code paths. But your rationale for not wanting to do this is sound, and I think it's easier to explain.

@Tobias-Kohn
Copy link
Collaborator

Agreed. I would keep the semantics bound to syntax as close as possible, i.e. not use sometimes is and sometimes == when comparing to constants.

The idea that case True: really should mean True and nothing else makes a lot of sense to me. However, when using constants, it also feels more natural that case consts.my_true: matches anything that is ==-equal to whatever my_true is. There is still the option to use something like case bool(consts.my_true): or case x if x is consts.my_true: if the type or is semantics are really important—with the plus that the more verbose syntax stresses the tighter constraints wanted there.

@gvanrossum gvanrossum changed the title Literal comparison for True, False, 1, 0, 1.0, and 0.0 Literal comparison for True, False, None Aug 17, 2020
@viridia
Copy link
Collaborator

viridia commented Aug 17, 2020

One of the early arguments for the (deferred) custom matcher feature was to be able to have matchers of the form equals(x) or isExactly(x), which I think reads more intuitively than bool(True).

@stereobutter
Copy link
Author

@viridia I think it would be very unfortunate if we had to (import and) use a helper function for dealing with one of the primitive data structures in the language. Not to mention that we'd need to wait for the custom matcher protocol (that I am very much looking forward to)

@Tobias-Kohn
Copy link
Collaborator

@viridia This probably addresses the numerous ideas brought forward for wanting to have something like case ==x and case is x. However, as much as I like the idea of the custom matcher in general, I think the issue with the NFT-trio here is a slightly different one: as I understood it, the main idea is to make case True etc. more behave as you would expect them, not to avoid syntax like case bool(True): as such. So, I wouldn't want to go down that route with a general is vs == debate right now.

@gvanrossum
Copy link
Owner

gvanrossum commented Sep 14, 2020

I am putting this in the revised PEP (PEP 634, python/peps#1598), so I am labeling this as "accepted" and "fully pepped". None, True and False will be compared using is instead of ==. @brandtbucher Would this be easy to implement in your branch?

@gvanrossum gvanrossum added accepted Discussion leading to a final decision to include in the PEP fully pepped Issues that have been fully documented in the PEP and removed needs more pep An issue which needs to be documented in the PEP open Still under discussion labels Sep 14, 2020
@brandtbucher
Copy link
Collaborator

brandtbucher commented Sep 14, 2020

Yep, just two lines in the compiler. I assume this is only for literals, not constant value patterns that evaluate to True/False/None. Otherwise we'll need a new opcode, but it's still not too bad.

@gvanrossum
Copy link
Owner

Indeed, nothing changes for value patterns.

@brandtbucher
Copy link
Collaborator

This has been implemented.

@dmoisset
Copy link
Collaborator

I was just re-reading the spec in PEP-634 and found something which I think is misspecified around this. It's essentially about literal patterns appearing as mapping keys, i.e. a pattern like {True: var}. If I use {1: "foo"} as a subject, does it match?

I'm guessing the reasonable and implementable answer is "yes", although our spec says A mapping pattern succeeds if every key given in the mapping pattern matches the corresponding item of the subject mapping. which is False for this example

@gvanrossum
Copy link
Owner

Oh, good eye for detail! Indeed, that totally didn't get rephrased when we changed how True/False/None are matched. Given how dict key lookup works,

match {1: "foo"}:
    case {True: var}: print(var)

should totally print "foo". Can you add a separate PR to address this in PEP 634?

@dmoisset
Copy link
Collaborator

Added to python/peps#1675

@ghost
Copy link

ghost commented Mar 17, 2021

I've looked again at the Pep and also in the issues here on github but I didn't find any mention of Ellipsis (except for some proposals to use ... instead of _ as wildcard). If None, True and False are special cased Ellipsis should probably be too? I have on occasion used ... as a placeholder where I couldn't use None because I had to distinguish between no result and the result being None. See also https://stackoverflow.com/a/120055.

@gvanrossum
Copy link
Owner

gvanrossum commented Mar 17, 2021 via email

@brandtbucher
Copy link
Collaborator

@sdesch see also this comment.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
accepted Discussion leading to a final decision to include in the PEP fully pepped Issues that have been fully documented in the PEP
Projects
None yet
Development

No branches or pull requests

7 participants