Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for Edge equality and hashing #249

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

peternowee
Copy link
Member

These tests test Edge equality comparison and hashing for regressions and point out some current issues.

For equality comparison (Edge.__eq__()):

For hashing (Edge.__hash__()):

Some tests are based on problems seen with the bug and patches reported in issue pydot/pydot#92. Some other tests are based on examples from articles by Aaron Meurer and Roy Williams and adapted to our Edge class.

The tests that are currently failing have expectedFailure decorators that will be removed as their underlying issues are addressed in future commits.

The PR for this commit is pydot/pydot#249. It should get links to spin-off issues and PRs that discuss the various issues brought up by these tests in more detail.

Additional remark: This PR is based on the fix of commit e4742e91 of PR #248, because if it was based on current master without that fix, the results for test_edge_equality_basics_3_same_points_not_not_equal would differ between Python 2 and Python 3 and it would be difficult to mark it as an expected failure only for Python 2. Please post comments on commit e4742e91 in #248, not here.

Currently, a non-equality comparison (`!=`) of two separate, but equal
`Edge` instances returns `False` as expected on Python 3, but `True` on
Python 2. There is no such difference in equality comparison (`==`,
`True` in this case) between Python 2 and 3.

The difference for non-equality is caused by the lack of an
`Edge.__ne__()` method. In Python 3, [`__ne__()` by default delegates
to `__eq__()`][1], but in Python 2 there are [no implied relationships
among the comparison operators][2], resulting in a fallback to
[identity-based comparison][3].

This commit fixes that by defining an `Edge.__ne__()` method only for
Python 2.

There is some [discussion about how to best implement an `__ne__()`
method][4], centering on the following two alternatives:

    def __ne__(self, other):
        return not self == other

or variations of:

    def __ne__(self, other):
        equal = self.__eq__(other)
        return equal if equal is NotImplemented else not equal

The discussion goes from optimal delegation order to symmetry to
performance and so forth. In short, I got the impression they are never
going to convince each other, because the first solution focuses more
on the `self` and `!=` and `==` being complements, whereas the second
solution focuses more on the symmetry of `!=` when swapping `self` and
`other`. It seems there are cases where you simply cannot have both. In
the end, I just tested both solutions against a variety of made-up
classes and chose the second solution, because it exactly mimicked the
Python 3 behavior in all cases, whereas the first clearly gave
different results sometimes, e.g. against a class with comparison
methods that always return `False`. Although such cases may be called
pathological, the main goal of this commit is to get the behavior on
Python 2 in line with the default behavior of Python 3 that we are used
to, not to come up with a better (or worse) custom `Edge.__ne__()` for
Python 3 as well.

Note that on comparison with non-`Edge` objects, `Edge.__eq__()`
currently raises an `Error` instead of returning `NotImplemented`, so
this will also happen for `Edge`-to-non-`Edge` non-equality comparisons
on Python 2 now. This is consistent with existing behavior on Python 3.
Changing this to `NotImplemented` will be proposed later.

API impact: I expect the number of affected users to be minimal,
because the old behavior on Python 2 was quite surprising, yet we never
had any bug reports about it. I assume `Edge` objects are seldom
checked for non-equality, or for non-identity by checking non-equality.
Also, any breakage may already have been discovered when running code
on Python 3 sometimes. I have not seen any explicit promises made for
the old behavior on Python 2.

[1]: https://docs.python.org/3.9/reference/datamodel.html#object.__eq__
[2]: https://docs.python.org/2.7/reference/datamodel.html#object.__eq__
[3]: https://docs.python.org/2.7/reference/expressions.html#value-comparisons
[4]: https://stackoverflow.com/questions/4352244/should-i-implement-ne-in-terms-of-eq-in-python/
These tests test `Edge` equality comparison and hashing for regressions
and point out some current issues.

For equality comparison (`Edge.__eq__()`):

- Our current definition of edge equality: Same endpoints and, if the
  parent graph is directed, the same order of endpoints. (Changes to
  that definition may require changing some of these tests as well.)
- [Identical objects should compare equal][1] (reflexivity).
- [Comparisons should be symmetric][1].
- Rich comparison methods should [return `NotImplemented` if they do
  not implement the operation for the operands provided][2] (see also
  [here][3]).
- Availability of equality comparison before the edge has a parent
  graph.

For hashing (`Edge.__hash__()`):

- [Hashable objects which compare equal must have the same hash
  value][4] (see also [here][5]).
- [The hash value of an hashable object should never change during its
  lifetime][4] (see also [here][5]).
- Try to prevent hash collisions, because they may degrade performance.
- Availability of the hash method before the edge has a parent graph.

Some tests are based on problems seen with the bug and patches reported
in issue [pydot#92][6]. Some other tests are based on examples
from articles by [Aaron Meurer][7] and [Roy Williams][8] and adapted to
our `Edge` class.

The tests that are currently failing have [`expectedFailure`][9]
decorators that will be removed as their underlying issues are
addressed in future commits.

The PR for this commit is [pydot#249][10]. It should get links to
spin-off issues and PRs that discuss the various issues brought up by
these tests in more detail.

[1]: https://docs.python.org/3.9/reference/expressions.html#value-comparisons
[2]: https://docs.python.org/3.9/reference/datamodel.html#the-standard-type-hierarchy
[3]: https://docs.python.org/3.9/reference/datamodel.html#object.__eq__
[4]: https://docs.python.org/3.9/glossary.html#term-hashable
[5]: https://docs.python.org/3.9/reference/datamodel.html#object.__hash__
[6]: pydot#92
[7]: https://www.asmeurer.com/blog/posts/what-happens-when-you-mess-with-hashing-in-python/
[8]: https://eng.lyft.com/hashing-and-equality-in-python-2ea8c738fb9d
[9]: https://docs.python.org/3.9/library/unittest.html#unittest.expectedFailure
[10]: pydot#249
Copy link

This pull request has conflicts, please resolve those so that the changes can be evaluated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simplify throws NameError, and, as implemented, doesn't work
1 participant