Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHP 7 - Wrong Vector/Map/Set equality #67

Open
tyx opened this issue Nov 15, 2016 · 66 comments
Open

PHP 7 - Wrong Vector/Map/Set equality #67

tyx opened this issue Nov 15, 2016 · 66 comments

Comments

@tyx
Copy link

tyx commented Nov 15, 2016

I think creating a different issue from #65 is a better idea.

Current problem

var_dump(new \Ds\Set([1, 3]) == new \Ds\Set([1, 2, 3])); // true

it makes me very sad as I definitively can't rely on Ds in my tests. And I use Ds everywhere as it does a really nice job (thanks for that anyway)

Solution

Is there any reason why this behavior ? And is it a bug or a known behavior ? Looks too big that anyone complain about it ^^

Whatever I'm definitively open to contribute event if my C skill are very old because it cannot stay like that.

I already had a look on the (clean) code.
I guess we need to implements the compare_objects handlers on the internal htable ? Looks pretty obvious, I know. May be we can even use the toArray and rely on array::compare_objects ?
Is there any stuff that prevent us to do that ?

Anyway, let me know what I can do to help on this very annoying issue.

Thanks

@rtheunissen
Copy link
Member

rtheunissen commented Nov 16, 2016

@tyx we would need to implement compare_objects on the zend objects, ie. php_ds_set etc. The internal htable would do the actual work though, eg.:

  • compare_objects handler
  • Check if the object is an instance of Set
  • Access the internal set
  • ds_set_equals(ds_set_t *a, ds_set_t* b)
  • ds_htable_equals(ds_htable_t *a, ds_htable_t *b)

I'm guessing we'd iterate through both the table's keys and values and check if they're equal? (Or in the case of a Set, only the keys as all the values would be undefined).

toArray would be a performance hit, might also run into non-scalar keys.

Sorry about this annoying issue - keen to get it implemented and tested though.

There's also the obvious win if the object being compared to is the same instance. ⚡️

@tyx
Copy link
Author

tyx commented Nov 16, 2016

Looks nice ! Thanks you for your fast answer.

toArray would be a performance hit, might also run into non-scalar keys.

Indeed !

Sorry about this annoying issue - keen to get it implemented and tested though.

Compared to the benefits we get to use your extension, it's not a big deal in fact ;)

There's also the obvious win if the object being compared to is the same instance.

Yes ! This one works actually

@rtheunissen
Copy link
Member

@nikic would this actually be implicit if we implement the get_properties handler?

@nikic
Copy link
Contributor

nikic commented Nov 16, 2016

@rtheunissen Yes-ish. I'd assume you wouldn't want to have the same comparison semantics as a property comparison would give you. In particular, those would be weak and I doubt you'd want to have Set(["foo"]) == Set([true]), right?

@rtheunissen
Copy link
Member

The only issue here is that we can't implement this in the polyfill. 🙅‍♂️

@rtheunissen
Copy link
Member

Only option would be to implement Hashable and use equals for equality checks. Would never need > and < anyway.

@shouze
Copy link

shouze commented Nov 28, 2016

@rtheunissen ok so the main issue is equality so it's not a big issue about < and >. And yes why not with Hashable, we're ok that Set, Vector, ... don't implement it ATM? This is what you've planned to solve the issue.

@rtheunissen
Copy link
Member

@shouze happy to do it with equals. Added to 2.0, see #65

@rtheunissen
Copy link
Member

This is actually more difficult than I expected. equals as part of the Hashable interface sort of scopes that equality check to a context in which the object is used as a key in a hash table. I'm not sure if that also implies general equality, ie. a case where you'd check $a === $b.

For example, if we implement equals on Collection, and we're checking each item of the collection for strict equality, do we call equals on values that implement Hashable?

Or should we create an Equality interface and have Hashable just extend that?

@shouze
Copy link

shouze commented Dec 15, 2016

First of all, to me identity comparison (===) is something else and not part of the scope of this issue.

For example, if we implement equals on Collection, and we're checking each item of the collection for strict equality, do we call equals on values that implement Hashable?

Yes why not.

Or should we create an Equality interface and have Hashable just extend that?

And yes why not, I'm not sure that it's needed but could be a nice to have.

In fact I wonder about the way you want to proceed:

  1. do you want to lookup each entries of eg a Collection when we want to check that a collection is equal to another?
  2. Or do you plan to maintain an always up2date internal hash, updated each time we mutate the collection?

@rtheunissen
Copy link
Member

Do you want to lookup each entries of eg a Collection when we want to check that a collection is equal to another?

Yes I think so. Something similar to what lodash does with _.equals. We can fail fast with things like type and size comparisons. An interesting case would be a Map, which may not have the same ordering, but could be considered "equal". {a: 1, b: 2} == {b: 2, a: 1} for example. We would have to define this for each collection.

Or do you plan to maintain an always up2date internal hash, updated each time we mutate the collection?

No I don't think this is practical, better to just iterate through to compare.

@rtheunissen
Copy link
Member

rtheunissen commented Feb 10, 2017

The issue remains that we can't implement a == check in the polyfill.
Whenever you would use $set == $another, you would have to use $set->equals($another).

@shouze
Copy link

shouze commented Feb 10, 2017

@rtheunissen yes ok for $set->equals($another) if it can help to maintain the polyfill at the same level of features alongside.

@rtheunissen
Copy link
Member

I'd like to keep them consistent, otherwise it invalidates the polyfill to some extent. Maybe if and when the extension becomes a default, we can drop the polyfill.

@shouze
Copy link

shouze commented Feb 10, 2017

Yes, can be a scheduled deprecation but you're right it's probably not the time to do that as it would break adoption.

@rtheunissen
Copy link
Member

This change is actually quite complex. Whenever values are compared (filter, sort, find, etc), we now have to check if the value implements Hashable, and use its Hashable::equals if it does. Of course it's easy to write a helper like valuesAreEqual that checks this for us, but the effects are clearly outside of the Hashable domain. I'm therefore convinced that we should move the equals method of Hashable to its own interface, maybe Equality, with only a single method equals.

It's a shame that PHP still doesn't have a Comparable interface or operator overloading. I'm tempted to wait for the Comparable RFC to pass and be released, which would make the Equality interface redundant.

You can use something like $a->diff($b)->isEmpty() but that wouldn't currently call equals on any objects in the set that implement Hashable. If you know you only have scalar values in your set, you can wrap it and implement your own equals for now.

@marcospassos
Copy link

marcospassos commented Nov 9, 2017

We're are considering to replace our custom collection structures by DS, but we use collection comparisons heavily.

Java defines specific strategies for computing hash codes for each type of collection. For maps, for instance, the hash code is order insensitive (hash += hash(key) ^ hash(value)), so that the equality relation is valid across all map implementations.

Our collection library has an Equality interface, from which Hashable extends from. It makes sense for us since the concept of hash only makes sense tied to an equality relation. Defining a concept of equality relation between collections also requires defining how the hash codes should be computed consistently as per collection type (it should be part of the interface contract). In Java, for instance, two maps are equal if both have the same EntrySet, while two sets are equal if both have the same elements. The rule for determining if two objects are equals is as follows:

  • Two identical objects are always equal
  • Two Hashable objects $a and $b, are equal if $a->equals($b) and $b->equals($a)
  • Everything else is unequal

All methods should follow these rules, including diff, intersect, remove, etc.

Despite the issues related to the implementation of operation comparison overloading, it would be really helpful to have an equals method that allows us testing collections for equality. It is not a BC break and does not require complex changes as implementing operation overloading does.

About the Comparable RFC, it's an old discussion and I don't believe it will be added to the core anytime soon.

@rtheunissen
Copy link
Member

It's is not a BC break and does not require complex changes as implementing operation overloading does.

That's true, but changing the behaviour of diff, intersect, remove, etc could be considered significant enough to add as part of 2.0 (end 2017).

I agree that the collections should implement Hashable, and I think following the direction you've laid out makes sense. No need for the overloaded op just yet, that can come later.

@marcospassos
Copy link

marcospassos commented Nov 10, 2017

Constraining the roadmap according to a polyfill is a big mistake IMHO. The polyfill makes no sense for production use. For any other purpose, I believe that an IDE plugin or just stubs are the way to go.

@marcospassos
Copy link

marcospassos commented Nov 10, 2017

We have another problem. If we use the equality operator, then type juggling will break our plans. Furthermore, the operator == is order insensitive in regards to arrays. So, is there any way to reuse the strict comparison mechanism to compare arrays considering Hashable? It would make the operator itself untouched, but it could be used by the equalsmethod.

@rtheunissen
Copy link
Member

rtheunissen commented Nov 10, 2017

Furthermore, the operator == is order insensitive in regards to arrays

Not true! https://3v4l.org/bbTb0 I was wrong here, that test was misleading because [0, 1] and [1, 0] might be the same values but their keys are 0 and 1, and therefore not equal pairs. If you were to try [0 => 0, 1 => 1] and [1 => 1, 0 => 0] you'll see that order doesn't matter.

@rtheunissen
Copy link
Member

type juggling will break our plans

How so?

@marcospassos
Copy link

marcospassos commented Nov 10, 2017

From the docs:

Example Name Result
$a + $b Union Union of $a and $b.
$a == $b Equality TRUE if $a and $b have the same key/value pairs.
$a === $b Identity TRUE if $a and $b have the same key/value pairs in the same order and of the same types.
$a != $b Inequality TRUE if $a is not equal to $b.
$a <> $b Inequality TRUE if $a is not equal to $b.
$a !== $b Non-identity TRUE if $a is not identical to $b.

http://php.net/manual/en/language.operators.array.php

type juggling will break our plans
How so?

[1, true, false] == ["1", 1, 0]

@rtheunissen
Copy link
Member

Ahh... well don't use arrays then. ;)

@marcospassos
Copy link

marcospassos commented Nov 10, 2017

It's not a choice. There is no way to constrain it in an application.

@rtheunissen
Copy link
Member

rtheunissen commented Nov 10, 2017

I stand corrected, "the operator == is order insensitive in regards to arrays" https://3v4l.org/jdIBv

@marcospassos
Copy link

Yeah, maybe the behavior changed in PHP 7 but the doc is outdated. Anyway, the major issue here is the type juggling.

So, is there any way to reuse the strict comparison mechanism to compare arrays considering Hashable? It would make the operator itself untouched, but it could be used by the equalsmethod.

?

@rtheunissen
Copy link
Member

I think we should try to stay as close to arrays as possible. 😞

  • Ordering doesn't matter
  • == comparison on everything?

@marcospassos
Copy link

marcospassos commented Nov 10, 2017

I personally don't use the operator ==. I try to avoid it as much as possible.

I think we should try to stay as close to arrays as possible. 😞

Stay close to array implies that order matters. About the equality relation, just close the eyes for arrays is a big inconsistency IMHO. Resolving this issue pushes this library to the next level 🚀

@rtheunissen
Copy link
Member

Stay close to array implies that order matters

I thought we just agreed that arrays don't consider order important?

[0 => 'a', 1 => 'b'] == [1 => 'b', 0 => 'a']

@marcospassos
Copy link

In the context of arrays, the order is relevant. For loose comparison, it is not. However, an equals method can never rely on type juggling, it makes no sense. The only way I see it working is performing a strict comparison, exactly as === does, with an exception only for Hashable objects. This logic must be applied to array elements as well. IMHO, it's the cleanest and safest approach.

@rtheunissen
Copy link
Member

Damn it, PHP. Not making this easy for us. 😠 But there's a solution here, just have to find it.

@marcospassos
Copy link

How about storing arrays in the Map as Maps, transparently to the developer? When an array is put into a Map, it became immutable as you cannot pass it as a reference. Storing it as a Map allows us labeling it with a hash code which eliminates most part of the inequalities. For the other cases, we can use the comparison logic recursively, calling the equals method on the wrapped array.

@rtheunissen
Copy link
Member

Not a crazy idea, but I don't like the idea of putting an array in and getting a collection out.

@rtheunissen
Copy link
Member

Let's just say that arrays use === comparison, end of story. If you need proper equality comparison, use a collection. or we can recursively descend into arrays and use the same equality checks. But that feels inconsistent and unexpected.

@marcospassos
Copy link

Not a crazy idea, but I don't like the idea of putting an array in and getting a collection out.

This is why I said it should be handled transparently to the developer, so an array should be returned.

Let's just say that arrays use === comparison, end of story. If you need proper equality comparison, use a collection. or we can recursively descend into arrays and use the same equality checks. But that feels inconsistent and unexpected.
Ignoring hashable inside arrays feels inconsistent as well :)

@rtheunissen
Copy link
Member

$ php -a
Interactive shell

php > var_dump(['a' => 1, 'b' => 2] === ['b' => 2, 'a' => 1]);
bool(false)
php > var_dump(['a' => 1, 'b' => 2] === ['a' => 1, 'b' => 2]);
bool(true)
php > var_dump([false, true] === [0, 1]);
bool(true)

Here's the policy:

  • Objects are compared with equals or === if they don't implement Equality.
  • Everything else is compared with ===.
  • Objects within arrays that implement Equality will honour the interface.

The third point here is the important one. We don't need to do anything clever or hacky to make this work, all we need to do is define the internal comparison behaviour.

The only cost of doing this is that we can't support it in the polyfill, which supports the argument that the polyfill is not of practical value and should be abandoned.

@marcospassos
Copy link

marcospassos commented Nov 10, 2017

@rtheunissen 100% agree with you. Any chance of getting this working anytime soon?

@rtheunissen
Copy link
Member

@marcospassos I don't want to release it under 1.x, but I could implement this and only this on a new branch? And merge that into 2.0 later on. But I have a lot on my plate at the moment so can't guarantee anything soon.

Do you need equals more than you need them to be hashable also? If that's the case, could you create your own collection that uses Map as the base and implement equals yourself in the meantime?

@rtheunissen
Copy link
Member

@php-ds php-ds deleted a comment from marcospassos Nov 10, 2017
@Fleshgrinder
Copy link

Fleshgrinder commented Nov 28, 2017

Interesting to see the typical PHP issues that internals claims are no issues being discussed here.

There are two kinds of relations between types, those that have a equivalence relation and those those with partial equivalence relations. This also matches what proper programming languages actually provide (e.g. Rust).

The thing regarding order is actually very simple from a math point of view. Two collections (set, map, …) are equal if they contain the same elements, regardless of order. This is why every proper language provides different types for e.g. set and sorted set as well as map and sorted map.

Another thing that is very important to understand is that equality is not covered by comparability. The same ordering of something does not directly translate to equality. I explained this once in internals but the best and easiest example is infinity.

Infinity is not smaller nor greater than infinity, however, infinity does not equal infinity.

INF.equals(INF) // false
INF.compareTo(INF) // 0

There are more examples with date and time but I think this is sufficient as an explanation.

Also, just for your consideration, providing methods to call is faster than overloading the operators (a lot if you have many comparisons). I benchmarked this once with a C implementation where I allowed overloading for all operators with marker interface as the ones I showed here.

PS: Sticking to what PHP already does might seem like a good idea from a familarity point of view. Doing so also means that it is, well, mostly wrong.

PPS: Note that there are also totally ordered sets and partially ordered sets, however, this discussion is about equivalence relations only so I did not include it.

@rtheunissen
Copy link
Member

rtheunissen commented Dec 22, 2017

Coming back to the plan of action for 2.0 here:

  • Override == for all collections to determine equality. instanceof internally?
  • Have === compare object reference.
  • Implement Hashable for all collections, where equals is equivalent to a ==.

Consequences:

  • Separate equals from Hashable
  • Abandon the polyfill 🔪

@rtheunissen
Copy link
Member

Hi everyone. 👋

My plan for comparison behavior in 2.0:

  • Abandon the polyfill 🔪
  • Hash-based structures only support object-keys that implement Hashable.
  • interface Hashable extends Equatable
  • Overload == for all classes that implement Equatable.
  • Use the equivalent of === for everything else (scalars, arrays, etc).
  • Two objects can only be equal if they are also instances of the same class.
  • The only Hashable collection will be Tuple, which is immutable.

@shouze
Copy link

shouze commented Mar 22, 2019

@rtheunissen this is the right decision to me. I've abandoned this extension because of this kind of issues and honestly if all of that is because of the polyfill kill it!

@hopeseekr
Copy link

It's such a shame to abandon the polyfill.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants