Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distinct on a Value Type doesn't seem to work in all cases #389

Open
CyberBotX opened this issue Jul 2, 2023 · 0 comments
Open

Distinct on a Value Type doesn't seem to work in all cases #389

CyberBotX opened this issue Jul 2, 2023 · 0 comments

Comments

@CyberBotX
Copy link

I am using the FatValueType from NetFabric's own LINQ benchmarks project (https://github.com/NetFabric/LinqBenchmarks), except I've made it a readonly record struct instead of just a struct, so it basically becomes the following:

public readonly record struct FatValueType
{
	public readonly int Value0 { get; }
	public readonly long Value1 { get; }
	public readonly long Value2 { get; }
	public readonly long Value3 { get; }
	public readonly long Value4 { get; }
	public readonly long Value5 { get; }
	public readonly long Value6 { get; }
	public readonly long Value7 { get; }

	public FatValueType(int value)
	{
		this.Value0 = value;
		this.Value1 = value;
		this.Value2 = value;
		this.Value3 = value;
		this.Value4 = value;
		this.Value5 = value;
		this.Value6 = value;
		this.Value7 = value;
	}

	public readonly bool IsEven() => (this.Value0 & 0x01) == 0;

	public static FatValueType operator +(in FatValueType left, in FatValueType right) => new(left.Value0 + right.Value0);

	public static FatValueType operator *(in FatValueType left, int right) => new(left.Value0 * right);
}

I'm using this in my own set of LINQ benchmarks, and I found that despite that EqualityComparer<FatValueType>.Default.GetHashCode() returns the same value for two identical instances of this value type, in the array that contains 4 distinct copies of each value, instead of Hyperlinq returning 100 values, it returns 162. The first 100 values are the first 100 from the original source, but the following 62 are the ones where Value0 is between 1 and 63, except for 32.

From tracing the code in a debugger, I find that it seems like Hyperlinq's Set<T> implementation might be at fault. I am not sure why it is failing in this case, but it seems that after it has added the first 100 items, the 101st item (which is when Value0 is 0) is correctly found as being in the set, but the 102nd item (which is when Value0 is 1) is not correctly found as being in the set.

Probably the simplest way I found to duplicate the problem, without knowing how to fix it, is with the following:

Enumerable.Range(0, 20).Select(i => new FatValueType(i)).Concat(Enumerable.Range(1, 10).Select(i => new FatValueType(i))).AsValueEnumerable().Distinct()

This should return a set of 20 values (0 through 19), but it instead returns all 30 values of the original enumerable.

This problem does not seem to plague primitive types such as int, as if the Select statements are removed from the above, it only returns 20 values. I believe it also seems to affect reference types too, such as the FatReferenceType that is also in NetFabric's LINQ benchmarks project, since when I make an IEqualityComparer<T> class for FatReferenceType to compare its field1 value, despite that it returns true for the 1st and 21st values in the above (when FatValueType is replaced by FatReferenceType and an instance of the comparer is passed to Distinct), the above also returns 30 values instead of 20.

If I had to venture a guess as to why it is failing, it could be because of how Set<T> in Hyperlinq handles resizing itself, but I can't say for sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant