Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IsHomogeneous: test all elements of a sequence for equality #698

Open
Orace opened this issue Nov 14, 2019 · 19 comments
Open

IsHomogeneous: test all elements of a sequence for equality #698

Orace opened this issue Nov 14, 2019 · 19 comments

Comments

@Orace
Copy link
Contributor

Orace commented Nov 14, 2019

I propose for:

(bool? isHomogeneous, T value) IsHomogeneous<T>(this IEnumerable<T> source, IEqualitityComparer<T> comparer = null);

DefaultEqualitityComparer<T> is used if comparer is null.
The return tuple contains:

  • isHomogeneous:
    • null if the source is empty
    • true if the source has one element or if all the elements in the source are equals to the first element relatively to comparer
    • false otherwise.
  • value: the first value of the source if isHomogeneous is true, default(T) otherwise.

I'm open to give this method any other relevant name.

@Orace Orace changed the title [Proposal] AreEquals/IsHomogeneous to test all elements of a sequence [Proposal] AreEquals/IsHomogeneous to test all elements of a sequence for equality Nov 14, 2019
@Orace
Copy link
Contributor Author

Orace commented Nov 14, 2019

The overloads can be:

(bool? isHomogeneous, TSource value) IsHomogeneous<TSource>(this IEnumerable<TSource> source);
(bool? isHomogeneous, TSource value) IsHomogeneous<TSource>(this IEnumerable<TSource> source, IEqualitityComparer<TSource> comparer);
(bool? isHomogeneous, TResult value) IsHomogeneous<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector);
(bool? isHomogeneous, TResult value) IsHomogeneous<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IEqualitityComparer<TResult> comparer);

@Orace
Copy link
Contributor Author

Orace commented Nov 14, 2019

An implementation of IsHomogeneous is available in #700:

private static (bool? isHomogeneous, T value) IsHomogeneous<T>(this IEnumerable<T> source)
{
var comparer = EqualityComparer<T>.Default;
using var e = source.GetEnumerator();
if (!e.MoveNext())
return (null, default);
var first = e.Current;
while (e.MoveNext())
{
if (!comparer.Equals(first, e.Current))
return (false, default);
}
return (true, first);
}

@Orace Orace changed the title [Proposal] AreEquals/IsHomogeneous to test all elements of a sequence for equality [Proposal] IsHomogeneous to test all elements of a sequence for equality Nov 14, 2019
@atifaziz
Copy link
Member

Sounds like Distinct + TrySingle (#540).

@atifaziz atifaziz changed the title [Proposal] IsHomogeneous to test all elements of a sequence for equality IsHomogeneous: test all elements of a sequence for equality Nov 14, 2019
@Orace
Copy link
Contributor Author

Orace commented Nov 14, 2019

@atifaziz, it is but with O(1) memory usage.

@atifaziz
Copy link
Member

Fair point but then thinking HomogeneousOrEmpty would be better? That is, return a sequence of one if all are the same or none. It's more idiomatic.

@atifaziz
Copy link
Member

@atifaziz, it is but with O(1) memory usage.

Second thoughts?

@Orace
Copy link
Contributor Author

Orace commented Nov 14, 2019

@atifaziz, it is but with O(1) memory usage.

Second thoughts?

Yep, distinc + TrySingle doesn't cost that much.

I'm ok with any name proposition.

@Orace
Copy link
Contributor Author

Orace commented Nov 14, 2019

HomogeneousOrEmpty will not distinct empty from non-homogeneous.

Here an example of use (an overload with selector predicate may be more adapted here):

for (;;)
{
var (isHomogeneous, hasNext) = enumerators.Select(e => e.MoveNext()).IsHomogeneous();
if (isHomogeneous == false)
throw new InvalidOperationException("Input sequences are of different length.");
if (!hasNext)
break;
foreach (var enumerator in enumerators)
yield return enumerator.Current;
}
}

@atifaziz
Copy link
Member

Yep, distinc + TrySingle doesn't cost that much.

It doesn't cost if it's homogeneous but could otherwise.

I'm ok with any name proposition.

First the signature then the name. Instead of asking a question or returning a Boolean, return a sequence of one if source is homogenous otherwise abort early and return an empty sequence (yield break). Joining with another sequence (e.g. Zip) would then have the right effect of returning a sequence or none.

HomogeneousOrEmpty will not distinct empty from non-homogeneous.

Sorry, I don't understand.

@Orace
Copy link
Contributor Author

Orace commented Nov 14, 2019

HomogeneousOrEmpty will not distinct empty from non-homogeneous.

Sorry, I don't understand.

If the sequence is empty my proposal return null
If it's not homogeneous it's return false
HomogeneousOrEmpty return empty in both case, so you can't made a distinction between empty and not-homogeneous

@atifaziz
Copy link
Member

atifaziz commented Nov 14, 2019

Fair point if you are after reduction without an error for an undefined state but then Is… is misleading because there is no Boolean result. You are looking for 3 states that define the homogeneity of a source:

public static TResult Homogeneity<T, TResult>(
    this IEnumerable<T> source,
    TResult ambiguous, TResult heterogeneous, Func<T, TResult> homogeneous)

@Orace
Copy link
Contributor Author

Orace commented Nov 14, 2019

Indeed it's a 3 state. And this looks like the discussion about TrySingle.

The 3 new parameters really add complexity, in my use case example it's obvious:

var (isHomogeneous, hasNext) = enumerators.Select(e => e.MoveNext()).IsHomogeneous(((bool?)null, default), (false, default), v  => (true, v)); 

And it's source of bug since we can mess up with the two first parameters.
And if we want overloads with a selector and a comparer it became impossible to use.

That's why I prefer a new class or struct as return type with a 3 state enum and the homogeneous value if any.

class HomogeneityResult<T>
{
  public Homogeneity Homogeneity { get; } // empty, homogeneous, heterogeneous
  public T Value { get; }
}

Or with bool properties:

class HomogeneityResult<T>
{
  public bool HasValue { get; }
  public bool IsHomogeneous { get; }
  public T Value { get; }
}

@Orace
Copy link
Contributor Author

Orace commented Nov 14, 2019

An other option is (optional via overloads) out parameters:

bool IsHomogeneous<T>(this IEnumerable<T> source, out bool hasValue, out T value);

But with the overloads to add a selector and a comparer it became a mess.

@atifaziz
Copy link
Member

Indeed it's a 3 state. And this looks like the discussion about TrySingle.

Yeah and now I am not happy about TrySingle. There's a sneaking problem underneath all this and I can't put my finger on it right now. I am tempted to remove TrySingle until I have had time to think about this.

The 3 new parameters really add complexity, in my use case example it's obvious:

var (isHomogeneous, hasNext) = enumerators.Select(e => e.MoveNext()).IsHomogeneous(((bool?)null, default), (false, default), v  => (true, v)); 

The complexity comes from the choice of types.

And it's source of bug since we can mess up with the two first parameters.

This is not always avoidable.

That's why I prefer a new class or struct as return type with a 3 state enum and the homogeneous value if any.

This shouldn't be MoreLINQ's problem.

var (isHomogeneous, hasNext) = enumerators.Select(e => e.MoveNext()).IsHomogeneous(); 

This use case is not worth the trouble because it abuses LINQ for side-effects besides causing too many allocations.

@atifaziz
Copy link
Member

The 3 new parameters really add complexity, in my use case example it's obvious:

var (isHomogeneous, hasNext) = enumerators.Select(e => e.MoveNext()).IsHomogeneous(((bool?)null, default), (false, default), v  => (true, v)); 

The complexity comes from the choice of types.

Let me expand on this. Suppose you introduce this in your project:

public enum TriBoolean { TriDefault, TriFalse, TriTrue }

then you can make it fairly readable:

// assume:
// using static TriBoolean;

var moves = from e in enumerators select e.MoveNext();
for (var stop = false; !stop;)
{
    switch (moves.Homogeneity(default, (TriFalse, default), moved => (TriTrue, moved)))
    {
        case (TriFalse, _):
            throw new InvalidOperationException("Input sequences are of different length.");
        case (TriTrue, true):
            foreach (var enumerator in enumerators) 
                yield return enumerator.Current;
            break;
        default:
            stop = true;
            break;
    }
}

Note that projection of moves can go outside the loop to save allocations.

That's why I prefer a new class or struct as return type with a 3 state enum and the homogeneous value if any.

This shouldn't be MoreLINQ's problem.

To expand on this too, if you provide a type, it will use generic names and won't make things necessarily more readable in the context of the actual operation. If you leave the choice up to the user then only the user is to blame or cheer for readability.

@Orace
Copy link
Contributor Author

Orace commented Nov 15, 2019

The 3 new parameters really add complexity, in my use case example it's obvious:

var (isHomogeneous, hasNext) = enumerators.Select(e => e.MoveNext()).IsHomogeneous(((bool?)null, default), (false, default), v  => (true, v)); 

The complexity comes from the choice of types.

I'm not agree with you, I need a type that can store any case (IsEmpty, IsHeteregoneous, IsHomogeneous with value T). I can't see something smaller than a tuple with a 3 state (here bool?) and a value holder.

var (isHomogeneous, hasNext) = enumerators.Select(e => e.MoveNext()).IsHomogeneous(); 

This use case is not worth the trouble because it abuses LINQ for side-effects

Again not agree, this use case is pretty straightforward, I have an IEnumerable<bool> (it is enumerators.Select(e => e.MoveNext())) and I want to know it it's Empty, Heteregoneous or Homogeneous (with the value).

I abuse linq since the enumeration stop early if an Heteregoneous case appear.
This kind of abuse are common (especially in linq):

A() && B(); // do not evaluate B is A is false
data.Select(selector).Where(validator).First(); // stop evaluation when a good case appear.

[...] besides causing too many allocations.

Should I go down to a for loop ?

Yeah and now I am not happy about TrySingle. There's a sneaking problem underneath all this and I can't put my finger on it right now. I am tempted to remove TrySingle until I have had time to think about this.

I am not happy about TrySingle neither.

My analysis: there is a separation in the Linq method.

1 The monad / fluent ones: Select, Where, Skip who take an IEnumerable as input and give an IEnumerable at output.
2 The consuming ones: Count, First, Aggregate, ...

TrySingle and the currently named IsHomogeneous are of the second type.
We expect from them to give an information about the source sequence (and to stop the enumeration a soon as the information is total).

For Count, First, Aggregate the type of the information is really simple : int, T, TResult, ...
For TrySingle and IsHomogeneous since they give more detailed information, if we want to put it in a returned object, the returned object became more complex. And the minimal form is a (TriState, ValueHolder).

There is already complex object returned in Linq like IGrouping in GroupBy

Match (from Regexp) may became a IEnumerable<char> extension method, and we expect it to return a Match object.

For me your proposed solution for TrySingle is functional programming oriented:
we have 3 case, please provide expected behavior for each case

An imperative form is:
here your result, do what ever you want with it

@Orace
Copy link
Contributor Author

Orace commented Nov 15, 2019

With a custom de-constructible HomogeneityResult<T> you can just write:

// assume:
// using static HomogeneityType;

var moves = from e in enumerators select e.MoveNext();
for (var stop = false; !stop;)
{
    switch (moves.Homogeneity())
    {
        case (Heterogeneous, _):
            throw new InvalidOperationException("Input sequences are of different length.");
        case (Homogeneous, true):
            foreach (var enumerator in enumerators) 
                yield return enumerator.Current;
            break;
        default:
            stop = true;
            break;
    }
}

And add a selector or a comparer is painless.

it will use generic names and won't make things necessarily more readable in the context of the actual operation

Example below show that the generic name rarely appear in the code and are pretty readable.

Note that projection of moves can go outside the loop to save allocations.

Good catch, I never do that since Resharper will grumble about multiple enumeration.

@leandromoh
Copy link
Collaborator

leandromoh commented Apr 25, 2020

Does this PR is similar to this other one?

@Orace
Copy link
Contributor Author

Orace commented Apr 25, 2020

Does this PR is similar to this other one?

For the 'all items equal' part yes, but this proposed method also return the actual value (in the case of an homogeneous enumerable).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants