Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add subtract to IterableExtensions #306

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

ebraminio
Copy link
Contributor

@ebraminio ebraminio commented Aug 15, 2023

Considering views this stackoverflow link has had, https://stackoverflow.com/q/3428536 (637k times atm) I think we can agree list subtract is a popular need for developers. It can have bad implementations like list(set(x) - set(y)) which results to removal of duplicated items or O(m * n) implementations like this https://stackoverflow.com/a/20259489 so as it's easy to write an incorrect implementation I believe we can implement a correct and definite implementation here. I went through mistakes of having incorrect implementations and ruining some app releases with the mistakes and now I like to share my implementation here so others don't go through issues I had.

@ebraminio ebraminio changed the title Add subtract to ListExtensions Add difference to ListExtensions Aug 15, 2023
@ebraminio ebraminio changed the title Add difference to ListExtensions Add subtract to ListExtensions Aug 15, 2023
@@ -286,6 +287,29 @@ extension ListExtensions<E> on List<E> {
yield slice(i, min(i + length, this.length));
}
}

/// Returns a new list with the elements of [other] are removed from `this`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] This returns an Iterable, not a list. Also the phrasing of "are removed" phrasing is awkward.

Maybe

/// Returns an iterable of the elements in this list that are not present in [other].

Copy link
Contributor Author

@ebraminio ebraminio Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied. I wanted to avoid 'present' as that somehow gives me the feeling that an occurrence in [other] results in removal of all same items in the result which deliberately isn't the case here (as we are aware about count and filter thing the amount they present in [other]) but I don't know to express that.

Also is it better to rename [other] to [minuend]? I renamed the argument to [other] to make it more compatible with Set's difference? I even once renamed the function to difference but felt that's can be a mistake given difference is defined in the context of sets.

///
/// It's aware about occurrence count and removes things only the amount
/// they are present in [other].
Iterable<E> subtract(List<E> other) sync* {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should document the assumptions we are making about E. I think mainly that it is is well behaved as a key in a HashMap (has a consistent hashCode and equals).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done I think

@natebosch natebosch requested a review from lrhn August 16, 2023 00:47
@ebraminio
Copy link
Contributor Author

ebraminio commented Aug 16, 2023

As this output an iterable and can work as an iterable extension, I move moved it to IterableExtension, or maybe I should revert that, or I will apply any other combination you would suggest.

@ebraminio ebraminio changed the title Add subtract to ListExtensions Add subtract to IterableExtensions Aug 16, 2023
As it solely acts on iterable and doesn't need lists.
@ebraminio ebraminio force-pushed the list-subtract branch 2 times, most recently from 2813cf0 to fd958d5 Compare August 17, 2023 08:31
@lrhn
Copy link
Member

lrhn commented Aug 17, 2023

Considering views this stackoverflow link has had, https://stackoverflow.com/q/3428536 (637k times atm) I think we can agree list subtract is a popular need for developers.

I don't always agree that developers know what they need. What they want isn't always what they need.

This operation is inherently two operations:

  • Count occurrences in the second operand
  • filter-with-count on the first operand.

I'd prefer to provide both operations, separately

extension <E> on Iterable<E> {
  /// Counts the elements of this iterable, and updates a map with the result.
  ///
  /// Creates a new map, or starts with [accumulator] if provided.
  /// For each element of this iterable, the current value of the element in the map, with a default of zero if
  /// the element doesn't have an entyr in the map, is increased by one.
  /// Returns the updated map.
  ///
  /// If no map is provided, the default map is a default Dart hash-based map using [Object.==] for equality.
  /// That means that distinct-but-equal objects are only represented by one of their elements in the resulting
  /// map.
  /// Provide a custom map if a different equality is needed.
  Map<E, int> countOccurrences([Map<E, int>? accumulator]) {
     accumulator ??= {};
     for (var element in this) accumulator.update(e, _inc, ifAbsent: _one);
     return accumulator;
  }

  static int _inc(int v) => v + 1;
  static int _one() => 1;

  /// Elements of this iterable, skipping elements while they have positive [removeCount].
  ///
  /// Iterating the returned iterable will iterate this iterable and emit the same elements, except that 
  /// for elements that have a positive value in the [removeCount] map, the element is skipped and the
  /// value decremented in the map. Non positive values are removed from the map.
  ///
  /// After iteration, the updated map may still contain counts for elements where there weren't
  /// as many elements in this iterable as could have been skipped.
  ///
  /// The elements of this iterable are looked up in the provided map using normal map operations,
  /// and therefore uses the same equality as the map.
  Iterable<E> removeCounted(Map<E, int> removeCount) sync* {
    for (var element in this) {
      var count = removeCount[element];
      if (count != null) {
        count -= 1;
        if (count > 0) {
          removeCount[element] = count;
        } else {
          removeCount.remove(element);
        }
        if (count >= 0) continue;
      }
      yield element;
    }
  }
}

Then we can also add an

  Iterable<E> removeCountedOccurrences(Iterable<E> remove, [Map<E, int>? accumulator]) =>
     removeCounted(remove.countOccurrences(accumulator));

which does both things for you, but it's understandable as the combination of those two separate and indpendently useful operations, it doesn't hide that it creates a map and uses it.

It feels wrong to me to have an operation that treats its Iterable input as a multiset, and always converts it to one, and doesn't just ask for a multiset as argument. An iterable is-not a multiset, it has horrible performance if you treat it as one directly, which is also why the first step here is to convert it to one.

Being able to see the remove-count being decremented, and see how much was not used, might actually be useful in some cases. Or controlling the equality used may also be useful, which is only possible if you admit that there is a Map involved.

elementsCount.remove(element);
} else {
elementsCount[element] = count - 1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A switch with patterns can be shorter:

  switch (elementsCount[element]) {
    case null: yield element;
    case > 1 && var count: elementsCount[element] = count - 1;
    case == 1: elementsCount.remove(element);
  }

@ebraminio
Copy link
Contributor Author

ebraminio commented Aug 17, 2023

This operation is inherently two operations:

  • Count occurrences in the second operand
  • filter-with-count on the first operand.

[..] it doesn't hide that it creates a map and uses it.

I agree with you, I just am wondering maybe count occurrence part can be shaped around ideas of python's Counter interface (a dedicated class instead of Map<T, int>) if that is not something we want here I think I can continue with your already provided code if you don't want to push it yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants