Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add groupByNelMap method to List syntax #4215

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

danicheg
Copy link
Member

I find this method extremely helpful in my day job. And since there is List syntax, it seems natural for me to have this method added.

@armanbilge armanbilge added this to the 2.8.0 milestone May 30, 2022
@satorg
Copy link
Contributor

satorg commented May 30, 2022

I agree that such a method could be a really nice addition. A couple of thoughts though:

  1. Name groupByNelMap or groupByMapNel? IMO the latter seems to be a more usual naming convention across the board.
  2. Personally I was found it quite confusing when I was able to use one method from the List syntax but then switching to Vector I realized that there's no such similar method available at all. So I would suggest to consider implementing similar methods for other syntaxes as well.

@danicheg
Copy link
Member Author

I've chosen groupByNelMap because the underlying method groupByMap name is constructed like groupBy + map. In that sense, groupByNelMap is groupByNel (that exists in syntax) + map.
I agree with adding that method to other syntaxes, but I feel we can do it iteratively?

@satorg
Copy link
Contributor

satorg commented May 31, 2022

I've chosen groupByNelMap because the underlying method groupByMap name is constructed like groupBy + map. In that sense, groupByNelMap is groupByNel (that exists in syntax) + map.

Actually, there are a bunch of methods in Cats already that use just groupMap name for similar methods (borrowed from and consistent with the Scala lib, I guess). E.g. there's one in NonEmptyList (which is used in your implementation). Also there's a similar method in Chain:

final def groupMap[K, B](key: A => K)(f: A => B)(implicit K: Order[K]): SortedMap[K, NonEmptyChain[B]]

So to me it seems that the method you're proposing should be named as just groupMap and get two separate parameter groups for key and f.

UPD. Just realized that it will conflict with groupMap from the Scala library then. So may be groupMapNel? Because it is still groupMap in the first place...

Also IMO it would be better to have the "original" implementation created for List (rather than for NonEmptyList) and then the one from NonEmptyList could simply call the former one from List. In that case we could avoid creating an extra Option object via a call to toNel. So seems the implementation would be a bit simpler.

@satorg
Copy link
Contributor

satorg commented May 31, 2022

I agree with adding that method to other syntaxes, but I feel we can do it iteratively?

Yes sure, totally up to you. I mean, it would be really nice to reach some consistency across all the syntaxes eventually.

@danicheg
Copy link
Member Author

@satorg Thanks for your thoughts on this, but groupMapNel is kinda counterintuitive for me. Naming is controversial all the time, so it would be nice to get more opinions. If there is a quorum, I will change the naming doubtless.

@armanbilge
Copy link
Member

armanbilge commented May 31, 2022

Also IMO it would be better to have the "original" implementation created for List (rather than for NonEmptyList) and then the one from NonEmptyList could simply call the former one from List. In that case we could avoid creating an extra Option object via a call to toNel. So seems the implementation would be a bit simpler.

Note that because NonEmptyList is a newtype for List, they actually have identical representations in the runtime. So actually we could always delegate to the List implementation and just do a cast to NonEmptyList.

Edit: nope, ignore me. I must be confusing with some other collections, sorry.

@satorg
Copy link
Contributor

satorg commented May 31, 2022

@satorg Thanks for your thoughts on this, but groupMapNel is kinda counterintuitive for me. Naming is controversial all the time, so it would be nice to get more opinions. If there is a quorum, I will change the naming doubtless.

Sure, np. Just to emphasize my concern on the naming: I'm only striving to retain some degree of consistency across method names. So to me it seems that groupMap* is used every time where there're two separate functions passed: one for the grouping key and the other one for the mapping itself. While groupBy* is used when there's a func for grouping key only. And such a scheme seems to be consistent with the Scala lib as well.

So yeah, not really sure whether such a name would be intuitive or not, just seems to me more consistent across the board.

@armanbilge
Copy link
Member

This one seems good for 2.9.0. If only we can agree on a name 😅

@danicheg
Copy link
Member Author

danicheg commented Oct 9, 2022

@armanbilge yeah, as it's well discovered, there are only two hard things in programming: cache invalidation and naming things.

@satorg
Copy link
Contributor

satorg commented Oct 9, 2022

Let me enumerate all group-like methods we have both in Scala library and Cats (just for everyone's convenience).
I didn't include LazyList/NonEmptyLazyList and Stream/NonEmptyStream.

Scala source target target group groupBy groupMap groupMapReduce
Cats
Seq n/a n/a n/a
List NeList Map groupByNel n/a n/a
groupByNelA n/a n/a
NeChain Map groupByNec n/a n/a
Vector NeVector Map groupByNev n/a n/a
groupByNevA n/a n/a
SortedSet NeSet Map groupByNes n/a n/a
Chain NeChain Map groupBy groupMap groupMapReduce
groupMapReduceWith
NeSeq NeSeq Map groupBy n/a n/a
NeSeq NeMap groupByNem n/a n/a
NeList NeList Map groupBy groupMap groupMapReduce
groupMapReduceWith
NeList NeMap groupByNem groupMapNem groupMapReduceNem
groupMapReduceWithNem
NeVector NeVector Map groupBy n/a n/a
NeVector NeMap groupByNem n/a n/a
NeSet NeSet Map groupBy n/a n/a
NeChain NeChain Map groupBy groupMap groupMapReduce
groupMapReduceWith
NeChain NeMap groupByNem groupMapNem groupMapReduceNem
groupMapReduceWithNem

@satorg
Copy link
Contributor

satorg commented Oct 9, 2022

I would say, there are at least two takeaways from the table:

  1. Cats has got huge inconsistency amidst its support for collections: some collections are way more beloved than the others.
  2. Those grouping methods that somehow change default result type (i.e. either change the collection type itself or the type of the grouping collection) receive an abbreviated name of the changed type at the very end of the method name.

p.2 is what I am voting for in this PR's discussion – looks like there's some convention established and I think it would be better to follow the convention.

@satorg
Copy link
Contributor

satorg commented Oct 9, 2022

And one more thing I didn't reflect in the table, but which does concern me as well: all existing groupMap-like methods (groupMap and groupMapNem) actually use two parameter groups similar to Scala's groupMap:

  def groupMapNem[K, B](key: A => K)(f: A => B)(implicit K: Order[K]): NonEmptyMap[K, NonEmptyList[B]] =
    NonEmptyMap.fromMapUnsafe(groupMap(key)(f))

Whereas the suggested method groupByNelMap uses a single two-parameters group instead.

@TonioGela
Copy link
Member

As the author of a method with a super controversial name that survived four months of bikeshedding discussion in his PR, here's my 2 cents.
I vote for consistency, that wins over everything IMHO, so I'll prefer two groups of parameters and groupMap as the trailing part of the method's name, getting:

def groupMapNel[K, B](key: A => K)(f: A => B)(implicit K: Order[K]): SortedMap[K, NonEmptyList[B]]

If you don't consider the sorting aspect of the method (i.e. requiring an Order and having a Sorted map) the real difference with the standard library groupMap is just the type of the collection that wraps the Map values, hence the Nel suffix.

P.S. @satorg 's table seems really like an invitation to implement all the remaining n/a methods :D

@satorg
Copy link
Contributor

satorg commented Oct 10, 2022

@TonioGela

The table seems really like an invitation to implement all the remaining n/a methods :D

I would say, it would be really nice to have all the method sets unified across the board eventually. But before you start digging into it, could take a look at #3998 please? In particular, it already includes some of the missing methods from the table, but the discussion is the most interesting part of it, I think.

The PR is market as "Draft" now although at the time it was created I assumed it is ready-to-go.

Nevertheless, I switched it to Draft because I appreciated comments from @johnynek.
In particular he raised a concern that perhaps instead of adding some ad-hoc methods to each collection,
we should try to generalize these methods first.

Just because Cats is more about typeclasses and generalization rather than some set of handy collections.

@danicheg @armanbilge you guys are also welcome 😉

@armanbilge
Copy link
Member

armanbilge commented Oct 10, 2022

Just because Cats is more about typeclasses and generalization rather than some set of handy collections.

This is very interesting. Generalizing over collections has become an interest of mine as well, see #4245 and #4246 for example.

Along those lines, it seems to me that we could implement groupByMap-like operation on Traverse itself. I feel like we are missing something like Parallel which represents the relationship between a collection and its non-empty counterpart.

Edit: ah, I see this idea is exactly proposed in #3998 (comment). And perhaps we don't need a parallel-like thing after all.

Perhaps all of this deserves a discussion.

@satorg
Copy link
Contributor

satorg commented Oct 10, 2022

Along those lines, it seems to me that we could implement groupByMap-like operation on Traverse itself.

But why Traverse? To me it seems like Foldable/Reducible should be just enough for that, no?

@armanbilge
Copy link
Member

Fair point, I forgot about Foldable/Reducible. In a recent PR/discussion I think the question came up: are there any datatypes that implement Foldable but do not implement Traverse ?

@satorg
Copy link
Contributor

satorg commented Oct 10, 2022

It regards to the generalization... Let me expand a little bit more on that with an example:
There are two similar mehtods in NonEmptyList: groupByNel (groups into NE-List) and groupByNec (groups into NE-Chain).

Frankly speaking, having two separate methods in Cats that have similar functionality but only different in their return types is not cool at all.

Because apparently, there are questions following: should we add groupByNev (to NE-Vector), groupByNeSeq, groupByNell (NE-LazyList) to List's syntax? Should we add all these variations to all other collection syntaxes the Cats has? What about groupMapNel, groupMapNev, ... , groupMapReduceNel, groupMapReduceNev, ..., and so on and so forth.

Crearly, it would be a huge lack of generalization in Cats. Some Java libraries do appreciate such an approach, but I don't think that Cats should :)

@satorg
Copy link
Contributor

satorg commented Oct 10, 2022

are there any datatypes that implement Foldable but do not implement Traverse ?

Good question, and honestly, I don't know :)
But if Reducible/Foldable were introduced for a reason, then I think we should try to utilize them once they meet minimal requirements for some functionality.

@johnynek
Copy link
Contributor

so, it seems like what we really want is something like:

trait HasNonEmpty[F[_]] {
  type NonEmpty[_]

  def neSemigroupK: SemigroupK[NonEmpty]
  def monoidK: MonoidK[F]

  def neTraverse: NonEmptyTraverse[NonEmpty]
  def traverse: Traverse[F]
  /**
   * obvious laws here: toNonEmpty(monoidK.empty) == None
   * toNonEmpty(f).map(fromNonEmpty) == Some(f) unless f is monoidK.isEmpty
   */
  def toNonEmpty[A](f: F[A]): Option[NonEmpty[A]]
  def fromNonEmpty[A](ne: NonEmpty[A]): F[A]
}

object GroupBy {
  def groupByNeOrd[F[_], A, B: Order, C](f: F[A])(key: A => B)(value: A => C)(implicit hne: HasNonEmpty[F]): SortedMap[B, hne.NonEmpty[C])
}

@armanbilge
Copy link
Member

armanbilge commented Oct 10, 2022

That's what I was imagining at first as well, but I wonder if we can be more general and collect into any G[_]: NonEmptyAlternative.

def groupByNeOrd[F[_], G[_]: NonEmptyAlternative, A, B: Order, C](f: F[A])(key: A => B)(value: A => C): SortedMap[B, G[C]]

@johnynek
Copy link
Contributor

johnynek commented Oct 10, 2022

yes, I think so, but I worry about being forced through the append/prepend interface which means you have to construct immutably and pick a direction (append or prepend), which I think will hurt performance.

Alternatively, we could possibly add this method to NonEmptyAlternative which then knows the structure of the thing it is building and can reimplement just using F[_]: Foldable as an input (since we can do toIterable/toList/etc and iterate through and control the order of iteration (either forward or reverse depending on which is more efficient for the resulting data structure.

@satorg
Copy link
Contributor

satorg commented Oct 10, 2022

Alternatively, we could possibly add this method to NonEmptyAlternative which then knows the structure of the thing it is building

That is an interesting idea... We can definitely give it a try. Then, having group* methods implemented in NonEmptyAlternative we can add them as a syntax to Foldable or Reducible I guess.

Let me try to sketch it out:

trait NonEmptyAlternative[F[_]] {
  // With Foldable we can only group into a regular Map/SortedMap
  // Suffix "To" is used to differentiate from regular groupBy which groups into the same collection type as the source is.
  def groupByTo[G[_]: Foldable, K: Order, A](ga: G[A])(kf: A => K): SortedMap[K, F[A]]
  
  // With Reducible we can group into `NonEmptyMap` 
  // Suffix "To" is used to differentiate from regular groupBy which groups into the same collection type as the source is.
  def groupByToNem[G[_]: Reducible, K: Order, A](ga: G[A])(kf: A => K): NonEmptyMap[K, F[A]]
}

// And now the syntax
object Foldable {
  trait Ops[F[_], A] {
    def self: F[A]

    def groupByTo[G[_]: NonEmptyAlternative, K: Order](kf: A => K): SortedMap[K, G[A]]
  }
}

object Reducible {
  trait Ops[F[_], A] {
    def self: F[A]

    def groupByToNem[G[_]: NonEmptyAlternative, K: Order](kf: A => K): NonEmptyMap[K, G[A]]
  }
}

Something like that?

@satorg
Copy link
Contributor

satorg commented Oct 10, 2022

The only issue I see here is that when using such an approach, a user will be forced to specify not only a desired output collection type (which is required here), but also all the other generic parameters (K and also B for groupMap-like methods) along with the target collection:

  val v: Vector[Foo] = ???

  // has to specify Int
  v.groupByTo[NonEmptyVector, Int](_.bar)

  // has to specify Int and String
  v.groupMapTo[NonEmptyVector, Int, String](_.bar)(_.baz)

which could be inconvenient to use.

An apparent solution for that could be an intermediate apply-like class:

object Foldable {
  trait Ops[F[_], A] {
    def self: F[A]

    def groupByTo[G[_]: NonEmptyAlternative]: SortedMap[K, G[A]] = new GroupByToApply[F, G, A](self)
  }

  // Unfortunately, we cannot leverage `AnyVal` for this class.
  final class GroupByToApply[F[_], G[_], A](self: F[A])(implicit F: Foldable[F], G: NonEmptyAlternative[G]) {
    // Takes the rest of type parameters here
    def apply[K: Order](kf: A => K): SortedMap[K, G[A]] = G.groupByTo[G, F, A, K](self)(kf)
  }
}

..or something like that... Looks tricky, but might be working... Wdyt?

@armanbilge armanbilge removed this from the 2.9.0 milestone Oct 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants