Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve traverse performance through increased specialization #4408

Open
djspiewak opened this issue Mar 6, 2023 · 5 comments
Open

Improve traverse performance through increased specialization #4408

djspiewak opened this issue Mar 6, 2023 · 5 comments

Comments

@djspiewak
Copy link
Member

djspiewak commented Mar 6, 2023

This is a follow up on #3790, but breaking it out as a separate issue since we've realized that we can do better than what was originally suggested and discussed. Leverages the findings in #4403. Also relates to #3993.

Altogether, traverse has three distinct performance scenarios, depending on the nature of the underlying applicative G and the underlying traverse F:

  • If the applicative is secretly a StackSafeMonad, the fastest (by about 3x) strategy is to use the direct left-to-right approach with flatMap. There are no stack-safety issues, and any underlying short-circuiting is handled by the monad itself
  • If the applicative has a Defer, then we still need to do the tree branching strategy in order to avoid possible stack issues (just because an applicative forms a Defer does not mean its ap is stack safe!), but we no longer need to introduce the extra layer of Eval wrapping in order to guarantee short-circuiting
  • Otherwise, we need to do tree branching and we need to do Eval wrapping

The first scenario corresponds to doing a traverse with IO or Eval, which are both StackSafeMonads. In this case, the direct flatMap is the fastest thing you can do (verified by checking a hand-coded version). The second scenario corresponds to doing a parTraverse with IO, where the G applicative forms a Defer but does not form a Monad. In this case, we obviously can't use flatMap, but the tree branching strategy is optimal anyway because we want to balance the parallel scheduling. We can rely on the G itself to handle short-circuiting (to the extent that we get it within par) because of Defer. The third scenario corresponds to doing a traverse with something like Option, where we aren't stack safe and we can't form a Defer, so we just need to do the pessimistic tree branching (for stack safety) and Eval wrapping (for short circuiting).

Right now, all Seq-based traversals delegate to traverseViaChain, which strictly implements the third (most pessimistic) scenario. What we want here are two additional implementations corresponding to the first and second scenarios (the second is very close to the third, just without all the Eval.laters), and the swapping between them will be based on runtime type casing on the Applicative[G] instance.

@armanbilge
Copy link
Member

What about the case where the applicative is secretly a Monad, albeit not a stack-safe one? Wouldn't delegating to tailRecM allow us to avoid both the tree-branching and the Eval?

@djspiewak
Copy link
Member Author

@armanbilge I haven't benchmarked that option, but I would imagine it's not that much faster than the tree + Eval implementation. tailRecM forces quite a bit of boxing, not quite as much as Eval, but close. It would be a fourth distinct branch of this set of redundant implementations, so it would really need to carry its own weight if we went this route.

@armanbilge
Copy link
Member

I guess we'll have to measure it :) it avoids deconstructing the collection into a tree, which is already a win, and then on top of that it is eagerly-evaluated instead of leaning on Eval for lazy evaluation. I suspect this is more amenable to optimization.

@satorg
Copy link
Contributor

satorg commented Mar 8, 2023

Sorry for asking if it has been discussed somewhere else already, but I'm just wondering:

If the applicative is secretly a StackSafeMonad ...

Does it have to be secretly a StackSafeMonad? Can it be expressed as a compile-time constraint? I think, that is what the whole idea of type classes is for, after all.

@djspiewak
Copy link
Member Author

djspiewak commented Mar 8, 2023

The problem is that this would require additional overloads of the traverse method on both the typeclass and in syntax. Additionally, the constraint may not have survived through compile time because most code is written to the least restrictive constraint, not the most restrictive. Finally, doing it this way would mean that changes to constraints would have vast but invisible implications on performance.

It's essentially the same thought process we went through with the async Queue optimization in CE, and we concluded that runtime typecase respecialization was far and away the least evil approach and produced the best experience for users.

The only alternative I can think of in this case would be Oscar's Apply.Strategy approach. This is kind of the same as typecasing, but more explicit for this scenario. I'm definitely open to it if we think it's preferable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants