Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't recompute IR children #9915

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

hubertp
Copy link
Contributor

@hubertp hubertp commented May 10, 2024

Pull Request Description

Profiling benchmarks revealed that we spend non-neglible amount of time building IR preorder which in turns recomputes IR nodes' children. IR is immutable so there is not need to recompute that data for every pass.
Note that the change uses lazy val because benchmarks weren't showing much improvements when allocating children on creation (with val). We could probably improve the situation by using a poor version of lazy val that does not employ synchronization.

Important Notes

Pending benchmark results.

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    TypeScript,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • Unit tests have been written where possible.

Profiling benchmarks revealed that we spend non-neglible amount of time
building IR preorder which in turns recomputes IR nodes' children.
IR is immutable so there is not need to recompute that data for every
pass.
Note that the change uses `lazy val` because benchmarks weren't showing
much improvements when allocating children on creation (with `val`). We
could probably improve the situation by using a poor version of `lazy
val` that does not employ synchronization.
@hubertp hubertp added the CI: No changelog needed Do not require a changelog entry for this PR. label May 10, 2024
@@ -138,7 +138,7 @@ object CallArgument {
|""".toSingleLine

/** @inheritdoc */
override def children: List[IR] = name.toList :+ value
override lazy val children: List[IR] = name.toList :+ value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to the change, but it should've been Vector, instead of a List

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once I heard @4e6 saying we shouldn't use List as it does an eager copy, but a lazy iterator. With List the preorder() implementation:

  • calls children on each one
  • the children implementation constructs a List
  • the preoder then concatenates the list to its own List representing the whole IR tree

Where can the slowdown be? Only in the construction of of children List, right? This PR tries to trade that CPU cost for increased memory consumption.

Wouldn't it be better to change the children to follow more efficient pattern and not allocate its own List?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List is a classic linked list https://github.com/scala/scala/blob/e858de50d51e13d225ac5b35015aac54cd71e23e/src/library/scala/collection/immutable/List.scala#L655-L660
It is suitable for implementing control flows (where you can efficiently pattern-match on head and tail), and maybe in prepending elements https://august.nagro.us/list-vs-vector.html. But it is not how we're using it.

Vector is a radix-balanced finger tree backed by arrays and should be more efficient in mixed use-cases (like ours append+traverse) both CPU and memory wise scala/scala#8534

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elements https://august.nagro.us/list-vs-vector.html

Our usecase is "Construction with Append + Traversals" where Vector is faster. Possibly even better with children(addTo : Vector.Builder) : Unit.

Up to you guys, I don't want to be expert on Scala data structures.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to be expert on Scala data structures.

I continue to believe the best might be to have a lazy iterator like in this case of iterating all children of a directory - e.g. a queue where each nextElement() adds its children to the end of the queue.

@JaroslavTulach
Copy link
Member

Profiling benchmarks revealed that we spend non-neglible amount of time building IR preorder which in turns recomputes IR nodes' children.

I believe I have also seen preorder to occupy a lot of time when profiling the static compilation.

IR is immutable so there is not need to recompute that data for every pass.

Can't we just cache the result of preorder()? E.g. make it lazy val?

Note that the change uses lazy val because benchmarks weren't showing much improvements when allocating children on creation (with val).

Can that be explained by any other hypothesis than: We do create some IR instances that we don't iterate thru with preorder?

We could probably improve the situation by using a poor version of lazy val that does not employ synchronization.

I don't understand this sentence.

@JaroslavTulach
Copy link
Member

What's the effect on benchmarks?

@hubertp
Copy link
Contributor Author

hubertp commented May 13, 2024

What's the effect on benchmarks?

I ran the benchmarks and locally I was seeing minor improvements but on CI not so much. So I'm wondering if it is worth integrating. Unless I'm doing it wrong.

@hubertp
Copy link
Contributor Author

hubertp commented May 13, 2024

IR is immutable so there is not need to recompute that data for every pass.

Can't we just cache the result of preorder()? E.g. make it lazy val?

Note that children is used in ChangesetBuilder. We could do both.

Note that the change uses lazy val because benchmarks weren't showing much improvements when allocating children on creation (with val).

Can that be explained by any other hypothesis than: We do create some IR instances that we don't iterate thru with preorder?

Maybe.

We could probably improve the situation by using a poor version of lazy val that does not employ synchronization.

I don't understand this sentence.

I meant that if this is a hotspot and lazy val behind the scenes does some synchronization, it is not really necessary. On the other hand it might be a premature optimization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants