Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trie: batch() Performance Optimizations #3293

Open
holgerd77 opened this issue Feb 22, 2024 · 5 comments
Open

Trie: batch() Performance Optimizations #3293

holgerd77 opened this issue Feb 22, 2024 · 5 comments

Comments

@holgerd77
Copy link
Member

holgerd77 commented Feb 22, 2024

This issue is an extract from #3290 analyzing the different performance-taking parts from block executions on Holesky (but not being Holesky specific).

This #3290 (comment) points to that tx trie validation with the need for generating the tx trie takes some significant amount of time (100-300ms for ~1000 entries (txs)).

This guided my attention to the trie.batch() functionality. Currently tx trie generation uses single put() calls. This could easily (eventually with some small modifications) switched to use batch() though.

That's the pre-story.

Trie batch() currently has no optimizations at all. This simply iterates to the operations given and calls into single puts.

I am pretty pretty convinced that there are significant gains to make here, and that somewhat likely relatively easily.

The following is an extract of how long (taken as a snapshot during one of these tx trie validation executions) trie put operations take, and then splitting this up into the two main performance consumers in trie.put(), being the first-round path finding and then the node update (side note: _updateNode() actually seems to be a really bad name, so this should be - minimally - _updateNodes() to be less confusing, right? 🤔)

grafik

No conclusions here yet, just to get some feeling for the thing.

Two things I can think of worth exploring (there might be more though):

batch() Key Sorting Option

It might give a significant insertion speed up if keys (so: the applied ones of course) gets pre-sorted in a way that insertion is eased and as few nodes as possible needs to be shifted around in between.

I think this should likely (?) just be an option and not the default case for batch() though that keys get sorted (for tx trie generation e.g. this should have no side effects, right?).

Analyze batch() Caching Opportunities

Especially in the context of the tx trie generation (where the keys are just unhashed sorted numbers 1,2,3,4,... for the single txs) it seems that on put() operations the same paths need to be retrieved again and again and again, maybe something like this goes for the insertion part as well.

Here is another screenshot showing the keys (already in the high range), just going up one by one:

grafik

Do not have a fully clear picture but it should be possible to leverage this by some caching functionality, e.g. by caching (parts of?) the path to be found), at least in the scope of a batch() operation (might be the case though that caching possibilities are found which "hold" beyond the scope of a batch() operation).


General side note: both things should be able to be easily tested with the small script adding some couple of ten thousand key/value pairs to the trie on the trie itself! No need for a complex test setup including the client!!!

@holgerd77
Copy link
Member Author

Additional side note: I find these numbers from above generally somewhat high for "memory only" operations, so this tx trie generation operates with a MapDB. 🤔 Another direction, but might be generally worth a look if something is generally off here.

@holgerd77
Copy link
Member Author

holgerd77 commented Feb 22, 2024

One (rough) concrete idea for a cache that I have (actually for some time, this could be useful/improving for various other situations as well):

That the resulting path respectively node stack (so in the sense of: what findPath() delivers, namingly a stack with instantiated node (extension, leaf, branch) objects) from the await this._updateNode(appliedKey, value, remaining, stack) call from within the put() operation get saved in a cache (batch() or even general scope cache).

And then on the next findPath() call (likely also within put()) parts of this path finding can be omitted by using (and maybe passing over to findPath()) these kind of cache results.

So, practical example:

Something is added for key (from screenshot above) 0x8202fc and then 0x8202fd.

Let's say a findPath() call delivers as a stack:

8 -> Branch Node
202 -> Extension Node
f -> Branch Node

And c for the remaining Nibble.

The updateNode call would then add a leaf node for c.

Maybe one could then save this stack in a cache in a way that one could then in the next round (put() call for 0x8202fd) look in the cache if there is a cache entry which at least partly matches the path.

In this case there would be a stack for 8202f returned (the one from above).

And these objects could then already be passed to the next findPath() call for 0x8202fd so that they do not have to be retrieved again.

This is not completely easy to achieve respectively trivial to integrate, but this could have an enormous impact.

@ScottyPoi
Copy link
Contributor

If I understand correctly, this would be an added optional parameter for findPath() in cases where a partial path is already known? That would indeed be helpful.

I think your solution is smart and likely could be applied in many places, both in our client and in ultralight. i'll give this some thought and tinker with it today

@ScottyPoi
Copy link
Contributor

I started on this issue today by refactoring findPath to accept an optional partialPath parameter. This allows it to jump ahead in the walk, while returning the same path as a full walk.

@holgerd77
Copy link
Member Author

👍 Cool that you are experimenting with this a bit! 🙂

This was referenced Mar 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants