Trie: add partialPath parameter to trie.findPath() #3305

ScottyPoi · 2024-03-07T22:26:26Z

addresses optimizations described in #3293

In many cases, trie.findPath may be called on a key when part of the path is already known. A function with multiple lookups will traverse the entire trie for each individual lookup. In many cases, this can be optimized through a caching of paths already traversed, which could be fed to the next findPath call. We could skip several lookups on each call this way, streamlining processes with multiple lookups, and making some of the more complex state trie operations much more efficient.

This solution adds an optional parameter called partialPath to trie.findPath()

  async findPath(
    key: Uint8Array,
    throwIfMissing = false,
    partialPath: {
      stack: TrieNode[]
    } = {
      stack: [],
    }
  ): Promise<Path>

partialPath.stack is a list of TrieNodes, identical to a slice of the target node's path.

If a partialPath is provided, findPath will consume the correct nibbles from the nodes in the partialProof, and build the return stack starting with the provided nodes:

    const stack: TrieNode[] = Array.from({ length: keyLen })
    let progress = 0
    for (let i = 0; i < partialPath.stack.length - 1; i++) {
      stack[i] = partialPath.stack[i]
      progress += stack[i] instanceof BranchNode ? 1 : (<ExtensionNode>stack[i]).keyLength()
    }

findPath will then begin the trie walk from the last node in the partial path, instead of the trie root which is the default starting place:

    const startingNode = partialPath.stack[partialPath.stack.length - 1]
    const start = startingNode !== undefined ? this.hash(startingNode.serialize()) : this.root()
      await this.walkTrie(start, onFound)

From there, findPath proceeds normally, and returns an identical result.

--

To proove that the process worked, I temporarily added logging to findPath which counted the number of lookups occuring during each call. This example shows how the number of lookups decreases when findPath is provided a partialPath of different lengths.

FIND PATH ORIGINAL:

trie:FIND_PATH:LOOKUP 0
trie:FIND_PATH:LOOKUP 1
trie:FIND_PATH:LOOKUP 2
trie:FIND_PATH:LOOKUP 3
trie:FIND_PATH:LOOKUP 4

FIND PATH PARTIAL: stack_len = 2

trie:FIND_PATH:LOOKUP 0
trie:FIND_PATH:LOOKUP 1
trie:FIND_PATH:LOOKUP 2
trie:FIND_PATH:LOOKUP 3

FIND PATH PARTIAL: stack_len = 3

trie:FIND_PATH:LOOKUP 0
trie:FIND_PATH:LOOKUP 1
trie:FIND_PATH:LOOKUP 2

codecov · 2024-03-07T22:29:27Z

Codecov Report

Attention: Patch coverage is 95.65217% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 86.91%. Comparing base (435606e) to head (e87b28e).

Additional details and impacted files

Flag	Coverage Δ
block	`88.43% <ø> (ø)`
blockchain	`91.61% <ø> (ø)`
client	`84.89% <ø> (ø)`
common	`98.43% <ø> (ø)`
devp2p	`82.12% <ø> (ø)`
ethash	`?`
evm	`74.05% <ø> (ø)`
genesis	`99.98% <ø> (ø)`
rlp	`?`
statemanager	`77.00% <ø> (ø)`
trie	`88.97% <95.65%> (-0.32%)`	⬇️
tx	`95.55% <ø> (ø)`
util	`89.19% <ø> (ø)`
vm	`79.85% <ø> (ø)`
wallet	`88.35% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

holgerd77 · 2024-03-08T12:06:07Z

Hey there,
wow, that's a great and super-clean start, super suitable for additions and experiments! 🤩

Some ideas and thoughts, definitely not something to absolutely follow if you have got some different picture in mind:

1. Applied scenario

Maybe it really does make sense to take this "Build up tx trie from scratch through a batch() operation" as a concrete first application scenario with a cache. At least this would have the two benefits that a) the cache would not grow that big that one would need to worry about cache management, so like deleting cache entries again from a certain point (at least I would guess so, so with roughly up to 1000 txs (aka, keys to add to the trie) for a whole block, that should not let memory completely explode). Second benefit b) would be that we would already have this benchmark I had done in the issue as a solid comparison tool to see if this works from a performance PoV (so benchmark would be to re-run a certain block and measure tx trie generation time with and without cache and see how things go 🙂).

This would I guess mean to:

Move the tx trie generation in the respective Block method to be a batch operation.
Add an optional parameter useCache (or so?) to the trie.batch() method
And then implement some (unbounded) caching mechanism

So going this route, also applying the idea from 2. with the new parameter and only doing this in batch() for now has the advantage that we would have a very controlled environment and do not need to worry about things like re-setting the root (and then the cache would be devalued, right?), things like that. If the cache is activated in batch() by the parameter, a user will likely know what he/she is doing.

Again, nevertheless just an option to start here, only citing some pros and cons.

2. Cache Structure

One big question I had was about the cache structure. How (and how deep) would we save path keys so that things (paths) are optimally retrievable but we are also not producing a lot of redundancy and cache bloat. So going back to the example I had on the issue, how would one assure for 0x8202fc and then (somewhat later) 0x8202fd that we have the path up to the differing nibbles "at hand" where it is not known at the "save moment" yet what part of the path can be eventually reused.

I think I now have at least one solution which I would think has some elegancy, also a drawback.

So in a first round we could set just a maximum how deep paths should be stored (let's say 4 nibbles here, so 8202), for the proposed solution this is not absolutely necessary though.

When storing the path for 0x8202fc we could store in the following scheme (lets assume an scenario without a maximum at 4):

8 -> Last node in the stored path (so: for 8)
82 -> Last node in the stored path (so: for 2)
820 -> Last node in the stored path (so: for 0)
8202 -> Last node in the stored path (so: for the second 2)
8202f -> Last node in the stored path (so: for f)
8202fc -> Last node in the stored path (so: for c)

When then a matching path in the cache for 0x8202fd should be searched, one could look in the cache from the full length "downwards", so:

"8202fd" -> No match
"8202f" -> match!

If there is a match, the cache structure would then ensure by design that one could read the full path by reading the respective last nodes from "8202f", "8202", "820",... and then putting things together.

Does this make sense? Then we would have a very flexible structure where the full possible path is leveraged (eventually capped by a maximum).

The downside of this structure is (I guess) that selective cache deletes are not possible (so: to delete 8202 from the cache) since one would then "destroy" the initial assumption that one can read down along a path. Or, to be more precise: deletion might be possible, but only with a large amount of effort (and likely: performance loss) on trying to find and identify and delete the longer paths also in the cache (which would likely not be worth it).

But I guess at least for this unbounded scenario (tx trie, batch()) this might nevertheless be a well-fitting structure, since - after the batch() operation - the cache can (or: should, I guess) - be deleted anyhow.

Since the whole thing from above is structurally so super simple, it might even make sense to take this variant now but to expand on this later on with a different cache structure for a different use case.

My (very long 😂) two cents. 🙂

holgerd77 · 2024-03-08T12:09:51Z

(have done additional edits to the previous post, please read on the website)

ScottyPoi · 2024-03-09T01:10:32Z

Yes, what you are saying makes sense.
Do you think this findPath option is OK to merge as is? It's a non-breaking optional parameter, and will be generally useful for standalone Trie package use.

holgerd77 · 2024-03-11T13:06:02Z

Updated this via UI

holgerd77 · 2024-03-11T13:26:27Z

Yes, what you are saying makes sense. Do you think this findPath option is OK to merge as is? It's a non-breaking optional parameter, and will be generally useful for standalone Trie package use.

Yes, I think we can do that. 🙂

Will merge! 👍

holgerd77

LGTM

ScottyPoi added 6 commits March 7, 2024 14:49

trie: add optional "partialPath" parameter to findPath

8c2506c

trie: use partialPath input in findPath stack

bd65be0

trie: start findPath walk from end of partialPath

b373049

trie: identify starting point in debug log

fbe6201

trie: test findPath with partial

6685b44

trie: test findPath on secure trie

6ad94e0

ScottyPoi added PR state: needs review type: refactor package: trie labels Mar 7, 2024

ScottyPoi and others added 2 commits March 8, 2024 18:10

Merge branch 'master' into findPath-from-partial

d3ea67c

Merge branch 'master' into findPath-from-partial

e87b28e

holgerd77 approved these changes Mar 11, 2024

View reviewed changes

holgerd77 merged commit 891ee51 into master Mar 11, 2024
36 checks passed

holgerd77 deleted the findPath-from-partial branch March 11, 2024 13:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trie: add partialPath parameter to trie.findPath() #3305

Trie: add partialPath parameter to trie.findPath() #3305

ScottyPoi commented Mar 7, 2024

codecov bot commented Mar 7, 2024 •

edited

holgerd77 commented Mar 8, 2024 •

edited

holgerd77 commented Mar 8, 2024

ScottyPoi commented Mar 9, 2024

holgerd77 commented Mar 11, 2024

holgerd77 commented Mar 11, 2024

holgerd77 left a comment

Trie: add partialPath parameter to trie.findPath() #3305

Trie: add partialPath parameter to trie.findPath() #3305

Conversation

ScottyPoi commented Mar 7, 2024

codecov bot commented Mar 7, 2024 • edited

Codecov Report

holgerd77 commented Mar 8, 2024 • edited

1. Applied scenario

2. Cache Structure

holgerd77 commented Mar 8, 2024

ScottyPoi commented Mar 9, 2024

holgerd77 commented Mar 11, 2024

holgerd77 commented Mar 11, 2024

holgerd77 left a comment

Choose a reason for hiding this comment

codecov bot commented Mar 7, 2024 •

edited

holgerd77 commented Mar 8, 2024 •

edited