Code Refactoring #78

Bhargavasomu · 2018-11-29T13:21:52Z

What was wrong?

The whole code base is a bit hard to read and could be structured better.

How can it be fixed?

The code base can be refactored with the following improvements

Make seperate classes for Leaf Nodes, Branch Nodes, Extension Nodes which in turn inherit a common class BaseNode. The class BaseNode should contain the following
- Type of the Node (Leaf or Branch or Extension)
- List of all the keys contained in the Node - (List of size 1 for Extension and Leaf nodes. List of size 16 for Branch nodes).
- value contained in the node (If has nothing, then Null)
- get_all_children method
- Node Encode and Decode methods
- parse_node method
Common BaseTrie class should be implemented from which the BinaryTrie and the HexaryTrie should be implemented.
Changing the test cases correspondingly to the above changes
Adding Benchmarks to compare the performance (benchmark architecture similar to py-evm)

The text was updated successfully, but these errors were encountered:

Bhargavasomu · 2018-11-29T13:22:24Z

@pipermerriam would you like to add something more on top of this, in terms of specs?

pipermerriam · 2018-12-04T20:29:48Z

This generally looks good and you're 👍 to start on this.

carver · 2018-12-11T18:16:30Z

Adding Benchmarks to compare the performance (benchmark architecture similar to py-evm)

We have been talking about a benchmarking refactor in py-evm, see ethereum/py-evm#1306

This could be a good time to try that out. Using the new proposed structure should be an improvement, but if it becomes a sticking point or a drag, just go back to the current way. No need to tie the two issues together.

Bhargavasomu · 2018-12-12T19:06:30Z

@carver thankyou for the reference, will take a look.

Bhargavasomu · 2018-12-14T17:17:21Z

@pipermerriam I have the basic prototype ready (cleanly designed), which just takes the key as a tuple of Nibbles and the value is bytes. I still understand that I would have to encode and hash a lot of stuff. Could you please guide me on this, in regards to tell me where all we need to encode and hash. Also I need guidance on when to push to the DB. It's not clear from the code. Thankyou.

pipermerriam · 2018-12-14T23:22:13Z

I have the basic prototype ready (cleanly designed)

It'd be helpful to see some code if you have it.

Bhargavasomu · 2018-12-15T11:36:45Z

@pipermerriam @carver After going through the yellow paper and a few examples, I have came up with the following neater version of the algorithm/design which is in par with the current live version too (Need to verify this).
Firstly, I have 3 classes LeafNode, ExtensionNode and the BranchNode. Each of these nodes has 3 main functionalities which are insert_child(key, value), get_value(key), delete_value(key). (The names maybe weird right now, but will make them better later). Each of the functionalities are explained as follows.

insert_child(key, value)

The main idea here is that, when we want to insert a key and it’s corresponding value in the trie, we call the insert_child function on the rootNode. The rootNode in turn calls the insert_child function on its childNode (for BranchNode and ExtensionNode mainly). This recursive calls continue till the LeafNode is reached, where the base case of the recursion is executed and the recursion ends. Note that at each step of the node calling its child node, the key gets consumed and the remainder key is sent to the child nodes to be inserted. Further specific details for each node are mentioned below.

LeafNode

If the key is empty, then just set the value of the node to be the new given value.
Find the common prefix between the current node's (which is a LeafNode) key and the current key.
Create a new BranchNode. Now use the functionality of the insert_child of BranchNode class to insert the current new (key, value) pair and the existing LeafNode's (partial_key, value) pair.
If a common prefix exists, then create a new ExtensionNode whose key is the common prefix and the value is the the BranchNode created in the above step.
Finally if a common prefix exists, then return the ExtensionNode created, else return the BranchNode.

BranchNode

If the key is empty, then just set the value of the node to be the new given value.
If no child_node exists with key[0] as the starting key nibble, then create a
node = LeafNode(key[1:], value) and set it to the BranchNode in the key[0] index.
I.e, BranchNode.child_nodes[key[0]] = LeafNode(key[1:], value).
Else if child_node already exists with key[0] as the starting key nibble, then recursively call insert_child on that child. i.e., BranchNode.child_nodes[key[0]].insert_child(key[1:], value).

ExtensionNode

Note : trie_key refers to the sequence of nibbles (part of key), which is stored in the Node as part of the trie. This convention is followed throughout the whole code.
If the new key doesn’t start with the trie_key, then the following steps are carried out.
- We create a BranchNode and insert the new (key, value) pair into the BranchNode using the BranchNode.insert_child functionality.
- Then we insert the current ExtensionNode as the child of the BranchNode, corresponding to the first nibble of the trie_key in the ExtensionNode.
  - But here one thing to note is that, if the trie_key[1:] is empty, then there is no point in storing the extension node, and hence we directly store it’s child. Now we delete the current ExtensionNode
    i.e, BranchNode.child_nodes[trie_key[0]] = self.child_node

get_value(key)

LeafNode

Return self.value if self.trie_key == key
Else raise exception key not found

BranchNode

If key is empty (full consumed), then return self.value
Else, assert that a child node exists with starting nibble as key[0]
If not found, raise exception key not found
Else recursively call BranchNode.child_nodes[key[1:]].get_value(key[1:])

ExtensionNode

If key doesn’t start with self.trie_key, then raise a exception key not found.
Else recursively call ExtensionNode.child_node.get_value(remainder_key)

delete_value(key)

LeafNode

Return None if self.trie_key == key
Else raise exception key not found

BranchNode

If the key is empty (fully consumed by the parent nodes), set the value to be None
Else if the key is not empty do the following
- Assert that the first nibble of the key exists in the child_nodes of the BranchNode.
- Failure of above assertion means that the key doesn’t exist to be deleted.
- Now recursively call the delete_value function on the child node.
  i.e., BranchNode.child_nodes[key[0]] = BranchNode.child_nodes[key[0]].delete_value(key[1:])
Now if the value contained in the BranchNode is None and if this BranchNode doesn’t point to any other child_nodes, then return None (As this node can be deleted)
Else if the BranchNode has no child_nodes but it has some value associated with it, then convert this into a LeafNode and return this newly created LeafNode.
Else return this updated BranchNode itself.

ExtensionNode

If key doesn’t start with self.trie_key, then raise a exception key not found.
Else recursively call ExtensionNode.child_node.delete_value(remainder_key) and set this node returned by that operation as the value of ExtensionNode.child_node.
- But one thing to note is that, if the ExtensionNode.child_node is None, after the above update, then there is no point in having this ExtensionNode and hence returns None.
- Else returns updated ExtensionNode

Bhargavasomu · 2018-12-15T11:37:41Z

@pipermerriam @carver I have tried my level best to be clear with the implementation followed. Please let me know if any part of it is unclear (or) if there is something that I'm missing out.

Bhargavasomu · 2018-12-15T14:23:25Z

I believe that the above approach eliminates the need of using a nibble terminator as we have a seperate class for LeafNode. I understand that the machine needs bytes and hence to make the number of nibbles even, so as to form bytes; we are using HP encoding. But I don't understand why we need to explicitly do that, as python should be taking care of that. I don't see how an Extension Node or a Leaf Node storing the nibbles (1, 2, 3) (odd number of nibbles as an example) is problematic. Could someone explain me about this in detail. Thankyou!

pipermerriam · 2018-12-17T17:07:09Z

@Bhargavasomu my memory of the algorithm is sufficiently foggy that I don't recall if the nibbles terminator is a protocol thing or an implementation thing. If it's just an implementation bit, I have no problem removing it. If it is protocol, I don't believe we can remove it.

In general, your plan/structure sounds fine, but I'll only really know once there is code to review. 👍 to move forward with this.

pipermerriam · 2018-12-17T17:11:41Z

I just saw this: #79

👀

Bhargavasomu mentioned this issue Dec 15, 2018

[WIP] Refactoring the codebase #79

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code Refactoring #78

Code Refactoring #78

Bhargavasomu commented Nov 29, 2018

Bhargavasomu commented Nov 29, 2018

pipermerriam commented Dec 4, 2018

carver commented Dec 11, 2018

Bhargavasomu commented Dec 12, 2018

Bhargavasomu commented Dec 14, 2018

pipermerriam commented Dec 14, 2018

Bhargavasomu commented Dec 15, 2018

Bhargavasomu commented Dec 15, 2018

Bhargavasomu commented Dec 15, 2018 •

edited

pipermerriam commented Dec 17, 2018

pipermerriam commented Dec 17, 2018

Code Refactoring #78

Code Refactoring #78

Comments

Bhargavasomu commented Nov 29, 2018

What was wrong?

How can it be fixed?

Bhargavasomu commented Nov 29, 2018

pipermerriam commented Dec 4, 2018

carver commented Dec 11, 2018

Bhargavasomu commented Dec 12, 2018

Bhargavasomu commented Dec 14, 2018

pipermerriam commented Dec 14, 2018

Bhargavasomu commented Dec 15, 2018

insert_child(key, value)

LeafNode

BranchNode

ExtensionNode

get_value(key)

LeafNode

BranchNode

ExtensionNode

delete_value(key)

LeafNode

BranchNode

ExtensionNode

Bhargavasomu commented Dec 15, 2018

Bhargavasomu commented Dec 15, 2018 • edited

pipermerriam commented Dec 17, 2018

pipermerriam commented Dec 17, 2018

Bhargavasomu commented Dec 15, 2018 •

edited