Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code Refactoring #78

Open
Bhargavasomu opened this issue Nov 29, 2018 · 11 comments
Open

Code Refactoring #78

Bhargavasomu opened this issue Nov 29, 2018 · 11 comments

Comments

@Bhargavasomu
Copy link
Contributor

What was wrong?

The whole code base is a bit hard to read and could be structured better.

How can it be fixed?

The code base can be refactored with the following improvements

  • Make seperate classes for Leaf Nodes, Branch Nodes, Extension Nodes which in turn inherit a common class BaseNode. The class BaseNode should contain the following
    • Type of the Node (Leaf or Branch or Extension)
    • List of all the keys contained in the Node - (List of size 1 for Extension and Leaf nodes. List of size 16 for Branch nodes).
    • value contained in the node (If has nothing, then Null)
    • get_all_children method
    • Node Encode and Decode methods
    • parse_node method
  • Common BaseTrie class should be implemented from which the BinaryTrie and the HexaryTrie should be implemented.
  • Changing the test cases correspondingly to the above changes
  • Adding Benchmarks to compare the performance (benchmark architecture similar to py-evm)
@Bhargavasomu
Copy link
Contributor Author

@pipermerriam would you like to add something more on top of this, in terms of specs?

@pipermerriam
Copy link
Member

This generally looks good and you're 👍 to start on this.

@carver
Copy link
Contributor

carver commented Dec 11, 2018

Adding Benchmarks to compare the performance (benchmark architecture similar to py-evm)

We have been talking about a benchmarking refactor in py-evm, see ethereum/py-evm#1306

This could be a good time to try that out. Using the new proposed structure should be an improvement, but if it becomes a sticking point or a drag, just go back to the current way. No need to tie the two issues together.

@Bhargavasomu
Copy link
Contributor Author

@carver thankyou for the reference, will take a look.

@Bhargavasomu
Copy link
Contributor Author

@pipermerriam I have the basic prototype ready (cleanly designed), which just takes the key as a tuple of Nibbles and the value is bytes. I still understand that I would have to encode and hash a lot of stuff. Could you please guide me on this, in regards to tell me where all we need to encode and hash. Also I need guidance on when to push to the DB. It's not clear from the code. Thankyou.

@pipermerriam
Copy link
Member

I have the basic prototype ready (cleanly designed)

It'd be helpful to see some code if you have it.

@Bhargavasomu
Copy link
Contributor Author

@pipermerriam @carver After going through the yellow paper and a few examples, I have came up with the following neater version of the algorithm/design which is in par with the current live version too (Need to verify this).
Firstly, I have 3 classes LeafNode, ExtensionNode and the BranchNode. Each of these nodes has 3 main functionalities which are insert_child(key, value), get_value(key), delete_value(key). (The names maybe weird right now, but will make them better later). Each of the functionalities are explained as follows.

insert_child(key, value)

The main idea here is that, when we want to insert a key and it’s corresponding value in the trie, we call the insert_child function on the rootNode. The rootNode in turn calls the insert_child function on its childNode (for BranchNode and ExtensionNode mainly). This recursive calls continue till the LeafNode is reached, where the base case of the recursion is executed and the recursion ends. Note that at each step of the node calling its child node, the key gets consumed and the remainder key is sent to the child nodes to be inserted. Further specific details for each node are mentioned below.

LeafNode

  • If the key is empty, then just set the value of the node to be the new given value.
  • Find the common prefix between the current node's (which is a LeafNode) key and the current key.
  • Create a new BranchNode. Now use the functionality of the insert_child of BranchNode class to insert the current new (key, value) pair and the existing LeafNode's (partial_key, value) pair.
  • If a common prefix exists, then create a new ExtensionNode whose key is the common prefix and the value is the the BranchNode created in the above step.
  • Finally if a common prefix exists, then return the ExtensionNode created, else return the BranchNode.

BranchNode

  • If the key is empty, then just set the value of the node to be the new given value.
  • If no child_node exists with key[0] as the starting key nibble, then create a
    node = LeafNode(key[1:], value) and set it to the BranchNode in the key[0] index.
    I.e, BranchNode.child_nodes[key[0]] = LeafNode(key[1:], value).
  • Else if child_node already exists with key[0] as the starting key nibble, then recursively call insert_child on that child. i.e., BranchNode.child_nodes[key[0]].insert_child(key[1:], value).

ExtensionNode

  • Note : trie_key refers to the sequence of nibbles (part of key), which is stored in the Node as part of the trie. This convention is followed throughout the whole code.
  • If the new key doesn’t start with the trie_key, then the following steps are carried out.
    • We create a BranchNode and insert the new (key, value) pair into the BranchNode using the BranchNode.insert_child functionality.
    • Then we insert the current ExtensionNode as the child of the BranchNode, corresponding to the first nibble of the trie_key in the ExtensionNode.
      • But here one thing to note is that, if the trie_key[1:] is empty, then there is no point in storing the extension node, and hence we directly store it’s child. Now we delete the current ExtensionNode
        i.e, BranchNode.child_nodes[trie_key[0]] = self.child_node

get_value(key)

LeafNode

  • Return self.value if self.trie_key == key
  • Else raise exception key not found

BranchNode

  • If key is empty (full consumed), then return self.value
  • Else, assert that a child node exists with starting nibble as key[0]
  • If not found, raise exception key not found
  • Else recursively call BranchNode.child_nodes[key[1:]].get_value(key[1:])

ExtensionNode

  • If key doesn’t start with self.trie_key, then raise a exception key not found.
  • Else recursively call ExtensionNode.child_node.get_value(remainder_key)

delete_value(key)

LeafNode

  • Return None if self.trie_key == key
  • Else raise exception key not found

BranchNode

  • If the key is empty (fully consumed by the parent nodes), set the value to be None
  • Else if the key is not empty do the following
    • Assert that the first nibble of the key exists in the child_nodes of the BranchNode.
    • Failure of above assertion means that the key doesn’t exist to be deleted.
    • Now recursively call the delete_value function on the child node.
      i.e., BranchNode.child_nodes[key[0]] = BranchNode.child_nodes[key[0]].delete_value(key[1:])
  • Now if the value contained in the BranchNode is None and if this BranchNode doesn’t point to any other child_nodes, then return None (As this node can be deleted)
  • Else if the BranchNode has no child_nodes but it has some value associated with it, then convert this into a LeafNode and return this newly created LeafNode.
  • Else return this updated BranchNode itself.

ExtensionNode

  • If key doesn’t start with self.trie_key, then raise a exception key not found.
  • Else recursively call ExtensionNode.child_node.delete_value(remainder_key) and set this node returned by that operation as the value of ExtensionNode.child_node.
    • But one thing to note is that, if the ExtensionNode.child_node is None, after the above update, then there is no point in having this ExtensionNode and hence returns None.
    • Else returns updated ExtensionNode

@Bhargavasomu
Copy link
Contributor Author

@pipermerriam @carver I have tried my level best to be clear with the implementation followed. Please let me know if any part of it is unclear (or) if there is something that I'm missing out.

@Bhargavasomu
Copy link
Contributor Author

Bhargavasomu commented Dec 15, 2018

I believe that the above approach eliminates the need of using a nibble terminator as we have a seperate class for LeafNode. I understand that the machine needs bytes and hence to make the number of nibbles even, so as to form bytes; we are using HP encoding. But I don't understand why we need to explicitly do that, as python should be taking care of that. I don't see how an Extension Node or a Leaf Node storing the nibbles (1, 2, 3) (odd number of nibbles as an example) is problematic. Could someone explain me about this in detail. Thankyou!

@pipermerriam
Copy link
Member

@Bhargavasomu my memory of the algorithm is sufficiently foggy that I don't recall if the nibbles terminator is a protocol thing or an implementation thing. If it's just an implementation bit, I have no problem removing it. If it is protocol, I don't believe we can remove it.

In general, your plan/structure sounds fine, but I'll only really know once there is code to review. 👍 to move forward with this.

@pipermerriam
Copy link
Member

I just saw this: #79

👀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants