Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External interface of UTXO set #29

Closed
MoaningMyrtle opened this issue Jan 14, 2017 · 11 comments
Closed

External interface of UTXO set #29

MoaningMyrtle opened this issue Jan 14, 2017 · 11 comments
Milestone

Comments

@MoaningMyrtle
Copy link
Contributor

While working on the pool (#21) one fairly important piece I'm missing is how the blockchain's UTXO snapshot will be exposed to other components of the system. It looks like this is #10, or at least a very closely related component of it (cc: @merope07).

It seems premature to set anything quite in stone, but I'm wondering if it makes sense to start discussion around what the expected external-facing capabilities are of the UTXO data structure so that we can start working against a common understanding?

@MoaningMyrtle
Copy link
Contributor Author

MoaningMyrtle commented Jan 14, 2017

Wanted to keep the issue itself short, so I'm adding my initial thoughts in a comment.

It seems like, from a very high level, we will need the capability to examine the UTXO state at a given chain state, designated by something like the hash of a block. This will be fundamental to the blockchain component being able to accept forks, but will also come into play in other components like the pool when switching heads.

It also seems like the most fundamental aspect of the UTXO set, as far as I can tell, is purely a point-in-time snapshot of the outputs at a specific chainstate.

My first inclination would be something like this (please pardon the loose pseudo-rust):

trait UtxoSet {
   fn root(&self) -> Hash;
   fn apply(&self, b: Block) -> UtxoSet;
   fn rewind(&self, b: Block) -> UtxoSet;
   fn get_output(&self, h: Hash) -> Option<&Output>;
}

The thought process here is that the UTXO set, representing a point-in-time snapshot, will be able to play forward blocks or rewind blocks and arrive at a new state. The blocks fed to the set would be managed by a higher-order service like the blockchain component itself, or perhaps a dedicated UTXO component.

Another alternative would be to apply UTXO set diffs rather than blocks. From the point-of-view of the UTXO set, they seem identical, although the block would open the door to checking the UTXO set merkle root after application/rewind. The benefit of UTXO set diffs being a separate structure is the ability for them to be used for more advanced purposes- the transaction pool would benefit from being able to receive a summary of UTXO set changes when the blockchain switches heads, for example.

I'm not all that familiar with the intended internals of the MMR or any related structures, or any work you've done @merope07; any feedback with your understanding would be appreciated!

@merope07
Copy link
Contributor

Thanks, this is exactly the kind of feedback I need. I have some flexibility in the external interface but I was not certain about what is required.

One nit is that I think get_output should take some more specific type, a UtxoReference or something.

Also, root will return a hash and a sum, as per https://github.com/ignopeverell/grin/blob/master/doc/merkle.md
Is there a need to get the root only (or the sum only)? I don't think I can compute either individually faster than computing them both.

@ignopeverell
Copy link
Contributor

Another nit is it may be easier to just pass inputs and ouputs sets instead of full blocks. There may be some consumers of this interface that won't have full blocks (be it only tests).

@MoaningMyrtle
Copy link
Contributor Author

Great, thanks to both of you for your comments!

One nit is that I think get_output should take some more specific type, a UtxoReference or something.

This makes perfect sense to me. Do you envision this including more than just the hash, or just a more specific type definition?

Also, root will return a hash and a sum, as per https://github.com/ignopeverell/grin/blob/master/doc/merkle.md
Is there a need to get the root only (or the sum only)? I don't think I can compute either individually faster than computing them both.

Ah no, I had forgotten about the need for the sum :) Makes perfect sense to me to bundle them, any thoughts @ignopeverell?

Another nit is it may be easier to just pass inputs and ouputs sets instead of full blocks. There may be some consumers of this interface that won't have full blocks (be it only tests).

Makes sense, which sounds similar to the UTXO diff I had mentioned in my comment above. I'm a fan of that as well, it opens the door to abstracting the switching of chain heads to utxo diffs for components outside the core blockchain, for example. I also realized that the user of this interface can always check the merkle root and sum manually (by calling root), so there's little benefit to pushing down the block header here.

Is there general agreement about the layers of abstraction here, namely

The blocks fed to the set would be managed by a higher-order service like the blockchain component itself, or perhaps a dedicated UTXO component.

?

@merope07
Copy link
Contributor

merope07 commented Jan 19, 2017

Because the TXO set is insertion-ordered, it makes sense for the set to only allow rewinding the most recently-added block. I will add an accessor for what the TXOset thinks the most recent blockhash is.

Edit: Never mind this, I had thought I could get away without any ability to random-access outputs, but this is obviously untrue.

@MoaningMyrtle
Copy link
Contributor Author

While being able to random-access outputs is necessary, I'm struggling to think of a situation where you'd want to rewind anything but the last block anyway. All I can think of are vaguely esoteric use cases that involve slightly abusing the data structure for other purposes; if simplifying the rewind case makes any of the implementation easier, I'd be in favor of doing so.

Taking our comments thus far, it seems like the base trait has been changed:

trait UtxoSet {
   fn root(&self) -> Hash, Sum;
   fn apply(&self, b: UtxoDiff) -> UtxoSet;
   fn rewind(&self, b: UtxoDiff) -> UtxoSet;
   fn get_output(&self, ref: UtxoRef) -> Option<&Output>;
}

Where:

  • UtxoDiff represents the minimal set of data encompassing a set of changes to the UtxoSet, including a set of inputs (spent outputs), new outputs, excesses, witnesses (?)
  • UtxoRef represents the data necessary to uniquely reference an output, possibly a hash of its commitment

@merope07
Copy link
Contributor

This looks right.

There isn't any simplification to allowing only the last block to be rewound. It looks like we need to store the insertion-ordered tree as well as an index to allow random accesses, since a UtxoDiff will have transaction inputs which are simply referenced.

One goal of using the insertion-ordered structure is that it is possible in principle for nodes to pass proofs around to each other and nobody stores more data than is necessary. But I think this is not a goal for grin, so we will need this redundancy. Insertion-ordered list plus index.

I'm thinking that the index can just go in a rocksdb file, but the tree will need a dedicated file format.

@JohnnSebastian
Copy link

JohnnSebastian commented Jan 25, 2017 via email

@MoaningMyrtle
Copy link
Contributor Author

A semi-related thought I've been noodling on:

Currently, the trait as written above doesn't quite work. Rust lacks abstract return types, so you can't statically return a UtxoSet from a method. You could use a trait object (Box<UtxoSet> or &UtxoSet) but this has the unfortunate side-effect of forcing dynamic dispatch and heap allocation for all downstream uses (based on my, admittedly limited, understanding of the type system in rust). There's a performance penalty to doing so (described as "significant" by rust maintainer aturon here), as well as a bit of an anti-pattern.

Given the early stage of the system it does not seem worthwhile to take on these penalties without a better understanding of the needs and desires of other components of the system. Hiding the implementation behind a trait has some nice abstraction elements but we don't need them quite yet.

Based on the above I think it makes sense to make UtxoSet a concrete type instead, with the functions described above as the public members of its method set, and keep tabs on the currently accepted abstract return type proposal and its corresponding RFC for if/when it makes sense to hide the implementation behind a trait.

Any thoughts?

@ignopeverell
Copy link
Contributor

Sounds reasonable to me. Adding a layer of abstraction can solve most problems in software engineering, except the one of having too many layers of abstraction :)

@merope07
Copy link
Contributor

In this case we can use Self in place of UtxoSet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants