PoET 2.0 Consensus - Updated #20

kulkarniamol · 2018-07-10T16:53:48Z

Proposes an new PoET Consensus mechanism designed to provide the
PoET functionality without requiring SGX Platform Services.

Deprecates PR#12.

Signed-off-by: kulkarniamol amol.kulkarni@intel.com

Proposes an new PoET Consensus mechanism designed to provide the PoET functionality without requiring SGX Platform Services. Signed-off-by: kulkarniamol <amol.kulkarni@intel.com>

Signed-off-by: kulkarniamol <amol.kulkarni@intel.com>

cmickeyb · 2018-07-24T20:32:00Z

text/0002-poet2-consensus.md

+infrastructure.
+
+This document details a new mechanism for the PoET algorithm that overcomes 
+some of the challenges with the original algorithm.


Suggest a wording more like "extends the original algorithm to new platforms"

cmickeyb · 2018-07-24T20:37:32Z

text/0002-poet2-consensus.md

+
+Sawtooth includes an implementation which simulates the secure instructions.
+This should make it easier for the community to work with the software but
+also forgoes Byzantine fault tolerance.


Strictly speaking... it does give you BFT. The problem is that it is trivially easy to "compromise" nodes (so the 3f+1/2f+1 guarantees are easy to violate).

cmickeyb · 2018-07-24T20:38:09Z

text/0002-poet2-consensus.md

+This should make it easier for the community to work with the software but
+also forgoes Byzantine fault tolerance.
+
+PoET 2.0 essentially works as follows:


Suggest that you link to the full description of PoET v1.

cmickeyb · 2018-07-24T20:39:11Z

text/0002-poet2-consensus.md

+
+ The `WaitCertificate` contains a `Duration` as well as a related `WaitTime`.
+	The `Duration` is a 256-bit random number generated using the secure
+	RNG available within the SGX. The `WaitTime` is derived from the


There is no reason to compute the WaitTime in the enclave. Since the wait time is essentially meaningless to the enclave. All the other validators can do the conversion from the random number/duration into a time. That would allow you to just call Duration what it really is... a random number.

cmickeyb · 2018-07-24T20:41:26Z

text/0002-poet2-consensus.md

+ On the originating validator, the `WaitTime` is used to throttle broadcast of
+	claim blocks. Upon creating the `WaitCertificate`, the validator waits
+	until `WaitTime` seconds have elapsed before broadcasting the block over
+	the gossip network


I would suggest that you describe how the wait time is computed from the random number. The computation is more or less the same as the computation from PoET v1.

cmickeyb · 2018-07-24T20:49:55Z

text/0002-poet2-consensus.md

+    double WaitTime      # The number of seconds to wait, as a function of the
+                         # Duration and the LocalMean
+    double LocalMean     # The computed local mean
+    byte[32] BlockID     # The BlockID passed in to the Enclave


see comments above. this information is redundant.

cmickeyb · 2018-07-24T20:55:28Z

text/0002-poet2-consensus.md

+>The implication of this change is that the signup data is lost each time the 
+>enclave is unloaded or the platform is restarted. The enclave has to register
+>afresh with a new set of keys each time it gets loaded.
+


i'm confused. poet 1 doesn't store signup data, it stores the identity of the monotonic counter. we regenerate data on reboot because the protocol REQUIRES that the enclave re-register, not because there is no sealed storage. Sealed storage without a monotonic counter cannot prevent replay attacks (just copy an old version of sealed storage into place if you want to have multiple signups for the processor).

cmickeyb · 2018-07-24T20:57:17Z

text/0002-poet2-consensus.md

+     current `WaitCertId_{n}` to upper layers for registration in
+     EndPoint registry.
+   * Goto 1
+


This appears to be missing the two delays that are necessary. The first delay is the time between registration & use of the registration. The second is the delay between subsequent registrations for the processor.

cmickeyb · 2018-07-24T21:02:46Z

text/0002-poet2-consensus.md

+>Note 1: In practice, the WC may be calculated by recording the system time at 
+>the moment of the arrival of the Sync Block and subsequently subtracting this 
+>timestamp from the current time.
+


Its probably worth receiving several blocks and computing an average over those blocks to ensure that you don't favor a low-latency neighbor or a cheating neighbor.

cmickeyb · 2018-07-24T21:04:40Z

text/0002-poet2-consensus.md

+
+An early arriving block (where `WC < CC'`) is considered 'Ineligible'. The block
+is cached for `CC' - WC` seconds until it becomes 'Eligible'. It is then
+broadcast to neighboring peers over the Gossip Network.


it will be broadcast assuming that another, better claim is not received before the timer expires.

…p mechanism.

…p mechanism. Signed-off-by: kulkarniamol <amol.kulkarni@intel.com>

mechanism. Signed-off-by: kulkarniamol <amol.kulkarni@intel.com>

dcmiddle · 2018-08-31T14:49:14Z

text/0002-poet2-consensus.md

+>subtracting this timestamp from the current time (`WC = CurrentTime - BaseTime`). 
+
+>Note 2: Notice that the CC is a function of the WaitTime, which is computed within
+>the enclave.


Not important that it's computed within the enclave. Only enclave trusted function is the RNG. If we compute WaitTime in the enclave it is for convenience / readability of having all the logic in one method.

dcmiddle · 2018-08-31T14:51:10Z

text/0002-poet2-consensus.md

+   EndPoint registry (otherwise sender needs to re-sign).
+
+4. Verify the `WaitCertificate.LocalMean` is correct by comparing
+   against `LocalMean` computed locally.


Verify wait time similarly

hartm · 2018-09-04T21:30:18Z

I have a question about this fork resolution process. Can someone explain why we only need to compare chain lengths (rather than check the total number of "work" done on the chain--i.e., incorporate the population estimate)?

Here's my thought process: suppose I controlled a small group of nodes. We could could fork off everything to a side chain and then wait for the "difficulty" to go down. In the steady state, due to the population adjustment, we would be adding blocks at the same rate as the main chain. We could let this go on for a while--maybe we would get lucky and accumulate blocks at a slightly (sublinear) rate faster than the main chain. Then, at some point, we could add in a bunch of new members and use these to add much more blocks than expected and try to catch up to the main chain. It's possible there's some mathematical reasoning that says this is impossible, but it's not immediate (and needs to be written up if this is becomes the spec).

Does anyone have an explanation for why this doesn't work? Am I misunderstanding something here? Thanks!

cmickeyb · 2018-09-04T23:38:35Z

Working through the logic we developed... First, every block should contain the raw random number that was generated in the enclave. NOTE... This is different from PoETv1 where we put the computed duration in the block. We can put the random number in the block since all conversion and enforcement of the wait happens in untrusted space anyway. Second, just a reminder that there must be a maximum fork length for other reasons mostly related to how quickly enclaves can be registered. Among other things, this also prevents "from the beginning of time" forks. Let me just define sum(C, i, j) as the sum of the random numbers in chain C between blocks i and j. And lets just call our chains C1 and C2. L1 = length(C1) and L2 = length(C2). To be clear... these are *VALID* chains which means that the chain clock through the chain is less than the wall clock. For now use CC(C1) is chain clock of C1 and WC(C1) is wall clock for C1. For a valid chain CC(C) < WC(C). Easy case #1: if L1 == L2, then we prefer the chain with the smaller sum. For example, if sum(C1,0,L1) < sum(C2,0,L2) then we pick C1. This is because the distribution of the random numbers is directly proportional to the population size, it is sufficient to assume that the chain with the smaller sum of random numbers was generated by a larger population. You do not need to worry about population adjustments since the random numbers are not adjusted for the population. Note... this basic premise kills most of the the partitioned community attacks. Now... assume that L1 < L2. Easy case #2: if sum(C1, 0, L1) > sum(C2, 0, L1), then again we have an obvious result (pick C2). That is, up through block L1 (the last in chain C1), chain C2 has a lower sum of random numbers, then clearly C2 represents a larger population for a longer time. Easy case #3: if sum(C1, 0, L1) < sum(C2, 0, L1) and CC(C1) > CC(C2) then again we have a fairly obvious result (pick C1). The chain clock represents all of the population adjustments that were made. So C1 over a given time block, C1 represents the larger population. Again... this should take care of the rest of the partitioned community attacks. Hard case: what happens if sum(C1, 0, L1) < sum(C2, 0, L1) and CC(C1) < CC(C2)? That is, the shorter chain C1 represents (up to block L1, the length of the shorter chain) a larger population, but C2 covers a larger time block (i.e. it is more recent). Some things to consider... 1) how far is the shorter chain's chain clock behind the wall clock? if the shorter chain's chain clock is close to wall clock (based on expected time of arrival of the next block), then you should probably pick C1 (the next block will likely move it back into easy case #3). if the shorter chain's chain clock is a long way behind the wall clock (or a long way behind CC(C2), then the shorter chain is likely partitioned (and out of contact or abandoned). At this point... any reasonable deterministic policy is probably sufficient.

…

On Tue, Sep 4, 2018 at 2:30 PM hartm ***@***.***> wrote: I have a question about this fork resolution process. Can someone explain why we only need to compare chain lengths (rather than check the total number of "work" done on the chain--i.e., incorporate the population estimate)? Here's my thought process: suppose I controlled a small group of nodes. We could could fork off everything to a side chain and then wait for the "difficulty" to go down. In the steady state, due to the population adjustment, we would be adding blocks at the same rate as the main chain. We could let this go on for a while--maybe we would get lucky and accumulate blocks at a slightly (sublinear) rate faster than the main chain. Then, at some point, we could add in a bunch of new members and use these to add much more blocks than expected and try to catch up to the main chain. It's possible there's some mathematical reasoning that says this is impossible, but it's not immediate (and needs to be written up if this is becomes the spec). Does anyone have an explanation for why this doesn't work? Am I misunderstanding something here? Thanks! — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#20 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAbEUIGbuy7GDBe63yYDN0bjeospqHOqks5uXvDrgaJpZM4VJxU1> .

hartm · 2018-09-06T09:38:03Z

Thanks for the in-depth explanation Mic.

I guess my point is the following: since you're storing the randomness of the block winner in the chain, you can compute the population estimate trivially. Consider the following formula:

W(chain, block_start, block_end) = \sum_{i = block_start}^{block_end} Population_Estimate(block_i).

The function W can be computed (with the current and proposed estimators) solely from the raw random numbers in the enclave, and is probably the most direct measure of "work" on a blockchain using PoET that we can possibly manage.

Now consider the following rule for deciding which branch of a fork to take. Suppose we have two branches: branch_1 and branch_2, with b_1 and b_2 blocks each, respectively. Let block number b* be the last block that both have in common.

We choose branch_1 if:

W(branch_1, b* + 1, b* + b_1) > W(branch_2, b* + 1, b* + b_2)

and branch_2 otherwise (we can break ties based on equality in some deterministic manner, say based on the randomness of the most recent block). Note that this functionality exactly agrees with your analysis above and simplifies it considerably, eliminating the need for a case-by-case analysis.

Is there a reason something like this doesn't work? It seems like a much simpler rule than what you're proposing, and leads us nicely in the direction of provable security (which I care about, obviously). Thanks!

agunde406 · 2018-10-24T13:41:55Z

text/0002-poet2-consensus.md

+efficiency. Existing system clock synchronization mechanisms like NTP
+etc. may be sufficient for PoET 2.0 requirements.
+
+Network latencies may be exploited by malicious nodes to broadcast blocks


Some of the points in this section sound like they could be moved to/or repeated in the currently empty drawbacks section.

PoET 2.0 Consensus - Updated

ea2432d

Proposes an new PoET Consensus mechanism designed to provide the PoET functionality without requiring SGX Platform Services. Signed-off-by: kulkarniamol <amol.kulkarni@intel.com>

kulkarniamol force-pushed the master branch from 5dfac7a to ea2432d Compare July 10, 2018 18:15

Incorporated review comments by Dan and Tom

2a25cef

Signed-off-by: kulkarniamol <amol.kulkarni@intel.com>

cmickeyb reviewed Jul 24, 2018

View reviewed changes

kulkarniamol added 3 commits August 23, 2018 09:49

Updated to address Mic's review comments. Proposed a simpler bootstra…

47e4957

…p mechanism.

Updated to address Mic's review comments. Proposed a simpler bootstra…

eb9f479

…p mechanism. Signed-off-by: kulkarniamol <amol.kulkarni@intel.com>

Merge branch 'master' of https://github.com/kulkarniamol/sawtooth-rfcs

0ee016b

mechanism. Signed-off-by: kulkarniamol <amol.kulkarni@intel.com>

dcmiddle suggested changes Aug 31, 2018

View reviewed changes

dcmiddle requested review from jsmitchell, aludvik, agunde406 and vaporos August 31, 2018 15:22

dcmiddle self-assigned this Aug 31, 2018

dcmiddle requested a review from peterschwarz August 31, 2018 15:23

askmish mentioned this pull request Sep 4, 2018

{Deprecated. Newer at PR#20} PoET 2.0 Consensus #12

Closed

agunde406 reviewed Oct 24, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PoET 2.0 Consensus - Updated #20

PoET 2.0 Consensus - Updated #20

kulkarniamol commented Jul 10, 2018 •

edited

cmickeyb Jul 24, 2018

cmickeyb Jul 24, 2018

cmickeyb Jul 24, 2018

cmickeyb Jul 24, 2018

cmickeyb Jul 24, 2018

cmickeyb Jul 24, 2018

cmickeyb Jul 24, 2018

cmickeyb Jul 24, 2018

cmickeyb Jul 24, 2018

cmickeyb Jul 24, 2018

dcmiddle Aug 31, 2018

dcmiddle Aug 31, 2018

hartm commented Sep 4, 2018

cmickeyb commented Sep 4, 2018 via email

hartm commented Sep 6, 2018

agunde406 Oct 24, 2018

PoET 2.0 Consensus - Updated #20

Are you sure you want to change the base?

PoET 2.0 Consensus - Updated #20

Conversation

kulkarniamol commented Jul 10, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hartm commented Sep 4, 2018

cmickeyb commented Sep 4, 2018 via email

hartm commented Sep 6, 2018

Choose a reason for hiding this comment

kulkarniamol commented Jul 10, 2018 •

edited