New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Round quorum before advancement #2897
Conversation
Nice benches, I'm almost surprised |
Most ignored e2e tests are long running (e.g. manual testing) or are too costly to perform on CI. We run them locally from time to time as a sanity check but ideally, our coverage should be complete with regular tests. |
a96627d
to
668dc7b
Compare
668dc7b
to
3ff7e91
Compare
node/bft/src/primary.rs
Outdated
// If our peer is far ahead, check if a quorum of peers is ahead and consider updating our committee. | ||
} else if is_peer_far_in_future { | ||
// Get the highest round seen from a quorum of the current committee | ||
let committee = self.ledger.get_committee_for_round(self.current_round())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be the get_previous_committee_for_round
.
In the mainnet
branch this should be changed to get_committee_lookback_for_round
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if you need me to create a new PR targeting mainnet.
|
||
// Check if we reached quorum on a new round | ||
if inserted { | ||
while committee.is_quorum_threshold_reached(&self.validators_in_support(committee)?) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The committee being used here may changed based on the round you are checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its fine to use the single given committee (from get_previous_committee_for_round
) to update completely, because who cares whether old outdated committees have high rounds..
Sidenote: if we were to only update by 1 round at a time based on the given committee, instead of always taking just <quorum>
batch proposals to update to the correct round, we will then need <quorum> + i
batch proposals to increase our round by i
. If <quorum> > max_gc_rounds
, then we might never catch up...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any case where you won't ever be able to see quorum because the fixed committee you are using doesn't include newly bonded validators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question, it seems so yes... Because we use our old outdated round to determine the committee: self.ledger.get_previous_committee_for_round(self.current_round())
.
The only alternative I see now is to use the peer's advertised batch_round
- but one question on that: is it possible that we don't know the committee yet for rounds far in the future so that get_previous_committee_for_round(batch_round)?
will keep failing? Or do we expect that to succeed at some point?
node/bft/src/helpers/cache_round.rs
Outdated
} | ||
|
||
/// Insert a validator at a round | ||
fn insert_validator_at_round(&mut self, round: u64, validator: AddressWithCoordinate<N>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doesn't this add the validator to address_rounds
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function is badly named, we are only updating because we saw a new round for this validator: ef446e9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't test it right now, but LGTM code-wise 👍
Co-authored-by: ljedrz <ljedrz@users.noreply.github.com> Signed-off-by: vicsn <victor.s.nicolaas@protonmail.com>
Co-authored-by: ljedrz <ljedrz@users.noreply.github.com> Signed-off-by: vicsn <victor.s.nicolaas@protonmail.com>
Co-authored-by: ljedrz <ljedrz@users.noreply.github.com> Signed-off-by: vicsn <victor.s.nicolaas@protonmail.com>
@vicsn in light of major code drift from |
Yes, will coordinate and help Michel to first get this related issue over the finish line to see if it resolves the attack: #3119 |
/// The current highest round which has (stake-weighted) quorum | ||
last_highest_round_with_quorum: u64, | ||
/// A sorted list of (round, Vec<AddressWithCoordinate<N>>), indicating the last seen highest round for each address | ||
highest_rounds: Vec<(u64, Vec<AddressWithCoordinate<N>>)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this Vec
have to be sorted? It seems only to quickly fetch an item? If that is true, why not use HashMap
? This gives you O(1) lookups and O(1) inserts. Now you have O(n) inserts and O(log n) lookups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will come back to this if we decide to keep this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's faster and more lightweight, unless the collection can become arbitrarily large
Deprecating this in favour of #3119 - which has been tested and works |
Motivation
Closes #2876
Open questions
AddressWithCoordinate<N>
upstream into snarkVMPerformance
In the tradition of Lukasz brilliance, I'm using Vecs+BinarySearch instead of maps.
In this repo you can find some benchmarks of different constructions. If we don't check quorum, we can do 250 updates per millisecond, if we do check quorum each time because we get increased round numbers, we can do 9 updates per millisecond. I expect on average an update will cost closer to 1/250 milliseconds. Moreover, an update will only be done if any peer actually sends a very high round number.
Also note that currently computing quorum requires us to collect into a HashSet, we could add a version which just uses an iterator (checking along the way if all elements are unique).
Persistence
I'm not storing the cache in the database, we should expect to receive a quorum's worth of updates from other nodes every round.