Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Round quorum before advancement #2897

Closed
wants to merge 15 commits into from
Closed

Conversation

vicsn
Copy link
Collaborator

@vicsn vicsn commented Dec 5, 2023

Motivation

Closes #2876

Open questions

  • we can consider merging AddressWithCoordinate<N> upstream into snarkVM

Performance

In the tradition of Lukasz brilliance, I'm using Vecs+BinarySearch instead of maps.

In this repo you can find some benchmarks of different constructions. If we don't check quorum, we can do 250 updates per millisecond, if we do check quorum each time because we get increased round numbers, we can do 9 updates per millisecond. I expect on average an update will cost closer to 1/250 milliseconds. Moreover, an update will only be done if any peer actually sends a very high round number.

Also note that currently computing quorum requires us to collect into a HashSet, we could add a version which just uses an iterator (checking along the way if all elements are unique).

Persistence

I'm not storing the cache in the database, we should expect to receive a quorum's worth of updates from other nodes every round.

@vicsn vicsn requested a review from niklaslong December 5, 2023 18:55
node/bft/src/primary.rs Outdated Show resolved Hide resolved
@niklaslong
Copy link
Contributor

Nice benches, I'm almost surprised BTreeMap wasn't the fastest with multiple inserts.

@niklaslong
Copy link
Contributor

niklaslong commented Dec 6, 2023

Do we usually run all ignore = tests manually, if so how?

Most ignored e2e tests are long running (e.g. manual testing) or are too costly to perform on CI. We run them locally from time to time as a sanity check but ideally, our coverage should be complete with regular tests.

@vicsn vicsn force-pushed the round_quorum_before_advancement branch from a96627d to 668dc7b Compare December 6, 2023 20:38
@vicsn vicsn requested a review from ljedrz December 6, 2023 20:38
@vicsn vicsn force-pushed the round_quorum_before_advancement branch from 668dc7b to 3ff7e91 Compare December 7, 2023 10:50
@vicsn vicsn marked this pull request as ready for review December 7, 2023 12:39
// If our peer is far ahead, check if a quorum of peers is ahead and consider updating our committee.
} else if is_peer_far_in_future {
// Get the highest round seen from a quorum of the current committee
let committee = self.ledger.get_committee_for_round(self.current_round())?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the get_previous_committee_for_round.

In the mainnet branch this should be changed to get_committee_lookback_for_round.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e53bdb0

Let me know if you need me to create a new PR targeting mainnet.


// Check if we reached quorum on a new round
if inserted {
while committee.is_quorum_threshold_reached(&self.validators_in_support(committee)?) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The committee being used here may changed based on the round you are checking.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its fine to use the single given committee (from get_previous_committee_for_round) to update completely, because who cares whether old outdated committees have high rounds..

Sidenote: if we were to only update by 1 round at a time based on the given committee, instead of always taking just <quorum> batch proposals to update to the correct round, we will then need <quorum> + i batch proposals to increase our round by i. If <quorum> > max_gc_rounds, then we might never catch up...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any case where you won't ever be able to see quorum because the fixed committee you are using doesn't include newly bonded validators?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question, it seems so yes... Because we use our old outdated round to determine the committee: self.ledger.get_previous_committee_for_round(self.current_round()).

The only alternative I see now is to use the peer's advertised batch_round - but one question on that: is it possible that we don't know the committee yet for rounds far in the future so that get_previous_committee_for_round(batch_round)? will keep failing? Or do we expect that to succeed at some point?

}

/// Insert a validator at a round
fn insert_validator_at_round(&mut self, round: u64, validator: AddressWithCoordinate<N>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't this add the validator to address_rounds?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is badly named, we are only updating because we saw a new round for this validator: ef446e9

node/bft/src/primary.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@ljedrz ljedrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't test it right now, but LGTM code-wise 👍

vicsn and others added 3 commits February 16, 2024 13:58
Co-authored-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: vicsn <victor.s.nicolaas@protonmail.com>
Co-authored-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: vicsn <victor.s.nicolaas@protonmail.com>
Co-authored-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: vicsn <victor.s.nicolaas@protonmail.com>
@howardwu
Copy link
Contributor

howardwu commented Mar 9, 2024

@vicsn in light of major code drift from testnet3, can you rebase this PR onto mainnet-staging and test this PR on a network to confirm its validity/necessity?

@howardwu howardwu marked this pull request as draft March 9, 2024 00:38
@vicsn
Copy link
Collaborator Author

vicsn commented Mar 11, 2024

Yes, will coordinate and help Michel to first get this related issue over the finish line to see if it resolves the attack: #3119

/// The current highest round which has (stake-weighted) quorum
last_highest_round_with_quorum: u64,
/// A sorted list of (round, Vec<AddressWithCoordinate<N>>), indicating the last seen highest round for each address
highest_rounds: Vec<(u64, Vec<AddressWithCoordinate<N>>)>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this Vec have to be sorted? It seems only to quickly fetch an item? If that is true, why not use HashMap? This gives you O(1) lookups and O(1) inserts. Now you have O(n) inserts and O(log n) lookups.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will come back to this if we decide to keep this PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's faster and more lightweight, unless the collection can become arbitrarily large

@vicsn
Copy link
Collaborator Author

vicsn commented Mar 29, 2024

Deprecating this in favour of #3119 - which has been tested and works

@vicsn vicsn closed this Mar 29, 2024
@vicsn vicsn deleted the round_quorum_before_advancement branch March 29, 2024 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Leader sending certificate with round far in the future halts the network
6 participants