Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gossip: notify state machine of duplicate proofs #32963

Merged

Conversation

AshwinSekar
Copy link
Contributor

@AshwinSekar AshwinSekar commented Aug 23, 2023

Problem

We consume duplicate shred proofs from gossip, however we don't let fork choice know.

Summary of Changes

Notify the consensus state machine which will in turn update fork choice.

Split from #32862
Fixes #16508

bw-solana
bw-solana previously approved these changes Aug 25, 2023
Copy link
Contributor

@bw-solana bw-solana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AshwinSekar
Copy link
Contributor Author

this looks like the commit which will allow Scenario 3, i'll wait to merge this until we have a fix.

@@ -84,6 +84,8 @@ pub enum Error {
TryFromIntError(#[from] TryFromIntError),
#[error("unknown slot leader: {0}")]
UnknownSlotLeader(Slot),
#[error("unable to send duplicate slot to state machine")]
DuplicateSlotSenderFailure,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please keep the enum sorted

@@ -25,6 +26,7 @@ const MAX_NUM_ENTRIES_PER_PUBKEY: usize = 128;
const BUFFER_CAPACITY: usize = 512 * MAX_NUM_ENTRIES_PER_PUBKEY;

type BufferEntry = [Option<DuplicateShred>; MAX_NUM_CHUNKS];
type DuplicateSlotSender = Sender<Slot>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this typedef.
For someone reading the code, Sender<Slot> is more explicit what actual type we are working with, and does not need to jump elsewhere to find the definition.

@AshwinSekar AshwinSekar force-pushed the duplicate-shred-proof-from-gossip branch from d147176 to b63c9d7 Compare September 5, 2023 09:34
@github-actions github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Sep 20, 2023
@github-actions github-actions bot closed this Sep 27, 2023
@AshwinSekar AshwinSekar reopened this Nov 29, 2023
@AshwinSekar AshwinSekar marked this pull request as draft November 29, 2023 18:10
@AshwinSekar AshwinSekar self-assigned this Nov 29, 2023
@AshwinSekar AshwinSekar force-pushed the duplicate-shred-proof-from-gossip branch from b63c9d7 to bd931bf Compare November 30, 2023 04:01
@AshwinSekar
Copy link
Contributor Author

reopening this now that Scenario 3 has been merged #34186
understandably some duplicate block local cluster tests are failing now that the duplicate proof is actually functional - will keep this as draft until I can fix up those tests.

@github-actions github-actions bot removed the stale [bot only] Added to stale content; results in auto-close after a week. label Nov 30, 2023
@AshwinSekar AshwinSekar force-pushed the duplicate-shred-proof-from-gossip branch from bd931bf to 3a0ba29 Compare December 1, 2023 08:38
@AshwinSekar AshwinSekar marked this pull request as ready for review December 1, 2023 08:50
@AshwinSekar AshwinSekar force-pushed the duplicate-shred-proof-from-gossip branch from 3a0ba29 to 048e1a0 Compare December 1, 2023 08:51
Copy link

codecov bot commented Dec 1, 2023

Codecov Report

Attention: 32 lines in your changes are missing coverage. Please review.

Comparison is base (b04765f) 81.6% compared to head (87bf3e5) 81.6%.
Report is 7 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##           master   #32963     +/-   ##
=========================================
- Coverage    81.6%    81.6%   -0.1%     
=========================================
  Files         828      830      +2     
  Lines      224339   224743    +404     
=========================================
+ Hits       183274   183569    +295     
- Misses      41065    41174    +109     

Comment on lines 154 to 162
let activated_slot = root_bank
.feature_set
.activated_slot(&feature_set::enable_gossip_duplicate_proof_ingestion::id())
.map(|slot| root_bank.epoch_schedule().get_epoch(slot));
activated_slot.is_some_and(|feature_epoch| {
root_bank.epoch_schedule().get_epoch(slot) > feature_epoch
// feature_epoch could only be 0 in tests and new cluster setup.
|| feature_epoch == 0
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more complicated than it should be. Can you make this more readable with something like:

let Some(activated_slot) = root_bank
            .feature_set
            .activated_slot(&feature_set::enable_gossip_duplicate_proof_ingestion::id()) else {
  return false;
}

@AshwinSekar AshwinSekar force-pushed the duplicate-shred-proof-from-gossip branch from bfa18af to 850dc1e Compare January 25, 2024 17:33
@AshwinSekar AshwinSekar force-pushed the duplicate-shred-proof-from-gossip branch from 850dc1e to 87bf3e5 Compare January 25, 2024 19:08
@AshwinSekar AshwinSekar merged commit 93271d9 into solana-labs:master Jan 26, 2024
45 checks passed
@AshwinSekar AshwinSekar added the v1.18 PRs that should be backported to v1.18 label Jan 29, 2024
Copy link
Contributor

mergify bot commented Jan 29, 2024

Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis.

mergify bot pushed a commit that referenced this pull request Jan 29, 2024
* gossip: notify state machine of duplicate proofs

* Add feature flag for ingesting duplicate proofs from Gossip.

* Use the Epoch the shred is in instead of the root bank epoch.

* Fix unittest by activating the feature.

* Add a test for feature disabled case.

* EpochSchedule is now not copyable, clone it explicitly.

* pr feedback: read epoch schedule on startup, add guard for ff recache

* pr feedback: bank_forks lock, -cached_slots_in_epoch, init ff

* pr feedback: bank.forks_try_read() -> read()

* pr feedback: fix local-cluster setup

* local-cluster: do not expose gossip internals, use retry mechanism instead

* local-cluster: split out case 4b into separate test and ignore

* pr feedback: avoid taking lock if ff is already found

* pr feedback: do not cache ff epoch

* pr feedback: bank_forks lock, revert to cached_slots_in_epoch

* pr feedback: move local variable into helper function

* pr feedback: use let else, remove epoch 0 hack

---------

Co-authored-by: Wen <crocoxu@gmail.com>
(cherry picked from commit 93271d9)
mergify bot added a commit that referenced this pull request Jan 29, 2024
…32963) (#35006)

gossip: notify state machine of duplicate proofs (#32963)

* gossip: notify state machine of duplicate proofs

* Add feature flag for ingesting duplicate proofs from Gossip.

* Use the Epoch the shred is in instead of the root bank epoch.

* Fix unittest by activating the feature.

* Add a test for feature disabled case.

* EpochSchedule is now not copyable, clone it explicitly.

* pr feedback: read epoch schedule on startup, add guard for ff recache

* pr feedback: bank_forks lock, -cached_slots_in_epoch, init ff

* pr feedback: bank.forks_try_read() -> read()

* pr feedback: fix local-cluster setup

* local-cluster: do not expose gossip internals, use retry mechanism instead

* local-cluster: split out case 4b into separate test and ignore

* pr feedback: avoid taking lock if ff is already found

* pr feedback: do not cache ff epoch

* pr feedback: bank_forks lock, revert to cached_slots_in_epoch

* pr feedback: move local variable into helper function

* pr feedback: use let else, remove epoch 0 hack

---------

Co-authored-by: Wen <crocoxu@gmail.com>
(cherry picked from commit 93271d9)

Co-authored-by: Ashwin Sekar <ashwin@solana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-gate Pull Request adds or modifies a runtime feature gate v1.18 PRs that should be backported to v1.18
Projects
Development

Successfully merging this pull request may close these issues.

Gossip proofs should be ingested into blockstore's, and used to notify ReplayStage of duplicates
5 participants