Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HackerOne-2311934] Verify transactions prior to speculation #2376

Merged
merged 12 commits into from Mar 13, 2024

Conversation

raychu86
Copy link
Contributor

@raychu86 raychu86 commented Mar 1, 2024

Motivation

This PR performs transaction verification in VM::speculate and aborts the transactions that are invalid. This is necessary because in snarkOS when we construct subdags, we do not validate transmissions found in proposals, which could lead to block production halting if there are malicious validators. With the new change, we can eliminate the invalid transactions prior to block production and ensure that Block::check_next_block doesn't error.

Note that now we perform transaction verification prior to creating the block itself, rather than just before the block is about to be added to storage. Because of the transaction partial verification cache, we do not perform duplicate verification on the execution/deployment, but we do have to perform the (cheaper) uniqueness checks twice.

Related PRs

This PR is related to AleoHQ/snarkOS#3104.

And should replace: AleoHQ/snarkOS#3083 and AleoHQ/snarkOS#3098.

This should fix: AleoHQ/snarkOS#2991 and AleoHQ/snarkOS#2990.

@vicsn
Copy link
Contributor

vicsn commented Mar 1, 2024

FYI when trying to compile this into snarkOS for testing I get the error:

error[E0038]: the trait `snarkos_node_bft_ledger_service::LedgerService` cannot be made into an object
   --> node/sync/src/block_sync.rs:118:49
    |
118 |     pub fn new(mode: BlockSyncMode, ledger: Arc<dyn LedgerService<N>>) -> Self {
    |                                                 ^^^^^^^^^^^^^^^^^^^^ `snarkos_node_bft_ledger_service::LedgerService` cannot be made into an object
    |
note: for a trait to be "object safe" it needs to allow building a vtable to allow the call to be resolvable dynamically; for more information visit <https://doc.rust-lang.org/reference/items/traits.html#object-safety>
   --> /Users/victorsintnicolaas/programs/snarkOS/node/bft/ledger-service/src/traits.rs:107:8
    |
107 |     fn prepare_advance_to_next_quorum_block<R: Rng + CryptoRng>(
    |        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait cannot be made into an object because method `prepare_advance_to_next_quorum_block` has generic type parameters

@ljedrz
Copy link
Contributor

ljedrz commented Mar 1, 2024

FYI when trying to compile this into snarkOS for testing I get the error:

As for the object safety issue, it's not ideal, but we could work around it by having 2 versions of prepare_advance_to_next_quorum_block, with TestRng hardcoded for #[cfg(test)] scenarios 🤔.

_ => {
let rngs = (0..candidate_transactions.len()).map(|_| StdRng::from_seed(rng.gen())).collect::<Vec<_>>();
// Verify the transactions and collect the error message if there is one.
let checked_transactions: Vec<_> = cfg_into_iter!(candidate_transactions)
Copy link
Contributor

@vicsn vicsn Mar 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment still applies: AleoHQ/snarkOS#3083 (comment)
I was able to kill all the nodes with some big batch proposals in a test suite.

With the system allocator (which is heavier than jemalloc):

  • The maximum RAM to verify a 1M constraint program is 2GB.
  • The maximum RAM for a 1 or 31-depth 1M constraint call execution is negligible, but to be safe we could assume 10MB

A thorough but brittle approach would be to estimate memory usage based on the number of constraints. But an easier setup would be a separate deploy and an execute verification maximum:

const MAX_PARALLEL_DEPLOY_VERIFICATIONS = 5;
const MAX_PARALLEL_EXECUTE_VERIFICATIONS = 1000;

let (deploys, executions) = candidate_transactions.partition(|tx|tx.is_deploy());
let deploy_iter = deploys.chunks(MAX_PARALLEL_DEPLOY_VERIFICATIONS);
let execution_iter = executions.chunks(MAX_PARALLEL_EXECUTE_VERIFICATIONS);
let tx_iter = deploy_iter.chain(execution_iter);

while let Some(txs) = tx_iter.next() {
    cfg_into_iter!(txs.zip(rngs).map(|| ...verify...)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different from the current approach of transaction verification on the current mainnet branch in check_next_block?

Are you saying that fundamentally the verification logic should be updated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that piece of parallel verification logic was prone to the same vulnerability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented the OOM fix here - #2387. If it is approved, we can apply the change to this PR as well.

@raychu86
Copy link
Contributor Author

raychu86 commented Mar 2, 2024

FYI when trying to compile this into snarkOS for testing I get the error:

@vicsn Similar to how we use a thread_rng for check_next_block and check_transaction_basic, this should allow testing to compile:

AleoHQ/snarkOS@mainnet...abort-malicious-txes

@vicsn
Copy link
Contributor

vicsn commented Mar 3, 2024

@vicsn Similar to how we use a thread_rng for check_next_block and check_transaction_basic, this should allow testing to compile:

Thank you, will test it and report back.

@ghostant-1017
Copy link
Contributor

Hi guys,
I was notified that [HackerOne-2310746/2311934] were duplicated. As I understand it and Report-2310746 focuses only on the Fee and is easier to fix, while Report-2311934 is more comprehensive. Will you reconsider the results of these two report assessments? Thank you! @vicsn @raychu86

@raychu86
Copy link
Contributor Author

raychu86 commented Mar 4, 2024

Hi guys, I was notified that [HackerOne-2310746/2311934] were duplicated. As I understand it and Report-2310746 focuses only on the Fee and is easier to fix, while Report-2311934 is more comprehensive. Will you reconsider the results of these two report assessments? Thank you! @vicsn @raychu86

Although both of these issues can be resolved with a single fix, in my opinion, they could be considered separate reports.

The snarkOS PR AleoHQ/snarkOS#3104, should address the Fee finding, and this PR should address both of the findings (for added safety).

@ghostant-1017
Copy link
Contributor

ghostant-1017 commented Mar 7, 2024

Thank you! @raychu86

@raychu86 raychu86 force-pushed the verify-in-speculate branch 2 times, most recently from 95c7e70 to aa6811e Compare March 13, 2024 01:51
@howardwu howardwu merged commit 1969efc into mainnet-staging Mar 13, 2024
78 checks passed
@howardwu howardwu deleted the verify-in-speculate branch March 13, 2024 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] BatchPropose with invalid transaction can halt the network
5 participants