Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: limit outstanding certificate requests #2956

Open
wants to merge 3 commits into
base: testnet3
Choose a base branch
from

Conversation

joske
Copy link
Contributor

@joske joske commented Dec 27, 2023

This attempts to fix https://github.com/AleoHQ/snarkOS/issues/2935.

I was able to reproduce using the following malicious code (credit @niklaslong): https://github.com/niklaslong/snarkOS/tree/tmp/2935-replication

Run one node with this code, and the rest with normal testnet3. After 20 rounds, the malicious node will send fake certificate requests.

With this fix, the nodes reject the bad certs after the max outstanding requests of 20 is reached, and then carry on, before all nodes would crash with stack overflow.

The MAX_OUTSTANDING_CERTIFICATE_REQUESTS is chosen arbitrarily at 20, please change as you see fit.

@joske joske requested a review from howardwu December 27, 2023 12:06
@joske joske force-pushed the fix/infinite_recursion branch 2 times, most recently from 90c00f1 to 4ca197f Compare December 27, 2023 12:09
node/bft/src/primary.rs Outdated Show resolved Hide resolved
@randomsleep
Copy link
Contributor

I am the reporter of #2935. The root cause of the issue is that the validator didn't verify the certificate before entering recursion. I think we can resolve the issue more directly:

  1. Upon receiving a BatchPropose, verify the direct parent certificates are all valid.
  2. Upon receiving a Certificate through broadcast, immediately verify if it is valid.

Also, we may support Batch Certificate Request and avoid using recursion.

@ghostant-1017
Copy link
Contributor

I am the reporter of #2935. The root cause of the issue is that the validator didn't verify the certificate before entering recursion. I think we can resolve the issue more directly:

  1. Upon receiving a BatchPropose, verify the direct parent certificates are all valid.
  2. Upon receiving a Certificate through broadcast, immediately verify if it is valid.

Also, we may support Batch Certificate Request and avoid using recursion.

Verify the BatchPropose or BatchCertificate rely on the previous certificates. And we don't know the committee has changed or not when receive a BatchPropose or BatchCertificate.

@randomsleep
Copy link
Contributor

I am the reporter of #2935. The root cause of the issue is that the validator didn't verify the certificate before entering recursion. I think we can resolve the issue more directly:

  1. Upon receiving a BatchPropose, verify the direct parent certificates are all valid.
  2. Upon receiving a Certificate through broadcast, immediately verify if it is valid.

Also, we may support Batch Certificate Request and avoid using recursion.

Verify the BatchPropose or BatchCertificate rely on the previous certificates. And we don't know the committee has changed or not when receive a BatchPropose or BatchCertificate.

Yes, this is a problem for the current implementation.

I submitted another bug report before, which is about the committee update. The report shows that updating the committee at every block will cause some consensus issues. Maybe we can fix that issue first to make it easier to validate the certificate.

@howardwu
Copy link
Contributor

howardwu commented Mar 8, 2024

@randomsleep @ghostant-1017 Is the issue still present in latest commit of mainnet-staging? If there is a better design, we can close this PR and work on a new issue.

@randomsleep
Copy link
Contributor

@howardwu I believe this issue still exists in mainnet-staging. Now the committee is fixed for every round. We can address this issue by:

  1. Upon receiving a Certificate, immediately check if it has reached quorum before recursion. If the Certificate is the parent of an already checked Certificate, we can skip the check.
  2. Upon receiving a BatchPropose, immediately perform some checks before recursion. Make sure the direct parent Certificate has reached quorum and parent_round == batch_propose_round + 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants