[Fix] Increase the inbound queue capacity (that leads into the memory pool) #3124

raychu86 · 2024-02-23T18:52:19Z

Motivation

This PR increases the capacity of the inbound queue (that leads into the memory pool) to the following:

Deployments: 10 -> 1024
Executions: 40 -> 1024
Solutions: 50 -> 1024

And bounds the number of deployments the BFT processes at any given interval to 1 deployment from the inbound queue entering into the memory pool.

Previously the capacity was too low and would cause the nodes to drop transactions during times of high load. These numbers are not final, and should be tuned to support the expected network throughputs.

apruden2008 · 2024-02-23T23:41:51Z

Don't we want the nodes to drop transactions during high load in the event of a flood? What is the "expected behavior" supposed to be in this case?

vicsn · 2024-02-24T10:48:22Z

node/consensus/src/lib.rs

-/// Percentage of mempool transactions capacity reserved for executions.
-const CAPACITY_FOR_EXECUTIONS: usize = 80;
+/// The capacity of the mempool reserved for deployments.
+const CAPACITY_FOR_DEPLOYMENTS: usize = 1 << 8;


The use of this constant further below (line 302) is now broken.

To avoid missing it in the future, I suggest we introduce a new constant, e.g. DEPLOY_VERIFICATION_RATE. This constant ensures that for every 10 transactions we are going to verify, at most DEPLOY_VERIFICATION_RATE will be deployments, preventing DoS from bogus deployments.

In this PR a rate of 20 was approved, but we can in the future go up to 50 to enhance deployment throughput.

vicsn · 2024-02-24T11:23:15Z

Don't we want the nodes to drop transactions during high load in the event of a flood? What is the "expected behavior" supposed to be in this case?

@apruden2008 The expected behaviour is that we don't drop transactions too quickly as this hurts usability. If the mempool would have a hypothetical size of 1, almost all transactions would be dropped, rendering the chain unusable. Increasing the mempool size does not hurt performance because they are just waiting in a queue. The only limiting factor is how much memory the nodes have.

Deployments: 10 -> 256

@raychu86 If we assume 300 MB a piece (as noted here), we will run out of memory with 256 deployments. it seems plausible programs can be that big, I just trivially made one of 50MB. A maximum of 50 deployments seems prudent. Moreover in contrast to the execution queue, deployments can more easily be filled with bogus, so a big queue can also hurt usability.

apruden2008 · 2024-02-27T17:34:44Z

In my view, we need to prioritize liveness over useability for now. If each program can be 300 MB, than even 10 deployments is 3 GB of not only memory but also bandwidth. Plus I'm assuming all the circuits need to be synthesized.

Given the prior issues we had with consensus and syncing, it feels like increasing this parameter only opens up a DoS vector & demand for transactions is clearly nowhere close to this level based on data from existing explorers

I would argue putting this down to the lowest possible setting per the acceptance criteria for launch. We can raise it later as chain usage grows.

vicsn · 2024-02-27T17:42:54Z

In my view, we need to prioritize liveness over useability for now. If each program can be 300 MB, than even 10 deployments is 3 GB of not only memory but also bandwidth. Plus I'm assuming all the circuits need to be synthesized.

Just so we're on the same page, increasing mempool capacity does not consume more bandwidth nor (synthesizing) processing time.

But it does use memory and your other concerns are valid.

raychu86 · 2024-02-27T22:18:13Z

@apruden2008 I've noted your concerns and want to make sure we are all on the same page. As Victor has mentioned, the mempool capacity is just in memory storage of the transactions (that we have yet to verify and hold no other resources other than space in memory).

The memory capacity is usually not the limiting factor of node operation, as we usually see memory usage hover at single digit percentages. The verification and processing of these transactions is the more important metric, which is why we bound the number of deployments we can process for any given round to reduce CPU usage.

In any case, I've lowered the bounds for deployments to 32 deployments in the mempool and a verification/process rate of 15% (which is at most 7 deployments per certificate) to further address any concerns.

howardwu · 2024-03-02T21:14:23Z

I just read this thread and there appears to be a game of telephone happening here.

Programs cannot be 300MBs, not even close.

@vicsn can you provide clear context as to how we arrived at this number?

howardwu · 2024-03-02T22:05:41Z

I just read this thread and there appears to be a game of telephone happening here.

Programs cannot be 300MBs, not even close.

@vicsn can you provide clear context as to how we arrived at this number?

Discussed offline. The 300MBs is a network message size, however it is unrelated to programs. There was a dummy program written which sized up to 50MBs by leveraging the struct construction.

Following this discussion, the decision was made to reduce the number of structs that can be defined in a program, and to bound any other constructs of a program so that the size of a program is inherently restricted.

@vicsn and @d0cd will be responsible with implementing a PR to contain this.

howardwu · 2024-03-04T01:19:18Z

Limiting the queue to 16 deployments is reasonable to mitigate potential DOS vectors (using malformed deployments), and restricting each interval to 1 deployment limits what the memory pool introduces into consensus. We will want to wait for the PR with program size restrictions, which will further reduce networking overhead of deployments.

Note: this PR does not change the memory pool limits (which remains gated at 50 transmissions per batch proposal). Rather, this PR limits the inbound queue, which leads into the memory pool (recall Narwhal is the memory pool by definition).

vicsn · 2024-03-11T17:02:23Z

With #3141 merged, deployments are max 100KB. Noting that we don't want to enable attackers to fill the queue with malformed deployments, @howardwu do you want to keep CAPACITY_FOR_DEPLOYMENTS limited to 16?

howardwu · 2024-03-12T01:13:29Z

At 16 deployments, we technically run the risk that a malicious actor could prevent program deployments by DOS'ing the network with a large quantity of new deployment transactions, thereby deleting the earlier deployment transactions, because we operate on an LRU policy in the inbound queue. As such, it is advisable for us to raise this limit as well. At 100KB, we may want to consider the limit at 1024. Is there an alternative value you might propose?

…

On Mon, Mar 11, 2024 at 10:02 AM vicsn ***@***.***> wrote: With #3141 <#3141> merged, deployments are max 100KB. Noting that we don't want to enable attackers to fill the queue with malformed deployments, @howardwu <https://github.com/howardwu> do you want to keep CAPACITY_FOR_DEPLOYMENTS limited to 16? — Reply to this email directly, view it on GitHub <#3124 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACGU6DFUZNINKHV23LGA4IDYXXPTNAVCNFSM6AAAAABDXGBC3CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYHE3DKMJYG4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

vicsn · 2024-03-12T14:00:19Z

At 16 deployments, we technically run the risk that a malicious actor could prevent program deployments by DOS'ing the network with a large quantity of new deployment transactions, thereby deleting the earlier deployment transactions, because we operate on an LRU policy in the inbound queue. As such, it is advisable for us to raise this limit as well. At 100KB, we may want to consider the limit at 1024. Is there an alternative value you might propose?

After additional thought, I agree 1024 is the way to go. Pushed the adustment: 12cca35

In the scenario where a validator is compromised, whatever the queue length, it is very susceptible to attacks. But in the scenario where no validator is compromised, 1024 is strictly better.

Update the mempool capacity

ac9983f

raychu86 requested review from kpandl and vicsn February 23, 2024 18:52

vicsn reviewed Feb 24, 2024

View reviewed changes

raychu86 added 2 commits February 27, 2024 13:57

Tune values

e99ad64

Use a percentage for deployment verification

3d83823

vicsn approved these changes Feb 28, 2024

View reviewed changes

howardwu mentioned this pull request Mar 2, 2024

Add limits on program struct definitions #3141

Closed

raychu86 added 2 commits March 3, 2024 16:12

Merge branch 'mainnet' into increase-cache-size

8dac9cc

Adjust capacities

3b7269e

raychu86 changed the title ~~Increase the mempool capacity~~ Increase the inbound queue capacity (that leads into the memory pool) Mar 4, 2024

Increase CAPACITY_FOR_DEPLOYMENTS to 1 << 10

12cca35

howardwu approved these changes Mar 12, 2024

View reviewed changes

howardwu merged commit f8d3c90 into mainnet-staging Mar 12, 2024
23 of 24 checks passed

howardwu deleted the increase-cache-size branch March 12, 2024 23:32

howardwu mentioned this pull request Mar 26, 2024

[Tracking] Pre-release of the mainnet branch #3160

Draft

howardwu changed the title ~~Increase the inbound queue capacity (that leads into the memory pool)~~ [Fix] Increase the inbound queue capacity (that leads into the memory pool) Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] Increase the inbound queue capacity (that leads into the memory pool) #3124

[Fix] Increase the inbound queue capacity (that leads into the memory pool) #3124

raychu86 commented Feb 23, 2024 •

edited by howardwu

apruden2008 commented Feb 23, 2024

vicsn Feb 24, 2024 •

edited

vicsn commented Feb 24, 2024 •

edited

apruden2008 commented Feb 27, 2024

vicsn commented Feb 27, 2024

raychu86 commented Feb 27, 2024

howardwu commented Mar 2, 2024

howardwu commented Mar 2, 2024

howardwu commented Mar 4, 2024

vicsn commented Mar 11, 2024

howardwu commented Mar 12, 2024 via email

vicsn commented Mar 12, 2024

[Fix] Increase the inbound queue capacity (that leads into the memory pool) #3124

[Fix] Increase the inbound queue capacity (that leads into the memory pool) #3124

Conversation

raychu86 commented Feb 23, 2024 • edited by howardwu

Motivation

apruden2008 commented Feb 23, 2024

vicsn Feb 24, 2024 • edited

Choose a reason for hiding this comment

vicsn commented Feb 24, 2024 • edited

apruden2008 commented Feb 27, 2024

vicsn commented Feb 27, 2024

raychu86 commented Feb 27, 2024

howardwu commented Mar 2, 2024

howardwu commented Mar 2, 2024

howardwu commented Mar 4, 2024

vicsn commented Mar 11, 2024

howardwu commented Mar 12, 2024 via email

vicsn commented Mar 12, 2024

raychu86 commented Feb 23, 2024 •

edited by howardwu

vicsn Feb 24, 2024 •

edited

vicsn commented Feb 24, 2024 •

edited