EIP-7594: PeerDAS open questions #3652

ralexstokes · 2024-04-05T16:08:00Z

Context

General background for PeerDAS design and goals:

https://ethresear.ch/t/peerdas-a-simpler-das-approach-using-battle-tested-p2p-components/16541

https://ethresear.ch/t/from-4844-to-danksharding-a-path-to-scaling-ethereum-da/18046

Open questions

Parameterization

Determine final parameters for a robust and secure network.

How many SAMPLES_PER_SLOT to hit the security level we want?
Compute MAX_REQUEST_DATA_COLUMN_SIDECARS as a function of MAX_REQUEST_BLOCKS and NUMBER_OF_COLUMNS: [WIP] EIP-7594: PeerDAS protocol #3574 (comment)
What should the CUSTODY_REQUIREMENT actually be? See thread: [WIP] EIP-7594: PeerDAS protocol #3574 (comment)

Availability look-behind

One particular parameter is how tight the sampling has to be with respect to block/blob processing and fork choice. For example, nodes could sample in the same slot as a block and not consider a block valid until the sampling completes. In the event this requirement is too strict (e.g. because of network performance), we could relax the requirement to only complete sampling within some number of trailing slots from the head. If we go with a trailing approach, are there additional complications in the regime of long-range forks or network partitions? Does working in this "optimistic" setting cause undue complexity in implementations?

Syncing

Some questions around syncing relating to PeerDAS and also the possible deprecation of EIP-4844 style sampling.

Deprecate `blob_sidecars_by_root` and `blob_sidecars_by_range`?

Can we deprecate these RPC methods? Note you would still sample anything inside the blob retention window.

`DataColumnSidecarsByRoot` and `DataColumnSidecarsByRange`

Currently missing a method for ByRange. Required for syncing in the regime where clients are expected to retain samples.
What is the exact layout of the RPC method? Multiple columns or just one? See thread: #3574 (comment)

Peer scoring

How to downscore a peer who should custody some sample but can’t respond with it?

Network shards design

See here for more context on the proposal: #3623
Likely a good simplification. Would touch some of the PeerDAS details around mapping a given peer to their sample subnets.
Some additional implications: #3574 (comment)

Subnet design

Map one column per subnet, unless we need to do otherwise, see #3574 (comment)

ENR semantics

#3574 (comment)

Spec refactoring

Misc. refactoring to align with the general spec style:

#3574 (comment)
#3574 (comment)
#3574 (comment)
Ensure all comments with references to Deneb or 4844 are now EIP-7594
#3574 (comment)
#3574 (comment)

The text was updated successfully, but these errors were encountered:

dapplion · 2024-04-06T06:10:45Z

Does working in this "optimistic" setting cause undue complexity in implementations?

Big yes, but note a similar gadget is required by ILs in their current design

Deprecate blob_sidecars_by_root and blob_sidecars_by_range?

They don't appear necessary as the proposer should distribute columns directly.

DataColumnSidecarsByRange

Useful for column custodians to fetch all columns for given subnet and epoch, like we do now for blobs

fradamt · 2024-04-08T10:58:30Z

If we go with a trailing approach, are there additional complications in the regime of long-range forks or network partitions? Does working in this "optimistic" setting cause undue complexity in implementations?

Imho we should avoid having the whole validator set operating in an optimistic setting, even if we were to ignore implementation complexity and just worry about consensus security. One attack that this enables is:

A proposer or a builder (importantly, not someone controlling much stake) proposes an unavailable block B, in particular available only in 15 out of 32 subnets.
Everyone in the 15 subnets where it is available votes for B because sampling is not required yet
Though B has a lot of votes, the next proposer does not build on it because sampling fails
Data is meanwhile made fully available. Sampling now succeeds for everyone.
No one votes for the new proposal because B has weight > proposer boost and the proposal does not extend it

This can perhaps be fixed by requiring the attesters to have their sampling done by 10s into the previous slot, while the proposer has a bit more time. More complexity, more timing assumptions. Also, this is just one attack, and it's not clear what the entire attack surface looks like.

There is a clear solution: the custody requirement needs to be high enough to provide strong guarantees even before we get to sampling (see here as well). High enough here means somewhere between 4 and 8, depending on the adversarial model we want to work with. With that, an attacker that does not control a lot of validators would fail at accruing many votes for a < 50% available block, and so it would be easily reorgable through proposer boost.

Some related things to keep in mind:

The efficiency gain we get in the distribution phase of PeerDAS compared to 4844 is DATA_COLUMN_SIDECAR_SUBNET_COUNT / CUSTODY_REQUIREMENT / 2, because nodes are required to custody CUSTODY_REQUIREMENT / DATA_COLUMN_SIDECAR_SUBNET_COUNT of the whole data, which is extended by 2x. For example, with current parameters PeerDAS would be 16x more efficient then 4844 (ignoring sampling): everyone downloads 1/32 of the 2x extended data, so an average throughput of 48 blobs would require the equivalent of the 4844 bandwidth for distribution. Even a much more modest ratio of 5x lets us move to 16/32 blobs with hardly any bandwidth increase (just a little bit for sampling)
By increasing the number of subnets, we can increase CUSTODY_REQUIREMENT without affecting the above mentioned ratio, or we can at least recover some of the lost efficiency. If we want to stick with 32 subnets, we could for example set the CUSTODY_REQUIREMENT to 4, which gives a 4x gain. In the initial rollout, we could even be more conservative, even if it does not allow much of a blob count increase. If we are ok with having 64 subnets like we do for attestations (and possibly all fitting together in the network shard paradigm?), then reasonable values could be 4/64 (8x), 6/64 (~5x), 8/64 (4x). Since in the short term we're likely not going to want to go past a max of 32 blobs, there might not be much reason to go beyond these values, e.g., up to 128 subnets.
A higher CUSTODY_REQUIREMENT / DATA_COLUMN_SIDECAR_SUBNET_COUNT ratio also means that we don't need as many honest peers in order to have good guarantees about being able to get our samples. Peer sampling can be generally more robust, and less dependent on there being many nodes with a high advertised custody.

Imo it makes a lot of sense to move from 4844 to PeerDAS gradually. We can do this not only by slowly increasing the blob count, but also by slowly decreasing the minimum proportion of data custodied by each node, i.e., the CUSTODY_REQUIREMENT / DATA_COLUMN_SIDECAR_SUBNET_COUNT ratio. For example, we could start with 3/6 blobs, 32 subnets, a custody requirement of 16, i.e., unchanged throughput and everyone still downloads the whole data, just changing the networking. At this point, we wouldn't even need sampling yet, and we could introduce it without it actually doing anything, just to test the behavior on mainnet. We could then fully introduce sampling while moving to 6/12 blobs and a custody requirement of 8, then 12/24 blobs and custody requirement of 4. From there, we can increase the subnet count to 64 etc...

How many SAMPLES_PER_SLOT to hit the security level we want?

I don't see why we would want more than 16, or even 16 - CUSTODY_REQUIREMENT.

jimmygchen · 2024-05-01T12:11:46Z

Is it worth also increasing the TARGET_NUMBER_OF_PEERS (currently 70), in addition to increasing the CUSTODY_REQUIREMENT?

With a target peer count of 70, and each peer subscribing to one subnet (out of 32), a healthy target peer count per subnet would be ~2 on average. This would potentially impact the ability for proposer to disseminate data columns to all 32 subnets successfully, and could potentially lead to data loss - assuming proposer isn't custodying all columns - we could potentially make an exception for proposer to custody all columns, but feels like it would be cleaner to just make sure we disseminate the samples reliably.

Although if we increase CUSTODY_REQUIREMENT to 4 this would already significantly reduce the likelihood of having insufficient peers in a subnet.

fradamt · 2024-05-03T08:41:09Z

Is it worth also increasing the TARGET_NUMBER_OF_PEERS (currently 70), in addition to increasing the CUSTODY_REQUIREMENT?

With a target peer count of 70, and each peer subscribing to one subnet (out of 32), a healthy target peer count per subnet would be ~2 on average. This would potentially impact the ability for proposer to disseminate data columns to all 32 subnets successfully, and could potentially lead to data loss - assuming proposer isn't custodying all columns - we could potentially make an exception for proposer to custody all columns, but feels like it would be cleaner to just make sure we disseminate the samples reliably.

Although if we increase CUSTODY_REQUIREMENT to 4 this would already significantly reduce the likelihood of having insufficient peers in a subnet.

We really shouldn't keep the CUSTODY_REQUIREMENT as is (even 4 is low) unless we go with a non-trailing fork-choice, so this shouldn't be as much of a problem in the short term. That said, if all clients agree that it's ok to do so, I think increasing the TARGET_NUMBER_OF_PEERS would be great, because even in the best case we'd have an average of ~7 peers per subnet (e.g. with CUSTODY_REQUIREMENT = 6 and 64 subnets). It also gives us more room to relax the custody ratio later.

fradamt · 2024-05-03T09:43:38Z

Something that I think should be added to the open questions is validator custody: should validators have their own custody assignment, at the very least when they're voting, if not even in every slot? This has two benefits:

If an unavailable block is finalized, validators can be asked (out of protocol) to provide the data they were supposed to custody, and socially slashed if they fail to do so after some deadline
There are two reasons to increase the CUSTODY_REQUIREMENT. One is to ensure that the average number of peers per subnet is sufficiently high, and another is to ensure that most validators won't vote for an unavailable block (the pre-sampling guarantees discussed here). Depending on TARGET_NUMBER_OF_PEERS, the former might require less custody than the latter, so the extra load can just be on validators, which need the extra custody for voting securely, and not on simple full nodes, for which it is unnecessary extra work.

Just as an example, we could set CUSTODY_REQUIREMENT to 4 and VALIDATOR_CUSTODY_REQUIREMENT to 2.

cc @adietrichs

cskiraly · 2024-05-13T07:31:29Z

How many SAMPLES_PER_SLOT to hit the security level we want?

I have my LossyDAS for PeerDAS notebook here:
https://colab.research.google.com/drive/18uUgT2i-m3CbzQ5TyP9XFKqTn1DImUJD

Of course it also covers the 0 losses allowed case.
The main question here is I think setting the security level we want to achieve. Any thoughts on that?

cskiraly · 2024-05-13T07:43:42Z

I see the following in the spec:
TARGET_NUMBER_OF_PEERS should be tuned upward in the event of failed sampling.

What are we trying to address with this? If it remains in the spec, I think there should also be a mechanism (or recommendations) to return back to original values.

cskiraly · 2024-05-13T07:58:47Z

Regarding TARGET_NUMBER_OF_PEERS:
We need peers for two different things:

building the overlays, which is at the subnet level
sampling, which is at the column level

For the sampling, peer count is important, because the mechanism to sample fast from nodes that are not peers is not yet there, os I see this driving TARGET_NUMBER_OF_PEERS requirements.
For the subnets, instead, my assumption would be that you can change your peerset based on the subnets assigned. If rotation is not too fast (or if there is no rotation), this should be doable. In that case, what you need is to reach target degree (plus some) on custody_size subnets.

TARGET_NUMBER_OF_PEERS I think should be tuned based on these two requirement, with sufficient safety margins.

fradamt mentioned this issue Apr 12, 2024

Network shards (Attnet Revamp + DAS Distribution Columns) #3623

Closed

hwwhww added the EIP-7594 PeerDAS label Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EIP-7594: PeerDAS open questions #3652

EIP-7594: PeerDAS open questions #3652

ralexstokes commented Apr 5, 2024 •

edited

dapplion commented Apr 6, 2024 •

edited

fradamt commented Apr 8, 2024 •

edited

jimmygchen commented May 1, 2024

fradamt commented May 3, 2024

fradamt commented May 3, 2024

cskiraly commented May 13, 2024

cskiraly commented May 13, 2024

cskiraly commented May 13, 2024

EIP-7594: PeerDAS open questions #3652

EIP-7594: PeerDAS open questions #3652

Comments

ralexstokes commented Apr 5, 2024 • edited

Context

Open questions

Parameterization

Availability look-behind

Syncing

Deprecate blob_sidecars_by_root and blob_sidecars_by_range?

DataColumnSidecarsByRoot and DataColumnSidecarsByRange

Peer scoring

Network shards design

Subnet design

ENR semantics

Spec refactoring

dapplion commented Apr 6, 2024 • edited

fradamt commented Apr 8, 2024 • edited

jimmygchen commented May 1, 2024

fradamt commented May 3, 2024

fradamt commented May 3, 2024

cskiraly commented May 13, 2024

cskiraly commented May 13, 2024

cskiraly commented May 13, 2024

ralexstokes commented Apr 5, 2024 •

edited

Deprecate `blob_sidecars_by_root` and `blob_sidecars_by_range`?

`DataColumnSidecarsByRoot` and `DataColumnSidecarsByRange`

dapplion commented Apr 6, 2024 •

edited

fradamt commented Apr 8, 2024 •

edited