-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mitigate coordinator deanonymization attacks based on timing and data withholding #83
Comments
Perhaps I misunderstand. It seems like such an attack is not practical under current conditions. Which conditions do you imagine this timing/withholding attack to be a problem, or, what is the worst-case example scenario of such an attack? Given sufficient (i.e. current) liquidity it would require isolating and serving only ~1/20 of the requests and guess that they are correlated at a given time. This would need to be done over and over again to have any sort of certainty that inputs are correlated, no? Is this what you mean by "grind chunks?" In that case it seems like a complexity that solves a problem well out of scope for a proof of concept, mvp, or even production application in the current landscape of privacy tools. I'm happy to see this level of care go into delivery and at the same time want to keep eyes on the prize: state of the art. |
So tor circuits are less robust than say a mixnet, a global passive adversary colluding with the coordinator, as well as just things like timing information (although as I described it these updates are passive, i need to refresh my memory on how tor assures reliable delivery and whether or not that reveals timing information). Since the coordinator can do blameless DoS, it can more easily do intersection attacks, which amplifies these concerns. Suppose the coordinator observes a targeted input, and wants to learn more information about it, during the round and subsequent blame rounds it could provide This kind of attack may not need to fully deanonymize to be a concern, if inputs are targeted there is probably some data external to the blockchain leading to that, but this is a way to ensure e.g. government coercion of the coordinator in order to exploit this inherent weakness of the asynchronous, centralized nature of the protocol can be made more observable by clients. |
A slightly more concrete explanation of how this helps from client perspective: Suppose all inputs have been registered, and clients now provide an ownership proof, which is a signature (the message should commit to the number of inputs and the round parameters) made with the key preimages of the script. The inputs and these ownership proofs are serialized together and padded resulting in a string of n*k bytes, where k is say around 100 bytes. The coordinator then starts sending packets of k bytes, the XORed blocks, one at a time to each connection round robin (clients have multiple connections) that is currently listening for round status updates, but the number of packets pertaining to a round is limited to some fraction of n per connection. This both limits bandwidth consumption of non active observers without distinguishing them from active participants and ensures that clients must obtain the data over multiple connections. During critical phases of a round, every new connection gets some fraction of the data pertaining to the round when it's in the status querying part of the state machine. After a client has seen at least n messages, the probability that it is able to decode decode the encrypted list of inputs and output signatures (the key is given in response to successful connection confirmation) rapidly rises. When they do this successfully, they can then verify that the packets it was sent before were indeed correct (if not, it should not register outputs or sign). At this point the client should continue receiving more data pertaining to the round, to ensure that other users were also able to successfully decode the data, and then proceed to output registration. These steps are then repeated for the output data when providing the transaction signatures, inputs should submit a signature only after the client is confident that other clients are also able to do so. Although clients can't verify that everything was done round robin but they can verify that their own connections (including new ones) don't observe violations of round robin order, which constrains the coordinator, so if a client observes that their connections were reordered, or that a large gap formed in between messages (as the message broadcasting loops around, the gaps should be more or less consistent because different iterations of the loop should be on the order of seconds or less), then they can abort the round and broadcast the messages they received. Unfortunately this proves nothing, but adding non-repudiation only needs some sort of signature made by the coordinator (ideally with a static key i'd say). This data can be published freely since it contains no information about client actions pertaining to registration, it only discloses that the user had at least that many connections active at that time observing a round. |
Based on direct messages with .@nothingmuch this now has my (admitedly accompanied by limited understanding) support |
tl;dr - Using a pretty simple scheme,
getStatus
can be modified to be allow auditing by clients, allowing them to detect partitioning or attempts to reveal the final transaction data in a way that targets any specific user.Motivation
Before providing registering outputs clients must validate ownership proofs during connection confirmation for the full set of inputs. Similarly, before providing signatures the clients need to know the full list of outputs and the final order of the transaction.
In the current protocol description this is mitigated by providing this data in
getStatus
responses, encrypted under a random symmetric key that is given to clients in response to successful connection confirmation.However, this still allows the coordinator to withhold or delay data on specific connections, only allowing a small fraction of users at a time to learn the transaction data in the hope of learning correlations between clustered inputs.
Proposed Solution
We can instead guarantee that no connection is favored by a simple application of erasure coding:
whenever a round is waiting for data from clients with dependencies, the required data (inputs and ownership proof, or outputs) is split into small fixed size chunks and broadcast in the following manner:
The chunks are assembled by decoding with a simple linear time algorithm, since the chunks selected into each message are calculated deterministically by each side the client knows which messages contain which chunks and therefore how to decode them by simply XORing.
The client can only verify the XOR is correct with respect to those chunks after decoding the entire payload. If the coordinator provides a signature of some kind, this can be used to prove malfeasance in case it attempts to withhold data.
Since the coordinator knows which connection received which chunks, but these are uniformly distributed between the different messages, clients should wait until redundant messages have been acquired to post the final signature, but can randomly schedule providing signatures from other inputs when the transaction has been decoded.
Rationale
Since clients maintain multiple connections to the coordinator and messages are totally ordered they can observe the message order between their connections and ensure that it is stable (consistent with round robin ordering). All connections should lie on the same circular buffer, if a new connection does not get interleaved with the existing one or two connections received the same message indicates, that may indicate a partition attack. (is it a problem that the coordinator can inflate message numbers? this allows the coordinator to grind chunks in order to partition subsets of connections, mitigated by small chunk size and multiple connections)
Since the messages contain the XOR of chunks selected, and that selection is based on the client randomness the choice can't be adverserially controlled by the coordinator, and therefore in combination with the connection ordering, so long as chunks are sufficiently clients should learn the full payload only dependent on the number of connections they have open.
In short, this technique can be used to detects attempt to partition users or perform timing attack from the client's perspective, and to ensure that clients maintain at least some number of circuits (based on the per connection cap).
Clients can therefore be reasonably sure that the order in which they learn of the full payload compared to other clients is random, and allowing them to audit the coordinator publishing transcripts of status updates if they contain order violations or anomalous gaps that may indicate data withholding attempts.
As a side benefit this means that clients that don't care about the round data can contribute to privacy with only a fraction of the bandwidth requirements (parameterized by the per payload/connection cap) of including all data in every
getStatus
response.[1] the number of chunks has to be sampled from a correctly modeled distribution for decoding to work well
The text was updated successfully, but these errors were encountered: