Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wallet2: prevent duplicate outs #8047

Closed
wants to merge 1 commit into from
Closed

wallet2: prevent duplicate outs #8047

wants to merge 1 commit into from

Conversation

ghost
Copy link

@ghost ghost commented Nov 7, 2021

Seeing a bug in Aeon transactions where multiple inputs in one transaction use the same output. Sorry, I have not confirmed this bug exists in Monero yet, but I anticipate it does. Is this possibly the culprit?

aeon#253

@SChernykh
Copy link
Contributor

SChernykh commented Nov 8, 2021

I'm not sure this is a even bug. Properly random output selection will have overlapping outputs sometimes and actively avoiding them only creates one more statistical skew. Also, even if you have the same output used twice, you can't know which one is decoy even if you know one of them is a decoy. It's definitely not a bug.

Edit: it's not much different from a case where the same output is used in two different transactions.

@ghost
Copy link
Author

ghost commented Nov 8, 2021

Being double or triple spent makes them distinct from other outputs thereby reducing the uniformity and weakening the group signature. That makes these outputs known spends/decoys. Also using distinct outputs in group signatures can increase the anonymity pool.

@SChernykh
Copy link
Contributor

That makes these outputs known spends/decoys

Absolutely not. It's only "N-1 of N are decoys, but unknown which ones". Same logic can be applied to multiple transactions spending the same output. Using distinct outputs moves the output distribution away from truly random because truly random can have duplicates!

@SChernykh
Copy link
Contributor

Or do you mean double spent in a single decoy ring?

@ghost
Copy link
Author

ghost commented Nov 8, 2021

No in different inputs in one transaction.

@ghost
Copy link
Author

ghost commented Nov 8, 2021

Yes just so we are clear the knowledge gained is "N-1 of N are decoys and one is a definite spend OR N of N are decoys". Please review the issue I linked. By including the same output multiple times, the anonymity is severely reduced. Of course with Monero, it is not so harsh, but the same principle applies.

The goal of decoy selection is to protect the privacy of monero users, not achieve perfect randomness. It is already skewed and not properly random by design.

@ghost
Copy link
Author

ghost commented Nov 8, 2021

I need to think about this more, I think you might be right @SChernykh

@ghost ghost closed this Nov 8, 2021
@ghost
Copy link
Author

ghost commented Nov 8, 2021

The difference is that one transaction is necessarily spent by one person whereas different transactions can be spent by multiple people.

Imagine I have a 3 IN transaction with group size 3 and the same OUT appears in each IN. Then there is only 7 unique OUT being spent and I am claiming ownership of 3 of the 7.

In the case where duplicates are not allowed, I have improved deniability, 3 of 9 outs are claimed to be mine.

In the case of three transactions, each spending one IN, there could be three different people making the transaction whereby each one is claiming ownership of 1 of 3 OUT.

@ghost ghost reopened this Nov 8, 2021
@ghost
Copy link
Author

ghost commented Nov 8, 2021

20211108_160747

@SChernykh
Copy link
Contributor

Even with this observation, it needs fixing only if it happens statistically more often in the same multi-input transaction than in different transactions.

  • If it does, fixing it removes the statistical bias
  • If it doesn't, fixing it introduces the statistical bias

I think the Monero research lab must look into it first.

@Rucknium
Copy link

Rucknium commented Nov 9, 2021

I agree that this needs more study by MRL before we can give a green or red light to incorporating this commit into Monero. Here are my initial thoughts:

  • From a quick glance at the Aeon code, it seems that it uses the same gamma distribution as Monero in its decoy selection algorithm. This means that decoy draws are concentrated on recently-created outputs, which could indeed create this particular issue. I note also that Aeon enforces a ring size of three. Based on what I see in an Aeon block explorer, the number of transactions on the Aeon blockchain is much lower than on the Monero blockchain.
  • It would seem to me that the low number of Aeon transactions would tend to result in a high number of these "collisions" compared to what happens on the Monero blockchain. Monero's ring sizes are larger, of course, but from my observations it seems that the larger ring sizes do not cancel out the effect of the low total number of Aeon transactions. Furthermore, due to the larger ring sizes any "collisions" would have less severe consequences for user privacy, if indeed there are any negative consequences. Finally, with a ring size of 3, Aeon is skirting close to chain-reaction issue (which admittedly I personally do not yet understand well).
  • Ultimately, it would be good to ppair this issue with ring signature statistical attack issues. If there is an problem, it could be analyzed in the context of specific adversarial actions that could be employed against users. That way, we would know the costs and benefits of any proposed countermeasure. This is an area of very active research.
  • This issue would be particularly salient for transactions that involve large consolidations, i.e. a large number of inputs in a single transaction. If there are a huge number of inputs and we disallow selecting outputs as decoys for multiple rings, do we create a problem in "saturating" the available set of recent outputs, possibly triggering a bug if we are not careful?

Some potential near-term action items are:

  1. @neptuneresearch 's tools could be used to determine how often these collisions occur on the Monero blockchain and possibly the Aeon blockchain.
  2. Bring this item up for discussion at MRL's next meeting on Wednesday.

@ghost
Copy link
Author

ghost commented Nov 9, 2021

Ok great. Yes more attention to the issue would be welcome. @Rucknium is exactly on point with their summation. Let me know if I can be of any further assistance.

@neptuneresearch
Copy link

Thanks for the mention @Rucknium. This is an interesting question, I will check if and how it has occurred onchain.

@Rucknium
Copy link

According to my preliminary results produced with this R code, @neptuneresearch 's Monero SQL database, and @Gingeropolous 's server:

  • About 0.88% of Monero transactions since the beginning of time have at least one such "collision".

  • In the last few months this figure has been around 0.4%.

  • For txs with exactly 2 inputs, collisions have occurred in about 0.23% of such txs since the genesis block.

  • In the last few months this figure has been around 0.1%.

This makes sense. We expect as the number of inputs increase, the probability of a collision occurring also increases. Two or fewer inputs would be consistent with a typical user transaction. A larger number of inputs would be more typical of a service like an exchange or consolidation of mining pool payments.

Whether or not these empirical results constitute a big deal is in the eye of the beholder. IMHO at this point, it seems that the low incidence of the issue on the Monero blockchain plus the fact that the tracing vulnerability -- if it exists -- may not be severe might suggest that this is a back burner issue for Monero for now.

Aeon may be a different story, as I mentioned above. I lack the infrastructure to do a empirical similar analysis on the Aeon blockchain data, but if infrastructure could be set up, we could do it for Aeon as well.

Among txs that have a single input, a collision has occurred a total of 4 (four) times in the entire history of Monero. I am told by @j-berman that such collisions are "prohibited at consensus since hf v6:

bool core::check_tx_inputs_ring_members_diff(const transaction& tx) const
{
const uint8_t version = m_blockchain_storage.get_current_hard_fork_version();
if (version >= 6)
{
for(const auto& in: tx.vin)
{
CHECKED_GET_SPECIFIC_VARIANT(in, const txin_to_key, tokey_in, false);
for (size_t n = 1; n < tokey_in.key_offsets.size(); ++n)
if (tokey_in.key_offsets[n] == 0)
return false;
}
}
return true;
}

@ghost
Copy link
Author

ghost commented Nov 12, 2021

Excellent work, @Rucknium. I'm confused why a bad thing happening rarely means it can be ignored, when it can easily be prevented. According to contributing guidelines for aeon, all requests must come upstream from monero. If this is a non-issue or even just a tiny speck of an improvement, would you approve it to be merged?

@Rucknium
Copy link

I'm confused why a bad thing happening rarely means it can be ignored, when it can easily be prevented.

We don't have to ignore it, but it seems to be minor in comparison to much bigger issues with the decoy selection algorithm. I outline some of those issues here. It may make sense to put greater priority on the more salient issues when thinking about how to allocate research labor hours. I understand that you have a fix ready, but that fix also has to be vetted, at least in terms of the code and probably also in terms of the statistical conceptual issues.

I'll defer to others on whether this change should be merged at this time. I can only speak from the research angle.

@ghost
Copy link
Author

ghost commented Nov 12, 2021

Ok, thanks for your input, that is understandable. I have been following your ospead research closely and wish you best of luck.

@@ -8444,13 +8445,17 @@ void wallet2::get_outs(std::vector<std::vector<tools::wallet2::get_outs_entry>>
bool own_found = false;
for (const auto &out: ring)
{
if (seen_outputs.count({amount, out}))
seen_indices.emplace(out);
continue;
Copy link
Collaborator

@j-berman j-berman Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if statement seems unnecessary to me.

Assume you have 2 known rings to be used as inputs in a tx, and the first ring includes the real output from the second ring as a decoy. This if statement looks like it would prevent the real output from being added to the second ring, which would then cause the function to throw just outside the for loop because own_found won't get set to true. The user's likely recourse in that case is to then remove the ring entirely and have their client construct a new one, which is worse than allowing duplicates.

It seems it's ok to let the duplicates slide here and remove the if statement altogether. The assumption with known rings is that the rings have already been seen by the world, i.e. you're not gaining something tangible by removing the duplicates from rings that have already been seen by the world.

Copy link
Collaborator

@j-berman j-berman Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to this, it seems the algorithm in general is still able to select duplicates because of this logic

Assume you have 2 rings to be used as inputs in a tx, and the first ring includes the real output from the second ring as a decoy.

I'm not seeing what prevents this, this could use a deeper look.

@j-berman
Copy link
Collaborator

My inclination is that it makes sense to not allow duplicates in order to increase the anon set. Allowing duplicates (very) marginally increases the chances you re-select your own output as a decoy in your own transaction relative to not allowing duplicates, and reduces the chances your output can be picked up as a decoy in someone else's transaction (since others are also marginally more likely to re-select their own outputs as decoys). Ideally you'd want to maximize the chances your output gets picked up as a decoy by other people, and hide among the largest crowd.

I generally agree with @Rucknium the incidence level isn't very significant. But it seems like a simple, sensible thing to do. It also would make sense to start doing at a fork because this would be a way to fingerprint an older wallet if a tx still includes duplicates.

@@ -8484,6 +8489,7 @@ void wallet2::get_outs(std::vector<std::vector<tools::wallet2::get_outs_entry>>
{
num_found = 1;
seen_indices.emplace(td.m_global_output_index);
seen_outputs.emplace({amount, td.m_global_output_index});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to this comment, I think you need to emplace the real outputs in seen_outputs before iterating to select decoys (much earlier on in get_outs, otherwise if you have 2 rings, ring 1 would be able to include a decoy that is a real output from ring 2)

Copy link
Collaborator

@moneromooo-monero moneromooo-monero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to do this, you should take care to allow these if there are few enough outputs on the chain to prevent a tx being made with the patch. For pre-rct outputs of uncommon value, this may well be the case, and the patch without the special case would make these outputs unspendable.

@@ -8444,13 +8445,17 @@ void wallet2::get_outs(std::vector<std::vector<tools::wallet2::get_outs_entry>>
bool own_found = false;
for (const auto &out: ring)
{
if (seen_outputs.count({amount, out}))
seen_indices.emplace(out);
continue;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the loop a noop. I assume that is not intended.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants