Feature: Optional epsilon-Differential Privacy #36

TKussel · 2021-01-27T11:58:29Z

Some time ago I implemented a machanism to protect the data using DP (last commit of branch 29e943c). I have not merged this branch as it was not automatically tested (which is non trivial due to the probabilistic nature). Should we include this feature as an "advanced study option"? Maybe different epsilons, on/off states per bin?

The text was updated successfully, but these errors were encountered:

TKussel · 2021-02-02T07:45:38Z

@prasser @fnwirth I'd appreciate your input

prasser · 2021-02-02T08:54:12Z

We should definitely keep this. The question is how. The noise that needs to be added probably depends on the sensitivity of the method used to calculate the count for a certain bin. We do not know this sensitivity, however, in EasySMPC. We could have the user specify this as well when adding noise?

prasser · 2021-02-02T08:59:42Z

Ah. I just saw that the sensitivity is already a parameter in your methods :)

TKussel · 2021-02-02T09:01:10Z

@prasser this is true, but for the subset of counting queries (especially histograms) the sensitivity is 1, so that an appropriate distribution can be chosen (cf. Dwork, Roth "The Algorithmic Foundations of Differential Privacy" Examples 3.1ff.).

prasser · 2021-02-02T09:06:05Z

I know. But there are a lot of subleties aroubd this. How do we know that bins are counts of individuals? How can we know that the values have been generated by coubt queries? We need to make this cobfigurable at least. But we can do that.

The implementation looks fairly straight-forward and should be "correct" ;) One last question: Is this an issue? https://dl.acm.org/doi/10.1145/2382196.2382264

TKussel · 2021-02-02T09:22:31Z

I know, that you know :) My response was more intended as a documentation of my reasons to implement it.
We definitely need to make this configurable, in my opinion even additionally "hidden" under an "advanced options" tab, as it is possibly harmful and not intuitive.
Implementation details of floats/doubles might be exploitable, but for our applications it is nevertheless important to forbid multiple queries over the same data, as we don't save the noise, but sample it for every query. Thanks for your linked paper, I need to study it a bit more before giving a more specific answer.

prasser · 2021-02-02T09:33:25Z

but for our applications it is nevertheless important to forbid multiple queries over the same data,

That's a good point!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Optional epsilon-Differential Privacy #36

Feature: Optional epsilon-Differential Privacy #36

TKussel commented Jan 27, 2021

TKussel commented Feb 2, 2021

prasser commented Feb 2, 2021

prasser commented Feb 2, 2021

TKussel commented Feb 2, 2021

prasser commented Feb 2, 2021

TKussel commented Feb 2, 2021

prasser commented Feb 2, 2021

Feature: Optional epsilon-Differential Privacy #36

Feature: Optional epsilon-Differential Privacy #36

Comments

TKussel commented Jan 27, 2021

TKussel commented Feb 2, 2021

prasser commented Feb 2, 2021

prasser commented Feb 2, 2021

TKussel commented Feb 2, 2021

prasser commented Feb 2, 2021

TKussel commented Feb 2, 2021

prasser commented Feb 2, 2021