Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Optional epsilon-Differential Privacy #36

Open
TKussel opened this issue Jan 27, 2021 · 7 comments
Open

Feature: Optional epsilon-Differential Privacy #36

TKussel opened this issue Jan 27, 2021 · 7 comments

Comments

@TKussel
Copy link
Collaborator

TKussel commented Jan 27, 2021

Some time ago I implemented a machanism to protect the data using DP (last commit of branch 29e943c). I have not merged this branch as it was not automatically tested (which is non trivial due to the probabilistic nature). Should we include this feature as an "advanced study option"? Maybe different epsilons, on/off states per bin?

@TKussel
Copy link
Collaborator Author

TKussel commented Feb 2, 2021

@prasser @fnwirth I'd appreciate your input

@prasser
Copy link
Collaborator

prasser commented Feb 2, 2021

We should definitely keep this. The question is how. The noise that needs to be added probably depends on the sensitivity of the method used to calculate the count for a certain bin. We do not know this sensitivity, however, in EasySMPC. We could have the user specify this as well when adding noise?

@prasser
Copy link
Collaborator

prasser commented Feb 2, 2021

Ah. I just saw that the sensitivity is already a parameter in your methods :)

@TKussel
Copy link
Collaborator Author

TKussel commented Feb 2, 2021

@prasser this is true, but for the subset of counting queries (especially histograms) the sensitivity is 1, so that an appropriate distribution can be chosen (cf. Dwork, Roth "The Algorithmic Foundations of Differential Privacy" Examples 3.1ff.).

@prasser
Copy link
Collaborator

prasser commented Feb 2, 2021

I know. But there are a lot of subleties aroubd this. How do we know that bins are counts of individuals? How can we know that the values have been generated by coubt queries? We need to make this cobfigurable at least. But we can do that.

The implementation looks fairly straight-forward and should be "correct" ;) One last question: Is this an issue? https://dl.acm.org/doi/10.1145/2382196.2382264

@TKussel
Copy link
Collaborator Author

TKussel commented Feb 2, 2021

I know, that you know :) My response was more intended as a documentation of my reasons to implement it.
We definitely need to make this configurable, in my opinion even additionally "hidden" under an "advanced options" tab, as it is possibly harmful and not intuitive.
Implementation details of floats/doubles might be exploitable, but for our applications it is nevertheless important to forbid multiple queries over the same data, as we don't save the noise, but sample it for every query. Thanks for your linked paper, I need to study it a bit more before giving a more specific answer.

@prasser
Copy link
Collaborator

prasser commented Feb 2, 2021

but for our applications it is nevertheless important to forbid multiple queries over the same data,

That's a good point!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants