Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPFL/UCL paper #130

Open
gmingas opened this issue Nov 24, 2020 · 2 comments
Open

EPFL/UCL paper #130

gmingas opened this issue Nov 24, 2020 · 2 comments
Labels
literature Literature to review

Comments

@gmingas
Copy link
Contributor

gmingas commented Nov 24, 2020

Interesting paper from EPFL/UCL just published which describes a privacy metric applicable to dataset synthesis and makes comparisons using various synthetic methods and datasets (including CTGAN). Can be found here.

Key takaway:
Our evaluation framework enabled us to study the privacygain provided by a wide variety of generative models for different datasets and adversarial settings. Our results chal-lenge the claim that synthetic data provides a silver-bullet solution to the privacy problem of microdata publishing. Our experiments surface two fundamental reasons why generative models are unsuitable privacy mechanisms. First, it is not possible to predict what data characteristics will be preserved in a model’s stochastic output. Thus, the more complex the model, the harder it is to know in advance, or even bound, the level of protection it will provide for a given target record. Furthermore, as the model selectively amplifies some signals, synthetic data provides differential protection for target records. Second, the utility of generative models comes from their ability to extract patterns and replicate these in synthetic datasets. As a result, synthetic data that is useful for analysis, by definition, also contains enough information to mount inference attacks. Likely for the same reasons, differential privacy-based defenses fail to increase privacy gain. The perturbations required to achieve differential privacy make it even harder to predict which records will remain vulnerable, and might even increase the exposure of some data records. Besides, existing techniques provide protection for a select set of data features only, leaving the synthetic data open to inference attacks that leverage other preserved characteristics.

@gmingas gmingas added the paper Papers we are planning to write ourselves label Nov 24, 2020
@gmingas gmingas added this to Milestone backlog in Project board via automation Nov 24, 2020
@gmingas gmingas moved this from Milestone backlog to Upcoming in Project board Nov 24, 2020
@gmingas
Copy link
Contributor Author

gmingas commented Nov 24, 2020

Added to Zotero

@ots22 ots22 added literature Literature to review and removed paper Papers we are planning to write ourselves labels Nov 24, 2020
@ots22 ots22 removed this from Upcoming in Project board Nov 24, 2020
@gmingas
Copy link
Contributor Author

gmingas commented Nov 30, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
literature Literature to review
Projects
None yet
Development

No branches or pull requests

2 participants