Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NeurIPS hide and seek synthetic data challenge #161

Open
martintoreilly opened this issue Apr 15, 2021 · 0 comments
Open

NeurIPS hide and seek synthetic data challenge #161

martintoreilly opened this issue Apr 15, 2021 · 0 comments

Comments

@martintoreilly
Copy link
Member

At this year's NeurIPS, Mihaela van der Schaar's lab is running their Hide-and-seek privacy challenge for the second time. It's an adversarial competition where "hiders" look to generate privacy preserving synthetic data sets and "seekers" look to re-identify individuals that are present in the real data used to generate the synthetic datasets.

@vollmersj and I were thinking this would be a cool competition to get involved with. Last year Microsoft sponsored two $5,000 prizes for each of the top hider and seeker teams and we thought that we could potentially use some of our £32,000 workshop and travel budget to sponsor prizes this year.

What are people's thoughts on (i) getting involved in the evaluation of methods in the challenge and (ii) sponsoring prizes?

The Hide-and-Seek Privacy Challenge is a novel two-tracked competition to simultaneously accelerate progress in tackling both problems. In our head-to-head format, participants in the synthetic data generation track (i.e. “hiders”) and the patient re-identification track (i.e. “seekers”) are directly pitted against each other by way of a new, high-quality intensive care time-series dataset: the AmsterdamUMCdb dataset. Ultimately, we seek to advance generative techniques for dense and high-dimensional temporal data streams that are (1) clinically meaningful in terms of fidelity and predictivity, as well as (2) capable of minimizing membership privacy risks in terms of the concrete notion of patient re-identification.

Importantly, rather than falling back on fixed theoretical notions of anonymity, we allow participants on both sides to uncover the best approaches in practice for launching or defending against privacy attacks.

This competition provides a two-sided platform for synthetic data generation and patient re-identification methods to compete among and against each other. Our aim is to understand—through the practical task of membership inference attacks—the strengths and weaknesses of machine learning techniques on both sides of the privacy battle, in particular to organically uncover what existing (and potentially novel) notions of privacy and anonymity end up being the most meaningful in practice. We therefore invite participants to compete in either or both of two submission tracks of the interactive challenge: (1) the hider (i.e. synthetic data generation) track, and (2) the seeker (i.e. patient re-identification) track.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant