Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation metrics for synthetic generated PII informations. #335

Open
yash-rathore opened this issue Apr 11, 2023 · 1 comment
Open

Evaluation metrics for synthetic generated PII informations. #335

yash-rathore opened this issue Apr 11, 2023 · 1 comment
Labels
feature request Request for a new feature

Comments

@yash-rathore
Copy link

Problem Description

What are the different metrics I can use to check quality of PII information produced?
report.get_diagnostics() checks the coverage and range of numerical/categorical data. But is there a sole metric I can use to check like duplicacy/quality of PII generated ?

@yash-rathore yash-rathore added feature request Request for a new feature new Label applied to new issues labels Apr 11, 2023
@npatki
Copy link
Contributor

npatki commented Apr 14, 2023

Thanks for filing this issue @yash-rathore. This requires some more thought. We can keep it open to communicate updates and have discussions.

At a high level, it would be interesting to identify the useful properties of PII columns.

  • Duplicity may be one, in the sense that we can check if the sensitive values in the synthetic data are repeats of the real data. Do note that some duplicity might be ok -- and it may even be good for privacy, as it prevents an "attack by omission" (wherein an attacker knows what is in the real data by identifying what is missing from the synthetic data)
  • Quality is an interesting one. How are you thinking about quality in PII values?

@npatki npatki added under discussion Issue is currently being discussed and removed new Label applied to new issues labels Apr 14, 2023
@npatki npatki removed the under discussion Issue is currently being discussed label May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

2 participants