Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better way to ignore columns when running a report #548

Open
npatki opened this issue Apr 2, 2024 · 0 comments
Open

Better way to ignore columns when running a report #548

npatki opened this issue Apr 2, 2024 · 0 comments
Labels
feature request Request for a new feature

Comments

@npatki
Copy link
Contributor

npatki commented Apr 2, 2024

Problem Description

As described in #546, I may want to ignore certain columns in a dataset when running a report (quality or diagnostic). It is not completely intuitive how to do this.

  1. The metadata requires that all columns be described. So you cannot ask a report to ignore a column simply by removing it from the metadata.
  2. It is unclear from the metadata spec which columns will be ignored and which will be used for evaluation

Actual Solution: If you mark a column with an "other" sdtype (not categorical, numerical, datetime, etc.), then SDV will assume it is non-statistical pii and therefore ignore the column. For example, using sdtype 'text' is sufficient to get a report to ignore the column.

Expected behavior

The metadata spec should probably remain as-is, because in the future we may decide to add metrics for specific sdtypes.

However, perhaps the report itself should allow you to specify which columns to ignore?

@npatki npatki added the feature request Request for a new feature label Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

1 participant