[Feature]: make flexible validations #76

nbbn · 2023-07-27T09:32:13Z

Guidelines

I agree to follow this project's Contributing Guidelines.

Description

Data.validator strongly follow idea of table and validations running on the table.
IMO it doesn't fit most of use cases.

E.g. I do:

validate(data.frame(), name = "Comparing testing vs postgres data") |>
  validate_if(
    identical(
      names(get_cols(...)),
      names(get_cols(...))
    ),
    description = "Column names are the same in 1 table"
  ) |>
  validate_if(
    identical(
      as.vector(get_cols(...)),
      as.vector(get_cols(...))
    ),
    description = "Column types are the same in 1 table"
  ) |>
  add_results(report)

As you can see, I have to pass empty data frame to validate() but I don't use it.

Then when I do print(report)
I see:

|table_name                         |description                                       |type    | total_violations|
|:----------------------------------|:-------------------------------------------------|:-------|----------------:|
|Comparing testing vs ci data       |Column names are the same in 1 table |success |               NA|
|Comparing testing vs ci data       |Column names are the same in 1 table         |success |               NA|

Name of column table_name doesn't make sense for me in this situation. It should be maybe Group?

Also Violated data doesn't work with this flexible approach.

Another example from practice

We used data.validator to show rows, that are returned by queries. Queries were built in the way that they return only invalid rows, and there is nothing returned if there is no invalid data. More documentation about how to hack data.validator for this cases would be nice.

Problem

My use of this package doesn't fit standard use of the package. I think package should be more flexible and allow validations based on multiple data frames without specifing them explicitly in validate call.

Proposed Solution

Change column names in report object.
Remove requirement of dataframe in validate()
Update docs with examples of more advanced and customized use-cases.

Alternatives Considered

Stick to what you have. Write in docs explicitly that it is dedicated to working with data frames.

The text was updated successfully, but these errors were encountered:

D3SL · 2023-08-02T09:38:23Z

It's not just that data.validator works only with dataframes, it's that it works only with columns and rows. I may have missed an obvious solution but from what I've seen there's no way to do something like your names() check that operates at the dataframe level.

nbbn added the enhancement New feature or request label Jul 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: make flexible validations #76

[Feature]: make flexible validations #76

nbbn commented Jul 27, 2023 •

edited

D3SL commented Aug 2, 2023

[Feature]: make flexible validations #76

[Feature]: make flexible validations #76

Comments

nbbn commented Jul 27, 2023 • edited

Guidelines

Description

Another example from practice

Problem

Proposed Solution

Alternatives Considered

D3SL commented Aug 2, 2023

nbbn commented Jul 27, 2023 •

edited