Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Document that Report object is mutable #71

Open
1 task done
alexverse opened this issue Jun 20, 2023 · 0 comments
Open
1 task done

[Feature]: Document that Report object is mutable #71

alexverse opened this issue Jun 20, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@alexverse
Copy link
Contributor

alexverse commented Jun 20, 2023

Guidelines

  • I agree to follow this project's Contributing Guidelines.

Description

Using mutable objects in a data workflow may destroy reproducibility evidence.

Example:

library(dplyr, warn.conflicts = FALSE)
library(data.validator)
library(assertr)

validator_a <- function(data_) {
  report <- data_validation_report()
  validate(data_) %>%
    validate_cols(
      \(x) not_na(x),
      Sepal.Length,
      description = "Sepal.Length not_na"
    ) %>%
    add_results(report)
  report
}

validator_b <- function(data_, report) {
  validate(data_) %>%
    validate_if(
      nrow(data_) > 0,
      description = "Non empty table"
    ) %>%
    add_results(report)
  report
}


report_a <- validator_a(iris)
print(report_a)
#> Validation summary: 
#>  Number of successful validations: 1
#>  Number of failed validations: 0
#>  Number of validations with warnings: 0
#> 
#> Advanced view: 
#> 
#> 
#> |table_name |description         |type    | total_violations|
#> |:----------|:-------------------|:-------|----------------:|
#> |data_      |Sepal.Length not_na |success |               NA|

report_b <- validator_b(iris, report_a)

### eport_a is mutated
print(report_a)
#> Validation summary: 
#>  Number of successful validations: 1
#>  Number of failed validations: 1
#>  Number of validations with warnings: 0
#> 
#> Advanced view: 
#> 
#> 
#> |table_name |description         |type    | total_violations|
#> |:----------|:-------------------|:-------|----------------:|
#> |data_      |Non empty table     |error   |                1|
#> |data_      |Sepal.Length not_na |success |               NA|

Created on 2023-06-20 with reprex v2.0.2

Problem

The report_a object changes and for a functional approach in data analysis workflow this may be non-expected behavior for most users.

Proposed Solution

Update documentation and highlight that reference semantics are used and the Report can be passed to downstream functions using R6 clone() method.

Example:

library(dplyr, warn.conflicts = FALSE)
library(data.validator)
library(assertr)

validator_a <- function(data_) {
  report <- data_validation_report()
  validate(data_) %>%
    validate_cols(
      \(x) not_na(x),
      Sepal.Length,
      description = "Sepal.Length not_na"
    ) %>%
    add_results(report)
  report
}

validator_b <- function(data_, report) {
  validate(data_) %>%
    validate_if(
      nrow(data_) > 0,
      description = "Non empty table"
    ) %>%
    add_results(report)
  report
}


report_a <- validator_a(iris)
print(report_a)
#> Validation summary: 
#>  Number of successful validations: 1
#>  Number of failed validations: 0
#>  Number of validations with warnings: 0
#> 
#> Advanced view: 
#> 
#> 
#> |table_name |description         |type    | total_violations|
#> |:----------|:-------------------|:-------|----------------:|
#> |data_      |Sepal.Length not_na |success |               NA|

report_b <- validator_b(iris, report_a$clone())

### eport_a is mutated
print(report_a)
#> Validation summary: 
#>  Number of successful validations: 1
#>  Number of failed validations: 0
#>  Number of validations with warnings: 0
#> 
#> Advanced view: 
#> 
#> 
#> |table_name |description         |type    | total_violations|
#> |:----------|:-------------------|:-------|----------------:|
#> |data_      |Sepal.Length not_na |success |               NA|

Created on 2023-06-20 with reprex v2.0.2

Alternatives Considered

Maybe refactor so that non-standard reference semantics are used.

@alexverse alexverse added the enhancement New feature or request label Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant