Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: validate_if with regex not working as expected #86

Open
1 task done
nick-youngblut opened this issue Sep 15, 2023 · 2 comments
Open
1 task done

[Bug]: validate_if with regex not working as expected #86

nick-youngblut opened this issue Sep 15, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@nick-youngblut
Copy link

Guidelines

  • I agree to follow this project's Contributing Guidelines.

Project Version

0.1.2

Platform and OS Version

macOS 13.3

Existing Issues

No response

What happened?

My validation function:

#' validate whether the table column contains nucleotide strings
is_nucleotide = function(val, col_name){
  msg = glue::glue('"{x}" column is a nucleotide sequence', x={{col_name}})
  validate_if(val, grepl('^[ACGTURYKMSWBHDV]+$', {{col_name}}, perl=TRUE), 
              description = msg) 
}

The validation workflow:

report = data_validation_report()
read.delim(infile) %>%
  validate(name = "Verifying samples table") %>%
  is_nuc("TARGET_COLUMN") %>%
  add_results(report)

render_semantic_report_ui(get_results(report))

Example values in the TARGET_COLUMN of the data.frame:

"ATTCGTCC" "GCCTAATG" "GAGTCAAA" "AGACGTGG" "GACGGGAG" "AGTAAAGA"

If I use ^.+$, the validation passes, but the validation does not pass when using ^[A-Z]+$.

All of the string values in table column are just comprised of [ATGC]+, so I don't see why ^[A-Z]+$ and ^[ACGTURYKMSWBHDV]+$ are failing.

Steps to reproduce

See above

Expected behavior

See above

Attachments

No response

Screenshots or Videos

No response

Additional Information

No response

@nick-youngblut nick-youngblut added the bug Something isn't working label Sep 15, 2023
@nick-youngblut
Copy link
Author

I'm guessing that the issue is due to incorrect non-standard evaluation in:

validate_if(val, grepl('^[ACGTURYKMSWBHDV]+$', {{col_name}}, perl=TRUE), 
              description = msg)

...which is resulting in the column name to be evaluated instead of the column values.

I'm not sure how to fix my code. An example would be appreciated.

@nick-youngblut
Copy link
Author

ChatGPT4 eventually led me to a working function:

is_nuc = function(val, col_name){
  expr = bquote(grepl('^[ACGTURYKMSWBHDV]+$', .(val)[[.(col_name)]], perl=TRUE))
  validate_if(val, eval(expr)) 
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant