Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Method dispatch for rx_string #11

Open
dmi3kno opened this issue Mar 10, 2019 · 11 comments
Open

Method dispatch for rx_string #11

dmi3kno opened this issue Mar 10, 2019 · 11 comments
Labels
enhancement New feature or request

Comments

@dmi3kno
Copy link

dmi3kno commented Mar 10, 2019

Basically boils down to detecting that .data is not of rx_string class and acting as though first argument is the value argument.
Reference: http://adv-r.had.co.nz/S3.html

UPDATED: sanitize now has method dispatch, so we can simply write

## unexported function for sanitizing arguments
sanitize_args <- function(...){
   if (missing(...)) return(NULL) 
  res <- sapply(list(...), sanitize) 
  Reduce(paste0, res)
}

is.rx_string <- function(x){
  inherits(x, "rx_string")
}

# class constructor - also unexported function. 
rx <- function(x){
  if(is.rx_string(x)) return(x)
  class(x) <- c("rx_string", class(x)) 
  x
}

rx_literal <- function(.data, ...) {
  UseMethod("rx_literal", .data)
}

rx_literal.character <- function(.data, ...){
  res <- paste0(sanitize(.data), sanitize_args(...))
  rx(res)
}

rx_literal.rx_string <- function(.data, ...) {
  res <- paste0(.data, sanitize_args(...))
  rx(res)
}

Now you dont need a constructor. Function works both in chain and stand alone

rx_literal("?@")
#> [1] "\\?@"
#> attr(,"class")
#> [1] "rx_string" "character"

rx_literal("?") %>% rx_literal("@")
#> [1] "\\?@"
#> attr(,"class")
#> [1] "rx_string" "character"

Hadley says we should also implement a few essential methods. We should rethink all of our functions with vectorization in mind.

When implementing a vector class, you should implement these methods: length, [, [<-, [[, [[<-, c. (If [ is implemented rev, head, and tail should all work).

@dmi3kno
Copy link
Author

dmi3kno commented Mar 10, 2019

In ‘Code smells’ Jenny Bryan argued that whenever you have if..inherits, it means you need to write a method.

Perhaps we need sanitize.rx_string and sanitize.character

@tylerlittlefield
Copy link
Member

tylerlittlefield commented Mar 10, 2019

This is exactly what the package needs! What is the use case for sanitize.rx_string, wouldn't it already be sanitized?

@dmi3kno
Copy link
Author

dmi3kno commented Mar 10, 2019

Exactly the point. It does nothing. It is needed to prevent sanitization. For rx_string objects sanitize.rx_string class method is called before sanitize.character. I will put together end to end example later today.

@tylerlittlefield tylerlittlefield added the enhancement New feature or request label Mar 10, 2019
@dmi3kno
Copy link
Author

dmi3kno commented Mar 10, 2019

Here's S3 method for sanitize

sanitize <- function(.data = NULL){
  UseMethod("sanitize", .data) 
}

sanitize.rx_string <- function(.data = NULL) {
  .data
}
# your function, unchanged, although we will need to rethink the error message now
sanitize.default <- function(.data = NULL) {
  if(missing(.data))
    stop("The 'value' argument is missing. Did you forget to start the rx chain with rx()", .call = FALSE)
  escape_chrs <- c(".", "|", "*", "?", "+", "(", ")", "{", "}", "^", "$", "\\", ":", "=", "[", "]")
  string_chrs <- strsplit(.data, "")[[1]]
  idx <- which(string_chrs %in% escape_chrs)
  idx_new <- paste0("\\", string_chrs[idx])
  paste0(replace(string_chrs, idx, idx_new), collapse = "")
}

rx_literal("?@") %>% sanitize()
#> [1] "\\?@"
#> attr(,"class")
#> [1] "rx_string" "character"

"?@" %>% sanitize()
#> [1] "\\?@"

I simplified the funcitons above assuming method dispatch for sanitize

@tylerlittlefield
Copy link
Member

tylerlittlefield commented Mar 10, 2019

Nice, I recently changed sanitize to:

sanitize <- function(.data = NULL) {
  if(missing(.data))
    stop("The 'value' argument is missing. Did you forget to start the rx chain with rx()?")
  esc <- c(".", "|", "*", "?", "+", "(", ")", "{", "}", "^", "$", "\\", ":", "=", "[", "]")
  gsub(paste0("([\\", paste0(collapse = "\\", esc), "])"), "\\\\\\1", .data, perl = TRUE)
}

The gsub call comes from rex. It's a bit more cryptic but also more concise. Do you prefer one over the other?

Regarding the error message, maybe something like "Nothing to sanitize, please provide a character vector or rx call."

@dmi3kno
Copy link
Author

dmi3kno commented Mar 10, 2019

I am fine with gsub if you promise to maintain it. Six escape characters are horrifying, but you might be very well right. I am indifferent,

@tylerlittlefield
Copy link
Member

Haha, yes it does look scary. I mainly just assumed its bullet proof given who wrote the package. Let's stick with gsub.

@dmi3kno
Copy link
Author

dmi3kno commented Mar 10, 2019

To do here:

  • Implement method dispatch for sanitize. Unexport it.
  • Make a template for method dispatch in multi-argument functions using rx_literal as example (also for two-argument functions, i.e. without ...).

@dmi3kno
Copy link
Author

dmi3kno commented Mar 15, 2019

Still relevant, because we really need to go away from the need to use rx() constructor

@tylerlittlefield
Copy link
Member

tylerlittlefield commented Mar 15, 2019

I've been trying to figure out how to avoid rx() and allow nested rx_ functions that have the ... but I can't come up with anything. At this point, if users want to nest things, throw a dot in front.

rx() %>% 
  rx_either_of(
    rx_find(., "foo"),
    rx_find(., "bar")
  )

@tylerlittlefield
Copy link
Member

One lazy solution I've came up with is:

`~` <- function(...) {
  args <- as.list(sys.call())[-1]
  args <- paste0("rx() %>% ", args)
  args <- sapply(args, function(x) eval(parse(text = x)), USE.NAMES = FALSE)
  paste0(args, collapse = "")
}

vals <- `~`

# but keep in mind
fortunes::fortune("answer is parse")

This allows something like:

# with tilde
rx() %>%
  rx_group(
    ~ rx_either_of("cat", "dog")
  ) %>%
  rx_literal(" food") %>%
  stringr::str_extract(c("cat food", "dog food", "fish food"), .)
#> [1] "cat food" "dog food" NA

# with vals
rx() %>%
  rx_group(
    vals(rx_either_of("cat", "dog"))
  ) %>%
  rx_literal(" food") %>%
  stringr::str_extract(c("cat food", "dog food", "fish food"), .)
#> [1] "cat food" "dog food" NA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants