`data_filter`: Add support for loop indices within functions? #309

rempsyc · 2022-11-06T01:03:24Z

Still within #301, I wonder if it would make sense to add support for loop indices within functions for data_filter, @etiennebacher?

library(datawizard)

df1 <- data.frame(
  id = c(1, 2, 3, 1, 3),
  item1 = c(NA, 1, 1, 2, 3),
  item2 = c(NA, 1, 1, 2, 3),
  item3 = c(NA, 1, 1, 2, 3)
)

# Attempt 1
fun <- function(data, id) {
  min.index <- NULL
  for (i in unique(data[[id]])) {
    min.index[i] <- 2
    x <- data_filter(data, item3 == min.index[i])
  }
  x
}
fun(df1, id = "id")
#> Error: Filtering did not work. Please check the syntax of your `filter`
#>   argument.

# Attempt 2, using quotes
fun <- function(data, id) {
  min.index <- NULL
  for (i in unique(data[[id]])) {
    min.index[i] <- 2
    x <- data_filter(data, "item3 == min.index[i]")
  }
  x
}
fun(df1, id = "id")
#> Error: Filtering did not work. Please check the syntax of your `filter`
#>   argument.

# Attempt 3, using curly brackets
fun <- function(data, id) {
  min.index <- NULL
  for (i in unique(data[[id]])) {
    min.index[i] <- 2
    x <- data_filter(data, item3 == min.index[{i}])
  }
  x
}
fun(df1, id = "id")
#> Error: Filtering did not work. Please check the syntax of your `filter`
#>   argument.

# Workaround is to create the index manually first
fun <- function(data, id) {
  min.index <- NULL
  for (i in unique(data[[id]])) {
    min.index[i] <- 2
    index <- which(data$item3 == min.index[i])
    x <- data_filter(data, index)
  }
  x
}
fun(df1, id = "id")
#>   id item1 item2 item3
#> 4  1     2     2     2

^{Created on 2022-11-05 with reprex v2.0.2}

The text was updated successfully, but these errors were encountered:

strengejacke · 2022-11-06T09:27:20Z

Yeah, .select_nse() works fine, but looks somehow "unmaintainable" due to its confusing complexity...

I'm not sure if in this particular case: data_filter(data, "item3 == min.index[i]"), it might be an issue of having the wrong environment when we evaluate the string? If so, there could be an "easy" solution, but these environment stuff, especially in combination with NSE, is still somewhat opaque to me.

etiennebacher · 2022-11-06T09:41:53Z

The problem is that data_filter() tries to evaluate the condition directly, whereas here we would like to first evaluate min.index[i] to get its value, and then filter based on this value.

Currently, if the evaluation fails in data_filter(), we check if the expression contains some curly brackets, and if it doesn't then we throw an error. This kind of situation means that we would also need to evaluate the RHS of the condition before evaluating the condition itself. There could be a solution but I think we could end up with a very messy code, as in .select_nse().

@strengejacke what do you think?

strengejacke · 2023-06-16T06:20:37Z

I tried to debug this issue. I saw that in code line:

datawizard/R/data_match.R

Line 209 in 9b2e2b5

eval_symbol <- .dynEval(symbol, ifnotfound = NULL)

.dynEval() returns NULL for the expression item3 == min.index[i].

When it comes to subsetting:

datawizard/R/data_match.R

Lines 228 to 233 in 9b2e2b5

    
           # filter data 
        
           out <- tryCatch( 
        
             subset(out, subset = eval(symbol, envir = new.env())), 
        
             warning = function(e) e, 
        
             error = function(e) e 
        
           )

symbol is item3 == min.index[i] and subset() errors at this point. Also simpler variants of the example-function do not work, like:

library(datawizard)

df1 <- data.frame(
  id = c(1, 2, 3, 1, 3),
  item1 = c(NA, 1, 1, 2, 3),
  item2 = c(NA, 1, 1, 2, 3),
  item3 = c(NA, 1, 1, 2, 3)
)

# Attempt 1
fun <- function(data, id) {
  min.index <- NULL
  for (i in unique(data[[id]])) {
    min.index <- 2
    x <- data_filter(data, item3 == min.index)
  }
  x
}
fun(df1, id = "id")
#> Error: Variable "min.index" was not found in the dataset.
#>   Possibly misspelled?

^{Created on 2023-06-16 with reprex v2.0.2}

Not sure how/if we can solve this?

rempsyc changed the title ~~Add support for loop indices within functions?~~ data_filter: Add support for loop indices within functions? Nov 6, 2022

etiennebacher modified the milestone: 1.0 Dec 9, 2022

etiennebacher mentioned this issue May 30, 2023

evaluate variables without curleys #426

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`data_filter`: Add support for loop indices within functions? #309

`data_filter`: Add support for loop indices within functions? #309

rempsyc commented Nov 6, 2022

strengejacke commented Nov 6, 2022

etiennebacher commented Nov 6, 2022

strengejacke commented Jun 16, 2023 •

edited

data_filter: Add support for loop indices within functions? #309

data_filter: Add support for loop indices within functions? #309

Comments

rempsyc commented Nov 6, 2022

strengejacke commented Nov 6, 2022

etiennebacher commented Nov 6, 2022

strengejacke commented Jun 16, 2023 • edited

`data_filter`: Add support for loop indices within functions? #309

`data_filter`: Add support for loop indices within functions? #309

strengejacke commented Jun 16, 2023 •

edited