Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collector4 #2

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

collector4 #2

wants to merge 3 commits into from

Conversation

moodymudskipper
Copy link
Collaborator

This is a general approach that should work on any closure, S3 Generic or not.
We don't use quosures, just lazy bindings.

I believe it was wrong to start from the execution environment, the caller is where we have all we need, combined with the call itself.

We do a clone of the caller and its parents up to a named env (the global env has no name so it will be cloned too), with the difference that we populate these envs purely with lazy bindings.

On exit we cleanup, and keep only regular bindings, the assumption being that bindings that are still lazy were not used, which I can only imagine being wrong in very artificial cases.

Semi manual reprex:

library(dplyr, w = F)

mutate
#> function (.data, ...) 
#> {
#>     collect_and_rethrow()
#>     {
#>         UseMethod("mutate")
#>     }
#> }
#> <environment: namespace:dplyr>
#> attr(,"unmodified")
#> function(.data, ...) {
#>   UseMethod("mutate")
#> }
#> <bytecode: 0x10adfe398>
#> <environment: namespace:dplyr>

starwars %>%
  select(name, mass) %>%
  mutate(
    mass2 = mass * 2,
    mass2_squared = mass2 * mass2
  )
#> # A tibble: 87 × 4
#>    name                mass mass2 mass2_squared
#>    <chr>              <dbl> <dbl>         <dbl>
#>  1 Luke Skywalker        77   154         23716
#>  2 C-3PO                 75   150         22500
#>  3 R2-D2                 32    64          4096
#>  4 Darth Vader          136   272         73984
#>  5 Leia Organa           49    98          9604
#>  6 Owen Lars            120   240         57600
#>  7 Beru Whitesun lars    75   150         22500
#>  8 R5-D4                 32    64          4096
#>  9 Biggs Darklighter     84   168         28224
#> 10 Obi-Wan Kenobi        77   154         23716
#> # ℹ 77 more rows

mtcars %>%
  group_by(cyl) %>%
  summarise(disp = mean(disp), sd = sd(disp))
#> # A tibble: 3 × 3
#>     cyl  disp    sd
#>   <dbl> <dbl> <dbl>
#> 1     4  105.    NA
#> 2     6  183.    NA
#> 3     8  353.    NA

qs::qsave(dplyr:::globals$archive, "archive.qs")
#> Warning in qs::qsave(dplyr:::globals$archive, "archive.qs"): 'package:dplyr'
#> may not be available when loading

#> Warning in qs::qsave(dplyr:::globals$archive, "archive.qs"): 'package:dplyr'
#> may not be available when loading

.rs.restartR()
archive <- qs::qread("archive.qs")

eval(archive$call[[1]], archive$env[[1]])
#> # A tibble: 87 × 4
#>    name                mass mass2 mass2_squared
#>    <chr>              <dbl> <dbl>         <dbl>
#>  1 Luke Skywalker        77   154         23716
#>  2 C-3PO                 75   150         22500
#>  3 R2-D2                 32    64          4096
#>  4 Darth Vader          136   272         73984
#>  5 Leia Organa           49    98          9604
#>  6 Owen Lars            120   240         57600
#>  7 Beru Whitesun lars    75   150         22500
#>  8 R5-D4                 32    64          4096
#>  9 Biggs Darklighter     84   168         28224
#> 10 Obi-Wan Kenobi        77   154         23716
#> # ℹ 77 more rows

eval(archive$call[[2]], archive$env[[2]])
#> # A tibble: 3 × 3
#>     cyl  disp    sd
#>   <dbl> <dbl> <dbl>
#> 1     4  105.    NA
#>.2     6  183.    NA
#> 3     8  353.    NA

@moodymudskipper moodymudskipper mentioned this pull request Apr 5, 2024
@krlmlr
Copy link

krlmlr commented Apr 10, 2024

Have you committed the right collector.R here?

@moodymudskipper
Copy link
Collaborator Author

It believe so, I just checked again, does the code fail ?

@krlmlr
Copy link

krlmlr commented Apr 10, 2024

My mistake, it works:

options(conflicts.policy = list(warn = FALSE))
library(dplyr)

mutate
#> function (.data, ...) 
#> {
#>     collect_and_rethrow()
#>     {
#>         UseMethod("mutate")
#>     }
#> }
#> <environment: namespace:dplyr>
#> attr(,"unmodified")
#> function(.data, ...) {
#>   UseMethod("mutate")
#> }
#> <bytecode: 0x117814ed0>
#> <environment: namespace:dplyr>

factor <- 2

starwars %>%
  select(name, mass) %>%
  mutate(
    mass2 = mass * factor,
    mass2_squared = mass2 * mass2
  )
#> # A tibble: 87 × 4
#>    name                mass mass2 mass2_squared
#>    <chr>              <dbl> <dbl>         <dbl>
#>  1 Luke Skywalker        77   154         23716
#>  2 C-3PO                 75   150         22500
#>  3 R2-D2                 32    64          4096
#>  4 Darth Vader          136   272         73984
#>  5 Leia Organa           49    98          9604
#>  6 Owen Lars            120   240         57600
#>  7 Beru Whitesun lars    75   150         22500
#>  8 R5-D4                 32    64          4096
#>  9 Biggs Darklighter     84   168         28224
#> 10 Obi-Wan Kenobi        77   154         23716
#> # ℹ 77 more rows

fun <- function() {
  three <- 3

  mtcars %>%
    group_by(cyl) %>%
    summarise(disp = mean(disp), sd = sd(disp), high = three + factor)
}

fun()
#> # A tibble: 3 × 4
#>     cyl  disp    sd  high
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     4  105.    NA     5
#> 2     6  183.    NA     5
#> 3     8  353.    NA     5

suppressWarnings(qs::qsave(dplyr:::globals$archive, "archive.qs"))

callr::r(function() {
  archive <- qs::qread("archive.qs")

  list(
    eval(archive$call[[1]], archive$env[[1]]),
    eval(archive$call[[2]], archive$env[[2]])
  )
})
#> [[1]]
#> # A tibble: 87 × 4
#>    name                mass mass2 mass2_squared
#>    <chr>              <dbl> <dbl>         <dbl>
#>  1 Luke Skywalker        77   154         23716
#>  2 C-3PO                 75   150         22500
#>  3 R2-D2                 32    64          4096
#>  4 Darth Vader          136   272         73984
#>  5 Leia Organa           49    98          9604
#>  6 Owen Lars            120   240         57600
#>  7 Beru Whitesun lars    75   150         22500
#>  8 R5-D4                 32    64          4096
#>  9 Biggs Darklighter     84   168         28224
#> 10 Obi-Wan Kenobi        77   154         23716
#> # ℹ 77 more rows
#> 
#> [[2]]
#> # A tibble: 3 × 4
#>     cyl  disp    sd  high
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     4  105.    NA     5
#> 2     6  183.    NA     5
#> 3     8  353.    NA     5

Created on 2024-04-10 with reprex v2.1.0

rlang::eval_bare(bquote(on.exit(env_cleanup(.(new_caller_env)))), parent.frame())

call[[1]] <- attr(sys.function(-1), "unmodified")
rlang::return_from(parent.frame(), eval(call, new_caller_env))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return_from() is overkill, we can have eval(call, new_caller_env) and remove the original body in the curried function.

@@ -119,6 +119,7 @@
summarise <- function(.data, ..., .groups = NULL) {
UseMethod("summarise")
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops

mutate <<- set_collector(mutate)
summarise <<- set_collector(summarise)
summarize <<- set_collector(summarize)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once in a package we can do this in place from outside, and we can easily have a function to reverse the operation, very similar to untrace()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants