Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more functions should be generic #1071

Open
1 of 7 tasks
mgirlich opened this issue Dec 7, 2020 · 6 comments · May be fixed by #1101
Open
1 of 7 tasks

more functions should be generic #1071

mgirlich opened this issue Dec 7, 2020 · 6 comments · May be fixed by #1101
Labels
feature a feature request or enhancement nesting 🐦 nesting, chopping, and packing pivoting ♻️ pivot rectangular data to different "shapes" rectangling 🗄️ converting deeply nested lists into tidy data frames

Comments

@mgirlich
Copy link
Contributor

mgirlich commented Dec 7, 2020

while bringing some functions from tidyr to dbplyr I noticed that a couple of tidyr functions should probably be generic:

  • uncount()

the pivot helpers would make sense as the dbplyr implemention of pivot_*() uses them anyway

  • build_longer_spec()/build_wider_spec()
  • pivot_longer_spec()/pivot_wider_spec()

data frame column helpers would make sense as nested tables in SQL are becoming more common (e.g. see tidyverse/dbplyr#515)

  • pack()/unpack()
  • unnest_longer()/unnest_wider()

list column helpers: not sure if they are needed but for completeness I also listed them here:

  • chop()/unchop()
  • hoist()
@abalter

This comment has been minimized.

@hadley hadley added feature a feature request or enhancement pivoting ♻️ pivot rectangular data to different "shapes" labels Feb 17, 2021
@hadley

This comment has been minimized.

@DavisVaughan
Copy link
Member

As we've refined these new-ish verbs in the 1.2.0 release (tweaking defaults, adding arguments), it has been extremely helpful that they aren't generic so we don't have to deal with S3 methods in other packages needing changes (unlike some issues in the generic pivot_longer() and pivot_wider() #1291). I would encourage holding out as long as we can on this to really stabilize all of these verbs.

@mgirlich
Copy link
Contributor Author

mgirlich commented Jan 4, 2022

@DavisVaughan I see your point. Though, for lazy tables it would make sense to have pivot_wider_spec() be generic. Alternatively, they might have to export something like dbplyr_pivot_wider_spec() temporarily.

The issue with pivot_wider() is that the result has to be collected in order to know which columns to generate. Therefore, pivot_wider() cannot be lazy. In pivot_wider_spec() this is not an issue because we already know the columns. There are some issues in dbplyr: tidyverse/dbplyr#703 and tidyverse/dbplyr#598

@abalter
Copy link

abalter commented Mar 12, 2023

Is there any chance nesting and pivoting will be available for lazy tables? Particularly database connections?

@abalter
Copy link

abalter commented Mar 12, 2023

@hadley -- Your comment sounded to me as though nesting is currently implemented. However, I can't get it to work. Am I missing something?

library(RPostgreSQL)
#> Loading required package: DBI
library(RMySQL)
#> 
#> Attaching package: 'RMySQL'
#> The following object is masked from 'package:RPostgreSQL':
#> 
#>     dbApply
library(dbplyr)
library(tidyverse)

con = DBI::dbConnect(RSQLite::SQLite(), ":memory:")

t = tibble(
  A = sample(letters[1:5], 20, replace=T),
  B=sample(1:5, 20, replace=T),
  C = rnorm(20)
)

lf = lazy_frame(
  A = sample(letters[1:5], 20, replace=T),
  B=sample(1:5, 20, replace=T),
  C = rnorm(20)
)

copy_to(con, t, "test", overwrite = T)

dbListTables(con)
#> [1] "sqlite_stat1" "sqlite_stat4" "test"

t %>%
  nest(data = -A) %>%
  slice_sample(n=2) %>%
  unnest(cols = data)
#> # A tibble: 6 x 3
#>   A         B      C
#>   <chr> <int>  <dbl>
#> 1 c         5  0.233
#> 2 c         1  1.62 
#> 3 c         1 -0.373
#> 4 c         4  0.275
#> 5 c         3 -1.47 
#> 6 b         5 -0.958

lf %>%
  nest(data = -A) %>%
  slice_sample(n=2) %>%
  unnest(cols = data)
#> Error in UseMethod("nest"): no applicable method for 'nest' applied to an object of class "c('tbl_TestConnection', 'tbl_lazy', 'tbl')"

tdb = tbl(con, "test")
tdb %>%
  nest(data = -A)
#> Error in UseMethod("nest"): no applicable method for 'nest' applied to an object of class "c('tbl_SQLiteConnection', 'tbl_dbi', 'tbl_sql', 'tbl_lazy', 'tbl')"

Created on 2023-03-12 with reprex v2.0.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement nesting 🐦 nesting, chopping, and packing pivoting ♻️ pivot rectangular data to different "shapes" rectangling 🗄️ converting deeply nested lists into tidy data frames
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants