Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pivot_wider: unexpected behavior specifying id_cols to omit that are included in names/values_from #1506

Open
jwhendy opened this issue Jun 28, 2023 · 8 comments
Labels
feature a feature request or enhancement pivoting ♻️ pivot rectangular data to different "shapes"

Comments

@jwhendy
Copy link

jwhendy commented Jun 28, 2023

I was running pivot_wider on some data and was surprised by the inability to use -c(col1, col2) to choose my id_cols, resulting in the error:

Error in `pivot_wider()`:
`id_cols` can't select a column already selected by `names_from`.
Column `type` has already been selected.

Repro:

library(dplyr)
library(tidyr)

tmp <- data.frame(id1 = c("a", "a", "b", "b"),
                  unused = c(NA, NA, NA, NA),
                  type = c("c", "d", "c", "d"),
                  values = c(1, 2, 3, 4))

Base case:

tmp %>% pivot_wider(id_cols = id1, names_from = type, values_from = values)

# A tibble: 2 × 3
  id1       c     d
  <chr> <dbl> <dbl>
1 a         1     2
2 b         3     4

But say you had a lot of columns; it's more concise to remove a few than name them all. Neither of these work, and produce the error above:

tmp %>% pivot_wider(id_cols = -c(type, values, unused), names_from = type, values_from = values)
tmp %>% pivot_wider(id_cols = c(-type, -values, -unused), names_from = type, values_from = values)

My failure mode may be covered by this statement from the docs:

id_cols [...] Defaults to all columns in data except for the columns specified through names_from and values_from. If a tidyselect expression is supplied, it will be evaluated on data after removing the columns specified through names_from and values_from.

This is why I included the "unused" column, as for data with many columns, one would have to think about "ok, I'm removing type and values 'for free' since they are used in other args, but I do need to remember to remove those other columns."

tmp %>% pivot_wider(id_cols = -unused, names_from = type, values_from = values)

# A tibble: 2 × 3
  id1       c     d
  <chr> <dbl> <dbl>
1 a         1     2
2 b         3     4

Thoughts:

  • this is a bug, in that there should be no problem specifying columns to drop, even if they are implicitly dropped by being passed to names_from or values_from
  • this is not a bug, but documentation could be improved. I was confused by the message: "Column type has already been selected"... by what/how?" It was non-intuitive to me that it's "selected" when I'm trying to explicitly not select it as an id_col
  • not a bug, and the documentation is perfectly clear. I admittedly don't use pivot functions that often, so it could be a misunderstanding on my part.
@hadley
Copy link
Member

hadley commented Nov 1, 2023

Could you please rework your reproducible example to use the reprex package ? That makes it easier to see both the input and the output, formatted in such a way that I can easily re-run in a local session. Thanks!

@hadley hadley added the reprex needs a minimal reproducible example label Nov 1, 2023
@jwhendy
Copy link
Author

jwhendy commented Nov 1, 2023

@hadley I admit I haven't used this before, so hopefully the below is the RightWay. I don't see much advantage other than the setup code and the failing code aren't separated by one line of my dialog.

Here's a reprex for the case that surprised me:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)

tmp <- data.frame(id1 = c("a", "a", "b", "b"),
                  unused = c(NA, NA, NA, NA),
                  type = c("c", "d", "c", "d"),
                  values = c(1, 2, 3, 4))
tmp %>% pivot_wider(id_cols = -c(type, values, unused), names_from = type, values_from = values)
#> Error in `pivot_wider()`:
#> ! `id_cols` can't select a column already selected by `names_from`.
#> ℹ Column `type` has already been selected.
#> Backtrace:
#>      ▆
#>   1. ├─tmp %>% ...
#>   2. ├─tidyr::pivot_wider(...)
#>   3. ├─tidyr:::pivot_wider.data.frame(., id_cols = -c(type, values, unused), names_from = type, values_from = values)
#>   4. │ └─tidyr:::build_wider_id_cols_expr(...)
#>   5. │   └─tidyr:::select_wider_id_cols(...)
#>   6. │     ├─rlang::try_fetch(...)
#>   7. │     │ └─base::withCallingHandlers(...)
#>   8. │     └─tidyselect::eval_select(...)
#>   9. │       └─tidyselect:::eval_select_impl(...)
#>  10. │         ├─tidyselect:::with_subscript_errors(...)
#>  11. │         │ └─rlang::try_fetch(...)
#>  12. │         │   └─base::withCallingHandlers(...)
#>  13. │         └─tidyselect:::vars_select_eval(...)
#>  14. │           └─tidyselect:::walk_data_tree(expr, data_mask, context_mask)
#>  15. │             └─tidyselect:::eval_minus(expr, data_mask, context_mask, error_call)
#>  16. │               └─tidyselect:::eval_bang(expr, data_mask, context_mask)
#>  17. │                 └─tidyselect:::walk_data_tree(expr[[2]], data_mask, context_mask)
#>  18. │                   └─tidyselect:::eval_c(expr, data_mask, context_mask)
#>  19. │                     └─tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
#>  20. │                       └─tidyselect:::walk_data_tree(new, data_mask, context_mask)
#>  21. │                         └─tidyselect:::as_indices_sel_impl(...)
#>  22. │                           └─tidyselect:::as_indices_impl(...)
#>  23. │                             └─tidyselect:::chr_as_locations(x, vars, call = call, arg = arg)
#>  24. │                               └─vctrs::vec_as_location(...)
#>  25. ├─vctrs (local) `<fn>`()
#>  26. │ └─vctrs:::stop_subscript_oob(...)
#>  27. │   └─vctrs:::stop_subscript(...)
#>  28. │     └─rlang::abort(...)
#>  29. │       └─rlang:::signal_abort(cnd, .file)
#>  30. │         └─base::signalCondition(cnd)
#>  31. ├─rlang (local) `<fn>`(`<vctrs___>`)
#>  32. │ └─handlers[[1L]](cnd)
#>  33. │   └─rlang::cnd_signal(cnd)
#>  34. │     └─rlang:::signal_abort(cnd)
#>  35. │       └─base::signalCondition(cnd)
#>  36. └─rlang (local) `<fn>`(`<vctrs___>`)
#>  37.   └─handlers[[1L]](cnd)
#>  38.     └─tidyr:::rethrow_id_cols_oob(...)
#>  39.       └─tidyr:::stop_id_cols_oob(i, "names_from", call = call)
#>  40.         └─cli::cli_abort(...)
#>  41.           └─rlang::abort(...)
Created on 2023-11-01 with [reprex v2.0.2](https://reprex.tidyverse.org/)

@hadley
Copy link
Member

hadley commented Nov 1, 2023

@jwhendy it doesn't look like you copied and pasted it correctly. And it does make my life easier having all the code in one block, because there's just one thing to copy and paste, rather than having to stitch together multiple pieces. You can also remove the call to dplyr, because that doesn't seem necessary.

library(tidyr)

tmp <- data.frame(
  id1 = c("a", "a", "b", "b"),
  unused = c(NA, NA, NA, NA),
  type = c("c", "d", "c", "d"),
  values = c(1, 2, 3, 4)
)
tmp |> pivot_wider(id_cols = -c(type, values, unused), names_from = type, values_from = values)
#> Error in `pivot_wider()`:
#> ! `id_cols` can't select a column already selected by `names_from`.
#> ℹ Column `type` has already been selected.

Created on 2023-11-01 with reprex v2.0.2

The error message doesn't seem correct because you're not actually selecting type; you're unselecting it.

@hadley hadley added feature a feature request or enhancement pivoting ♻️ pivot rectangular data to different "shapes" and removed reprex needs a minimal reproducible example labels Nov 1, 2023
@jwhendy
Copy link
Author

jwhendy commented Nov 2, 2023

Understood, and don't want to be a hassle! Boy, the docs are confusing (edit: not a good word choice... just mean for how simple the steps are, to not have obtained the right result is a bummer) for how simple this is supposed to be!

Let’s say you copy this code onto your clipboard (or, on RStudio Server or Cloud, select it):

I selected this, then copied to clip board.

library(dplyr)
library(tidyr)

tmp <- data.frame(id1 = c("a", "a", "b", "b"),
                  unused = c(NA, NA, NA, NA),
                  type = c("c", "d", "c", "d"),
                  values = c(1, 2, 3, 4))
tmp %>% pivot_wider(id_cols = -c(type, values, unused), names_from = type, values_from = values)

Then call reprex(), where the default target venue is GitHub:

So I ran reprex() in the rstudio R prompt.

The relevant bit of GitHub-flavored Markdown is ready to be pasted from your clipboard.

Coming back here to paste:

tmp %>% pivot_wider(id_cols = -c(type, values, unused), names_from = type, values_from = values)
#> Error in tmp %>% pivot_wider(id_cols = -c(type, values, unused), names_from = type, : could not find function "%>%"
library(tidyr)

tmp <- data.frame(id1 = c("a", "a", "b", "b"),
                  unused = c(NA, NA, NA, NA),
                  type = c("c", "d", "c", "d"),
                  values = c(1, 2, 3, 4))
tmp %>% pivot_wider(id_cols = -c(type, values, unused), names_from = type, values_from = values)
#> Error in `pivot_wider()`:
#> ! `id_cols` can't select a column already selected by `names_from`.
#> ℹ Column `type` has already been selected.
#> Backtrace:
#>      ▆
#>   1. ├─tmp %>% ...

### snipped for brevity, but same as above

#>  41.           └─rlang::abort(...)

<sup>Created on 2023-11-01 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>

At least the output seems reproducible 2x in a row :)

You can also remove the call to dplyr, because that doesn't seem necessary.

My bad. I thought %>% came from dplyr.

The error message doesn't seem correct because you're not actually selecting type; you're unselecting it.

That was my thinking, though I regularly consider myself in noob territory despite having used R a long time! At the least, I thought the message could clarify why this is problematic. The example seems trivial, but at the time, there were a bunch of columns, so I'd much rather id_cols = -c(start_col:end_col).

Thanks for taking a look!

@hadley
Copy link
Member

hadley commented Nov 2, 2023

Do you have the latest version of reprex? And how are you running R? (e.g. in RStudio on your desktop?)

@jwhendy
Copy link
Author

jwhendy commented Nov 2, 2023

Would you like me to create a ticket in the reprex repo? I just installed it yesterday. After installing, I wasn't sure if any loaded environment objects would goof things up, so my process was:

  • install.packages("reprex")
  • quit/restart Rstudio
  • put the code above into a random .Rmd file, select, cmd+c
  • reprex()
  • copy the output here
> sessionInfo()
R version 4.2.3 (2023-03-15)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.6.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tibble_3.2.1     vroom_1.6.1      readr_2.1.4      writexl_1.4.2    stringr_1.5.0    readxl_1.4.2     openxlsx_4.2.5.2 odbc_1.3.4      
 [9] dotenv_1.0.3     DBI_1.1.3        tidyr_1.3.0      dplyr_1.1.2      reprex_2.0.2    

loaded via a namespace (and not attached):
 [1] zip_2.3.0        Rcpp_1.0.10      cellranger_1.1.0 compiler_4.2.3   pillar_1.9.0     tools_4.2.3      digest_0.6.31    bit_4.0.5       
 [9] evaluate_0.20    lifecycle_1.0.3  pkgconfig_2.0.3  rlang_1.1.1      cli_3.6.1        rstudioapi_0.14  yaml_2.3.7       xfun_0.39       
[17] fastmap_1.1.1    withr_2.5.0      knitr_1.42       generics_0.1.3   fs_1.6.1         vctrs_0.6.3      hms_1.1.3        bit64_4.0.5     
[25] tidyselect_1.2.0 glue_1.6.2       R6_2.5.1         processx_3.8.2   fansi_1.0.4      rmarkdown_2.21   tzdb_0.3.0       purrr_1.0.1     
[33] callr_3.7.3      clipr_0.8.0      blob_1.2.4       magrittr_2.0.3   ps_1.7.5         htmltools_0.5.5  utf8_1.2.3       stringi_1.7.12  
[41] crayon_1.5.2  

@hadley
Copy link
Member

hadley commented Nov 2, 2023

@jwhendy yes please!

@jwhendy
Copy link
Author

jwhendy commented Nov 11, 2023

@hadley I'm delayed, but tis done. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement pivoting ♻️ pivot rectangular data to different "shapes"
Projects
None yet
Development

No branches or pull requests

2 participants