Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change in pivot_wider behavior and error messages when using column numbers for id_cols #1541

Open
bpcolman opened this issue Feb 15, 2024 · 1 comment

Comments

@bpcolman
Copy link

bpcolman commented Feb 15, 2024

In going through some old code that I hadn't touched in a while, I found that pivot_wider was not working and was giving me the following error:

#> Error in `rethrow_id_cols_oob()`:
#> ! `i` must be a single string, not an integer vector.
#> ℹ This is an internal error that was detected in the tidyr package.
#>   Please report it at <https://github.com/tidyverse/tidyr/issues> with a reprex
#>   (<https://tidyverse.org/help/>) and the full backtrace.

Digging deeper, I think I've sorted out the overarching timing of the issue, and it is due to changes made between tidyr 1.1.4 and 1.2.0 that changes the behavior, and between versions 1.2.1 and 1.3.0 where the error message changed to the above. I've included a reprex below that shows behavior in ver 1.1.4 and 1.3.1:

# Install `dplyr` and `devtools` if they aren't already:
list.of.packages <- c("dplyr", "devtools")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)

# Load dplyr and devtools packages
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(devtools)
#> Loading required package: usethis


# Record installed version of tidyr if there is one, to allow reinstallation of
# the current version, or provide an error if there is not one:
tidy_ver <- packageVersion("tidyr") # record the version number of existing tidyr

detach("package:tidyr", unload = TRUE)
#> Error in detach("package:tidyr", unload = TRUE): invalid 'name' argument

remove.packages("tidyr")
#> Removing package from '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
#> (as 'lib' is unspecified)

install_version("tidyr", 
                version = "1.1.4", 
                repos = "http://cran.us.r-project.org",
                quiet = TRUE)

library(tidyr)

# Set a seed for consistency
set_seed = 312

# Make some data (albeit with not the tidiest structure)

(df <- data.frame(site = c(rep((1:3), each = 2)),
                 time = rep(c("day", "night"), 3),
                 temp = rnorm(6, rep(c(20, 10), each = 1), 4),
                 day_of_exp = c(rep(1, 2), 
                                rep(11, 2),
                                rep(21, 2))))
#>   site  time      temp day_of_exp
#> 1    1   day 24.681337          1
#> 2    1 night  2.646830          1
#> 3    2   day 19.473059         11
#> 4    2 night 10.124520         11
#> 5    3   day 23.304546         21
#> 6    3 night  7.414085         21

# Pivot data based on site (column 1) and day_of_exp (column 4) using column
# numbers:

df %>% pivot_wider(id_cols = c(1, 4),
                   names_from = time,
                   values_from = temp)
#> # A tibble: 3 × 4
#>    site day_of_exp   day night
#>   <int>      <dbl> <dbl> <dbl>
#> 1     1          1  24.7  2.65
#> 2     2         11  19.5 10.1 
#> 3     3         21  23.3  7.41

# Works well. We can also Pivot data based on site and day_of_exp using
# column names:

df %>% pivot_wider(id_cols = c(site, day_of_exp),
                   names_from = time,
                   values_from = temp)
#> # A tibble: 3 × 4
#>    site day_of_exp   day night
#>   <int>      <dbl> <dbl> <dbl>
#> 1     1          1  24.7  2.65
#> 2     2         11  19.5 10.1 
#> 3     3         21  23.3  7.41

# If for some reason, we include a variable that doesn't exist in the dataset,
# we get a simple error message:

df %>%
  pivot_wider(id_cols = c(1, 5),
              names_from = time,
              values_from = temp)
#> Error in `pivot_wider_spec()`:
#> ! Can't subset columns past the end.
#> ℹ Location 5 doesn't exist.
#> ℹ There are only 4 columns.

# So that gives an error message, and the error message makes it clear that the
# error is caused by choosing a column that doesn't exist. 

# Working with `tidyr` v. 1.3.1, things don't work in the same way.

# Detach and uninstall `tidyr`1.1.3 and install and load 1.3.1:

detach("package:tidyr", unload = TRUE)
remove.packages("tidyr")
#> Removing package from '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
#> (as 'lib' is unspecified)

install_version("tidyr", 
                version = "1.3.1", 
                repos = "http://cran.us.r-project.org", 
                quiet = TRUE)

library("tidyr")

# In ver 1.3.1, se can try to pivot data wider based on site (column 1) and
# day_of_exp (column 4) using column numbers:

df %>% pivot_wider(id_cols = c(1, 4),
                   names_from = time,
                   values_from = temp)
#> Error in `rethrow_id_cols_oob()`:
#> ! `i` must be a single string, not an integer vector.
#> ℹ This is an internal error that was detected in the tidyr package.
#>   Please report it at <https://github.com/tidyverse/tidyr/issues> with a reprex
#>   (<https://tidyverse.org/help/>) and the full backtrace.

# It does not work, and we get a new error message. We can still use names:

df %>% pivot_wider(id_cols = c(site, day_of_exp),
                   names_from = time,
                   values_from = temp)
#> # A tibble: 3 × 4
#>    site day_of_exp   day night
#>   <int>      <dbl> <dbl> <dbl>
#> 1     1          1  24.7  2.65
#> 2     2         11  19.5 10.1 
#> 3     3         21  23.3  7.41

# If we include a variable that doesn't exist in the dataset, we get the same
# error message as when we tried to call the 4th column:

df %>%
  pivot_wider(id_cols = c(1, 5),
              names_from = time,
              values_from = temp)
#> Error in `rethrow_id_cols_oob()`:
#> ! `i` must be a single string, not an integer vector.
#> ℹ This is an internal error that was detected in the tidyr package.
#>   Please report it at <https://github.com/tidyverse/tidyr/issues> with a reprex
#>   (<https://tidyverse.org/help/>) and the full backtrace.

# It appears that the cause of this error when calling "id_cols" by number is
# that pivot_wider calls the columns from an object where the columns that are
# called in the "names_from" or "values_from" arguments have already been
# removed, leaving a object in this case that that has dimensions 6 x 2 instead
# of 6 x 4. This is corroborated if we call columns 1 and 2, which should call
# our site and day_of_exp variables if time and temp are removed:

df %>%
  pivot_wider(id_cols = c(1, 2),
              names_from = time,
              values_from = temp)
#> # A tibble: 3 × 4
#>    site day_of_exp   day night
#>   <int>      <dbl> <dbl> <dbl>
#> 1     1          1  24.7  2.65
#> 2     2         11  19.5 10.1 
#> 3     3         21  23.3  7.41

# This works, suggesting that this is where the error lies. 

# Detach library, remove tidyr, and reinstall original version if it was already
# installed:

detach("package:tidyr", unload = TRUE)

remove.packages("tidyr")
#> Removing package from '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
#> (as 'lib' is unspecified)

install_version("tidyr", 
                version = tidy_ver, 
                repos = "http://cran.us.r-project.org",
                quiet = TRUE)

Created on 2024-02-15 with reprex v2.1.0

A workaround is to use the column names in the id_cols argument; however, given older versions of the pivot_wider function allowed the use of column numbers and there is an increased potential for error when entering names, it would be nice if pivot_wider functioned like it did in tidyr 1.1.4 and before. I would also advocate for a return to the more descriptive error message when choosing variables that don't exist if that is possible, and including a descriptive warning/error if one attempts to choose a variable that is included in the names_from or values_from arguments.

If the removal of straightforward selection by column number is a feature and not a bug, I would request an update for the help and vignette to make this even clearer.

@rion-saeon
Copy link

rion-saeon commented Apr 11, 2024

In my case, I could specify id_cols like this and it worked. All indexes (col numbers) did not (i.e., when month_n is specified as 8)

pivot_wider(id_cols = 
                c(1:4, month_n), 
              names_from = "records_or_cards", 
              values_from = value)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants