Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crossing() adds missing factor levels (either a bug or a documentation issue) #1526

Open
billdenney opened this issue Oct 28, 2023 · 0 comments

Comments

@billdenney
Copy link
Contributor

When using crossing() with a factor that does not have all levels present, it inserts the missing levels. My guess is that this is an intentional feature and the documentation should clarify it. When I was using crossing() the expansion of missing levels was unexpected since it added values that were not in the combination of inputs.

The observed behavior is below, and I expected the second behavior in both cases.

I think that the documentation clarification that would have made the behavior expected would be the following:

Here, add "and the nesting() and crossing() helpers" after "complete()":

tidyr/R/expand.R

Lines 36 to 38 in ad62841

#' When used with factors, [expand()] and [complete()] use the full set of
#' levels, not just those that appear in the data. If you want to use only the
#' values seen in the data, use `forcats::fct_drop()`.

And, in the expand_grid() documentation, point back to the fact that it uses expand(). Perhaps adding something like the following to the details section of the docs of expand_grid(): "expand_grid() uses expand() to generate all combinations.

library(tidyr)

# factor levels are expanded
crossing(a = c("A", "B"), b = factor(c("a", "b"), levels = c("a", "b", "c")))
#> # A tibble: 6 × 2
#>   a     b    
#>   <chr> <fct>
#> 1 A     a    
#> 2 A     b    
#> 3 A     c    
#> 4 B     a    
#> 5 B     b    
#> 6 B     c
# factor levels are not expanded
crossing(
  a = c("A", "B"),
  data.frame(b = factor(c("a", "b"), levels = c("a", "b", "c")))
)
#> # A tibble: 4 × 2
#>   a     b    
#>   <chr> <fct>
#> 1 A     a    
#> 2 A     b    
#> 3 B     a    
#> 4 B     b

Created on 2023-10-28 with reprex v2.0.2

If helpful, I can make the documentation PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants