Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unnest_wider unstable with respect to row order when elements have duplicated names #1501

Open
miketcassidy opened this issue Jun 5, 2023 · 1 comment
Labels
bug an unexpected problem or unintended behavior

Comments

@miketcassidy
Copy link

I've encountered what seems to be a bug in unnest_wider.

Consider the following setup: You have a tibble where one of the columns contains a nested list, the elements of which may include duplicate names repeated a variable number of times across observations (as in a nested XML document).

If the "larger" observation (in the sense of having maximum number of repeated-and-duplicatively-named list elements across all observations) is the first observation, unnest_wider with names_sep and names_repair handles the flattening the way it should.

However, if the "smaller" observation occurs first, unnest_wider drops the all but the first instance of the repeated-and-duplicate-named list element, hence deleting information should be flattened into additional distinct columns.

See below for a simple example.


Brief description of the problem

library(tidyverse)

df1 = tibble(
  a=1:2,
  b=list( list(c=list(3),c=list(4)), list(c=list(5)) ) )
df1
#> # A tibble: 2 × 2
#>       a b               
#>   <int> <list>          
#> 1     1 <named list [2]>
#> 2     2 <named list [1]>

df1 |> 
  tidyr::unnest_wider(b,names_sep ="_",names_repair = 'unique') |>
  tidyr::unnest_wider(contains("c"),names_sep ="_",names_repair = 'unique')
#> New names:
#> • `b_c` -> `b_c...2`
#> • `b_c` -> `b_c...3`
#> # A tibble: 2 × 3
#>       a b_c...2_1 b_c...3_1
#>   <int>     <dbl>     <dbl>
#> 1     1         3         4
#> 2     2         5         5


df1[c(2,1),] |> 
  tidyr::unnest_wider(b,names_sep ="_",names_repair = 'unique') |>
  tidyr::unnest_wider(contains("c"),names_sep ="_",names_repair = 'unique')
#> # A tibble: 2 × 2
#>       a b_c_1
#>   <int> <dbl>
#> 1     2     5
#> 2     1     3
@miketcassidy miketcassidy changed the title unnest_wider unstable with respect to row order unnest_wider unstable with respect to row order Jun 5, 2023
@hadley
Copy link
Member

hadley commented Nov 1, 2023

This reprex is sufficient to illustrate the problem, right?

library(tidyverse)

df <- tibble(
  a = 1:2,
  b = list(
    list(c = list(3), c = list(4)),
    list(c = list(5))
  )
)
df
#> # A tibble: 2 × 2
#>       a b               
#>   <int> <list>          
#> 1     1 <named list [2]>
#> 2     2 <named list [1]>

df |>
  unnest_wider(b, names_sep = "_", names_repair = "unique")
#> New names:
#> • `b_c` -> `b_c...2`
#> • `b_c` -> `b_c...3`
#> # A tibble: 2 × 3
#>       a b_c...2    b_c...3   
#>   <int> <list>     <list>    
#> 1     1 <list [1]> <list [1]>
#> 2     2 <list [1]> <list [1]>

df[c(2, 1), ] |>
  unnest_wider(b, names_sep = "_") 
#> # A tibble: 2 × 2
#>       a b_c       
#>   <int> <list>    
#> 1     2 <list [1]>
#> 2     1 <list [1]>

Created on 2023-11-01 with reprex v2.0.2

It seems likely that it's the duplicated names causing problems here.

@hadley hadley added the bug an unexpected problem or unintended behavior label Nov 1, 2023
@hadley hadley changed the title unnest_wider unstable with respect to row order unnest_wider unstable with respect to row order when elements have duplicated names Nov 1, 2023
@hadley hadley changed the title unnest_wider unstable with respect to row order when elements have duplicated names unnest_wider unstable with respect to row order when elements have duplicated names Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants