`unnest_wider` unstable with respect to row order when elements have duplicated names #1501

miketcassidy · 2023-06-05T19:20:39Z

I've encountered what seems to be a bug in unnest_wider.

Consider the following setup: You have a tibble where one of the columns contains a nested list, the elements of which may include duplicate names repeated a variable number of times across observations (as in a nested XML document).

If the "larger" observation (in the sense of having maximum number of repeated-and-duplicatively-named list elements across all observations) is the first observation, unnest_wider with names_sep and names_repair handles the flattening the way it should.

However, if the "smaller" observation occurs first, unnest_wider drops the all but the first instance of the repeated-and-duplicate-named list element, hence deleting information should be flattened into additional distinct columns.

See below for a simple example.

Brief description of the problem

library(tidyverse)

df1 = tibble(
  a=1:2,
  b=list( list(c=list(3),c=list(4)), list(c=list(5)) ) )
df1
#> # A tibble: 2 × 2
#>       a b               
#>   <int> <list>          
#> 1     1 <named list [2]>
#> 2     2 <named list [1]>

df1 |> 
  tidyr::unnest_wider(b,names_sep ="_",names_repair = 'unique') |>
  tidyr::unnest_wider(contains("c"),names_sep ="_",names_repair = 'unique')
#> New names:
#> • `b_c` -> `b_c...2`
#> • `b_c` -> `b_c...3`
#> # A tibble: 2 × 3
#>       a b_c...2_1 b_c...3_1
#>   <int>     <dbl>     <dbl>
#> 1     1         3         4
#> 2     2         5         5


df1[c(2,1),] |> 
  tidyr::unnest_wider(b,names_sep ="_",names_repair = 'unique') |>
  tidyr::unnest_wider(contains("c"),names_sep ="_",names_repair = 'unique')
#> # A tibble: 2 × 2
#>       a b_c_1
#>   <int> <dbl>
#> 1     2     5
#> 2     1     3

hadley · 2023-11-01T19:06:40Z

This reprex is sufficient to illustrate the problem, right?

library(tidyverse)

df <- tibble(
  a = 1:2,
  b = list(
    list(c = list(3), c = list(4)),
    list(c = list(5))
  )
)
df
#> # A tibble: 2 × 2
#>       a b               
#>   <int> <list>          
#> 1     1 <named list [2]>
#> 2     2 <named list [1]>

df |>
  unnest_wider(b, names_sep = "_", names_repair = "unique")
#> New names:
#> • `b_c` -> `b_c...2`
#> • `b_c` -> `b_c...3`
#> # A tibble: 2 × 3
#>       a b_c...2    b_c...3   
#>   <int> <list>     <list>    
#> 1     1 <list [1]> <list [1]>
#> 2     2 <list [1]> <list [1]>

df[c(2, 1), ] |>
  unnest_wider(b, names_sep = "_") 
#> # A tibble: 2 × 2
#>       a b_c       
#>   <int> <list>    
#> 1     2 <list [1]>
#> 2     1 <list [1]>

^{Created on 2023-11-01 with reprex v2.0.2}

It seems likely that it's the duplicated names causing problems here.

miketcassidy changed the title ~~unnest_wider unstable with respect to row order~~ unnest_wider unstable with respect to row order Jun 5, 2023

hadley added the bug an unexpected problem or unintended behavior label Nov 1, 2023

hadley changed the title ~~unnest_wider unstable with respect to row order~~ unnest_wider unstable with respect to row order when elements have duplicated names Nov 1, 2023

hadley changed the title ~~unnest_wider unstable with respect to row order when elements have duplicated names~~ unnest_wider unstable with respect to row order when elements have duplicated names Nov 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`unnest_wider` unstable with respect to row order when elements have duplicated names #1501

`unnest_wider` unstable with respect to row order when elements have duplicated names #1501

miketcassidy commented Jun 5, 2023

hadley commented Nov 1, 2023

unnest_wider unstable with respect to row order when elements have duplicated names #1501

unnest_wider unstable with respect to row order when elements have duplicated names #1501

Comments

miketcassidy commented Jun 5, 2023

hadley commented Nov 1, 2023

`unnest_wider` unstable with respect to row order when elements have duplicated names #1501

`unnest_wider` unstable with respect to row order when elements have duplicated names #1501