Exposing cell content in list-columns? #892

courtiol · 2021-05-25T17:12:12Z

When displaying tibbles with list-columns, it would be nice to be able to give a glimpse of the content within each cell.
For example, if the width is sufficient, instead of:

> tibble::tibble(x = list(1:2, 1:100))
# A tibble: 2 x 1
  x          
  <list>     
1 <int [2]>  
2 <int [100]>

it would great to have something like what str() produces:

> tibble::tibble(x = list(1:2, 1:100))
# A tibble: 2 x 1
  x          
  <list>     
1 int [1:2] 1 2
2 int [1:100] 1 2 3 4 5 6 7 8 9 10 ...

I guess this could be done by defining one's own class and pillar method, but I think that it would be useful for any tibble.
Perhaps whether to expose the content of not could be set with a global formatting option.

A motivation is that it could play well with dplyr::summarise() when using function not outputting scalars:

> iris %>%
+   group_by(Species) %>%
+   summarise(range = list(range(Sepal.Length)),
+             quartiles = list(quantile(Sepal.Length)))
# A tibble: 3 x 3
  Species    range     quartiles
  <fct>      <list>    <list>   
1 setosa     <dbl [2]> <dbl [5]>
2 versicolor <dbl [2]> <dbl [5]>
3 virginica  <dbl [2]> <dbl [5]>

A difficulty is that any kind of content can be nested within a cell and not just vectors, but perhaps specific displays could be setup for the main class.

This is probably an issue for pillar, but the motivation is the display of tibbles.

The text was updated successfully, but these errors were encountered:

krlmlr · 2021-06-09T04:27:19Z

Thanks. I think a way to move forward could indeed be the creation of a custom class that applies the desired formatting. If this is useful and stable, we might incorporate a variant in pillar.

courtiol · 2021-06-09T15:36:04Z

I don't anything about pillar & vctrs so I don't know how stable the code below may be, but here is a simple proof of concept:

list_col <- function(x) {
  vctrs::new_vctr(x, class = "list_col")
}

formatter_list_element <- function(x, width) {
  start_txt <- "<"
  end_txt <- ">"
  ptype_txt  <-  vctrs::vec_ptype_abbr(x) # note: not working if element is not a vector (e.g. a function), do we care?
  context_text <- ifelse(length(x) > 0,
                         paste0(" [", length(x), "] ",
                                toString(x,
                                         width = width - nchar(ptype_txt) - nchar(paste0("<[]>", length(x))))),
                         "")
  paste0(start_txt, ptype_txt, context_text, end_txt)
}

format.list_col <- function(x, ..., width = 25, formater = formatter_list_element) {
  res <- purrr::map_chr(x, ~  formater(.x, width))
  format(res, justify = "left")
}

vec_ptype_abbr.list_col <- function(x) {
  "list-col"
}

pillar_shaft.list_col <- function(x, ...) {
  out <- format(x, width = 25) # how to define width?
  pillar::new_pillar_shaft_simple(out, min_width = 10) # what should min_width be?
}

## Example 1:
x <- list(1:2, TRUE, NA, NULL, 1.3, list(1, b = 2:10), matrix(1:9, nrow = 3))
     
y <- list_col(x)

tibble::tibble(x = x, y = y)
#> # A tibble: 7 x 2
#>   x                y                          
#>   <list>           <list-col>                 
#> 1 <int [2]>        <int [2] 1, 2>             
#> 2 <lgl [1]>        <lgl [1] TRUE>             
#> 3 <lgl [1]>        <lgl [1] NA>               
#> 4 <NULL>           <NULL>                     
#> 5 <dbl [1]>        <dbl [1] 1.3>              
#> 6 <named list [2]> <named list [2] 1, 2:10>   
#> 7 <int [3 × 3]>    <int[,3] [9] 1, 2, 3, ....>

# note: display could be improved:
# - in console, colors are not consistent
# - matrix dim are weird
# - named list don't show names

## Example 2:
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
iris %>%
  group_by(Species) %>%
  summarise(range = list_col(list(range(Sepal.Length))),
            quartiles = list_col(list(quantile(Sepal.Length))))
#> # A tibble: 3 x 3
#>   Species    range              quartiles                  
#>   <fct>      <list-col>         <list-col>                 
#> 1 setosa     <dbl [2] 4.3, 5.8> <dbl [5] 4.3, 4.8, 5, ....>
#> 2 versicolor <dbl [2] 4.9, 7>   <dbl [5] 4.9, 5.6, 5.9....>
#> 3 virginica  <dbl [2] 4.9, 7.9> <dbl [5] 4.9, 6.225, 6....>

^{Created on 2021-06-09 by the reprex package (v2.0.0)}

krlmlr · 2021-06-09T16:34:27Z

Nice!

@hadley: What do you think?

hadley · 2021-06-14T19:28:02Z

Seems like a reasonable idea, but I'd want to see a fuller exploration of what would be displayed for types other than atomic vector.

courtiol · 2021-06-15T11:13:26Z

Default outputs for non-atomic vectors

Redefining formatter_list_element() above as:

formatter_list_element <- function(x, width) {
  ptype_txt  <- pillar::obj_sum(x)
  context_text <- ifelse(length(x) > 0,
                         paste0(" ", toString(x, width = width - nchar(ptype_txt) - 3L)),
                         "")
  paste0("<", ptype_txt, context_text, ">")
}

to benefit from the dimensions and ptype extracted by pillar::obj_sum(),
and increasing the width in pillar_shaft.list_col() to 50 to show here more of the output,

pillar_shaft.list_col <- function(x, ...) {
  out <- format(x, width = 50)
  pillar::new_pillar_shaft_simple(out, min_width = 10)
}

we get the following for types other than atomic vectors

> x <- list(a = matrix(1:9, nrow = 3), b = array(1:27, dim = c(3, 3, 3)), c = list(z = 1, zz = list(1, 2)))
> y <- list_col(x)
> tibble::tibble(x = x, y = y)
# A tibble: 3 x 2
  x                 y                                                 
  <named list>      <list-col>                                        
1 <int [3 × 3]>     <int [3 × 3] 1, 2, 3, 4, 5, 6, 7, 8, 9>           
2 <int [3 × 3 × 3]> <int [3 × 3 × 3] 1, 2, 3, 4, 5, 6, 7, 8, 9, 1....>
3 <named list [2]>  <named list [2] 1, list(1, 2)>

The first 2 rows are not that different from what str() (and thus glimpse()) does:

> str(x)
List of 3
 $ a: int [1:3, 1:3] 1 2 3 4 5 6 7 8 9
 $ b: int [1:3, 1:3, 1:3] 1 2 3 4 5 6 7 8 9 10 ...
 $ c:List of 2
  ..$ z : num 1
  ..$ zz:List of 2
  .. ..$ : num 1
  .. ..$ : num 2

the list looks quite different since it is compacted into a single row for the display of the tibble.
As list() are pandora's boxes, perhaps we could also opt to not reveal their guts...

For fun, I tried list of class lm:

> iris %>%
+   group_nest(Species) %>%
+   rowwise() %>%
+   summarise(lm = list(lm(Sepal.Length ~ Petal.Length, data = data))) %>%
+   mutate(lm = list_col(lm))
`summarise()` has ungrouped output. You can override using the `.groups` argument.
# A tibble: 3 x 1
  lm                                                                                                  
  <list-col>                                                                                          
1 <lm c(`(Intercept)` = 4.21316822303424, Petal.Length = 0.542292597103803), c(`1` = 0.1276221410....>
2 <lm c(`(Intercept)` = 2.40752310536045, Petal.Length = 0.828280961182994), c(`1` = 0.6995563770....>
3 <lm c(`(Intercept)` = 1.05965909090909, Petal.Length = 0.995738636363637), c(`1` = -0.734090909....>

That could certainly be improved but that shows that it should be possible to deal with various classes of non-atomic vectors.

Improved outputs via methods for `toString()`

If the default outputs are not good enough, perhaps we could build on the fact that toString() is a generic function.
We could thus try to define specific methods for stuff that aren't atomic vectors.

For example, we could imagine a toy method for arrays as follows:

toString.array <- function(x, width = NULL, ...) {
  cols <- apply(x, 2, \(col) toString(col, width = floor(width/ncol(x))))
  toString(paste0("{", paste(cols, collapse = "}{"), "}"), width)
  }

yielding to:

# A tibble: 3 x 2
  x                 y                                                                                                                          
  <named list>      <list-col>                                                                                                                 
1 <int [3 × 3]>     <int [3 × 3] {1, 2, 3}{4, 5, 6}{7, 8, 9}>                                                                                  
2 <int [3 × 3 × 3]> <int [3 × 3 × 3] {{1, 2, 3}{10, 11, 12}{19, 20, 21}}{{4, 5, 6}{13, 14, 15}{22, 23, 24}}{{7, 8, 9}{16, 17, 18}{25, 26, 27}}>
3 <named list [2]>  <named list [2] 1, list(1, 2)>

or

# A tibble: 3 x 2
  x                 y                                                 
  <named list>      <list-col>                                        
1 <int [3 × 3]>     <int [3 × 3] {1, 2, 3}{4, 5, 6}{7, 8, 9}>         
2 <int [3 × 3 × 3]> <int [3 × 3 × 3] {{1,.......}{{4,.......}{{7,....>
3 <named list [2]>  <named list [2] 1, list(1, 2)>

depending on the width argument for toString().

That could certainly be improved but that shows that authors of other packages could implement their own methods for toString() for dealing with the display of their specific classes when appearing in a list-column (without the need for them to define vctrs classes).

krlmlr · 2021-08-06T04:06:05Z

Thanks. I think the easiest way to start is to expand the contents only for elements where is_bare_atomic() holds, and there only to use the first three elements, and only if there's space. I'll take a look in pillar.

krlmlr added this to the 3.1.4 milestone Jul 29, 2021

krlmlr removed this from the 3.1.4 milestone Aug 6, 2021

krlmlr mentioned this issue Dec 25, 2021

Expand contents of list columns with bare atomics r-lib/pillar#403

Open

krlmlr added this to the 3.1.7 milestone Dec 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exposing cell content in list-columns? #892

Exposing cell content in list-columns? #892

courtiol commented May 25, 2021

krlmlr commented Jun 9, 2021

courtiol commented Jun 9, 2021 •

edited

krlmlr commented Jun 9, 2021

hadley commented Jun 14, 2021

courtiol commented Jun 15, 2021

krlmlr commented Aug 6, 2021

Exposing cell content in list-columns? #892

Exposing cell content in list-columns? #892

Comments

courtiol commented May 25, 2021

krlmlr commented Jun 9, 2021

courtiol commented Jun 9, 2021 • edited

krlmlr commented Jun 9, 2021

hadley commented Jun 14, 2021

courtiol commented Jun 15, 2021

Default outputs for non-atomic vectors

Improved outputs via methods for toString()

krlmlr commented Aug 6, 2021

courtiol commented Jun 9, 2021 •

edited

Improved outputs via methods for `toString()`