Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposing cell content in list-columns? #892

Open
courtiol opened this issue May 25, 2021 · 6 comments
Open

Exposing cell content in list-columns? #892

courtiol opened this issue May 25, 2021 · 6 comments
Milestone

Comments

@courtiol
Copy link

When displaying tibbles with list-columns, it would be nice to be able to give a glimpse of the content within each cell.
For example, if the width is sufficient, instead of:

> tibble::tibble(x = list(1:2, 1:100))
# A tibble: 2 x 1
  x          
  <list>     
1 <int [2]>  
2 <int [100]>

it would great to have something like what str() produces:

> tibble::tibble(x = list(1:2, 1:100))
# A tibble: 2 x 1
  x          
  <list>     
1 int [1:2] 1 2
2 int [1:100] 1 2 3 4 5 6 7 8 9 10 ...

I guess this could be done by defining one's own class and pillar method, but I think that it would be useful for any tibble.
Perhaps whether to expose the content of not could be set with a global formatting option.

A motivation is that it could play well with dplyr::summarise() when using function not outputting scalars:

> iris %>%
+   group_by(Species) %>%
+   summarise(range = list(range(Sepal.Length)),
+             quartiles = list(quantile(Sepal.Length)))
# A tibble: 3 x 3
  Species    range     quartiles
  <fct>      <list>    <list>   
1 setosa     <dbl [2]> <dbl [5]>
2 versicolor <dbl [2]> <dbl [5]>
3 virginica  <dbl [2]> <dbl [5]>

A difficulty is that any kind of content can be nested within a cell and not just vectors, but perhaps specific displays could be setup for the main class.

This is probably an issue for pillar, but the motivation is the display of tibbles.

@krlmlr
Copy link
Member

krlmlr commented Jun 9, 2021

Thanks. I think a way to move forward could indeed be the creation of a custom class that applies the desired formatting. If this is useful and stable, we might incorporate a variant in pillar.

@courtiol
Copy link
Author

courtiol commented Jun 9, 2021

I don't anything about pillar & vctrs so I don't know how stable the code below may be, but here is a simple proof of concept:

list_col <- function(x) {
  vctrs::new_vctr(x, class = "list_col")
}

formatter_list_element <- function(x, width) {
  start_txt <- "<"
  end_txt <- ">"
  ptype_txt  <-  vctrs::vec_ptype_abbr(x) # note: not working if element is not a vector (e.g. a function), do we care?
  context_text <- ifelse(length(x) > 0,
                         paste0(" [", length(x), "] ",
                                toString(x,
                                         width = width - nchar(ptype_txt) - nchar(paste0("<[]>", length(x))))),
                         "")
  paste0(start_txt, ptype_txt, context_text, end_txt)
}

format.list_col <- function(x, ..., width = 25, formater = formatter_list_element) {
  res <- purrr::map_chr(x, ~  formater(.x, width))
  format(res, justify = "left")
}

vec_ptype_abbr.list_col <- function(x) {
  "list-col"
}

pillar_shaft.list_col <- function(x, ...) {
  out <- format(x, width = 25) # how to define width?
  pillar::new_pillar_shaft_simple(out, min_width = 10) # what should min_width be?
}

## Example 1:
x <- list(1:2, TRUE, NA, NULL, 1.3, list(1, b = 2:10), matrix(1:9, nrow = 3))
     
y <- list_col(x)

tibble::tibble(x = x, y = y)
#> # A tibble: 7 x 2
#>   x                y                          
#>   <list>           <list-col>                 
#> 1 <int [2]>        <int [2] 1, 2>             
#> 2 <lgl [1]>        <lgl [1] TRUE>             
#> 3 <lgl [1]>        <lgl [1] NA>               
#> 4 <NULL>           <NULL>                     
#> 5 <dbl [1]>        <dbl [1] 1.3>              
#> 6 <named list [2]> <named list [2] 1, 2:10>   
#> 7 <int [3 × 3]>    <int[,3] [9] 1, 2, 3, ....>

# note: display could be improved:
# - in console, colors are not consistent
# - matrix dim are weird
# - named list don't show names

## Example 2:
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
iris %>%
  group_by(Species) %>%
  summarise(range = list_col(list(range(Sepal.Length))),
            quartiles = list_col(list(quantile(Sepal.Length))))
#> # A tibble: 3 x 3
#>   Species    range              quartiles                  
#>   <fct>      <list-col>         <list-col>                 
#> 1 setosa     <dbl [2] 4.3, 5.8> <dbl [5] 4.3, 4.8, 5, ....>
#> 2 versicolor <dbl [2] 4.9, 7>   <dbl [5] 4.9, 5.6, 5.9....>
#> 3 virginica  <dbl [2] 4.9, 7.9> <dbl [5] 4.9, 6.225, 6....>

Created on 2021-06-09 by the reprex package (v2.0.0)

@krlmlr
Copy link
Member

krlmlr commented Jun 9, 2021

Nice!

@hadley: What do you think?

@hadley
Copy link
Member

hadley commented Jun 14, 2021

Seems like a reasonable idea, but I'd want to see a fuller exploration of what would be displayed for types other than atomic vector.

@courtiol
Copy link
Author

Default outputs for non-atomic vectors

Redefining formatter_list_element() above as:

formatter_list_element <- function(x, width) {
  ptype_txt  <- pillar::obj_sum(x)
  context_text <- ifelse(length(x) > 0,
                         paste0(" ", toString(x, width = width - nchar(ptype_txt) - 3L)),
                         "")
  paste0("<", ptype_txt, context_text, ">")
}

to benefit from the dimensions and ptype extracted by pillar::obj_sum(),
and increasing the width in pillar_shaft.list_col() to 50 to show here more of the output,

pillar_shaft.list_col <- function(x, ...) {
  out <- format(x, width = 50)
  pillar::new_pillar_shaft_simple(out, min_width = 10)
}

we get the following for types other than atomic vectors

> x <- list(a = matrix(1:9, nrow = 3), b = array(1:27, dim = c(3, 3, 3)), c = list(z = 1, zz = list(1, 2)))
> y <- list_col(x)
> tibble::tibble(x = x, y = y)
# A tibble: 3 x 2
  x                 y                                                 
  <named list>      <list-col>                                        
1 <int [3 × 3]>     <int [3 × 3] 1, 2, 3, 4, 5, 6, 7, 8, 9>           
2 <int [3 × 3 × 3]> <int [3 × 3 × 3] 1, 2, 3, 4, 5, 6, 7, 8, 9, 1....>
3 <named list [2]>  <named list [2] 1, list(1, 2)> 

The first 2 rows are not that different from what str() (and thus glimpse()) does:

> str(x)
List of 3
 $ a: int [1:3, 1:3] 1 2 3 4 5 6 7 8 9
 $ b: int [1:3, 1:3, 1:3] 1 2 3 4 5 6 7 8 9 10 ...
 $ c:List of 2
  ..$ z : num 1
  ..$ zz:List of 2
  .. ..$ : num 1
  .. ..$ : num 2

the list looks quite different since it is compacted into a single row for the display of the tibble.
As list() are pandora's boxes, perhaps we could also opt to not reveal their guts...

For fun, I tried list of class lm:

> iris %>%
+   group_nest(Species) %>%
+   rowwise() %>%
+   summarise(lm = list(lm(Sepal.Length ~ Petal.Length, data = data))) %>%
+   mutate(lm = list_col(lm))
`summarise()` has ungrouped output. You can override using the `.groups` argument.
# A tibble: 3 x 1
  lm                                                                                                  
  <list-col>                                                                                          
1 <lm c(`(Intercept)` = 4.21316822303424, Petal.Length = 0.542292597103803), c(`1` = 0.1276221410....>
2 <lm c(`(Intercept)` = 2.40752310536045, Petal.Length = 0.828280961182994), c(`1` = 0.6995563770....>
3 <lm c(`(Intercept)` = 1.05965909090909, Petal.Length = 0.995738636363637), c(`1` = -0.734090909....>

That could certainly be improved but that shows that it should be possible to deal with various classes of non-atomic vectors.

Improved outputs via methods for toString()

If the default outputs are not good enough, perhaps we could build on the fact that toString() is a generic function.
We could thus try to define specific methods for stuff that aren't atomic vectors.

For example, we could imagine a toy method for arrays as follows:

toString.array <- function(x, width = NULL, ...) {
  cols <- apply(x, 2, \(col) toString(col, width = floor(width/ncol(x))))
  toString(paste0("{", paste(cols, collapse = "}{"), "}"), width)
  }

yielding to:

# A tibble: 3 x 2
  x                 y                                                                                                                          
  <named list>      <list-col>                                                                                                                 
1 <int [3 × 3]>     <int [3 × 3] {1, 2, 3}{4, 5, 6}{7, 8, 9}>                                                                                  
2 <int [3 × 3 × 3]> <int [3 × 3 × 3] {{1, 2, 3}{10, 11, 12}{19, 20, 21}}{{4, 5, 6}{13, 14, 15}{22, 23, 24}}{{7, 8, 9}{16, 17, 18}{25, 26, 27}}>
3 <named list [2]>  <named list [2] 1, list(1, 2)> 

or

# A tibble: 3 x 2
  x                 y                                                 
  <named list>      <list-col>                                        
1 <int [3 × 3]>     <int [3 × 3] {1, 2, 3}{4, 5, 6}{7, 8, 9}>         
2 <int [3 × 3 × 3]> <int [3 × 3 × 3] {{1,.......}{{4,.......}{{7,....>
3 <named list [2]>  <named list [2] 1, list(1, 2)> 

depending on the width argument for toString().

That could certainly be improved but that shows that authors of other packages could implement their own methods for toString() for dealing with the display of their specific classes when appearing in a list-column (without the need for them to define vctrs classes).

@krlmlr krlmlr added this to the 3.1.4 milestone Jul 29, 2021
@krlmlr
Copy link
Member

krlmlr commented Aug 6, 2021

Thanks. I think the easiest way to start is to expand the contents only for elements where is_bare_atomic() holds, and there only to use the first three elements, and only if there's space. I'll take a look in pillar.

@krlmlr krlmlr removed this from the 3.1.4 milestone Aug 6, 2021
@krlmlr krlmlr added this to the 3.1.7 milestone Dec 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants