Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Option to print both head and tail of tables? #651

Open
DarwinAwardWinner opened this issue Sep 28, 2023 · 6 comments
Open

Comments

@DarwinAwardWinner
Copy link

The S4Vectors package from Bioconductor implements an S4 class called DataFrame (which exists to allow S4 vectors as data frame columns, I believe). One of the nice features of this class is that when printing, it shows both the first and last few rows of the data frame, e.g.:

library(dplyr)
library(S4Vectors)
as(arrange(mtcars, cyl), "DataFrame")
#> DataFrame with 32 rows and 11 columns
#>                        mpg       cyl      disp        hp      drat        wt
#>                  <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
#> Datsun 710            22.8         4     108.0        93      3.85     2.320
#> Merc 240D             24.4         4     146.7        62      3.69     3.190
#> Merc 230              22.8         4     140.8        95      3.92     3.150
#> Fiat 128              32.4         4      78.7        66      4.08     2.200
#> Honda Civic           30.4         4      75.7        52      4.93     1.615
#> ...                    ...       ...       ...       ...       ...       ...
#> AMC Javelin           15.2         8       304       150      3.15     3.435
#> Camaro Z28            13.3         8       350       245      3.73     3.840
#> Pontiac Firebird      19.2         8       400       175      3.08     3.845
#> Ford Pantera L        15.8         8       351       264      4.22     3.170
#> Maserati Bora         15.0         8       301       335      3.54     3.570
#>                       qsec        vs        am      gear      carb
#>                  <numeric> <numeric> <numeric> <numeric> <numeric>
#> Datsun 710           18.61         1         1         4         1
#> Merc 240D            20.00         1         0         4         2
#> Merc 230             22.90         1         0         4         2
#> Fiat 128             19.47         1         1         4         1
#> Honda Civic          18.52         1         1         4         2
#> ...                    ...       ...       ...       ...       ...
#> AMC Javelin          17.30         0         0         3         2
#> Camaro Z28           15.41         0         0         3         4
#> Pontiac Firebird     17.05         0         0         3         2
#> Ford Pantera L       14.50         0         1         5         4
#> Maserati Bora        14.60         0         1         5         8

Created on 2023-09-28 with reprex v2.0.2

Would it be possible to implement this as an option in pillar, at least for tables whose tail is easily accessible (i.e. probably not tables representing database queries)? Overall I prefer the formatting of pillar, but often seeing both the head and tail of a table is useful, because if the table is sorted by a particular column, it may not be clear from just the head that this column varies, e.g.:

library(dplyr)
print(as_tibble(arrange(mtcars, cyl)))
#> # A tibble: 32 × 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  22.8     4 108      93  3.85  2.32  18.6     1     1     4     1
#>  2  24.4     4 147.     62  3.69  3.19  20       1     0     4     2
#>  3  22.8     4 141.     95  3.92  3.15  22.9     1     0     4     2
#>  4  32.4     4  78.7    66  4.08  2.2   19.5     1     1     4     1
#>  5  30.4     4  75.7    52  4.93  1.62  18.5     1     1     4     2
#>  6  33.9     4  71.1    65  4.22  1.84  19.9     1     1     4     1
#>  7  21.5     4 120.     97  3.7   2.46  20.0     1     0     3     1
#>  8  27.3     4  79      66  4.08  1.94  18.9     1     1     4     1
#>  9  26       4 120.     91  4.43  2.14  16.7     0     1     5     2
#> 10  30.4     4  95.1   113  3.77  1.51  16.9     1     1     5     2
#> # ℹ 22 more rows

Created on 2023-09-28 with reprex v2.0.2

As for implementation, I imagine either a logical option to include the tail, in which case the number of rows to be printed would be split equally; or else a fraction between 0 and 1 indication the desired split of rows between head and tail. But maybe you have better ideas.

@DarwinAwardWinner
Copy link
Author

I had a look through the code to see if I could implement this myself, but there were a few too many layers of indirection for me to follow. If you can point me to the appropriate place in the code, I can try implementing this when I have time.

@krlmlr
Copy link
Member

krlmlr commented Sep 29, 2023

Thanks. The prt package implements output in this way, see, e.g., https://github.com/nbenn/prt/blob/main/tests/testthat/_snaps/format.md .

CC @nbenn.

@DarwinAwardWinner
Copy link
Author

Interesting. So it looks like I could potentially define my own print method for data frames and/or tibbles that calls prt::format_dt. Is there an easy way to determine if a given tibble's backend supports efficient random access so that I can avoid trying to e.g. get the tail of a database query result?

@krlmlr
Copy link
Member

krlmlr commented Sep 29, 2023

None that I'm aware of, perhaps you could implement some heuristics? Happy to review if you'd be willing to share an implementation.

@DarwinAwardWinner
Copy link
Author

I will definitely share if I figure it out. Do you have any opinions on how the options should be set up?

@DarwinAwardWinner
Copy link
Author

DarwinAwardWinner commented Sep 29, 2023

A minimal implementation for tibbles, meant to be put in ~/.Rprofile:

print.tbl <- function (x, width = NULL, ..., n = NULL, max_extra_cols = NULL, max_footer_lines = NULL) {
    tryCatch({
        n_half <- if(!is.null(n)) ceiling(n/2)
        prt:::cat_line(prt:::format_dt(x = x, ..., n = n_half, width = width, max_extra_cols = max_extra_cols, max_footer_lines = max_footer_lines))
    }, error = \(...) pillar:::print.tbl(x = x, width = width, ..., n = n, max_extra_cols = max_extra_cols, max_footer_lines = max_footer_lines))
}

I also came up with something for base data frames, but I print them using the aforementioned S4Vectors code, since the dplyr/pillar stuff doesn't print row names, which can't be ignored for base data frames.

print.data.frame <- function(x, ...) {
    tryCatch({
        withr::with_options(
            list(max.print = ncol(x) * 15),
            S4Vectors:::.show_DataFrame(x)
        )
    }, error = \(...) base::print.data.frame(x = x, ...))
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants