Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should tabyls always be tibbles? #301

Open
k5cents opened this issue May 20, 2019 · 10 comments
Open

Should tabyls always be tibbles? #301

k5cents opened this issue May 20, 2019 · 10 comments
Assignees
Labels
seeking comments Users and any interested parties should please weigh in - this is in a discussion phase!

Comments

@k5cents
Copy link

k5cents commented May 20, 2019

Feature requests

I think it would be nice to print tabyl frames as tibbles! Especially since "janitor is a #tidyverse-oriented package."

It's a small thing, but consistency just makes the whole process smoother. In the example below, you can see how the tibble object makes it clear that color is an ordinal variable and removes non-significant digits from percent.

library(janitor) # '1.2.0'
library(ggplot2) # '3.2.1'
library(tibble)  # '2.1.3'

t <- tabyl(diamonds, color)
print(t)
#>  color     n    percent
#>      D  6775 0.12560252
#>      E  9797 0.18162773
#>      F  9542 0.17690026
#>      G 11292 0.20934372
#>      H  8304 0.15394883
#>      I  5422 0.10051910
#>      J  2808 0.05205784
as_tibble(t)
#> # A tibble: 7 x 3
#>   color     n percent
#>   <ord> <dbl>   <dbl>
#> 1 D      6775  0.126 
#> 2 E      9797  0.182 
#> 3 F      9542  0.177 
#> 4 G     11292  0.209 
#> 5 H      8304  0.154 
#> 6 I      5422  0.101 
#> 7 J      2808  0.0521

Created on 2019-09-30 by the reprex package (v0.3.0)

Edit: Use reprex::reprex() for example

@sfirke
Copy link
Owner

sfirke commented May 20, 2019

I prefer tibbles, too, in general. Looking back at #44, the hassle of tibbles not printing all their rows was the deciding factor in moving to data.frame.

Now I suppose the print.tabyl() method could pass a large value to n so that say, the first 100 rows print. Then would tibbles be preferable? I like the truncating of the digits, I think, and the labeling of the column vars. What do others think?

There might be implementation wrinkles I'm not thinking of b/c all of the tabyl class and metadata info would have to be attached to a tbl, but I think it could be done.

@sfirke
Copy link
Owner

sfirke commented May 20, 2019

(also a small note, when you do the dplyr::arrange() it strips the tabyl of its tabyl class which does have its own print method that does not show line numbers. Maybe a better example without the arrange. I'm only pointing that out because we are comparing outputs of the print methods and that's not actually print.tabyl())

ggplot2::diamonds %>% 
+      janitor::tabyl(color)
 color     n    percent
     D  6775 0.12560252
     E  9797 0.18162773
     F  9542 0.17690026
     G 11292 0.20934372
     H  8304 0.15394883
     I  5422 0.10051910
     J  2808 0.05205784

@k5cents
Copy link
Author

k5cents commented May 20, 2019

I think passing a larger number into print(n = ) makes a lot of sense. I often do that myself when exploring an object. 100 rows is already much more manageable than 1000, especially when the overflow columns are hidden.

@sfirke
Copy link
Owner

sfirke commented May 5, 2020

Well, now I'm on board with prioritizing this and saying tabyl should always return a tibble. This just burned me. I couldn't figure out why a line like x == "foo" was not matching in a case_when recoding that I was feeding into tabyl. Turns out the value had leading whitespace, " foo". Adding as_tibble() made that immediately evident.

@sfirke sfirke added the seeking comments Users and any interested parties should please weigh in - this is in a discussion phase! label May 5, 2020
@jzadra
Copy link
Contributor

jzadra commented May 5, 2020

I agree that this is a good idea. I"m often converting tabyls to tibbles to work with them further if it's not for immediate interactive checking or for printing in a report.

I would also hope that it's a rare case for someone to be creating a tabyl with more than 50 rows; at that point it seems unlikely that they have a var that is actually a meaningful categorical var. So I think 50 makes sense. The case where it would be more would be 3-way+, in which case perhaps it would make sense to start reducing the number of printed rows per tabyl to limit the overall to 50 or 100.

I think the only drawback here is what happens if I create a basic tabyl object earlier, and then I want to adorn things to it later on in my code? Without it being a tabyl object, would these functions still work without lots of recoding?

Edit: Just realized we're talking about "printing". Would the object remain a tabyl, but only be printed as a tibble? If so my above drawback is moot.

@k5cents
Copy link
Author

k5cents commented May 5, 2020

Well something simple like tibble::as_tibble() changes the class and removes the tabyl part. But you can have a data frame with both classes that prints like a tibble but should (?) keep all the aspects of a tabyl object.

@sfirke
Copy link
Owner

sfirke commented May 6, 2020

The lightest change would be for print.tabyl to convert the tabyl to a tibble at print time. That's adding just a single line of code. But I wonder if that would confuse people, because the tabyl would look like a tibble to the user but not actually be one.

I could try rewriting the tabyl class to also be a tibble, as well as changing print.tabyl. Funnily, right now mtcars %>% count(gear) %>% as_tabyl() %>% class gets you an object that is also a tibble but prints like it's not a tibble, kind of backwards of the first option here.

This rewriting might take more work, and could surface problems I'm not thinking of right now... but right now I feel like if a tabyl is going to look like a tibble when it prints, it should be a tibble.

@jzadra
Copy link
Contributor

jzadra commented May 12, 2020

Yes, I see your points. I agree that printing something that looks different from what the object actually is is not a good idea and would be confusing.

@sfirke sfirke added this to the v2.2 milestone Nov 24, 2020
@sfirke sfirke changed the title Print tabyls as tibbles Should tabyls always be tibbles? Nov 24, 2020
@sfirke
Copy link
Owner

sfirke commented Dec 27, 2020

I'm still on board with making tabyls always be tibbles but I'm not including this in v2.2, it involves updating many many tests.

@sfirke sfirke removed this from the v2.2 milestone Dec 27, 2020
@sfirke sfirke self-assigned this Dec 27, 2020
@sfirke sfirke added this to the v2.3 milestone Dec 27, 2020
@Sibojang9
Copy link

This issue has been around for more than two years.

I do not expect tabyls to always output tibbles. However, it should retain the property of the input dataframe. Removing the tibble property of a input dataframe is highly unexpected.

@sfirke sfirke removed this from the v2.2 milestone Jan 4, 2023
@sfirke sfirke mentioned this issue Aug 19, 2023
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
seeking comments Users and any interested parties should please weigh in - this is in a discussion phase!
Projects
None yet
Development

No branches or pull requests

4 participants