Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend dplyr to include tabyl class so that tabyl attributes are preserved by dplyr operations #527

Open
larry77 opened this issue Feb 3, 2023 · 6 comments
Assignees

Comments

@larry77
Copy link

larry77 commented Feb 3, 2023

Hello,
In my workflow I often call tabyl on some data and as a consequence I get an object having both "tabyl" and "data.frame" class.
It seems to me that if the resulting object inherits the "tabyl" class then adorn_totals will fail on it.
Why? I think I did not have this in the past.
In the reprex below, df has only the "data.frame" class and everything is fine.
df2 has also the tabyl class (in reality it comes from calling tabyl on some data) and then adorn_totals() fails.
Any idea if this is intended?

Thanks!

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(janitor)
#> 
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#> 
#>     chisq.test, fisher.test

df <- structure(list(procedure_name = c("Agriculture Block Exemption Regulation", 
"Fisheries Block Exemption Regulation", "General Block Exemption Regulation", 
"Notified Aid"), n = c(38L, 1L, 215L, 51L), percent = c(12.5, 
0.3, 70.5, 16.7)), row.names = c(NA, -4L), class = c(## "tabyl", 
                                                     "data.frame"))

df
#>                           procedure_name   n percent
#> 1 Agriculture Block Exemption Regulation  38    12.5
#> 2   Fisheries Block Exemption Regulation   1     0.3
#> 3     General Block Exemption Regulation 215    70.5
#> 4                           Notified Aid  51    16.7

is.data.frame(df)
#> [1] TRUE

df |>
    adorn_totals()
#>                          procedure_name   n percent
#>  Agriculture Block Exemption Regulation  38    12.5
#>    Fisheries Block Exemption Regulation   1     0.3
#>      General Block Exemption Regulation 215    70.5
#>                            Notified Aid  51    16.7
#>                                   Total 305   100.0

df2 <- structure(list(procedure_name = c("Agriculture Block Exemption Regulation", 
"Fisheries Block Exemption Regulation", "General Block Exemption Regulation", 
"Notified Aid"), n = c(38L, 1L, 215L, 51L), percent = c(12.5, 
0.3, 70.5, 16.7)), row.names = c(NA, -4L), class = c( "tabyl", 
                                                     "data.frame"))


df2 |>
    adorn_totals()
#> Error in 1:nrow(attr(dat, "core")): argument of length 0


sessionInfo()
#> R version 4.2.2 (2022-10-31)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Debian GNU/Linux 11 (bullseye)
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.13.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
#>  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
#>  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] janitor_2.2.0 dplyr_1.1.0  
#> 
#> loaded via a namespace (and not attached):
#>  [1] knitr_1.40        magrittr_2.0.3    tidyselect_1.2.0  timechange_0.2.0 
#>  [5] R.cache_0.16.0    R6_2.5.1          rlang_1.0.6       fastmap_1.1.0    
#>  [9] fansi_1.0.4       stringr_1.5.0     styler_1.8.0      highr_0.9        
#> [13] tools_4.2.2       xfun_0.34         R.oo_1.25.0       utf8_1.2.3       
#> [17] cli_3.6.0         withr_2.5.0       htmltools_0.5.3   yaml_2.3.6       
#> [21] digest_0.6.30     tibble_3.1.8      lifecycle_1.0.3   purrr_1.0.1      
#> [25] vctrs_0.5.2       R.utils_2.12.1    fs_1.5.2          snakecase_0.11.0 
#> [29] glue_1.6.2        evaluate_0.17     rmarkdown_2.17    reprex_2.0.2     
#> [33] stringi_1.7.12    pillar_1.8.1      compiler_4.2.2    generics_0.1.3   
#> [37] R.methodsS3_1.8.2 lubridate_1.9.1   pkgconfig_2.0.3

Created on 2023-02-03 with reprex v2.0.2

@sfirke sfirke self-assigned this Feb 3, 2023
@sfirke
Copy link
Owner

sfirke commented Feb 3, 2023

Could you share more about how df2 gets created? I see your object has class tabyl but it does not have the attributes "core" or "tabyl_type" which a tabyl should. The new adorn_totals() will call as_tabyl on an input even if it's a tabyl, for the minor improvement of re-sorting its core attribute data.frame. But calling as_tabyl on an object of class tabyl assumes it has a core attribute data.frame already.

I could start exploring how to mitigate this - but I would need to understand how your input is created, since I can't conceive of a way that someone is getting an object that's a tabyl without core or tabyl_type attributes.

@sfirke sfirke changed the title Adorn_totals() working only with tibbles? adorn_totals() fails when input is a tabyl but lacks tabyl attributes Feb 3, 2023
@larry77
Copy link
Author

larry77 commented Feb 3, 2023

Hello,
Below you can find (I hope!) a more useful reprex.
In real life, sometimes I need to adjust/round the percent column while not changing its type, but it seems that adorn_totals() does not like this now.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(janitor)
#> 
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#> 
#>     chisq.test, fisher.test


df <- structure(list(member_state_3_letter_codes = c("AUT", "AUT", 
"AUT", "AUT", "AUT", "AUT", "AUT", "AUT", "AUT", "AUT"), case_no = c("N 135/2010", 
"N 197/2010", "N 418/2007", "N 521/2009", "N 564b/2004", "N 622/2003", 
"SA.100539", "SA.32485", "SA.33384", "SA.33496"), year = c(2021, 
2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021), amount_spent_aid_element_in_eur_million = c(4.831, 
0.25, 0.01, 0.004, 0.5668, 0.2883, 0.005, 0.981, 314.074, 0.042
), procedure_name = c("Notified Aid", "Notified Aid", "Notified Aid", 
"Notified Aid", "Notified Aid", "Notified Aid", "General Block Exemption Regulation", 
"General Block Exemption Regulation", "Notified Aid", "Notified Aid"
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))

df2 <- df |> 
    tabyl(procedure_name) |> 
    ## mutate(percent=round_preserve_sum(percent*100,1)) %>%
    adorn_totals() 

df2
#>                      procedure_name  n percent
#>  General Block Exemption Regulation  2     0.2
#>                        Notified Aid  8     0.8
#>                               Total 10     1.0


df3 <- df |> 
    tabyl(procedure_name) |> 
    mutate(percent=round(percent*100,1)) |> 
    adorn_totals()
#> Error in 1:nrow(attr(dat, "core")): argument of length 0

df3
#> Error in eval(expr, envir, enclos): object 'df3' not found

Created on 2023-02-03 with reprex v2.0.2

@sfirke
Copy link
Owner

sfirke commented Feb 6, 2023

Interesting. dplyr::mutate() has destroyed data.frame attributes in the past but I had understood that was changing. I will ask in a dplyr issue for thoroughness. Compare the attributes stored on the data.frame before and after your mutate call:

Beginning Attributes

df |>
  tabyl(procedure_name) |>
  attributes()

$names
[1] "procedure_name" "n"              "percent"       

$class
[1] "tabyl"      "data.frame"

$row.names
[1] 1 2

$core
                      procedure_name n percent
1 General Block Exemption Regulation 2     0.2
2                       Notified Aid 8     0.8

$tabyl_type
[1] "one_way"

Attributes after dplyr::mutate

df |>
  tabyl(procedure_name) |>
  mutate(percent=round(percent*100,1)) |> 
  attributes()

$names
[1] "procedure_name" "n"              "percent"       

$row.names
[1] 1 2

$class
[1] "tabyl"      "data.frame"

I see how this change in 2.2.0 could feel like a regression to you but it was accidental that it worked in the first place. And the new behavior is as a result of an improvement in janitor. Perhaps I could mitigate this on the janitor side, but not right now - I'm tired from the last rewrite and this feels like it's caused by another package.

For now here is the janitor-y way to do what you're doing:

df |>
  tabyl(procedure_name) |>
  adorn_rounding(digits = 1,,,percent) %>%
  adorn_totals() %>%
  adorn_pct_formatting()

Or if you want an integer output multiplied by 100, then simply do that mutate step last, after adorn_totals().

@larry77
Copy link
Author

larry77 commented Feb 7, 2023

Thanks. Beyond the case in the reprex, for which you show an elegant solution, I may want to call tabyl, mutate all sort of stuff in the ensuing object and then calling adorn_totals.
So far for me the solution is to interpose "|>as_tibble()" after calling tabyl, but there must be a better way to handle this (though I totally understand this should not be a blocking condition for a release).
Keep up the good work!

@renatocava
Copy link

dplyr::select() destroy "core" attribute too

@sfirke
Copy link
Owner

sfirke commented Apr 9, 2023

In this discussion, Davis Vaughan provided good insight into what's going on here and how janitor could extend dplyr such that attributes would be preserved. I do not have the bandwidth to implement this anytime soon unfortunately, but there's the full scoop on the situation.

@sfirke sfirke changed the title adorn_totals() fails when input is a tabyl but lacks tabyl attributes extend dplyr to include tabyl class so that tabyl attributes are preserved by dplyr operations Aug 19, 2023
@sfirke sfirke mentioned this issue Aug 19, 2023
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants