extend dplyr to include tabyl class so that tabyl attributes are preserved by dplyr operations #527

larry77 · 2023-02-03T14:19:43Z

Hello,
In my workflow I often call tabyl on some data and as a consequence I get an object having both "tabyl" and "data.frame" class.
It seems to me that if the resulting object inherits the "tabyl" class then adorn_totals will fail on it.
Why? I think I did not have this in the past.
In the reprex below, df has only the "data.frame" class and everything is fine.
df2 has also the tabyl class (in reality it comes from calling tabyl on some data) and then adorn_totals() fails.
Any idea if this is intended?

Thanks!

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(janitor)
#> 
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#> 
#>     chisq.test, fisher.test

df <- structure(list(procedure_name = c("Agriculture Block Exemption Regulation", 
"Fisheries Block Exemption Regulation", "General Block Exemption Regulation", 
"Notified Aid"), n = c(38L, 1L, 215L, 51L), percent = c(12.5, 
0.3, 70.5, 16.7)), row.names = c(NA, -4L), class = c(## "tabyl", 
                                                     "data.frame"))

df
#>                           procedure_name   n percent
#> 1 Agriculture Block Exemption Regulation  38    12.5
#> 2   Fisheries Block Exemption Regulation   1     0.3
#> 3     General Block Exemption Regulation 215    70.5
#> 4                           Notified Aid  51    16.7

is.data.frame(df)
#> [1] TRUE

df |>
    adorn_totals()
#>                          procedure_name   n percent
#>  Agriculture Block Exemption Regulation  38    12.5
#>    Fisheries Block Exemption Regulation   1     0.3
#>      General Block Exemption Regulation 215    70.5
#>                            Notified Aid  51    16.7
#>                                   Total 305   100.0

df2 <- structure(list(procedure_name = c("Agriculture Block Exemption Regulation", 
"Fisheries Block Exemption Regulation", "General Block Exemption Regulation", 
"Notified Aid"), n = c(38L, 1L, 215L, 51L), percent = c(12.5, 
0.3, 70.5, 16.7)), row.names = c(NA, -4L), class = c( "tabyl", 
                                                     "data.frame"))


df2 |>
    adorn_totals()
#> Error in 1:nrow(attr(dat, "core")): argument of length 0


sessionInfo()
#> R version 4.2.2 (2022-10-31)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Debian GNU/Linux 11 (bullseye)
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.13.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
#>  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
#>  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] janitor_2.2.0 dplyr_1.1.0  
#> 
#> loaded via a namespace (and not attached):
#>  [1] knitr_1.40        magrittr_2.0.3    tidyselect_1.2.0  timechange_0.2.0 
#>  [5] R.cache_0.16.0    R6_2.5.1          rlang_1.0.6       fastmap_1.1.0    
#>  [9] fansi_1.0.4       stringr_1.5.0     styler_1.8.0      highr_0.9        
#> [13] tools_4.2.2       xfun_0.34         R.oo_1.25.0       utf8_1.2.3       
#> [17] cli_3.6.0         withr_2.5.0       htmltools_0.5.3   yaml_2.3.6       
#> [21] digest_0.6.30     tibble_3.1.8      lifecycle_1.0.3   purrr_1.0.1      
#> [25] vctrs_0.5.2       R.utils_2.12.1    fs_1.5.2          snakecase_0.11.0 
#> [29] glue_1.6.2        evaluate_0.17     rmarkdown_2.17    reprex_2.0.2     
#> [33] stringi_1.7.12    pillar_1.8.1      compiler_4.2.2    generics_0.1.3   
#> [37] R.methodsS3_1.8.2 lubridate_1.9.1   pkgconfig_2.0.3

^{Created on 2023-02-03 with reprex v2.0.2}

sfirke · 2023-02-03T16:44:25Z

Could you share more about how df2 gets created? I see your object has class tabyl but it does not have the attributes "core" or "tabyl_type" which a tabyl should. The new adorn_totals() will call as_tabyl on an input even if it's a tabyl, for the minor improvement of re-sorting its core attribute data.frame. But calling as_tabyl on an object of class tabyl assumes it has a core attribute data.frame already.

I could start exploring how to mitigate this - but I would need to understand how your input is created, since I can't conceive of a way that someone is getting an object that's a tabyl without core or tabyl_type attributes.

larry77 · 2023-02-03T21:58:30Z

Hello,
Below you can find (I hope!) a more useful reprex.
In real life, sometimes I need to adjust/round the percent column while not changing its type, but it seems that adorn_totals() does not like this now.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(janitor)
#> 
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#> 
#>     chisq.test, fisher.test


df <- structure(list(member_state_3_letter_codes = c("AUT", "AUT", 
"AUT", "AUT", "AUT", "AUT", "AUT", "AUT", "AUT", "AUT"), case_no = c("N 135/2010", 
"N 197/2010", "N 418/2007", "N 521/2009", "N 564b/2004", "N 622/2003", 
"SA.100539", "SA.32485", "SA.33384", "SA.33496"), year = c(2021, 
2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021), amount_spent_aid_element_in_eur_million = c(4.831, 
0.25, 0.01, 0.004, 0.5668, 0.2883, 0.005, 0.981, 314.074, 0.042
), procedure_name = c("Notified Aid", "Notified Aid", "Notified Aid", 
"Notified Aid", "Notified Aid", "Notified Aid", "General Block Exemption Regulation", 
"General Block Exemption Regulation", "Notified Aid", "Notified Aid"
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))

df2 <- df |> 
    tabyl(procedure_name) |> 
    ## mutate(percent=round_preserve_sum(percent*100,1)) %>%
    adorn_totals() 

df2
#>                      procedure_name  n percent
#>  General Block Exemption Regulation  2     0.2
#>                        Notified Aid  8     0.8
#>                               Total 10     1.0


df3 <- df |> 
    tabyl(procedure_name) |> 
    mutate(percent=round(percent*100,1)) |> 
    adorn_totals()
#> Error in 1:nrow(attr(dat, "core")): argument of length 0

df3
#> Error in eval(expr, envir, enclos): object 'df3' not found

^{Created on 2023-02-03 with reprex v2.0.2}

sfirke · 2023-02-06T16:48:45Z

Interesting. dplyr::mutate() has destroyed data.frame attributes in the past but I had understood that was changing. I will ask in a dplyr issue for thoroughness. Compare the attributes stored on the data.frame before and after your mutate call:

Beginning Attributes

df |>
  tabyl(procedure_name) |>
  attributes()

$names
[1] "procedure_name" "n"              "percent"       

$class
[1] "tabyl"      "data.frame"

$row.names
[1] 1 2

$core
                      procedure_name n percent
1 General Block Exemption Regulation 2     0.2
2                       Notified Aid 8     0.8

$tabyl_type
[1] "one_way"

Attributes after dplyr::mutate

df |>
  tabyl(procedure_name) |>
  mutate(percent=round(percent*100,1)) |> 
  attributes()

$names
[1] "procedure_name" "n"              "percent"       

$row.names
[1] 1 2

$class
[1] "tabyl"      "data.frame"

I see how this change in 2.2.0 could feel like a regression to you but it was accidental that it worked in the first place. And the new behavior is as a result of an improvement in janitor. Perhaps I could mitigate this on the janitor side, but not right now - I'm tired from the last rewrite and this feels like it's caused by another package.

For now here is the janitor-y way to do what you're doing:

df |>
  tabyl(procedure_name) |>
  adorn_rounding(digits = 1,,,percent) %>%
  adorn_totals() %>%
  adorn_pct_formatting()

Or if you want an integer output multiplied by 100, then simply do that mutate step last, after adorn_totals().

larry77 · 2023-02-07T14:51:59Z

Thanks. Beyond the case in the reprex, for which you show an elegant solution, I may want to call tabyl, mutate all sort of stuff in the ensuing object and then calling adorn_totals.
So far for me the solution is to interpose "|>as_tibble()" after calling tabyl, but there must be a better way to handle this (though I totally understand this should not be a blocking condition for a release).
Keep up the good work!

renatocava · 2023-04-06T16:52:32Z

dplyr::select() destroy "core" attribute too

sfirke · 2023-04-09T16:44:56Z

In this discussion, Davis Vaughan provided good insight into what's going on here and how janitor could extend dplyr such that attributes would be preserved. I do not have the bandwidth to implement this anytime soon unfortunately, but there's the full scoop on the situation.

sfirke self-assigned this Feb 3, 2023

sfirke changed the title ~~Adorn_totals() working only with tibbles?~~ adorn_totals() fails when input is a tabyl but lacks tabyl attributes Feb 3, 2023

sfirke changed the title ~~adorn_totals() fails when input is a tabyl but lacks tabyl attributes~~ extend dplyr to include tabyl class so that tabyl attributes are preserved by dplyr operations Aug 19, 2023

sfirke added the pull-request-welcome label Aug 19, 2023

sfirke mentioned this issue Aug 19, 2023

Submit 2.3.0 to CRAN #558

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extend dplyr to include tabyl class so that tabyl attributes are preserved by dplyr operations #527

extend dplyr to include tabyl class so that tabyl attributes are preserved by dplyr operations #527

larry77 commented Feb 3, 2023

sfirke commented Feb 3, 2023

larry77 commented Feb 3, 2023

sfirke commented Feb 6, 2023

larry77 commented Feb 7, 2023

renatocava commented Apr 6, 2023

sfirke commented Apr 9, 2023

extend dplyr to include tabyl class so that tabyl attributes are preserved by dplyr operations #527

extend dplyr to include tabyl class so that tabyl attributes are preserved by dplyr operations #527

Comments

larry77 commented Feb 3, 2023

sfirke commented Feb 3, 2023

larry77 commented Feb 3, 2023

sfirke commented Feb 6, 2023

larry77 commented Feb 7, 2023

renatocava commented Apr 6, 2023

sfirke commented Apr 9, 2023