Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adorn_ns() adds excluded values to a adorn_totals() in a pipe #533

Open
panporter opened this issue Mar 14, 2023 · 3 comments
Open

adorn_ns() adds excluded values to a adorn_totals() in a pipe #533

panporter opened this issue Mar 14, 2023 · 3 comments

Comments

@panporter
Copy link

panporter commented Mar 14, 2023

It might be correct, but hit me by surprise.

Excluding a variable in adorn_totals() works as expected, but adding percentages and Ns later on in the pipe reintroduces the excluded values.

Studying the other issues, it reminds me of #195

library(tidyverse)
library(janitor)

example <- tribble(
  ~year, ~var1, ~var2, ~ignored_num,
  2019,20,12,99,
  2020,30,11,99)

# totals are correct
example %>%
  adorn_totals(c("col", "row"), fill = "-", na.rm = TRUE, name = "Total", !c(year, ignored_num))
#>   year var1 var2 ignored_num Total
#>   2019   20   12          99    32
#>   2020   30   11          99    41
#>  Total   50   23           -    73

# row totals include the ignored_num
example %>%
  adorn_totals(c("col", "row"), fill = "-", na.rm = TRUE, name = "Total", !c(year, ignored_num)) %>%
  adorn_percentages() %>%
  adorn_pct_formatting(digits = 1) %>% 
  adorn_ns()
#>   year       var1       var2 ignored_num        Total
#>   2019 62.5% (20) 37.5% (12)    99  (99) 100.0% (131)
#>   2020 73.2% (30) 26.8% (11)    99  (99) 100.0% (140)
#>  Total 68.5% (50) 31.5% (23)     - (198) 100.0% (271)

# I can tidyselectively exclude `ignored_num` (and have to exclude `year` then as well) from calculating/displaying Ns in its column,
# but it stays as a summand in the rows' total
example %>%
  adorn_totals(c("col", "row"), fill = "-", na.rm = TRUE, name = "Total", !c(year, ignored_num)) %>%
  adorn_percentages() %>%
  adorn_pct_formatting(digits = 1) %>% 
  adorn_ns(,,,!c(year, ignored_num))
#>   year       var1       var2 ignored_num        Total
#>   2019 62.5% (20) 37.5% (12)          99 100.0% (131)
#>   2020 73.2% (30) 26.8% (11)          99 100.0% (140)
#>  Total 68.5% (50) 31.5% (23)           - 100.0% (271)

Created on 2023-03-14 by the reprex package (v2.0.1)

@sfirke
Copy link
Owner

sfirke commented Mar 14, 2023

Hello! I don't see a way to pass along that information from adorn_totals specifying which columns should subsequently be ignored. So I'm afraid you'll have to specify it again as you have done in your example.

That last execution looks good to me. Can you say more about:

but it stays as a summand in the rows' total

Can you show what you were expecting in that last execution?

@panporter
Copy link
Author

Thanks for your respond and clarification. Sorry for being ambiguous.

At least in the last example I would expect the same results as the in the first:

example %>%
  adorn_totals(c("col", "row"), fill = "-", na.rm = TRUE, name = "Total", !c(year, ignored_num)) %>%
  adorn_percentages() %>%
  adorn_pct_formatting(digits = 1) %>% 
  adorn_ns(,,,!c(year, ignored_num))

# I get:
#>   year       var1       var2 ignored_num        Total
#>   2019 62.5% (20) 37.5% (12)          99 100.0% (131)
#>   2020 73.2% (30) 26.8% (11)          99 100.0% (140)
#>  Total 68.5% (50) 31.5% (23)           - 100.0% (271)

# I'd expect:
#   year       var1       var2 ignored_num        Total
#   2019 62.5% (20) 37.5% (12)          99 100.0% (32)
#   2020 73.2% (30) 26.8% (11)          99 100.0% (41)
#  Total 68.5% (50) 31.5% (23)           - 100.0% (73)

So with adorn_ns(,,,!c(year, ignored_num)) I can hide the display of Ns in the ignored_num column, but they still add up to the row-Total: 20+12+99 = 131

I would expect 20+12=32 (as adorn_totals(c("col", "row"), fill = "-", na.rm = TRUE, name = "Total", !c(year, ignored_num)) does in the first place).

N.B.: The documentation states that the first column and all non-numeric columns are always ignored.

So I guess, I have to cast the ignored_num as character.

My understanding of the docs is that I could exclude (numeric) columns via tidyselect specifications as well:

columns to adorn. This takes a tidyselect specification. By default, all columns are adorned except for the first column and columns not of class numeric

Might be misinterpretation.

@panporter
Copy link
Author

Hello,

is there anything new on that subject?

I created a minimal example to pinpoint what's unexpected:

library(tidyverse)
library(janitor)
 
# For the sake of the example, month is numeric.
 
example <- tribble(
  ~year, ~month, ~var1, ~var2,
  2019,11,20,10,
  2020,12,30,10)
 
# I aim to get the column totals of var1 and var2. (Ignore year and month.)
# Those should be *30* and *40*
 
# Attempt 1: Do nothing (not expecting the intended results)
 
example %>%
  adorn_totals(c("col", "row"), fill = "-", na.rm = TRUE, name = "Total") %>%
  adorn_ns()
#>   year   month    var1    var2   Total
#>   2019 11 (11) 20 (20) 10 (10) 41 (41)
#>   2020 12 (12) 30 (30) 10 (10) 52 (52)
#>  Total 23 (23) 50 (50) 20 (20) 93 (93)
 
# Obviously this didn't work, because month is numeric and gets added up too.
# I get 30+11 = 41 and 40+12 = 52.
# So far so good (and correct).
 
# Attempt 2: Exclude month from _totals and fom _ns:
 
example %>%
  adorn_totals(c("col", "row"), fill = "-", na.rm = TRUE, name = "Total", !c(year, month)) %>%
  adorn_ns(,,,!c(year, month))
#>   year month    var1    var2   Total
#>   2019    11 20 (20) 10 (10) 30 (41)
#>   2020    12 30 (30) 10 (10) 40 (52)
#>  Total     - 50 (50) 20 (20) 70 (93)
 
# Now the _ns in the Total column are wrong: (41) instead of (30), (52) instead of (40)
# Even though we excluded the month column.
 
# Attempt 3:
# Of course, when I convert the month to something non-numeric, it works:
 
example <- example %>% mutate(month = as.factor(month))
 
example %>%
  adorn_totals(c("col", "row"), fill = "-", na.rm = TRUE, name = "Total") %>%
  adorn_ns()
#>   year month    var1    var2   Total
#>   2019    11 20 (20) 10 (10) 30 (30)
#>   2020    12 30 (30) 10 (10) 40 (40)
#>  Total     - 50 (50) 20 (20) 70 (70)
 
# This is the result I expected when excluding month.

Attempt 2 is what puzzles me: I exclude one numeric column but this exclusion is partly ignored in the totals column (see highlighted numbers).

year month var1 var2 Total
2019 11 20 (20) 10 (10) 30 (41)
2020 12 30 (30) 10 (10) 40 (52)
Total - 50 (50) 20 (20) 70 (93)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants