Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use group_by with freq_table #40

Open
2 of 11 tasks
mbcann01 opened this issue Nov 29, 2021 · 1 comment
Open
2 of 11 tasks

Use group_by with freq_table #40

mbcann01 opened this issue Nov 29, 2021 · 1 comment

Comments

@mbcann01
Copy link
Member

mbcann01 commented Nov 29, 2021

Overview

Previously, in #1 I removed the ability to use a grouped tibble with freq_table(). Now, I'm finding that using group_by() might be the most dplyr way to do things. Remember, freq_table() is intended to be integrated with a dplyr pipeline.

Additionally, using group_by() might help with issue #9 in that group_var_1, group_var_1, etc. would naturally flow from the variables added to group_by().

Adding multiple var names to the group_by() function could result in multiple tables rather than being used as grouping variables. (Nah, I don't think I like this idea).

Passing one var name to freq_table() should still produce a one-way frequency table. In other words, you shouldn't need to use group_by() to produce a one-way frequency table.

It turns out that just removing

  if (("grouped_df" %in% .data_class)) {
    .data <- dplyr::ungroup(.data)
  }

from the freq_table code will make it so that group_by() works again. All the stats still work too. The only issue that I can see is that.

mtcars %>% 
  freq_table(am, cyl)

and

mtcars %>% 
  group_by(am) %>%
  freq_table(cyl)

Now return the exact same result. I'm not sure if that's good or not. I guess one problem is that it makes it harder to rename the output columns as described in #9 (i.e., group and outcome). Does it though? Need to think more about this.

One good thing is that we don't have to worry about previous groupings messing up the groups we expect when using group_by with freq_table. According to the group_by documentation, If you apply group_by() to an already grouped dataset, will overwrite the existing grouping variables.

Left off at

2023-03-17

Working through the stuff below in test.Rmd. Decided to create some test files that I can use to compare freqtables to Stata and SAS. The specifics are outlined in #22.

2022-07-31

Trying to decide if I want to soft depreciate ... or hard deprecate it the ... argument in freq_tables. In the iss-40-group-by branch, I have four different versions of the freq_table() function:

  1. freq_table(): The current CRAN version of the function.
  2. freq_table_v2(): In this version, I'm soft deprecating .... It still works, but I'm also adding a .x argument and an informative warning message for users about deprecating .... This is probably the safest route, but it feel like it will slow me down from doing what I actually want to do with freqtables. Also, not being able to use the .x argument by position feels wrong.
  3. freq_table_v3(): In this version, I'm hard deprecating .... I'm just replacing it with the .x argument and an informative warning message for users about deprecating .... Of course, there are issues with the approach breaking code.
  4. freq_table_v4(): In this version, I'm also hard deprecating .... This is the most extreme version and what I was last working on. It begins from the new freq_tbl function and builds on from there. Not only might this fix the group_by issue, but we might also address Group and subgroup make more sense than row and col. #9, Update @return in freq_table #14, Create the freq_tbl function #39, and Add ability to make n-way tables #22. And also modularize the code a little more, which is something I've been wanting to do for a while. Of course, there are lots of issues with the approach breaking code.

Task list

  • Remove group_by check from freq_table() code
  • Remove group_by check from freq_table() documentation
  • Change row_var to group_var and row_cat to group_cat
  • Change col_var to freq_var and col_cat to freq_cat
  • Change ... to .x or something like that. .x = Calculate the number of times each value of this variable is observed in the data frame. If the data frame is group with group_by() then freq_table will calculate the number of times each value of this variable is observed separately for each value of the grouping variable(s).
  • Look into documenting the deprecations/changes with the lifecycle package
  • Update README
  • Update documentation
  • Build check
  • Update version number
  • Submit to CRAN
@mbcann01 mbcann01 created this issue from a note in Bug fixes and enhancements (To do) Nov 29, 2021
@mbcann01 mbcann01 moved this from To do to In progress in Bug fixes and enhancements Apr 1, 2022
mbcann01 added a commit that referenced this issue Apr 3, 2022
@mbcann01
Copy link
Member Author

mbcann01 commented Jul 10, 2022

What to do when two variables are passed to freq_table()?

Cyl is the outcome var of interest

mtcars %>% 
  freq_table(cyl)

Now, cyl within levels of am

mtcars %>% 
  group_by(am) %>%
  freq_table(cyl)
# A tibble: 6 × 17
  row_var row_cat col_var col_cat     n n_row n_total percent_total se_total t_crit_total lcl_total ucl_total percent_row se_row
  <chr>   <chr>   <chr>   <chr>   <int> <int>   <int>         <dbl>    <dbl>        <dbl>     <dbl>     <dbl>       <dbl>  <dbl>
1 am      0       cyl     4           3    19      32          9.38     5.24         2.04      2.86      26.7        15.8   8.59
2 am      0       cyl     6           4    19      32         12.5      5.94         2.04      4.51      30.2        21.1   9.61
3 am      0       cyl     8          12    19      32         37.5      8.70         2.04     22.0       56.1        63.2  11.4 
4 am      1       cyl     4           8    13      32         25        7.78         2.04     12.5       43.7        61.5  14.0 
5 am      1       cyl     6           3    13      32          9.38     5.24         2.04      2.86      26.7        23.1  12.2 
6 am      1       cyl     8           2    13      32          6.25     4.35         2.04      1.45      23.2        15.4  10.4 
# … with 3 more variables: t_crit_row <dbl>, lcl_row <dbl>, ucl_row <dbl>

The code above gives the result we want. However, this works too:

mtcars %>% 
  freq_table(am, cyl)

If that didn't what would we want it to return instead? A list of one-way tables?

The cleanest thing to do for now is change ... to freq_var or something like that. Only accept one variable. If we want multiple n-way tables, we can use purrr.

In the future, we may want multiple vars passed to freq_table() to create multiple n-way tables (#36). But if we go that route, it should be done as a separate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Active Development
Development

No branches or pull requests

1 participant