-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
summarise_at using different functions for different variables #3101
Comments
Hi @profdave, don't know if it will help you but here are some examples in order to illustrate what I understand you want First, a reminder that library(dplyr, warn.conflicts = F)
df <- tribble(
~category, ~x, ~y, ~z,
#----------------------
'a', 4, 6, 8,
'a', 7, 3, 0,
'a', 7, 9, 0,
'b', 2, 8, 8,
'b', 5, 1, 8,
'b', 8, 0, 1,
'c', 2, 1, 1,
'c', 3, 8, 0,
'c', 1, 9, 1
)
df %>%
group_by(category) %>%
summarize_at(vars(x, y), funs(min, max))
#> # A tibble: 3 x 5
#> category x_min y_min x_max y_max
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 a 4 3 7 9
#> 2 b 2 0 8 8
#> 3 c 1 1 3 9 I understood you want to map several functions to some different specific columns. library(purrr)
list(c("x"), c("y")) %>%
map2(lst(min = min, max = max), ~ df %>% group_by(category) %>% summarise_at(.x, .y)) %>%
reduce(inner_join)
#> Joining, by = "category"
#> # A tibble: 3 x 3
#> category x y
#> <chr> <dbl> <dbl>
#> 1 a 4 9
#> 2 b 2 8
#> 3 c 1 9 In the example above, fist you select some column to apply function in a list, you map them to a list of same length with the different functions you want and it will apply respectively in It can use every feature of list(.vars = lst("x", "y", c("y", "z")),
.funs = lst(min, max, funs(mean = mean, median = median))) %>%
pmap(~ df %>% group_by(category) %>% summarise_at(.x, .y)) %>%
reduce(inner_join, by = "category")
#> # A tibble: 3 x 7
#> category x y y_mean z_mean y_median z_median
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 a 4 9 6 2.6666667 6 0
#> 2 b 2 8 3 5.6666667 1 8
#> 3 c 1 9 6 0.6666667 8 1 You can do the same with all Is this the kind of result you seek ? If not, I will delete this post. Eventually, I do not know if we could implement one function to do that or include it in |
Thanks very much @cderv, it looks like this is exactly what I was talking about. I'll study it more closely (and get myself 100% up to date on purrr) to understand it better. But would it really be so hard to incorporate this functionality into dplyr? You know better than I do, of course, but I think it would be very helpful to the average user. |
library(dplyr, warn.conflicts = FALSE)
df <- tribble(
~category, ~x, ~y, ~z,
#----------------------
'a', 4, 6, 8,
'a', 7, 3, 0,
'a', 7, 9, 0,
'b', 2, 8, 8,
'b', 5, 1, 8,
'b', 8, 0, 1,
'c', 2, 1, 1,
'c', 3, 8, 0,
'c', 1, 9, 1
)
df %>%
group_by(category) %>%
summarise_all(funs(mean, median, first))
#> # A tibble: 3 x 10
#> category x_mean y_mean z_mean x_median y_median z_med… x_fi… y_fi… z_fi…
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 a 6.00 6.00 2.67 7.00 6.00 0 4.00 6.00 8.00
#> 2 b 5.00 3.00 5.67 5.00 1.00 8.00 2.00 8.00 8.00
#> 3 c 2.00 6.00 0.667 2.00 8.00 1.00 2.00 1.00 1.00 |
When I use group_by and summarise in dplyr, I can naturally apply different summary functions to different variables. For instance:
results in output:
My question is, how would I do this with summarise_at? Obviously for this example it's unnecessary, but it would be useful if I have lots of variables that I want to take the mean of, lots of medians, etc.
Obviously, this issue is the same for all the new _all's, _at's and _if's. Perhaps this is a feature still in development; if so, I would be a fan of seeing it released as soon as possible.
The text was updated successfully, but these errors were encountered: