-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FR: order of columns resulting from pivot_wider #839
Comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I think this issue is really two separate issues, 1) one for specifying the order in which multiply variables are combined, as per @mattantaliss original comment, and 2) one for how values within a variable are ordered, as in #850 (which was closed as a duplicate of this issue) and @EarlGlynn’s comments. Here are my comments and regexp for the second part. The columns (within a variable) should be ordered the same as in library(tidyr)
library(dplyr)
d = tibble(day_int = c(4,3,5,1,2),
day_fac = factor(day_int, levels=1:5,
labels=c("Mon","Tue", "Wed","Thu","Fri")),
day_char = as.character(day_fac))
d
#> # A tibble: 5 x 3
#> day_int day_fac day_char
#> <dbl> <fct> <chr>
#> 1 4 Thu Thu
#> 2 3 Wed Wed
#> 3 5 Fri Fri
#> 4 1 Mon Mon
#> 5 2 Tue Tue
levels(d$day_fac)
#> [1] "Mon" "Tue" "Wed" "Thu" "Fri"
# spread() respects the ordering of the factor levels ...
d %>%
select(day_fac, day_int) %>%
spread(key = "day_fac", value = "day_int")
#> # A tibble: 1 x 5
#> Mon Tue Wed Thu Fri
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 3 4 5
# ... but pivot_wider() does not
d %>%
select(day_fac, day_int) %>%
pivot_wider(names_from = day_fac, values_from = day_int)
#> # A tibble: 1 x 5
#> Thu Wed Fri Mon Tue
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 4 3 5 1 2
# spread() respects the ordering of numeric variables ...
d %>%
select(day_int) %>%
spread(key = "day_int", value = "day_int")
#> # A tibble: 1 x 5
#> `1` `2` `3` `4` `5`
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 3 4 5
# ... but pivot_wider() does not
d %>%
select(day_int) %>%
pivot_wider(names_from = day_int, values_from = day_int)
#> # A tibble: 1 x 5
#> `4` `3` `5` `1` `2`
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 4 3 5 1 2
# spread() respects the (alphabetic) ordering of character variables ...
d %>%
select(day_char, day_int) %>%
spread(key = "day_char", value = "day_int")
#> # A tibble: 1 x 5
#> Fri Mon Thu Tue Wed
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 5 1 4 2 3
# ... but pivot_wider() does not
d %>%
select(day_char, day_int) %>%
pivot_wider(names_from = day_char, values_from = day_int)
#> # A tibble: 1 x 5
#> Thu Wed Fri Mon Tue
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 4 3 5 1 2 Basically, the request (for the second sub-issue) is that order(d$day_fac)
#> [1] 4 5 2 1 3
order(d$day_int)
#> [1] 4 5 2 1 3
order(d$day_char)
#> [1] 3 4 1 5 2 |
@huftis they're currently ordered by their first appearance in the data frame. Unfortunately, I don't think ordering to match |
For the ordering case, I think I don't know what to call the argument for the other case which is about whether |
Random idea: maybe us_rent_income %>%
pivot_wider(names_from = c(variable, .value), values_from = c(estimate, moe))
us_rent_income %>%
pivot_wider(names_from = c(.value, variable), values_from = c(estimate, moe)) |
No, I don't think I can make that work without a lot more thinking. So then the only choice you get is whether |
Does it seem right to change the order of the column names when you change the position of the variable name? library(tidyr)
pivot_wider(
us_rent_income,
names_from = variable,
values_from = c(estimate, moe),
names_value_loc = "first"
)
#> # A tibble: 52 x 6
#> GEOID NAME estimate_income estimate_rent moe_income moe_rent
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 01 Alabama 24476 747 136 3
#> 2 02 Alaska 32940 1200 508 13
#> 3 04 Arizona 27517 972 148 4
#> 4 05 Arkansas 23789 709 165 5
#> 5 06 California 29454 1358 109 3
#> 6 08 Colorado 32401 1125 109 5
#> 7 09 Connecticut 35326 1123 195 5
#> 8 10 Delaware 31560 1076 247 10
#> 9 11 District of Columbia 43198 1424 681 17
#> 10 12 Florida 25952 1077 70 3
#> # … with 42 more rows
pivot_wider(
us_rent_income,
names_from = variable,
values_from = c(estimate, moe),
names_value_loc = "last"
)
#> # A tibble: 52 x 6
#> GEOID NAME income_estimate income_moe rent_estimate rent_moe
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 01 Alabama 24476 136 747 3
#> 2 02 Alaska 32940 508 1200 13
#> 3 04 Arizona 27517 148 972 4
#> 4 05 Arkansas 23789 165 709 5
#> 5 06 California 29454 109 1358 3
#> 6 08 Colorado 32401 109 1125 5
#> 7 09 Connecticut 35326 195 1123 5
#> 8 10 Delaware 31560 247 1076 10
#> 9 11 District of Columbia 43198 681 1424 17
#> 10 12 Florida 25952 70 1077 3
#> # … with 42 more rows Created on 2020-04-27 by the reprex package (v0.3.0) |
I don't think I wanted to add any more than two arguments for these features as the tradeoff increased complexity of |
Hmm, I would be okay with relying on generating the pivot spec myself if that'd produce what you are getting from your idea for modifying column orderings. My original use-case, however, seems to be slightly different. It looks like your idea was to handle the case of having columns be <value>_<name> or <name>_<value>. What I was looking for, however, is to have <value1>_<name1>, <value2>_<name1>, <value2>_<name2>, etc. instead of <value1>_<name1>, <value1>_<name2>, etc. I'm thinking about all of this as a pair of nested loops. I want to be able to say whether the outer-loop (or inner-loop, whichever is easier or makes more sense to specify) is over names or values, with the current/default behavior being that the outer-loop is iterating over values and the inner-loop is iterating over names. |
@mattantaliss yes, sounds like you just want to control this yourself with a pivot spec. I'll work on more docs for this. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This is true, but so does the ordering used in I have now used |
This comment has been minimized.
This comment has been minimized.
Hm. I think I misunderstood the example (didn’t take a proper look at the data set first). I’m not really sure what I think of this anymore. But it seems more natural to always use the |
This comment has been minimized.
This comment has been minimized.
Note that it’s often not actually possible to fix the column order by running d = head(mtcars) %>%
group_by(gear, cyl) %>%
summarise(mean_hp = mean(hp)) %>%
ungroup() I want to show the A plain d %>%
pivot_wider(names_from = cyl, values_from = mean_hp)
## A tibble: 2 x 4
# gear `6` `8` `4`
# <dbl> <dbl> <dbl> <dbl>
# 1 3 108. 175 NA
# 2 4 110 NA 93 Sorting the tibble by the column variable gives the wrong row order: arrange(d, cyl) %>%
pivot_wider(names_from = cyl, values_from = mean_hp)
# # A tibble: 2 x 4
# gear `4` `6` `8`
# <dbl> <dbl> <dbl> <dbl>
# 1 4 93 110 NA
# 2 3 NA 108. 175 Subsorting by the original row variable still gives the wrong row order. arrange(d, cyl, gear) %>%
pivot_wider(names_from = cyl, values_from = mean_hp)
# # A tibble: 2 x 4
# gear `4` `6` `8`
# <dbl> <dbl> <dbl> <dbl>
# 1 4 93 110 NA
# 2 3 NA 108. 175 So it looks like it’s actually impossible to get both the row and the column order correctly. Unless one uses spread(d, key = cyl, value = mean_hp)
# # A tibble: 2 x 4
# gear `4` `6` `8`
# <dbl> <dbl> <dbl> <dbl>
# 1 3 NA 108. 175
# 2 4 93 110 NA |
hello. there is a way to sort the order of the col as needed using "contains()"
hope it helps :) |
@huftis you may be right that |
Note to self for when I come back to this: before encoding the options in |
For what it's worth, the situation @huftis describes for factor values is exactly the scenario that brought me here. My package, pollster, depends on using factor levels to arrange the columns and the rows. I've tested the development version of tidyr, and the It's not my place to say whether or not the default value of |
It tried adding .value last in names_from hoping to iterate on value columns in the innermost loop - before reading your comment, so it is intuitive! I would suggest that the documentation names_sort would say "Should the columns be sorted?" rather than "Should the column names be sorted?". I only realized they are sorted by how the columns are ordered in names_from after using names_glue where I changed the order. |
I'm looking for the same feature mentioned by @mattantaliss above:
... as well as in this Stack Overflow issue (where a user has provided a pretty funky workaround). Hope this feature may be added someday! If it's possible to do with |
I came here via this StackOverflow question where the user shows what I wanted. The solutions so far are to fix it post-hoc with gnarly |
Many of these comments are a bit off topic, and are more related to #770. What makes this issue unique is that this is about the order in which
To summarize the above conversation,
but some people want it to combine with
This is currently impossible. It is worth noting that it is already possible to combine in I don't think it is really about whether names_vary = c("fastest", "slowest") ( |
For the questions that are on topic, note that this issue of whether Example 1 - Original reprex in this issuelibrary(tidyverse)
spec <- build_wider_spec(us_rent_income, names_from = variable, values_from = c(estimate, moe))
spec
#> # A tibble: 4 × 3
#> .name .value variable
#> <chr> <chr> <chr>
#> 1 estimate_income estimate income
#> 2 estimate_rent estimate rent
#> 3 moe_income moe income
#> 4 moe_rent moe rent
spec <- arrange(spec, variable, .value)
spec
#> # A tibble: 4 × 3
#> .name .value variable
#> <chr> <chr> <chr>
#> 1 estimate_income estimate income
#> 2 moe_income moe income
#> 3 estimate_rent estimate rent
#> 4 moe_rent moe rent
pivot_wider_spec(us_rent_income, spec) %>%
print(n = 2)
#> # A tibble: 52 × 6
#> GEOID NAME estimate_income moe_income estimate_rent moe_rent
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 01 Alabama 24476 136 747 3
#> 2 02 Alaska 32940 508 1200 13
#> # … with 50 more rows Example 2 - SO post 1From #839 (comment) library(tidyverse)
spec <- build_wider_spec(mtcars, names_from = gear, values_from = c(vs, am, carb))
spec
#> # A tibble: 9 × 3
#> .name .value gear
#> <chr> <chr> <dbl>
#> 1 vs_4 vs 4
#> 2 vs_3 vs 3
#> 3 vs_5 vs 5
#> 4 am_4 am 4
#> 5 am_3 am 3
#> 6 am_5 am 5
#> 7 carb_4 carb 4
#> 8 carb_3 carb 3
#> 9 carb_5 carb 5
# don't arrange by `.value` too, user didn't want that
spec <- arrange(spec, gear)
spec
#> # A tibble: 9 × 3
#> .name .value gear
#> <chr> <chr> <dbl>
#> 1 vs_3 vs 3
#> 2 am_3 am 3
#> 3 carb_3 carb 3
#> 4 vs_4 vs 4
#> 5 am_4 am 4
#> 6 carb_4 carb 4
#> 7 vs_5 vs 5
#> 8 am_5 am 5
#> 9 carb_5 carb 5
pivot_wider_spec(mtcars, spec) %>%
print(n = 2)
#> # A tibble: 32 × 16
#> mpg cyl disp hp drat wt qsec vs_3 am_3 carb_3 vs_4 am_4
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 NA NA NA 0 1
#> 2 21 6 160 110 3.9 2.88 17.0 NA NA NA 0 1
#> # … with 30 more rows, and 4 more variables: carb_4 <dbl>, vs_5 <dbl>,
#> # am_5 <dbl>, carb_5 <dbl> Example 3 - SO post 2From #839 (comment) library(tidyverse)
df1 <- structure(
list(
Session_ID = c(1L, 1L, 1L, 3L, 7L, 7L),
Side_Effect = c("anxious", "dizzy", "relaxed", "dizzy", "nauseous", "anxious"),
Number_Code = c(1L, 2L, 3L, 2L, 4L, 1L)
),
class = "data.frame",
row.names = c(NA, -6L)
)
df1 <- df1 %>%
group_by(Session_ID) %>%
mutate(per_session_id = row_number()) %>%
ungroup()
df1
#> # A tibble: 6 × 4
#> Session_ID Side_Effect Number_Code per_session_id
#> <int> <chr> <int> <int>
#> 1 1 anxious 1 1
#> 2 1 dizzy 2 2
#> 3 1 relaxed 3 3
#> 4 3 dizzy 2 1
#> 5 7 nauseous 4 1
#> 6 7 anxious 1 2
spec <- build_wider_spec(df1, per_session_id, c(Side_Effect, Number_Code))
spec
#> # A tibble: 6 × 3
#> .name .value per_session_id
#> <chr> <chr> <int>
#> 1 Side_Effect_1 Side_Effect 1
#> 2 Side_Effect_2 Side_Effect 2
#> 3 Side_Effect_3 Side_Effect 3
#> 4 Number_Code_1 Number_Code 1
#> 5 Number_Code_2 Number_Code 2
#> 6 Number_Code_3 Number_Code 3
spec <- arrange(spec, per_session_id)
spec
#> # A tibble: 6 × 3
#> .name .value per_session_id
#> <chr> <chr> <int>
#> 1 Side_Effect_1 Side_Effect 1
#> 2 Number_Code_1 Number_Code 1
#> 3 Side_Effect_2 Side_Effect 2
#> 4 Number_Code_2 Number_Code 2
#> 5 Side_Effect_3 Side_Effect 3
#> 6 Number_Code_3 Number_Code 3
pivot_wider_spec(df1, spec)
#> # A tibble: 3 × 7
#> Session_ID Side_Effect_1 Number_Code_1 Side_Effect_2 Number_Code_2
#> <int> <chr> <int> <chr> <int>
#> 1 1 anxious 1 dizzy 2
#> 2 3 dizzy 2 <NA> NA
#> 3 7 nauseous 4 anxious 1
#> # … with 2 more variables: Side_Effect_3 <chr>, Number_Code_3 <int> Example 4 - Issue 1064 |
In case I'm not just missing an obvious pre-existing solution, it would be nice to be able to specify how resulting columns from
pivot_wider
will be ordered. Using theus_rent_income
example,produces the columns,
estimate_income
,estimate_rent
,moe_income
, andmoe_rent
. In my use-case (getting to a double-header table), I want them to be in the order,estimate_income
,moe_income
,estimate_rent
, andmoe_rent
.The text was updated successfully, but these errors were encountered: