You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think it would be really useful for pivot_longer to preserve the variable labels as the value labels after pivoting. Unfortunately this is not possible. To clarify, I don't think value labels should be preserved.
I work with survey data that are usually saved as .sav files. I then use the haven package to import them to R. This gives me both variable labels and value labels. In this case, only variable labels are relevant. Often, the variable labels are the questions used in the survey and are quite long. When a question is "select all that apply", each response option is split into a new variable. In order to analyze the questions, I use pivot_longer to make it into one variable.
The issue I run into is that I would like to maintain the original variable labels as the value labels. Here is an example data frame with variable labels.
We can see in df_long that none of the variable labels made it in as value_labels for "var". We can manually set it using labelled::set_variable_labels like this:
We can see now that the "var" variable has value labels. These value labels are the same as the variable labels of the variables that were pivoted. This is the ideal output.
Is it possible to change pivot_longer() so that it provides you the option of using the variable labels as the new value_labels in the "var" column?
The current workaround I have found is with the sjlabelled::label_to_colname() function as seen below:
However, this is really not ideal as it just renames the variables immediately and doesn't add value_labels. Thus making subsetting the "var" variable incredibly cumbersome.
I also think that this should be possible since each variable has only one variable_label and therefore there shouldn't be any conflicts when pivoting, unlike with value_labels.
The text was updated successfully, but these errors were encountered:
I believe that I have found a simple solution. I am not sure how easy it would be to add this into pivot_longer as it exists right now but this was the function I created:
pivot_longer_values<-function(data, cols, names_to, values_to, add_value_labels=TRUE) {
long<-data %>%
tidyr::pivot_longer(
cols= {{ cols }},
names_to=names_to,
values_to=values_to
)
if (add_value_labels==TRUE) {
# create a vector containing the variable labelsvar_labs<-labelled::var_label(x=data %>% select( {{ cols }})) %>%
unlist()
# flip the names and values of the vectorvar_labs<- setNames(names(var_labs), var_labs)
# add the vector of labels as value labels to the new column of nameslabelled::val_labels(long[{{names_to}}]) <-var_labsreturn(long)
} else {
return(long)
}
}
This function is basically a wrapper around the pivot_longer function with the same function variable names. I also added in add_value_labels so that people have the option of including the variable labels as value labels in the new columns specified under names_to. I tested this with the original data set I created and on other data I have and it seems to work.
# use original data set and make it longer with the value labels includedtbl_long<-tbl %>%
pivot_longer_values(
cols=-c(q1, q3),
names_to="var",
values_to="resp",
add_value_labels=TRUE
)
# use labelled::look_for() to see if it workedlabelled::look_for(tbl_long)
# make a new column with the labels as the namestbl_long %>% mutate(var_f= as_factor(var))
# here is a more in-depth example of a problem that this solves.tbl_long %>%
# get the labels in a new column
mutate(var_f= as_factor(var)) %>%
# filter out some of the variables
filter(!var%in% c("q2_6", "q2_7", "q2_8")) %>%
# group it by the labels
group_by(var_f) %>%
# get the frequency
count(q1)
Hopefully this is something that can be added into tidyr.
JDenn0514
changed the title
Pivot_longer converts variable labels to new value labelspivot_longer converts variable labels to new value labels
May 25, 2024
I think it would be really useful for pivot_longer to preserve the variable labels as the value labels after pivoting. Unfortunately this is not possible. To clarify, I don't think value labels should be preserved.
I work with survey data that are usually saved as .sav files. I then use the haven package to import them to R. This gives me both variable labels and value labels. In this case, only variable labels are relevant. Often, the variable labels are the questions used in the survey and are quite long. When a question is "select all that apply", each response option is split into a new variable. In order to analyze the questions, I use pivot_longer to make it into one variable.
The issue I run into is that I would like to maintain the original variable labels as the value labels. Here is an example data frame with variable labels.
We can see in df_long that none of the variable labels made it in as value_labels for "var". We can manually set it using
labelled::set_variable_labels
like this:We can see now that the "var" variable has value labels. These value labels are the same as the variable labels of the variables that were pivoted. This is the ideal output.
Is it possible to change pivot_longer() so that it provides you the option of using the variable labels as the new value_labels in the "var" column?
The current workaround I have found is with the sjlabelled::label_to_colname() function as seen below:
However, this is really not ideal as it just renames the variables immediately and doesn't add value_labels. Thus making subsetting the "var" variable incredibly cumbersome.
I also think that this should be possible since each variable has only one variable_label and therefore there shouldn't be any conflicts when pivoting, unlike with value_labels.
The text was updated successfully, but these errors were encountered: