Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pivot_longer converts variable labels to new value labels #1535

Open
JDenn0514 opened this issue Dec 19, 2023 · 1 comment
Open

pivot_longer converts variable labels to new value labels #1535

JDenn0514 opened this issue Dec 19, 2023 · 1 comment

Comments

@JDenn0514
Copy link

JDenn0514 commented Dec 19, 2023

I think it would be really useful for pivot_longer to preserve the variable labels as the value labels after pivoting. Unfortunately this is not possible. To clarify, I don't think value labels should be preserved.


I work with survey data that are usually saved as .sav files. I then use the haven package to import them to R. This gives me both variable labels and value labels. In this case, only variable labels are relevant. Often, the variable labels are the questions used in the survey and are quite long. When a question is "select all that apply", each response option is split into a new variable. In order to analyze the questions, I use pivot_longer to make it into one variable.

The issue I run into is that I would like to maintain the original variable labels as the value labels. Here is an example data frame with variable labels.

library(tidyr)

# create fake data
df <- tibble(
  q1 = haven::labelled(0:1, label = "Question 1"),
  q2_1 = haven::labelled(0:1, label = "Question 2, Response Option 1"),
  q2_2 = haven::labelled(0:1, label = "Question 2, Response Option 2"),
  q2_3 = haven::labelled(0:1, label = "Question 2, Response Option 3"),
  q2_4 = haven::labelled(0:1, label = "Question 2, Response Option 4"),
  q2_5 = haven::labelled(0:1, label = "Question 2, Response Option 5"),
  q2_6 = haven::labelled(0:1, label = "Question 2, Response Option 6"),
  q2_7 = haven::labelled(0:1, label = "Question 2, Response Option 7"),
  q2_8 = haven::labelled(0:1, label = "Question 2, Response Option 8"),
  q3 = haven::labelled(0:1, label = "Question 3")
)

# pivot the data
df_long <- df %>% 
  pivot_longer(
    cols = -c(q1, q3),
    names_to = " var",
    values_to = "resp"
  )

df_long

labelled::look_for(df_long)

We can see in df_long that none of the variable labels made it in as value_labels for "var". We can manually set it using labelled::set_variable_labels like this:

df_long <- df_long %>% 
  set_value_labels(
    var = c(q2_1 = "Question 2, Response Option 1",
                 q2_2 = "Question 2, Response Option 2",
                 q2_3 = "Question 2, Response Option 3",
                 q2_4 = "Question 2, Response Option 4",
                 q2_5 = "Question 2, Response Option 5",
                 q2_6 = "Question 2, Response Option 6",
                 q2_7 = "Question 2, Response Option 7",
                 q2_8 = "Question 2, Response Option 8")
  )

labelled::look_for(df_long)

We can see now that the "var" variable has value labels. These value labels are the same as the variable labels of the variables that were pivoted. This is the ideal output.

Is it possible to change pivot_longer() so that it provides you the option of using the variable labels as the new value_labels in the "var" column?

The current workaround I have found is with the sjlabelled::label_to_colname() function as seen below:

df_long <- df %>% 
  sjlabelled::label_to_colnames(q2_1:q2_8) %>% 
  pivot_longer(
    cols = -c(q1, q3),
    names_to = "var",
    values_to = "resp"
  )

df_long

However, this is really not ideal as it just renames the variables immediately and doesn't add value_labels. Thus making subsetting the "var" variable incredibly cumbersome.

I also think that this should be possible since each variable has only one variable_label and therefore there shouldn't be any conflicts when pivoting, unlike with value_labels.

@JDenn0514
Copy link
Author

I believe that I have found a simple solution. I am not sure how easy it would be to add this into pivot_longer as it exists right now but this was the function I created:

pivot_longer_values <- function(data, cols, names_to, values_to, add_value_labels = TRUE) {
  long <- data %>% 
    tidyr::pivot_longer(
      cols = {{ cols }},
      names_to = names_to,
      values_to = values_to
    )
  
  if (add_value_labels == TRUE) {
    # create a vector containing the variable labels
    var_labs <- labelled::var_label(x = data %>% select( {{ cols }})) %>% 
      unlist()
    
    # flip the names and values of the vector
    var_labs <- setNames(names(var_labs), var_labs)
    
    # add the vector of labels as value labels to the new column of names
    labelled::val_labels(long[{{names_to}}]) <- var_labs
    
    return(long)
  } else {
    return(long)
  }
  
}

This function is basically a wrapper around the pivot_longer function with the same function variable names. I also added in add_value_labels so that people have the option of including the variable labels as value labels in the new columns specified under names_to. I tested this with the original data set I created and on other data I have and it seems to work.

# use original data set and make it longer with the value labels included
tbl_long <- tbl %>% 
  pivot_longer_values(
    cols = -c(q1, q3),
    names_to = "var",
    values_to = "resp",
    add_value_labels = TRUE
  )

# use labelled::look_for() to see if it worked
labelled::look_for(tbl_long)

# make a new column with the labels as the names
tbl_long %>% mutate(var_f = as_factor(var))

# here is a more in-depth example of a problem that this solves.
tbl_long %>% 
  # get the labels in a new column
  mutate(var_f = as_factor(var)) %>% 
  # filter out some of the variables
  filter(!var %in% c("q2_6", "q2_7", "q2_8")) %>% 
  # group it by the labels
  group_by(var_f) %>% 
  # get the frequency 
  count(q1)

Hopefully this is something that can be added into tidyr.

@JDenn0514 JDenn0514 changed the title Pivot_longer converts variable labels to new value labels pivot_longer converts variable labels to new value labels May 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant