Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: names_to_row (an approximate inverse function for row_to_names) #434

Open
billdenney opened this issue Mar 12, 2021 · 7 comments

Comments

@billdenney
Copy link
Collaborator

Feature requests

I think a new function that moves the header row to become the first data row would be helpful.

I sometimes receive data where I need to do a lot of mucking about to get the results into a usable format. I use rio::import() to import the data files, and then I need to get the data ready for analysis. Sometimes those data files have information I need to convert into a column of information in the header rows, and it would be more helpful to have the header row as part of the data.

This would be approximately the inverse of the row_to_names() function (so it seems like it could be a fit in janitor).

The proposed function is below (it would need some cleanup before just dropping into the repo). Some features that it has are:

  • Try to detect values in the header that should be NA, and set them to NA.
  • Default to forcing the columns to be character. That may be a general requirement all the time (i.e. maybe it shouldn't be an option).
names_to_row <- function(data, na_pattern="^\\.\\.\\.[0-9]+", force_character=TRUE) {
  first_row_char <- names(data)
  first_row_char[grepl(x=first_row_char, pattern=na_pattern)] <- NA_character_
  first_row <-
    setNames(
      as.data.frame(as.list(first_row_char)),
      nm=paste0("X", seq_along(first_row_char))
    )
  ret_prep <-
    if (force_character) {
      data %>%
        mutate(across(.cols=everything(), .fns=as.character))
    } else {
      data
    }
  names(ret_prep) <- paste0("X", seq_along(ret_prep))
  ret <-
    bind_rows(
      first_row,
      ret_prep
    )
  ret
}
@jzadra
Copy link
Contributor

jzadra commented Mar 12, 2021

Wouldn't this do the trick? Unless I'm missing some other necessary functionality?

names(mtcars) <- mtcars %>% slice(1) %>% make_clean_names()
mtcars %>% slice(-1)

@billdenney
Copy link
Collaborator Author

That's more like the functionality of row_to_names() (it has a few more bells and whistles, but you've got the gist).

The goal here is to move the names down to a row, and to make cells that should be NA into NA.

@jzadra
Copy link
Contributor

jzadra commented Mar 12, 2021

Oops, I did not give my coffee enough time to work before responding :)

@systemnova
Copy link

Can I second this idea as a valuable convenience addition to janitor 👍

@sfirke
Copy link
Owner

sfirke commented Oct 29, 2021

Can you say a little more about the use case? Bill if you say it's useful I trust that - and thanks @systemnova for chiming in - but I don't follow. If the names were supposed to be the data, I would usually specify col_names = FALSE or col_names = c('myvar1', ...). Then it will put the names into the first row. The use case must be different than that though.

Sorry for my inconsistent attention to new feature requests!

@systemnova
Copy link

Firstly, thanks for creating such a wonderfully useful package!

I think it would just feel more convenient & complete. When I first started looking for convenient pipe-able ways to manipulate names, I kind of felt that the existing function names in this package and others were a little confusing & ended up mapping them to my own function names.

In my head the logical set would be:

  • RowName_To_Column
  • Column_To_RowName
  • ColumnName_To_Row
  • Row_To_ColumnName

Or

PromoteHeader
DemoteHeader
PromoteRowname
DemoteRowname

Although I say this after spending way too much time in SPSS & PowerBI. So it might just be me that feels this way 😉

@billdenney
Copy link
Collaborator Author

tl;dr: To me, the value of the function is borderline, but it can have use.

The correct way to get the data out is usually to use col_names = FALSE or something similar to that depending on how you're loading the data.

The times that I find it useful is when I get some ugly data that have several layers of column names. For a recent example, the first row had animal sex, the second row had animal number, and then subsequent rows had the data I wanted (drug concentrations over time). I've started using the unpivotr library for some of the scenarios, but I do think that there is a use case for this function. Maybe it should have a warning like "Consider loading the data without column names". :)

@systemnova, for the function names suggested, I agree that these don't necessarily please everyone. rownames_to_column() is in the tibble library, and since janitor is generally part of the tidyverse, I'd prefer not to replicate the function. For column_to_rowname(), tibbles don't support row names, so that wouldn't be supported, in general. As for the last two which are part of janitor, names within R refers to column names for a data.frame or tibble, so while it doesn't refer to rows specifically, within the idiom of the R ecosystem, it should generally be clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants