NA handling in unite #203

voxnonecho · 2016-06-13T12:07:24Z

Consider the following df:

ID   d1   d2   
1    G    G
2    A    G
3    A    A
4    G    A
5    NA   NA
6    G    G

When uniting d1 and d2:

tidyr::unite(df, new, d1, d2, remove = FALSE, sep = "")

Row 5 gives NANA instead of the expected NA

  ID  new   d1   d2
1  1   GG    G    G
2  2   AG    A    G
3  3   AA    A    A
4  4   GA    G    A
5  5 NANA <NA> <NA>
6  6   GG    G    G

The text was updated successfully, but these errors were encountered:

voxnonecho · 2016-06-13T13:36:52Z

Well, I think unite() should work like paste() but could maybe provide an additional argument to handle NAs, à la na.rm = TRUE

danrlu · 2016-09-08T22:24:13Z

I think in some cases the omit NA option could be useful. My df has many columns that contain mostly NA, as a result of multiple rounds of join.

recipe  potato  tomato  cucumber    rock
A       potato  NA      cucumber    NA
B       NA      NA      NA          rock
C       NA      tomato  NA          NA
...

So I was trying to combine the columns into one and remove the NA to see things better.

recipe  ingredients
A       potato,cucumber
B       rock
C       tomato
...

The solution is not hard, just not quite as tidy.

jennybc · 2016-10-27T00:37:44Z

This is not the requested solution, but a clean way to get the desired result is:

library(tidyverse)
df <- tribble(
  ~ID, ~d1, ~d2,   
    1, "G", "G",
    2, "A", "G",
    3, "A", "A",
    4, "G", "A",
    5,  NA,  NA,
    6, "G", "G")
df %>% 
  replace_na(list(d1 = "", d2 = "")) %>% 
  unite(new, d1, d2, remove = FALSE, sep = "")
#> # A tibble: 6 × 4
#>      ID   new    d1    d2
#> * <dbl> <chr> <chr> <chr>
#> 1     1    GG     G     G
#> 2     2    AG     A     G
#> 3     3    AA     A     A
#> 4     4    GA     G     A
#> 5     5                  
#> 6     6    GG     G     G

alistaire47 · 2017-09-11T14:33:20Z

I'm not convinced that unite should work like paste, as it's a very rare situation when a user would actually want to turn NA values into strings. More concerningly, in terms of API consistency separate will introduce NAs in a way that unite can't reverse:

library(tidyr)

example <- tibble::data_frame(x = c('foo', 'foo bar', 'foo bar baz'))

example %>% separate(x, c('foo', 'bar', 'baz'), fill = 'right')    # without `fill = 'right'` same result with a message 
#> # A tibble: 3 x 3
#>     foo   bar   baz
#> * <chr> <chr> <chr>
#> 1   foo  <NA>  <NA>
#> 2   foo   bar  <NA>
#> 3   foo   bar   baz

example %>% 
    separate(x, c('foo', 'bar', 'baz'), fill = 'right') %>% 
    unite(x, foo:baz, sep = ' ')
#> # A tibble: 3 x 1
#>             x
#> *       <chr>
#> 1   foo NA NA
#> 2  foo bar NA
#> 3 foo bar baz

If NAs are in the middle of columns that get united and then separated then paste-like behavior would allow the NA location to be saved (at the cost of requiring them to be converted from strings to actual NA again), but most of the time the NA handling keeps the functions from being inverses. Making na.rm = TRUE the default would be a breaking change, but probably not one that would break much code.

hadley · 2017-11-16T21:00:30Z

There are actually two feature requests in this thread:

Make NAs infections so that if any input is NA, then the output is NA
Provide an easy way to drop NAs.

2. seems like the more useful option so I will implement that.

@alexpghayes the plan is to extract out a general helper for turning the vectorised functions that power many tidyr functions in a tibblicious version

hadley · 2019-03-07T21:58:27Z

Minimal reprex

library(tidyr)
df <- expand_grid(x = c("a", NA), y = c("b", NA))
unite(df, z, c("x", "y"), remove = FALSE)
#> # A tibble: 4 x 3
#>   z     x     y    
#>   <chr> <chr> <chr>
#> 1 a_b   a     b    
#> 2 a_NA  a     <NA> 
#> 3 NA_b  <NA>  b    
#> 4 NA_NA <NA>  <NA>

^{Created on 2019-03-07 by the reprex package (v0.2.1.9000)}

hadley · 2019-03-07T22:16:41Z

Note that you'll need na.rm = TRUE (I left the default as is to preserve backward compatibility since it seems like many people have probably worked around the previous behaviour in various way)

library(tidyr)
df <- expand_grid(x = c("a", NA), y = c("b", NA))
df %>% unite("z", x:y, na.rm = TRUE, remove = FALSE)
#> # A tibble: 4 x 3
#>   z     x     y    
#>   <chr> <chr> <chr>
#> 1 a_b   a     b    
#> 2 a     a     <NA> 
#> 3 b     <NA>  b    
#> 4 ""    <NA>  <NA>

^{Created on 2019-03-07 by the reprex package (v0.2.1.9000)}

kasperav · 2019-03-28T11:50:57Z

Hi @hadley ,

I am having trouble getting na.rm = TRUE to work within the unite() function.

I tried the following:

Update R from 3.5.1 to 3.5.3
Delete the old tidyverse and tidyr packages
install fresh tidyverse package
run the following code:

> library("tidyr")
> df <- expand.grid(x = c("a", NA), y = c("b", NA))
> df
     x    y
1    a    b
2 <NA>    b
3    a <NA>
4 <NA> <NA>
> df %>% unite("z", x:y, na.rm = TRUE, remove = FALSE)
Error: `TRUE` must evaluate to column positions or names, not a logical vector
Call `rlang::last_error()` to see a backtrace

Which gives me this error:

Error: `TRUE` must evaluate to column positions or names, not a logical vector
Call `rlang::last_error()` to see a backtrace

Backtracing error:

> rlang::last_error()
<error>
message: `TRUE` must evaluate to column positions or names, not a logical vector
class:   `rlang_error`
backtrace:
  1. tidyr::unite(., "z", x:y, na.rm = TRUE, remove = FALSE)
 10. tidyselect::vars_select(colnames(data), ...)
 11. tidyselect:::bad_calls(bad, "must evaluate to { singular(.vars) } positions or names, \\\n       not { first_type }")
 12. tidyselect:::glubort(fmt_calls(calls), ..., .envir = .envir)
 13. tidyr::unite(., "z", x:y, na.rm = TRUE, remove = FALSE)
Call `rlang::last_trace()` to see the full backtrace

> rlang::last_trace()
     x
  1. \-df %>% unite("z", x:y, na.rm = TRUE, remove = FALSE)
  2.   +-base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
  3.   \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
  4.     \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
  5.       \-global::`_fseq`(`_lhs`)
  6.         \-magrittr::freduce(value, `_function_list`)
  7.           +-base::withVisible(function_list[[k]](value))
  8.           \-function_list[[k]](value)
  9.             +-tidyr::unite(., "z", x:y, na.rm = TRUE, remove = FALSE)
 10.             \-tidyr:::unite.data.frame(., "z", x:y, na.rm = TRUE, remove = FALSE)
 11.               \-tidyselect::vars_select(colnames(data), ...)
 12.                 \-tidyselect:::bad_calls(bad, "must evaluate to { singular(.vars) } positions or names, \\\n       not { first_type }")
 13.                   \-tidyselect:::glubort(fmt_calls(calls), ..., .envir = .envir)

hadley · 2019-03-28T13:14:19Z

@kasperav you probably have not installed the development version of tidyr.

kasperav · 2019-03-28T15:56:48Z

@hadley you are right! I have no luck with installing the dev version, so I'll wait for this to be implemented in a CRAN version of tidyr :)

jameshowison · 2019-12-27T13:50:16Z

FWIW, I found the behavior where unite takes two NA values and produces an empty string to be very confusing and unexpected. Seems clear to me that uniting two NA values should produce an NA value.

I'm guessing this is clearer to people who have used paste a lot :) Simple to fix up with a na_if("") (but one has to hope that empty string wasn't a meaningful value distinct from _NA_character in the original columns!)

lindsayplatt · 2020-02-25T15:43:08Z

I have a use case where I need to use na.rm = TRUE and unite for 8 columns. One of the columns is all NA. Using na.rm = T with unite seems to have different behavior when one of the columns is all NA. Is that expected behavior? Should I just ignore columns that are all NA before using unite?

library(tidyr)
df_notwork <- expand_grid(x = c("a", NA), y = c(NA, NA))
df_notwork %>% unite("z", x:y, na.rm = TRUE, remove = FALSE)

# A tibble: 4 x 3
  z     x     y    
  <chr> <chr> <lgl>
1 a_NA  a     NA   
2 a_NA  a     NA   
3 NA    NA    NA   
4 NA    NA    NA

jzadra · 2020-02-25T18:38:49Z

What version are you using? That's not the result I get (on 1.0.2.9000)

suppressPackageStartupMessages(require(tidyverse))
df_notwork <- expand_grid(x = c("a", NA), y = c(NA, NA))
df_notwork %>% unite("z", x:y, na.rm = T, remove = FALSE)
#> # A tibble: 4 x 3
#>   z     x     y    
#>   <chr> <chr> <lgl>
#> 1 "a"   a     NA   
#> 2 "a"   a     NA   
#> 3 ""    <NA>  NA   
#> 4 ""    <NA>  NA

^{Created on 2020-02-25 by the reprex package (v0.3.0)}

lindsayplatt · 2020-02-25T19:03:38Z

I am using a newer version.

packageVersion("tidyverse")
[1] ‘1.3.0’

jzadra · 2020-02-25T19:05:16Z

tidyverse is different from tidyr; it is a collection of other packages put together for easy loading. So it will have a different version than all the packages within it. Check your tidyr version.

lindsayplatt · 2020-02-25T19:14:01Z

Oh, sorry I saw that you were loading tidyverse so I assumed that was the version you were referring to. I always assumed that updating tidyverse would update the packages within it so I normally just update that one. I guess that is an inappropriate assumption!

Even with updating tidyr using the GitHub version, I still have that issue. Maybe it is another out-of-date package?

packageVersion("tidyr")
[1] ‘1.0.2.9000’
> library(tidyr)
> df_notwork <- expand_grid(x = c("a", NA), y = c(NA, NA))
> df_notwork %>% unite("z", x:y, na.rm = TRUE, remove = FALSE)
# A tibble: 4 x 3
  z     x     y    
  <chr> <chr> <lgl>
1 a_NA  a     NA   
2 a_NA  a     NA   
3 NA    NA    NA   
4 NA    NA    NA

jzadra · 2020-02-25T19:18:27Z

Interesting. I'm not sure why we are getting different results.

Regardless, it looks to me as if your NA's aren't being removed despite na.rm = F.

Yes, I would try update your other packages and see if that solves it. But since both expand_grid and unite are from tidyr I'm not sure why that would be the case.

lindsayplatt · 2020-02-25T20:44:10Z

It appears that my version of tidyselect was quite out-of-date (<1.0). I updated that and now it is functioning as expected.

packageVersion("tidyr")
[1] ‘1.0.2.9000’

packageVersion("tidyselect")
[1] ‘1.0.0’

library(tidyr)
df_notwork <- expand_grid(x = c("a", NA), y = c(NA, NA))
df_notwork %>% unite("z", x:y, na.rm = TRUE, remove = FALSE)

# A tibble: 4 x 3
  z     x     y    
  <chr> <chr> <lgl>
1 "a"   a     NA   
2 "a"   a     NA   
3 ""    NA    NA   
4 ""    NA    NA

anjaollodart · 2020-04-01T02:57:55Z

Hello,

I've updated to all the latest versions of the packages (tidyr 1.0.2.900, tidyselect 1.0.0) and I'm still getting the same error. I tried Lindsay's df_notwork, and get the same version as what she has prior to the updates. Any help would be appreciated!

lindsayplatt · 2020-04-02T14:02:10Z

@anjaollodart - perhaps you can try updating additional packages that tidyr depends on. It's just a guess, but the need to separately update tidyselect from tidyr was surprising to me, so maybe there is another package dependency that has the same issue.

jvpon · 2020-04-02T14:24:34Z

Dear Lindsay, I solved this issue on my own when I used my own data frame (not one that was in the example). And as soon as I did it, in 10-15 minutes, I deleted the comment because the issue was not about this function. It is strange that GitHub still put this comment through. Thank you, Julia

…

On Thu, Apr 2, 2020 at 4:02 PM Lindsay (Carr) Platt < ***@***.***> wrote: @anjaollodart <https://github.com/anjaollodart> - perhaps you can try updating additional packages that tidyr depends on. It's just a guess, but the need to separately update tidyselect from tidyr was surprising to me, so maybe there is another package dependency that has the same issue. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#203 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFUQAKM6SJFDD6MZGPN2B6DRKSLHHANCNFSM4CGRBQRQ> .

voxnonecho changed the title ~~Strange way of handling NA in unite~~ NA handling in unite Jun 13, 2016

This comment has been minimized.

Sign in to view

hadley added the feature a feature request or enhancement label Jun 23, 2017

1danjordan mentioned this issue Aug 24, 2017

A paste with NA handling r-lib/vctrs#39

Closed

hadley added the strings 🎻 label Nov 16, 2017

This comment has been minimized.

Sign in to view

japhir mentioned this issue Nov 8, 2018

allow easy recoding of data columns isoverse/isoreader#11

Closed

This comment has been minimized.

Sign in to view

hadley closed this as completed in 58df41d Mar 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NA handling in unite #203

NA handling in unite #203

voxnonecho commented Jun 13, 2016

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

voxnonecho commented Jun 13, 2016 •

edited

danrlu commented Sep 8, 2016

This comment has been minimized.

jennybc commented Oct 27, 2016

This comment has been minimized.

This comment has been minimized.

alistaire47 commented Sep 11, 2017

hadley commented Nov 16, 2017 •

edited

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

hadley commented Mar 7, 2019

hadley commented Mar 7, 2019

kasperav commented Mar 28, 2019

hadley commented Mar 28, 2019

kasperav commented Mar 28, 2019 •

edited

jameshowison commented Dec 27, 2019

lindsayplatt commented Feb 25, 2020

jzadra commented Feb 25, 2020

lindsayplatt commented Feb 25, 2020

jzadra commented Feb 25, 2020

lindsayplatt commented Feb 25, 2020

jzadra commented Feb 25, 2020

lindsayplatt commented Feb 25, 2020 •

edited

anjaollodart commented Apr 1, 2020

lindsayplatt commented Apr 2, 2020 •

edited

jvpon commented Apr 2, 2020 via email

NA handling in unite #203

NA handling in unite #203

Comments

voxnonecho commented Jun 13, 2016

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

voxnonecho commented Jun 13, 2016 • edited

danrlu commented Sep 8, 2016

This comment has been minimized.

jennybc commented Oct 27, 2016

This comment has been minimized.

This comment has been minimized.

alistaire47 commented Sep 11, 2017

hadley commented Nov 16, 2017 • edited

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

hadley commented Mar 7, 2019

hadley commented Mar 7, 2019

kasperav commented Mar 28, 2019

hadley commented Mar 28, 2019

kasperav commented Mar 28, 2019 • edited

jameshowison commented Dec 27, 2019

lindsayplatt commented Feb 25, 2020

jzadra commented Feb 25, 2020

lindsayplatt commented Feb 25, 2020

jzadra commented Feb 25, 2020

lindsayplatt commented Feb 25, 2020

jzadra commented Feb 25, 2020

lindsayplatt commented Feb 25, 2020 • edited

anjaollodart commented Apr 1, 2020

lindsayplatt commented Apr 2, 2020 • edited

jvpon commented Apr 2, 2020 via email

voxnonecho commented Jun 13, 2016 •

edited

hadley commented Nov 16, 2017 •

edited

kasperav commented Mar 28, 2019 •

edited

lindsayplatt commented Feb 25, 2020 •

edited

lindsayplatt commented Apr 2, 2020 •

edited