You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When splitting a delimited character variable using the newer separate_wider_delim() function from the tidyr package (v 1.3.0), if you:
specify the names_sep= argument,
do NOT specify the names= argument, and
specify cols_remove=FALSE,
then the original variable is retained in the output data set (as expected) but:
the original variable name has been duplicated using the value specified in the names_sep= argument such that, e.g., names_sep='_' with cols=varname produces a variable named varname_varname in the output data, and
the original variable is located after the new separated columns, which is different from how the older separate() function behaves (placing the original column before the new columns).
Note that the first point above (variable renaming) is the major issue. The second point is just something that I was not unexpecting.
library(dplyr, warn.conflicts=FALSE)
library(tidyr)
library(reprex)
# Create test data set## 1 character variable (`v`):## * semicolon-delimited values,## * includes NA,## * inconsistent/unpredictable number of delimiters per valuetest<- tibble(
v= c('a;b', 'c', NA, 'd;e;f', 'g;h')
)
# specifying `names` (not `names_sep`)# `cols_remove` is TRUE => behaves as expected (original column name unchanged)
separate_wider_delim(
data=test,
cols=v,
delim=';',
names= c('v_1', 'v_2', 'v_3'),
too_few='align_start',
cols_remove=FALSE
)
#> # A tibble: 5 × 4#> v_1 v_2 v_3 v #> <chr> <chr> <chr> <chr>#> 1 a b <NA> a;b #> 2 c <NA> <NA> c #> 3 <NA> <NA> <NA> <NA> #> 4 d e f d;e;f#> 5 g h <NA> g;h# specifying `names_sep` only# `cols_remove` is TRUE (default) => behaves as expected
separate_wider_delim(
data=test,
cols=v,
delim=';',
names_sep='_',
too_few='align_start',
cols_remove=TRUE
)
#> # A tibble: 5 × 3#> v_1 v_2 v_3 #> <chr> <chr> <chr>#> 1 a b <NA> #> 2 c <NA> <NA> #> 3 <NA> <NA> <NA> #> 4 d e f #> 5 g h <NA># specifying `names_sep` only# `cols_remove` is FALSE => **unexpected renaming of original variable**
separate_wider_delim(
data=test,
cols=v,
delim=';',
names_sep='_',
too_few='align_start',
cols_remove=FALSE
)
#> # A tibble: 5 × 4#> v_1 v_2 v_3 v_v #> <chr> <chr> <chr> <chr>#> 1 a b <NA> a;b #> 2 c <NA> <NA> c #> 3 <NA> <NA> <NA> <NA> #> 4 d e f d;e;f#> 5 g h <NA> g;h## Expected output from previous code chunk:## * note original column name unchanged# # A tibble: 5 × 4# v_1 v_2 v_3 v # <chr> <chr> <chr> <chr># 1 a b <NA> a;b # 2 c <NA> <NA> c # 3 <NA> <NA> <NA> <NA> # 4 d e f d;e;f # 5 g h <NA> g;h # old behavior (with `separate()`)# * original variable located before new `separate()`d columns
separate(
data=test,
col=v,
into= c('v_1', 'v_2', 'v_3'),
sep=';',
remove=FALSE,
fill='right'
)
#> # A tibble: 5 × 4#> v v_1 v_2 v_3 #> <chr> <chr> <chr> <chr>#> 1 a;b a b <NA> #> 2 c c <NA> <NA> #> 3 <NA> <NA> <NA> <NA> #> 4 d;e;f d e f #> 5 g;h g h <NA>
The text was updated successfully, but these errors were encountered:
tszberkowitz
changed the title
separate_wider_delim renames original column when col_remove=TRUE and names= not specifiedseparate_wider_delim renames original column when col_remove=FALSE and names= not specified
May 16, 2023
When splitting a delimited character variable using the newer
separate_wider_delim()
function from thetidyr
package (v 1.3.0), if you:names_sep=
argument,names=
argument, andcols_remove=FALSE
,then the original variable is retained in the output data set (as expected) but:
names_sep=
argument such that, e.g.,names_sep='_'
withcols=varname
produces a variable namedvarname_varname
in the output data, andseparate()
function behaves (placing the original column before the new columns).Note that the first point above (variable renaming) is the major issue. The second point is just something that I was not unexpecting.
Created on 2023-05-14 with reprex v2.0.2
Session info
The text was updated successfully, but these errors were encountered: