Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rbind in 1.14.3 doesn't like POSIX #5309

Closed
dcaseykc opened this issue Jan 12, 2022 · 9 comments · Fixed by #5857 · May be fixed by #5446
Closed

Rbind in 1.14.3 doesn't like POSIX #5309

dcaseykc opened this issue Jan 12, 2022 · 9 comments · Fixed by #5857 · May be fixed by #5446

Comments

@dcaseykc
Copy link

Looks like rbind (and by extension merge(all = T)) doesn't like POSIXct type data in 1.14.3. To the extent this is a legit bug, I think its been introduced relatively recently (code running on an older version of 1.14.3 seemed to work just fine).

Sys.setenv(TZ = 'America/Los_Angeles')
library('data.table')
#> Warning: package 'data.table' was built under R version 4.1.2
t2 = data.table(a = NA_real_, b = NA_real_, c = NA_real_, d= as.Date(NA))
t1 = data.table(a = 1.1, b = 1.1, c = 1.1, d = as.Date('2021-10-05'), e = as.POSIXct("2021-10-06 13:58:00 UTC"))
r2 = rbind(t1,t2, fill = T, use.names = F)
#> Error in rbindlist(l, use.names, fill, idcol): Class attribute on column 5 of item 2 does not match with column 5 of item 1.

sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows Server x64 (build 19041)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252 
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] data.table_1.14.3
#> 
#> loaded via a namespace (and not attached):
#>  [1] ps_1.6.0        digest_0.6.27   withr_2.4.2     magrittr_2.0.1 
#>  [5] reprex_2.0.1    evaluate_0.14   highr_0.9       stringi_1.6.2  
#>  [9] rlang_0.4.11    cli_2.5.0       rstudioapi_0.13 fs_1.5.0       
#> [13] rmarkdown_2.11  tools_4.1.0     stringr_1.4.0   glue_1.4.2     
#> [17] xfun_0.29       yaml_2.2.1      fastmap_1.1.0   compiler_4.1.0 
#> [21] htmltools_0.5.2 knitr_1.33

Under 1.14.2:

Sys.setenv(TZ = 'America/Los_Angeles')
library('data.table')
#> Warning: package 'data.table' was built under R version 4.1.2
t2 = data.table(a = NA_real_, b = NA_real_, c = NA_real_, d= as.Date(NA))
t1 = data.table(a = 1.1, b = 1.1, c = 1.1, d = as.Date('2021-10-05'), e = as.POSIXct("2021-10-06 13:58:00 UTC"))
r2 = rbind(t1,t2, fill = T, use.names = F)
#> Warning in rbindlist(l, use.names, fill, idcol): use.names= cannot be FALSE when
#> fill is TRUE. Setting use.names=TRUE.

sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows Server x64 (build 19041)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252 
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] data.table_1.14.2
#> 
#> loaded via a namespace (and not attached):
#>  [1] ps_1.6.0        digest_0.6.27   withr_2.4.2     magrittr_2.0.1 
#>  [5] reprex_2.0.1    evaluate_0.14   highr_0.9       stringi_1.6.2  
#>  [9] rlang_0.4.11    cli_2.5.0       rstudioapi_0.13 fs_1.5.0       
#> [13] rmarkdown_2.11  tools_4.1.0     stringr_1.4.0   glue_1.4.2     
#> [17] xfun_0.29       yaml_2.2.1      fastmap_1.1.0   compiler_4.1.0 
#> [21] htmltools_0.5.2 knitr_1.33
@dcaseykc
Copy link
Author

Some git bisecting suggests commit: 4922384 is where this got introduced.

@jangorecki jangorecki added the dev label Jan 18, 2022
@jangorecki jangorecki added this to the 1.14.3 milestone Jan 18, 2022
@fox34
Copy link

fox34 commented Jan 28, 2022

Combining NAs and POSIXct does not work with Version 1.14.2 either. So this is related to non-existing columns defaulting to NA with fill=TRUE (in your example: column e only exists in t1):

rbindlist(list(data.table(a=NA), data.table(a=as.POSIXct("2021-01-01"))))
#> Error in rbindlist(list(data.table(a = NA), data.table(a = as.POSIXct("2021-01-01")))) : 
#>   Class attribute on column 1 of item 2 does not match with column 1 of item 1.

I stumbled across this issue, too, and my workaround was to temporarily convert POSIXct to double and then convert back as needed with as.POSIXct(..., origin="1970-01-01", tz="...").

sessionInfo()
#> R version 4.1.2 (2021-11-01)
#> Platform: aarch64-apple-darwin21.1.0 (64-bit)
#> Running under: macOS Monterey 12.2
#> 
#> Matrix products: default
#> BLAS:   /opt/homebrew/Cellar/openblas/0.3.19/lib/libopenblasp-r0.3.19.dylib
#> LAPACK: /opt/homebrew/Cellar/r/4.1.2/lib/R/lib/libRlapack.dylib
#> 
#> locale:
#> [1] de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] data.table_1.14.2
#> 
#> loaded via a namespace (and not attached):
#> [1] compiler_4.1.2

@adrian-quintario
Copy link

Can confirm this happens with both base::Date and zoo::Date too:

> library(data.table)
data.table 1.14.3 IN DEVELOPMENT built 2022-07-20 18:26:12 UTC; root using 1 threads (see ?getDTthreads).  Latest news: r-datatable.com
**********
This development version of data.table was built more than 4 weeks ago. Please update: data.table::update_dev_pkg()
**********
**********
This installation of data.table has not detected OpenMP support. It should still work but in single-threaded mode.
This is a Mac. Please read https://mac.r-project.org/openmp/. Please engage with Apple and ask them for support. Check r-datatable.com for updates, and our Mac instructions here: https://github.com/Rdatatable/data.table/wiki/Installation. After several years of many reports of installation problems on Mac, it's time to gingerly point out that there have been no similar problems on Windows or Linux.
**********
> dt1 <- data.table(a = 1:2, b = base::as.Date("2022-03-09"))
> dt1_zoo <- data.table(a = 1:2, b = zoo::as.Date("2022-03-09"))
> dt1_ok <- data.table(a = 1:2, b = letters[1:2])
> dt2 <- data.table(a = 6:7)
> rbind(dt1_ok, dt2, use.names=FALSE, fill=TRUE) # ok
       a      b
   <int> <char>
1:     1      a
2:     2      b
3:     6   <NA>
4:     7   <NA>
> rbind(dt1, dt2, use.names=T, fill=TRUE) # ok
       a          b
   <int>     <Date>
1:     1 2022-03-09
2:     2 2022-03-09
3:     6       <NA>
4:     7       <NA>
> rbind(dt1, dt2, use.names=FALSE, fill=TRUE) # error
Error in rbindlist(l, use.names, fill, idcol) : 
  Class attribute on column 2 of item 2 does not match with column 2 of item 1.
> rbind(dt1_zoo, dt2, use.names=FALSE, fill=TRUE) # error
Error in rbindlist(l, use.names, fill, idcol) : 
  Class attribute on column 2 of item 2 does not match with column 2 of item 1.
> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.3.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] es_ES.UTF-8/es_ES.UTF-8/es_ES.UTF-8/C/es_ES.UTF-8/es_ES.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.14.3

loaded via a namespace (and not attached):
[1] zoo_1.8-10      compiler_4.2.0  tools_4.2.0     grid_4.2.0      lattice_0.20-45
> 

@berg-michael
Copy link

I've run into this issue with IDates. I note that while rbind fails on both versions 1.14.2 and 1.14.3 with the toy data below, the merge is actually successful on 1.14.2 but not 1.14.3. I haven't exhaustively looked into this, but reverting the change made to rbindlist.c in #5263 at least makes the merge work. I think this is probably the same issue as #5391 .

1.14.3 (neither merge nor rbind work with IDate)

library(data.table)

item1 <- data.table(col1 = c(1,2,3,4), 
                    col2_x = c(as.IDate("2016-01-01"), 
                               as.IDate("2016-01-02"), 
                               as.IDate("2016-01-03"), 
                               as.IDate("2016-01-04")),
                    col2_y = c(NA, NA, NA, NA))

item2 <- data.table(col1 = c(5,6,7,8),
                    col2_x = c(NA, NA, NA, NA),
                    col2_y = c("p", "q", "r", "s"))


item3 <- data.table(col1 = c(1,2,3,4), 
                    col2_x = c("2016-01-01", 
                               "2016-01-02", 
                               "2016-01-03", 
                               "2016-01-04"),
                    col2_y = c(NA, NA, NA, NA))


rbind(item1, item2)
#> Error in rbindlist(l, use.names, fill, idcol): Class attribute on column 2 of item 2 does not match with column 2 of item 1.
rbind(item3, item2)
#>     col1     col2_x col2_y
#>    <num>     <char> <char>
#> 1:     1 2016-01-01   <NA>
#> 2:     2 2016-01-02   <NA>
#> 3:     3 2016-01-03   <NA>
#> 4:     4 2016-01-04   <NA>
#> 5:     5       <NA>      p
#> 6:     6       <NA>      q
#> 7:     7       <NA>      r
#> 8:     8       <NA>      s

item1_merge <- data.table(col1 = c(1,2,3,4), 
                          col2 = c(as.IDate("2016-01-01"), 
                                   as.IDate("2016-01-02"), 
                                   as.IDate("2016-01-03"), 
                                   as.IDate("2016-01-04")))

item2_merge <- data.table(col1 = c(5,6,7,8),
                          col2 = c(NA, NA, NA, NA))

merge(x = item1_merge,
      y = item2_merge,
      by = "col1",
      all = T)
#> Error in rbindlist(l, use.names, fill, idcol): Class attribute on column 3 of item 2 does not match with column 3 of item 1.

sessionInfo()
#> R version 4.1.3 (2022-03-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252 
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] data.table_1.14.3
#> 
#> loaded via a namespace (and not attached):
#>  [1] rstudioapi_0.13   knitr_1.38        magrittr_2.0.2    R.cache_0.15.0   
#>  [5] rlang_1.0.2       fastmap_1.1.0     fansi_1.0.3       stringr_1.4.0    
#>  [9] styler_1.7.0      highr_0.9         tools_4.1.3       xfun_0.30        
#> [13] R.oo_1.24.0       utf8_1.2.2        cli_3.2.0         withr_2.5.0      
#> [17] htmltools_0.5.2   ellipsis_0.3.2    yaml_2.3.5        digest_0.6.29    
#> [21] tibble_3.1.6      lifecycle_1.0.1   crayon_1.5.1      purrr_0.3.4      
#> [25] R.utils_2.11.0    vctrs_0.3.8       fs_1.5.2          glue_1.6.2       
#> [29] evaluate_0.15     rmarkdown_2.13    reprex_2.0.1      stringi_1.7.6    
#> [33] compiler_4.1.3    pillar_1.7.0      R.methodsS3_1.8.1 pkgconfig_2.0.3

Created on 2022-08-22 by the reprex package (v2.0.1)

R 1.14.2 (merge works, rbind doesn't)

library(data.table)

item1 <- data.table(col1 = c(1,2,3,4), 
                    col2_x = c(as.IDate("2016-01-01"), 
                               as.IDate("2016-01-02"), 
                               as.IDate("2016-01-03"), 
                               as.IDate("2016-01-04")),
                    col2_y = c(NA, NA, NA, NA))

item2 <- data.table(col1 = c(5,6,7,8),
                    col2_x = c(NA, NA, NA, NA),
                    col2_y = c("p", "q", "r", "s"))


item3 <- data.table(col1 = c(1,2,3,4), 
                    col2_x = c("2016-01-01", 
                               "2016-01-02", 
                               "2016-01-03", 
                               "2016-01-04"),
                    col2_y = c(NA, NA, NA, NA))


rbind(item1, item2)
#> Error in rbindlist(l, use.names, fill, idcol): Class attribute on column 2 of item 2 does not match with column 2 of item 1.
rbind(item3, item2)
#>    col1     col2_x col2_y
#> 1:    1 2016-01-01   <NA>
#> 2:    2 2016-01-02   <NA>
#> 3:    3 2016-01-03   <NA>
#> 4:    4 2016-01-04   <NA>
#> 5:    5       <NA>      p
#> 6:    6       <NA>      q
#> 7:    7       <NA>      r
#> 8:    8       <NA>      s

item1_merge <- data.table(col1 = c(1,2,3,4), 
                          col2 = c(as.IDate("2016-01-01"), 
                                   as.IDate("2016-01-02"), 
                                   as.IDate("2016-01-03"), 
                                   as.IDate("2016-01-04")))

item2_merge <- data.table(col1 = c(5,6,7,8),
                          col2 = c(NA, NA, NA, NA))

merge(x = item1_merge,
      y = item2_merge,
      by = "col1",
      all = T)
#>    col1     col2.x col2.y
#> 1:    1 2016-01-01     NA
#> 2:    2 2016-01-02     NA
#> 3:    3 2016-01-03     NA
#> 4:    4 2016-01-04     NA
#> 5:    5       <NA>     NA
#> 6:    6       <NA>     NA
#> 7:    7       <NA>     NA
#> 8:    8       <NA>     NA

sessionInfo()
#> R version 4.1.3 (2022-03-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252 
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] data.table_1.14.2
#> 
#> loaded via a namespace (and not attached):
#>  [1] rstudioapi_0.13   knitr_1.38        magrittr_2.0.2    R.cache_0.15.0   
#>  [5] rlang_1.0.2       fastmap_1.1.0     fansi_1.0.3       stringr_1.4.0    
#>  [9] styler_1.7.0      highr_0.9         tools_4.1.3       xfun_0.30        
#> [13] R.oo_1.24.0       utf8_1.2.2        cli_3.2.0         withr_2.5.0      
#> [17] htmltools_0.5.2   ellipsis_0.3.2    yaml_2.3.5        digest_0.6.29    
#> [21] tibble_3.1.6      lifecycle_1.0.1   crayon_1.5.1      purrr_0.3.4      
#> [25] R.utils_2.11.0    vctrs_0.3.8       fs_1.5.2          glue_1.6.2       
#> [29] evaluate_0.15     rmarkdown_2.13    reprex_2.0.1      stringi_1.7.6    
#> [33] compiler_4.1.3    pillar_1.7.0      R.methodsS3_1.8.1 pkgconfig_2.0.3

Created on 2022-08-22 by the reprex package (v2.0.1)

R 1.14.3 but with change to rbindlist.c reverted

library(data.table)

item1 <- data.table(col1 = c(1,2,3,4), 
                    col2_x = c(as.IDate("2016-01-01"), 
                               as.IDate("2016-01-02"), 
                               as.IDate("2016-01-03"), 
                               as.IDate("2016-01-04")),
                    col2_y = c(NA, NA, NA, NA))

item2 <- data.table(col1 = c(5,6,7,8),
                    col2_x = c(NA, NA, NA, NA),
                    col2_y = c("p", "q", "r", "s"))


item3 <- data.table(col1 = c(1,2,3,4), 
                    col2_x = c("2016-01-01", 
                               "2016-01-02", 
                               "2016-01-03", 
                               "2016-01-04"),
                    col2_y = c(NA, NA, NA, NA))


rbind(item1, item2)
#> Error in rbindlist(l, use.names, fill, idcol): Class attribute on column 2 of item 2 does not match with column 2 of item 1.
rbind(item3, item2)
#>     col1     col2_x col2_y
#>    <num>     <char> <char>
#> 1:     1 2016-01-01   <NA>
#> 2:     2 2016-01-02   <NA>
#> 3:     3 2016-01-03   <NA>
#> 4:     4 2016-01-04   <NA>
#> 5:     5       <NA>      p
#> 6:     6       <NA>      q
#> 7:     7       <NA>      r
#> 8:     8       <NA>      s

item1_merge <- data.table(col1 = c(1,2,3,4), 
                    col2 = c(as.IDate("2016-01-01"), 
                               as.IDate("2016-01-02"), 
                               as.IDate("2016-01-03"), 
                               as.IDate("2016-01-04")))

item2_merge <- data.table(col1 = c(5,6,7,8),
                    col2 = c(NA, NA, NA, NA))

merge(x = item1_merge,
      y = item2_merge,
      by = "col1",
      all = T)
#> Warning in rbindlist(l, use.names, fill, idcol): use.names= cannot be FALSE when
#> fill is TRUE. Setting use.names=TRUE.
#> Key: <col1>
#>     col1     col2.x col2.y
#>    <num>     <IDat> <lgcl>
#> 1:     1 2016-01-01     NA
#> 2:     2 2016-01-02     NA
#> 3:     3 2016-01-03     NA
#> 4:     4 2016-01-04     NA
#> 5:     5       <NA>     NA
#> 6:     6       <NA>     NA
#> 7:     7       <NA>     NA
#> 8:     8       <NA>     NA

sessionInfo()
#> R version 4.1.3 (2022-03-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252 
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] data.table_1.14.3
#> 
#> loaded via a namespace (and not attached):
#>  [1] rstudioapi_0.13   knitr_1.38        magrittr_2.0.2    R.cache_0.15.0   
#>  [5] rlang_1.0.2       fastmap_1.1.0     fansi_1.0.3       stringr_1.4.0    
#>  [9] styler_1.7.0      highr_0.9         tools_4.1.3       xfun_0.30        
#> [13] R.oo_1.24.0       utf8_1.2.2        cli_3.2.0         withr_2.5.0      
#> [17] htmltools_0.5.2   ellipsis_0.3.2    yaml_2.3.5        digest_0.6.29    
#> [21] tibble_3.1.6      lifecycle_1.0.1   crayon_1.5.1      purrr_0.3.4      
#> [25] R.utils_2.11.0    vctrs_0.3.8       fs_1.5.2          glue_1.6.2       
#> [29] evaluate_0.15     rmarkdown_2.13    reprex_2.0.1      stringi_1.7.6    
#> [33] compiler_4.1.3    pillar_1.7.0      R.methodsS3_1.8.1 pkgconfig_2.0.3

Created on 2022-08-22 by the reprex package (v2.0.1)

@jangorecki
Copy link
Member

Thank you for extra report. We need ensure that goes into unit tests to close this issue.

@berg-michael
Copy link

BTW, in the current dev release, the order of x and y in the merge command can determine whether this issue is triggered.

library(data.table)

item1_merge <- data.table(col1 = c(1,2,3,4), 
                          col2 = c(as.IDate("2016-01-01"), 
                                   as.IDate("2016-01-02"), 
                                   as.IDate("2016-01-03"), 
                                   as.IDate("2016-01-04")))

item2_merge <- data.table(col1 = c(5,6,7,8),
                          col2 = c(NA, NA, NA, NA))


merge(x = item1_merge,
      y = item2_merge,
      by = "col1",
      all = T)
#> Error in rbindlist(l, use.names, fill, idcol): Class attribute on column 3 of item 2 does not match with column 3 of item 1.

merge(x = item2_merge,
      y = item1_merge,
      by = "col1",
      all = T)
#> Key: <col1>
#>     col1 col2.x     col2.y
#>    <num> <lgcl>     <IDat>
#> 1:     1     NA 2016-01-01
#> 2:     2     NA 2016-01-02
#> 3:     3     NA 2016-01-03
#> 4:     4     NA 2016-01-04
#> 5:     5     NA       <NA>
#> 6:     6     NA       <NA>
#> 7:     7     NA       <NA>
#> 8:     8     NA       <NA>

sessionInfo()
#> R version 4.1.3 (2022-03-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252 
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] data.table_1.14.3
#> 
#> loaded via a namespace (and not attached):
#>  [1] rstudioapi_0.13   knitr_1.38        magrittr_2.0.2    R.cache_0.15.0   
#>  [5] rlang_1.0.2       fastmap_1.1.0     fansi_1.0.3       stringr_1.4.0    
#>  [9] styler_1.7.0      highr_0.9         tools_4.1.3       xfun_0.30        
#> [13] R.oo_1.24.0       utf8_1.2.2        cli_3.2.0         withr_2.5.0      
#> [17] htmltools_0.5.2   ellipsis_0.3.2    yaml_2.3.5        digest_0.6.29    
#> [21] tibble_3.1.6      lifecycle_1.0.1   crayon_1.5.1      purrr_0.3.4      
#> [25] R.utils_2.11.0    vctrs_0.3.8       fs_1.5.2          glue_1.6.2       
#> [29] evaluate_0.15     rmarkdown_2.13    reprex_2.0.1      stringi_1.7.6    
#> [33] compiler_4.1.3    pillar_1.7.0      R.methodsS3_1.8.1 pkgconfig_2.0.3

Created on 2022-08-23 by the reprex package (v2.0.1)

@ben-schwen
Copy link
Member

ben-schwen commented Aug 24, 2022

@berg-michael IMO your first example of (shortened it)

x <- data.table(a = 1, b = as.IDate(16801))
y <- data.table(a = 5, b = NA)
rbind(x, y)

should never work because of different types. Ofc we could check if all values of a column are NA and bump types but the cost and loss of understanding seem not worth it.

@berg-michael
Copy link

This may be an ignorant question, but doesn't your example work for non-date classes? I can run something like

library(data.table)
x <- data.table(a = 1, b = "2016-01-01")
y <- data.table(a = 5, b = NA_integer_)
str(rbind(x, y))
#> Classes 'data.table' and 'data.frame':   2 obs. of  2 variables:
#>  $ a: num  1 5
#>  $ b: chr  "2016-01-01" NA
#>  - attr(*, ".internal.selfref")=<externalptr>

It seems like rbind will coerce classes/types in most but not all situations, and merge with all = T can rely on that behavior when there are non-matching rows between two datasets. Though I don't fully understand why the call you gave doesn't work in either 1.14.2 and 1.14.3, yet merge works fine in these cases in 1.14.2 while sometimes breaking in 1.14.3. I believe it is related to allowing usenames = F when fill = T, as if I force usenames to T in 1.14.3 the merge works fine.

I guess what you describe is really just #3911.

@ben-schwen
Copy link
Member

ben-schwen commented Aug 24, 2022

I see, so apparently, we do type bumping for atomic types. If we allow different classes we have to take care of things like this where e.g. first class (or class with highest types) determines class of result

x = data.table(a = 1, b = as.IDate(16801))
y = data.table(a = 5, b = NA)
rbind(x, y)
#>        a          b
#>    <num>     <IDat>
#> 1:     1 2016-01-01
#> 2:     5       <NA>
rbind(y, x)
#>        a     b
#>    <num> <int>
#> 1:     5    NA
#> 2:     1 16801

ben-schwen added a commit that referenced this issue Aug 25, 2022
@jangorecki jangorecki modified the milestones: 1.14.9, 1.15.0 Oct 29, 2023
ben-schwen added a commit that referenced this issue Dec 27, 2023
jangorecki pushed a commit that referenced this issue Dec 27, 2023
)

* add regression fix

* add tests from #5309

* added comment about NA rectangle

* emphasize subtle part about attributes too

---------

Co-authored-by: Michael Chirico <chiricom@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment