Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joining a keyed table on a non-keyed table is not working sometimes #3441

Closed
symbalex opened this issue Mar 4, 2019 · 3 comments · Fixed by #3443
Closed

Joining a keyed table on a non-keyed table is not working sometimes #3441

symbalex opened this issue Mar 4, 2019 · 3 comments · Fixed by #3443
Assignees
Labels
Milestone

Comments

@symbalex
Copy link

symbalex commented Mar 4, 2019

I am joining two data.table objects: dt_tbl (which has a key automatically created by dcast) on Y (which does not have a key), on a column called ROLE_TYPE. I am expecting the NumTxns column in the final object to have value 86 for ROLE_TYPE == "A", but instead I get NA.

Interestingly, the first join on ROLE_TYPE (dt_tbl on the dcast-ed object) works fine.

Reproducible example

library(data.table)

dt_tbl <- data.table(
  ROLE_TYPE = c("D", "A"), 
  CountCases = c(16L, 25L)
)

X <- data.table(
  outlier = c(FALSE, TRUE), 
  ROLE_TYPE = c("A", "A"),
  N = c(220L, 29L)
  )

# a dcast-ed table is now keyed
str(dcast(X, ROLE_TYPE ~ outlier, value.var = "N", fill = 0)) 

# cast and join
dt_tbl <- dcast(X, ROLE_TYPE ~ outlier, value.var = "N", fill = 0)[
  dt_tbl,
  on = "ROLE_TYPE"
  ]
# this is correct
dt_tbl
str(dt_tbl)

Y <- data.table(ROLE_TYPE = "A", NumTxns = 86L)

dt_tbl <- Y[
  dt_tbl,
  on = "ROLE_TYPE"
  ]
# why is NumTxns NA?
dt_tbl
# ROLE_TYPE NumTxns FALSE TRUE CountCases
# 1:         D      NA    NA   NA         16
# 2:         A      NA   220   29         25

Output of sessionInfo()

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.5

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.12.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0        rstudioapi_0.7    magrittr_1.5      usethis_1.4.0     devtools_2.0.1    pkgload_1.0.2     R6_2.3.0          rlang_0.3.1      
 [9] tools_3.5.1       pkgbuild_1.0.2    sessioninfo_1.1.1 cli_1.0.1         withr_2.1.2       remotes_2.0.2     yaml_2.2.0        assertthat_0.2.0 
[17] digest_0.6.18     rprojroot_1.3-2   crayon_1.3.4      processx_3.2.0    callr_3.0.0       base64enc_0.1-3   fs_1.2.6          ps_1.2.1         
[25] curl_3.3          testthat_2.0.0    glue_1.3.0        memoise_1.1.0     compiler_3.5.1    desc_1.2.0        backports_1.1.2   prettyunits_1.0.2
@franknarf1
Copy link
Contributor

Yeah, the key for x should not be preserved after x[i, on=key(x)], so the first join is also incorrect and is where the problem started.

library(data.table)
dx = data.table(id = "A", key = "id")
di = list(c("D", "A"))
(res <- dx[di])
#    id
# 1:  D
# 2:  A
key(res)
# [1] "id"

It should be sorted by its key.

Btw, overwriting objects / reusing names makes the example more confusing than it needs to be.

@symbalex
Copy link
Author

symbalex commented Mar 4, 2019

Thanks - sorry for the confusing example :)

@jangorecki
Copy link
Member

jangorecki commented Mar 5, 2019

patch submitted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants