Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't inspect i=CJ(...) names in DT[CJ(...), on=] joins #2081

Open
franknarf1 opened this issue Mar 27, 2017 · 2 comments
Open

Don't inspect i=CJ(...) names in DT[CJ(...), on=] joins #2081

franknarf1 opened this issue Mar 27, 2017 · 2 comments

Comments

@franknarf1
Copy link
Contributor

Looking again at #1596 , one point against it is that some folks may be taking advantage of the default V1, V2, ... names, so auto-assigning different names would break their code.

One alternative, that would address my primary use-case, would be for CJ to not need names during a join:

library(data.table)
DT = unique(data.table(datasets::CO2)[, .(Plant, Type, Treatment)])
setkey(DT, Plant, Type)

# good -- no names needed using list
DT[.("Mc1", "Mississippi"), .N, by=.EACHI]

# good -- no names needed using CJ and (implicitly) on = key
DT[CJ(Plant, Type, unique=TRUE), .N, by = .EACHI]

# bad -- breaks for explicit on = key 
DT[CJ(Plant, Type, unique=TRUE), on=key(DT)]

# bad -- breaks for on=some-non-key
DT[CJ(Plant, Treatment, unique = TRUE), on=.(Plant, Treatment)]

So I guess I'm asking for an exception in [.data.table that ignores names in i when i=CJ(...), similar to how names are ignored with list inputs or when on= is implicitly the key.

Of course, my desired syntax would also work if FR #1596 went through.

@MichaelChirico
Copy link
Member

I think I'm fine with breaking some code relying on V1. Just filed #2977.

@franknarf1
Copy link
Contributor Author

franknarf1 commented Jul 17, 2018

@MichaelChirico Great! Just to clarify, autonaming solves #1596 and the example given here but not the broader case I meant:

library(data.table)
DT = unique(data.table(datasets::CO2)[, .(Plant, Type, Treatment)])
setkey(DT, Plant, Type)

f = function(myplants, mytypes) DT[CJ(myplants, mytypes), on=key(DT)]
f(c("Qn1", "Qc2"), "Quebec") # error: i is a table and so must have correct names

# or even more generally
g = function(..., d = DT) d[do.call(CJ, list(...)), on=key(d)]
g(c("Qn1", "Qc2"), "Quebec") # error: is is a table and so must have correct names

What I mean by "must have correct names" is that unnamed lists don't have the same requirements:

DT[unname(unclass(CJ(c("Qn1", "Qc2"), "Quebec"))), on=key(DT)] # works

So I'm hoping for special parsing of i that recognizes that CJ is the top call and then "ignores" i's names in the same way lists in i get away without having correct names (when on= is an equi join only referring to x columns).

Not sure if that's reasonable, and it's certainly not a big deal if I'm writing functions as above (since I can just add unname + unclass or similar) ... so the request is pretty much just for easier interactive data exploration.

EDIT: Hm, another resolution would be to let CJ return an unnamed unclassed list via some new argument (bringing it in line, eg, with the behavior of shift), though that seems like overkill if it only serves to cover this use case.

@jangorecki jangorecki changed the title [Request] Don't inspect i=CJ(...) names in DT[CJ(...), on=] joins Don't inspect i=CJ(...) names in DT[CJ(...), on=] joins Apr 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants