New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keyby/by not returning unique groups with subsetting #2713
Comments
Agree it's a bug. For the record, the recommended code in this case is:
Which gives the correct answer since this version will activate Of course this is no help if your actual code can't be kludged like this. Interestingly if we pass the subset rows directly the code works:
I see the following difference in the
This led me to install from CRAN; the code runs without error on So I guess this is something from @MarkusBonsch 's work on subset optimization? I also see the same error if we make the join explicit:
But the keyed version is fine:
|
Thanks @cathine for reporting and to @MichaelChirico for investigating. WIll probably get solved when this issue #2591 is solved. |
Thanks @cathine! Confirmed this is dev-only and can be alleviated with > DT = data.table(
id = c("a","a","a","b","b","c","c","d","d"),
group = c(1,1,1,1,1,2,2,2,2),
num = 1)
> DT[, uniqueN(id), by=group] # ok
group V1
<num> <int>
1: 1 2
2: 2 2
> DT[num==1, uniqueN(id), by=group] # group column wrong
group V1
<num> <int>
1: 1 2
2: 1 2
> options(datatable.optimize=2)
> DT[num==1, uniqueN(id), by=group] # ok
group V1
<num> <int>
1: 1 2
2: 2 2
> options(datatable.optimize=3) # not ok
> DT[num==1, uniqueN(id), by=group]
group V1
<num> <int>
1: 1 2
2: 1 2
> DT[num==1, sum(num), by=group] # ok
group V1
<num> <num>
1: 1 7
2: 2 4
> DT[num==1, length(num), by=group] # not ok
group V1
<num> <int>
1: 1 7
2: 1 4
> options(datatable.optimize=2) # ok
> DT[num==1, length(num), by=group]
group V1
<num> <int>
1: 1 7
2: 2 4
> |
Why did it slip through tests? Because it only occurs if the grouping column is sorted (see code below)! I didn't check grouping on sorted columns specifically.
|
Below is a simple example where keyby (also by) is not returning unique groups with subsetting.
However, once subsetting is removed, keyby works properly.
The text was updated successfully, but these errors were encountered: