[R-Forge #5222] 'not found' when DT[, list(sum(non-.SD-col), lapply(.SD,mean)), by=..., .SDcols=...] #495

arunsrinivasan · 2014-06-08T13:15:02Z

Submitted by: Matt Weller; Assigned to: Nobody; R-Forge link

When using .SDcols (for the purpose of applying a function to multiple columns) I cannot reference other columns in the original table (v1) using the following syntax:

dt = data.table(grp=c(2,3,3,1,1,2,3), v1=1:7, v2=7:1, v3=10:16)
dt.out = dt[, c(v1 = sum(v1),  lapply(.SD,mean)), by = grp, .SDcols = v2:v3]
# Error in `[.data.table`(dt, , list(v1 = sum(v1), lapply(.SD, mean)), by = grp,  : 
#   object 'v1' not found

A similar error happens when I use c instead of list, clearly the column v1 cannot be accessed within the j clause.

I resorted to the following code which includes column v1, even though I do not want that to be included in the lapply portion, having to drop it after computation.

sd.cols = c("v1","v2", "v3")
dt.out = dt[, c(sum.v1 = sum(v1), lapply(.SD,mean)), by = grp, .SDcols = sd.cols]

According to eddi on Stackoverflow this is a bug and he has asked me to report it. I cannot provide much more detail as I'm not exactly sure which part he thinks was a bug, looking at the accepted answer by Arun and their ensuing discussion will highlight where but the problem lies.

Here is the relevant SO post.

The text was updated successfully, but these errors were encountered:

arunsrinivasan · 2015-01-04T07:34:28Z

Another post to update: http://stackoverflow.com/questions/27755518/data-table-sd-lapply-multiple-columns-in-argument

MichaelChirico · 2015-07-11T15:59:12Z

Bit late, but adding this question of mine to the pile

jangorecki · 2015-07-11T16:31:09Z

I didn't even think about it as a bug, usually I provide additional required fields to .SDcols and later in j I use .SD[, !"total", with=FALSE] to exclude unwanted column.

MichaelChirico · 2015-07-11T16:41:19Z

That's another good workaround, I wonder the performance difference vis-a-vis using dt$total. And yes, this sort of dances the line between FR and bug, IMO.

DavidArenburg · 2015-08-19T18:16:03Z

Bumping this up again. Looks like this could be a very important fix. this question seem to be related to and could be potentially solved via DT[, (deltaColsNewNames) := lapply(.SD, normalDelta, price), .SDcols = deltaColsNames]

franknarf1 · 2015-09-10T15:19:00Z

Here's another simple case where this would be useful: http://stackoverflow.com/a/32498711/1191259

rentrop · 2015-10-05T08:53:45Z

Here's another simple case that suffers: http://stackoverflow.com/questions/32944060/using-data-table-to-calculate-new-columns/32944519#32944519

franknarf1 · 2015-10-09T17:02:52Z

Another to update when fixed: http://stackoverflow.com/q/32915770/1191259

arunsrinivasan · 2016-03-07T23:41:30Z

Yay! we can now do this:

require(data.table)
dt = data.table(grp=c(2,3,3,1,1,2,3), v1=1:7, v2=7:1, v3=10:16)
dt.out = dt[, c(v1 = sum(v1),  lapply(.SD,mean)), by = grp, .SDcols = v2:v3]
  #  grp v1  v2   v3
# 1:   2  7 4.5 12.5
# 2:   3 12 4.0 13.0
# 3:   1  9 3.5 13.5

arunsrinivasan · 2016-03-08T00:29:56Z

Updated all SO posts linked here. Thanks to all.

DavidArenburg · 2016-03-08T07:44:51Z

Thanks, @arunsrinivasan. I was waiting for this fix for couple of years.

rentrop · 2016-03-08T09:22:59Z

Awesome! Thank you

…ded. Related to #495.

arunsrinivasan mentioned this issue Aug 4, 2014

[R-Forge #5285] specifying .SDcols shouldn't restrict all variables to just .SDcols #540

Closed

arunsrinivasan added this to the v1.9.6 milestone Sep 24, 2014

arunsrinivasan added the High label Sep 24, 2014

arunsrinivasan self-assigned this Oct 10, 2014

arunsrinivasan modified the milestones: v1.9.8, v1.9.6 Oct 10, 2014

arunsrinivasan mentioned this issue Oct 10, 2014

Bug in grouping external variables - related to #495. #875

Closed

arunsrinivasan closed this as completed in 68091d8 Mar 7, 2016

arunsrinivasan added a commit that referenced this issue Mar 18, 2016

Closes #484, handles .SD and other cols in j when .SDcols isn't provi…

ff04c71

…ded. Related to #495.

MichaelChirico mentioned this issue Dec 19, 2016

names(.SD) sometimes incorrect #1965

Closed

ChandlerLutz mentioned this issue Mar 26, 2017

Map in j does not work with .SDcols and other columns as function attributes #2079

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[R-Forge #5222] 'not found' when DT[, list(sum(non-.SD-col), lapply(.SD,mean)), by=..., .SDcols=...] #495

[R-Forge #5222] 'not found' when DT[, list(sum(non-.SD-col), lapply(.SD,mean)), by=..., .SDcols=...] #495

arunsrinivasan commented Jun 8, 2014

arunsrinivasan commented Jan 4, 2015

MichaelChirico commented Jul 11, 2015

jangorecki commented Jul 11, 2015

MichaelChirico commented Jul 11, 2015

DavidArenburg commented Aug 19, 2015

franknarf1 commented Sep 10, 2015

rentrop commented Oct 5, 2015

franknarf1 commented Oct 9, 2015

arunsrinivasan commented Mar 7, 2016

arunsrinivasan commented Mar 8, 2016

DavidArenburg commented Mar 8, 2016

rentrop commented Mar 8, 2016

[R-Forge #5222] 'not found' when DT[, list(sum(non-.SD-col), lapply(.SD,mean)), by=..., .SDcols=...] #495

[R-Forge #5222] 'not found' when DT[, list(sum(non-.SD-col), lapply(.SD,mean)), by=..., .SDcols=...] #495

Comments

arunsrinivasan commented Jun 8, 2014

arunsrinivasan commented Jan 4, 2015

MichaelChirico commented Jul 11, 2015

jangorecki commented Jul 11, 2015

MichaelChirico commented Jul 11, 2015

DavidArenburg commented Aug 19, 2015

franknarf1 commented Sep 10, 2015

rentrop commented Oct 5, 2015

franknarf1 commented Oct 9, 2015

arunsrinivasan commented Mar 7, 2016

arunsrinivasan commented Mar 8, 2016

DavidArenburg commented Mar 8, 2016

rentrop commented Mar 8, 2016