GForce should be able to work with `:=` as well. #1414

arunsrinivasan · 2015-10-29T14:02:41Z

No description provided.

franknarf1 · 2016-05-18T19:39:37Z

Just ran into this today looking at a question on SO:

actions = data.table(User_id = c("Carl","Carl","Carl","Lisa","Moe"),
                     category = c(1,1,2,2,1),
                     value= c(10,20,30,40,50))
users = actions[, other_var := 1, by=User_id]

# verbose says: the following is not optimized
users[, value_one := 0 ]
users[actions[category==1], value_one := sum(value), on="User_id", by=.EACHI, verbose=TRUE]

# verbose says: the following is optimized
rbind( 
    actions[category==1], 
    unique(actions[,"User_id", with=FALSE])[, value := 0 ],
fill=TRUE)[, sum(value), by=User_id, verbose=TRUE]

To me, the first way looks idiomatic, considering the variable needs to end up in users in the end.

Another: https://stackoverflow.com/a/47338118/ (gtail)

Another https://stackoverflow.com/a/51569126/ should do DT[, mx := max(pt), by=Subject][, diff := mx - pt][] I guess

Another, specifically interested in memory performance: https://stackoverflow.com/q/52189712 "data.table reference semantics: memory usage of iterating through all columns"

Another, wants to scale/demean multiple variables: https://stackoverflow.com/q/52528123

Another taking max by group with a subsetting condition and adding with := (see akrun's answer) https://stackoverflow.com/a/54911855/ also related to the already-completed part of #971

brodieG · 2019-03-11T16:02:39Z

Just wanted to emphasize that enabling this can allow using GForce effectively for complex expressions, albeit with some work. For example I show in this post how to enable it for:

slope <- function(x, y) {
  x_ux <- x - mean(x)
  uy <- mean(y)
  sum(x_ux * (y - uy)) / sum(x_ux ^ 2)
}

By doing:

DT <- data.table(grp, x, y)
setkey(DT, grp)
DTsum <- DT[, .(ux=mean(x), uy=mean(y)), keyby=grp]
DT[DTsum, `:=`(x_ux=x - ux, y_uy=y - uy)]
DT[, `:=`(x_ux.y_uy=x_ux * y_uy, x_ux2=x_ux^2)]
DTsum <- DT[, .(x_ux.y_uy=sum(x_ux.y_uy), x_ux2=sum(x_ux2)), keyby=grp]
res.slope.dt2 <- DTsum[, .(grp, V1=x_ux.y_uy / x_ux2)]

Whereas if GForce was supported in := we could do:

DT <- data.table(grp, x, y)
DT[, `:=`(ux=mean(x), uy=mean(y)), keyby=grp]
DT[, `:=`(x_ux=x - ux, y_uy=y - uy)]
DT[, `:=`(x_ux.y_uy=x_ux * y_uy, x_ux2=x_ux^2)]
DTsum <- DT[, .(x_ux.y_uy=sum(x_ux.y_uy), x_ux2=sum(x_ux2)), keyby=grp]
res.slope.dt3 <- DTsum[, .(grp, x_ux.y_uy/x_ux2)]

Which looks cleaner and should be faster.

brodieG · 2019-06-10T19:49:47Z

Discussions with @MichaelChirico make me realize a very close cousin to this issue is:

>   DT <- data.table(x, y, grp)
>   DT[, .(x, mean(x)), keyby=grp]
Detected that j uses these columns: x 
Finding groups using forderv ... 1.049s elapsed (0.946s cpu) 
Finding group sizes from the positions (can be avoided to save RAM) ... 0.011s elapsed (0.011s cpu) 
lapply optimization is on, j unchanged as 'list(x, mean(x))'
GForce is on, left j unchanged
Old mean optimization changed j from 'list(x, mean(x))' to 'list(x, .External(Cfastmean, x, FALSE))'
Making each group and running j (GForce FALSE) ... 
  collecting discontiguous groups took 1.293s for 999953 groups
  eval(j) took 1.860s for 999953 calls
5.517s elapsed (3.862s cpu) 
              grp         x        V2
       1:       1 0.2151365 0.5512966
       2:       1 0.5358256 0.5512966
       3:       1 0.8496598 0.5512966
       4:       1 0.8480730 0.5512966
       5:       1 0.3464458 0.5512966
      ---                            
 9999996: 1000000 0.2601940 0.5474986
 9999997: 1000000 0.7940921 0.5474986
 9999998: 1000000 0.3825493 0.5474986
 9999999: 1000000 0.1786861 0.5474986
10000000: 1000000 0.9179119 0.5474986

Cross linking to #523.

arunsrinivasan added High enhancement labels Oct 29, 2015

arunsrinivasan added this to the v1.9.8 milestone Oct 29, 2015

arunsrinivasan added the performance label Oct 29, 2015

arunsrinivasan self-assigned this Nov 12, 2015

arunsrinivasan modified the milestones: v2.0.0, v1.9.8 Apr 10, 2016

mattdowle removed this from the Candidate milestone May 10, 2018

MichaelChirico added the GForce issues relating to optimized grouping calculations (GForce) label Feb 25, 2019

franknarf1 mentioned this issue Jul 24, 2019

uniqueN could be GForce optimised + GForce could be optimised for := too. #3725

Closed

arunsrinivasan removed their assignment Aug 31, 2019

myoung3 mentioned this issue Sep 29, 2019

making use of data.table GFORCE optimizations kaufman-lab/timeperiods#15

Closed

jangorecki removed the High label Oct 15, 2020

MichaelChirico mentioned this issue May 14, 2021

Master list of most-requested issues #3189

Open

76 tasks

ben-schwen mentioned this issue Oct 9, 2021

gshift as gforce optimized shift #5205

Merged

7 tasks

ben-schwen mentioned this issue Nov 1, 2021

:= works with GForce #5245

Merged

mattdowle added this to the 1.14.3 milestone Dec 9, 2021

mattdowle closed this as completed in #5245 Dec 9, 2021

mattdowle mentioned this issue Mar 15, 2022

gforce := follow up #5348

Merged

ben-schwen mentioned this issue Jun 5, 2022

:=(...) with shift() - gforce follow-up #5245, #5348 #5404

Merged

jangorecki modified the milestones: 1.14.9, 1.15.0 Oct 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GForce should be able to work with `:=` as well. #1414

GForce should be able to work with `:=` as well. #1414

arunsrinivasan commented Oct 29, 2015

franknarf1 commented May 18, 2016 •

edited

brodieG commented Mar 11, 2019

brodieG commented Jun 10, 2019

GForce should be able to work with := as well. #1414

GForce should be able to work with := as well. #1414

Comments

arunsrinivasan commented Oct 29, 2015

franknarf1 commented May 18, 2016 • edited

brodieG commented Mar 11, 2019

brodieG commented Jun 10, 2019

GForce should be able to work with `:=` as well. #1414

GForce should be able to work with `:=` as well. #1414

franknarf1 commented May 18, 2016 •

edited