grouping sets #1377

jangorecki · 2015-10-05T20:51:39Z

Some keywords: GROUPING SETS, ROLLUP, CUBE, GROUPING
Some references: postgres, Oracle, SQL Server, groupings combined with arbitrary functions

Grouping sets and friends are useful to pre-calculate various aggregation levels, which is often desired. Api for that feature in data.table is not very friendly, see Aggregating sub totals and grand totals with data.table.

In case of rollup those are aggregations for provided by from top to bottom. See description from postgres man, and example code below.

ROLLUP ( e1, e2, e3, ... )

is equivalent to:

GROUPING SETS (
    ( e1, e2, e3, ... ),
    ...
    ( e1, e2 )
    ( e1 )
    ( )
)

I wonder if there could be cheap speed-up of that process? this is potentially heavy computing task. Would be great to have computation of grouping sets feature developed in C, so all the rollup/cube and other features could be built on top of grouping sets more easily in R still utilizing full speed.

Answers to update when closed:

https://stackoverflow.com/questions/21366138/multi-level-aggregations-like-grouping-sets-via-ddply-or-other-r-function

library(plyr)
grp.cols <- c("vs", "am", "gear", "carb", "cyl")
plyr.r = do.call(
    rbind.fill,
    lapply(1:length(grp.cols), function(x) ddply(mtcars, grp.cols[1:x], summarize, agg=mean(mpg)))
)

library(data.table) # 1.9.7+
dt.r = rollup(as.data.table(mtcars), j = .(agg=mean(mpg)), by=grp.cols)
all.equal(
    as.data.table(plyr.r),
    dt.r[-.N], # exclude grand total, not present in BrodieG answer
    ignore.row.order = TRUE,
    ignore.col.order = TRUE
)
#[1] TRUE
# install.packages("data.table", type = "source", repos = "https://Rdatatable.github.io/data.table")

https://stackoverflow.com/questions/9315258/aggregating-sub-totals-and-grand-totals-with-data-table/

library(data.table)
set.seed(1)
DT = data.table(
    group=sample(letters[1:2],100,replace=TRUE), 
    year=sample(2010:2012,100,replace=TRUE),
    v=runif(100))

cube(DT, mean(v), by=c("group","year"))
#    group year        V1
#1:     a 2011 0.4176346
#2:     b 2010 0.5231845
#3:     b 2012 0.4306871
#4:     b 2011 0.4997119
#5:     a 2012 0.4227796
#6:     a 2010 0.2926945
#7:    NA 2011 0.4463616
#8:    NA 2010 0.4278093
#9:    NA 2012 0.4271160
#10:     a   NA 0.3901875
#11:     b   NA 0.4835788
#12:    NA   NA 0.4350153
cube(DT, mean(v), by=c("group","year"), id=TRUE)
#    grouping group year        V1
#1:        0     a 2011 0.4176346
#2:        0     b 2010 0.5231845
#3:        0     b 2012 0.4306871
#4:        0     b 2011 0.4997119
#5:        0     a 2012 0.4227796
#6:        0     a 2010 0.2926945
#7:        2    NA 2011 0.4463616
#8:        2    NA 2010 0.4278093
#9:        2    NA 2012 0.4271160
#10:        1     a   NA 0.3901875
#11:        1     b   NA 0.4835788
#12:        3    NA   NA 0.4350153

# install.packages("data.table", type = "source", repos = "https://Rdatatable.github.io/data.table")

Some other questions can get new answers also:

The text was updated successfully, but these errors were encountered:

RC - Grouping Sets, rollup, cube. #1377

UweBlock · 2018-01-26T10:50:17Z

https://stackoverflow.com/questions/42376032/how-to-use-data-table-within-functions-and-loops

library(data.table) # version 1.10.5 required
dt = data.table(ggplot2::diamonds)
groupingsets(dt, c(lapply(.SD, mean), list(COUNT = .N)), 
     by = names(dt)[2:4], .SDcols = 5:10, id = FALSE,
     sets = as.list(names(dt)[2:4]))

          cut color clarity    depth    table    price        x        y        z COUNT
 1:     Ideal    NA      NA 61.70940 55.95167 3457.542 5.507451 5.520080 3.401448 21551
 2:   Premium    NA      NA 61.26467 58.74610 4584.258 5.973887 5.944879 3.647124 13791
 3:      Good    NA      NA 62.36588 58.69464 3928.864 5.838785 5.850744 3.639507  4906
 4: Very Good    NA      NA 61.81828 57.95615 3981.760 5.740696 5.770026 3.559801 12082
 5:      Fair    NA      NA 64.04168 59.05379 4358.758 6.246894 6.182652 3.982770  1610
 6:        NA     E      NA 61.66209 57.49120 3076.752 5.411580 5.419029 3.340689  9797
 7:        NA     I      NA 61.84639 57.57728 5091.875 6.222826 6.222730 3.845411  5422
 8:        NA     J      NA 61.88722 57.81239 5323.818 6.519338 6.518105 4.033251  2808
 9:        NA     H      NA 61.83685 57.51781 4486.669 5.983335 5.984815 3.695965  8304
10:        NA     F      NA 61.69458 57.43354 3724.886 5.614961 5.619456 3.464446  9542
11:        NA     G      NA 61.75711 57.28863 3999.136 5.677543 5.680192 3.505021 11292
12:        NA     D      NA 61.69813 57.40459 3169.954 5.417051 5.421128 3.342827  6775
13:        NA    NA     SI2 61.77217 57.92718 5063.029 6.401370 6.397826 3.948478  9194
14:        NA    NA     SI1 61.85304 57.66254 3996.001 5.888383 5.888256 3.639845 13065
15:        NA    NA     VS1 61.66746 57.31515 3839.455 5.572178 5.581828 3.441007  8171
16:        NA    NA     VS2 61.72442 57.41740 3924.989 5.657709 5.658859 3.491478 12258
17:        NA    NA    VVS2 61.66378 57.02499 3283.737 5.218454 5.232118 3.221465  5066
18:        NA    NA    VVS1 61.62465 56.88446 2523.115 4.960364 4.975075 3.061294  3655
19:        NA    NA      I1 62.73428 58.30378 3924.169 6.761093 6.709379 4.207908   741
20:        NA    NA      IF 61.51061 56.50721 2864.839 4.968402 4.989827 3.061659  1790

intael · 2018-11-03T19:45:32Z

This is just awesome. Makes working with pivot tables in Shiny way easier.

jangorecki · 2022-04-25T07:57:55Z

Could be, you are welcome to file a feature request (including minimal example of current vs requested), or if one already exists, then upvote existing one.

arunsrinivasan added the feature request label Oct 6, 2015

This comment was marked as off-topic.

Sign in to view

jangorecki mentioned this issue Apr 21, 2016

[R-Forge #2695] Add 'margin' argument to [.data.table #574

Closed

jangorecki self-assigned this Apr 21, 2016

jangorecki mentioned this issue Apr 21, 2016

RC - Grouping Sets, rollup, cube. #1377 #1667

Merged

19 tasks

jangorecki added a commit that referenced this issue Apr 24, 2016

Grouping Sets: rollup, cube, Closes #1377

114a2f5

jangorecki mentioned this issue May 11, 2016

[Request] Piping into Rbindlist with a data.table, not automatically assigning "l = ." #1697

Closed

jangorecki added this to the v1.9.10 milestone Nov 23, 2016

mattdowle closed this as completed in ac85018 Aug 7, 2017

mattdowle added a commit that referenced this issue Aug 7, 2017

Merge pull request #1667 from Rdatatable/groupingsets

1bc0553

RC - Grouping Sets, rollup, cube. #1377

mattdowle modified the milestones: v1.10.6, Candidate Aug 7, 2017

avimallu mentioned this issue Apr 3, 2023

GROUPING SETS feature in Polars pola-rs/polars#7948

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

grouping sets #1377

grouping sets #1377

jangorecki commented Oct 5, 2015 •

edited

This comment was marked as off-topic.

UweBlock commented Jan 26, 2018

intael commented Nov 3, 2018

jangorecki commented Apr 25, 2022 •

edited

grouping sets #1377

grouping sets #1377

Comments

jangorecki commented Oct 5, 2015 • edited

This comment was marked as off-topic.

UweBlock commented Jan 26, 2018

intael commented Nov 3, 2018

jangorecki commented Apr 25, 2022 • edited

jangorecki commented Oct 5, 2015 •

edited

jangorecki commented Apr 25, 2022 •

edited