New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add total.label argument for groupingsets, cube, rollup #5973
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #5973 +/- ##
==========================================
+ Coverage 97.51% 97.52% +0.01%
==========================================
Files 80 80
Lines 14915 14987 +72
==========================================
+ Hits 14544 14616 +72
Misses 371 371 ☔ View full report in Codecov by Sentry. |
As for the arg name I would keep it as "label" |
R/groupingsets.R
Outdated
@@ -57,6 +57,16 @@ groupingsets.data.table = function(x, j, by, sets, .SDcols, id = FALSE, jj, ...) | |||
stopf("Argument 'sets' must be a list of character vectors.") | |||
if (!is.logical(id)) | |||
stopf("Argument 'id' must be a logical scalar.") | |||
if (!(is.null(total.label) || | |||
(is.character(total.label) && length(total.label) == 1L) || | |||
(is.list(total.label) && all(vapply_1b(total.label, is.character)) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can rule out list because AFAIU char vector will be sufficient. Please correct me if I am wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jangorecki Thanks for reviewing. To make sure I understand you correctly, do you mean that total.label = list(Region = "National", Product = "Total")
should not be allowed, and it should instead be total.label = c(Region = "National", Product = "Total")
? I would have to try it to be sure, but I think that would be OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as long as each element in the list is scalar then list is not necessary and character vector is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jangorecki Thanks, I'll try that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I was wrong and labels must not necessarily be of character type, is that correct? Then we need a list type so mixed types can be provided.
Would be good to use such little bit more complex example so it comes up straight away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jangorecki. Currently it only supports character and factor grouping variables. For other types, it's left as NA.
In #5351 you wrote
Then we of course want total.label="Total" to be recycled to list(State = "Total", Group = "Total"). Of course only if all columns are character.
I interpreted this as meaning the label argument should only apply to variables of type character, but now I realise that maybe I interpreted incorrectly and you might have meant that total.label = "Total"
should only be recycled to character variables, but non-character labels for non-character variables are still allowed in a list.
A couple of options (and there might be others):
- Only apply the label argument to grouping variables of type character or factor, and require
label = c(Region = "National", Product = "Total")
instead oflabel = list(Region = "National", Product = "Total")
. - Allow the label argument to apply to grouping variables that aren't character or factor, and allow total.label to be a list so that different types can be specified for different variables.
Which do you prefer? As I mentioned above, if I had a date or integer grouping variable, I would probably still want a character total label (e.g. "Total") rather than a date or integer total label. To do this, currently the user would have to convert the variable to character or factor before using groupingsets/cube/rollup.
Maybe for non-character grouping variables, we could allow a choice of specifying a label of the same type as the variable, or a character label, and if a character label is provided then the variable will be converted to character in the output. For example, if A
is character and B
is integer, then label = list(A = "LabelA", B = 999L)
and label(A = "LabelA", B = "LabelB")
would both be allowed, and the second would result in B being character in the output. If we do that, then I think I would favour having label = "Label"
apply to all variables, not just character variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing column type is not an option, column type needs to be as original columns used in 'by' and not in labels (so no for option B). Option 2. We cannot apply it to all variables because we must match column types, we don't just print results but return a data.table so saying it is only for output won't be useful. Our output is data.table and column types cannot depend on labels arg. It is the labels arg has to match to column types in the output 'by'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "output", I meant the data.table that is returned. Sorry for the confusion.
Thanks for clarifying that the column types have to stay the same. That makes it simpler.
Do you think the shorter form label = "Total"
should be allowed, or should label
always be a named list? If the shorter form is allowed, should it always have to be character, or would something like label = 999L
be allowed and apply to all the integer grouping variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be allowed but for non-char column will result an NA, which anyway is likely to be expected. Similarly 999L can be also allowed but would result NA for non int columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll give that a try.
@jangorecki Thanks for your comment. Do you mean change it from "total.label" to "label", or keep it as "total.label"? |
Yes, change to shorter "label" groupings sets are all about (sub-)totals so we don't need to be so explicitly in naming variable so verbosely. Possibly even better than "label" will he better to use "labels" so it is clear that multiple values are supported. |
Very good PR. What would be even more useful is minimal example usage in NEWS entry. Including different data types for labels. |
@jangorecki I prefer "total.label" or "total.labels" because they make it easier to guess where the label is going to be used, but I don't feel strongly about this, so I'm happy to change it to "labels". |
@jangorecki Thanks for the feedback. I can add an example to the NEWS entry. |
Another reason why I prefer single word argument name is it does not enforce any style, like total.label or totalLabel or total_label will do. |
OK, I can change it. Maybe "label" is better than "labels", since "labels" is a base function. |
Add 'label' argument to the groupingsets.data.table(), cube.data.table(), and rollup.data.table() functions.
Add to the Usage, Arguments, Details, and Examples sections.
@jangorecki I've updated the functions, tests, documentation, and news, based on our discussions. When you have time, please let me know what you think. I wasn't sure about the correct way to update a PR. Apologies if I've done it incorrectly. I don't understand the reason for the red cross next to the News commit. When I click on the cross, it says "R-CMD-check / windows-latest (devel) (pull_request)" was cancelled, but in the list of checks, it says that check was successful. Thanks. |
Closes #5351
Added a
total.label
argument to therollup.data.table
,cube.data.table
, andgroupingsets.data.table
functions.A couple of comments:
all.label
might be a better name thantotal.label
, since the thing being calculated won't always be a total. But I left it astotal.label
because the documentation uses the terminology "(sub-)totals", and the word "total" is used in other software, so "total" seems to be widely accepted.