New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast 'groups' of individual rows #1004
Comments
I guess it's because d[, j, by=1:nrow(d)] But I agree Update: Seems to be linked to this SO post. |
I can't think of a reason for any base function (including Also, it'd be nice to know what kind of operations require having to group on each row.. If it's useful to have, we could try and optimise it internally. |
Yes, my question is a followup from that SO post you mention above. Here are two examples, and I'd be curious to hear your thoughts.
I said mapply didn't play well with data.table because I was doing this:
Which seemed nice and natural, but wasn't working because it defaults to SIMPLIFY=TRUE, which returns an array. This works:
|
Also, will |
David,
Such a function exists in Also note that |
Arun, Thanks for the replies.
As far as using vecseq for 2), that's cool to know about, but this is still a "just vectorize it" solution. What would you recommend if vecseq didn't exist? [rhetorical question] Writing a new C function to vectorize? In this situation (and in most situations I encounter), the speed is not important enough to invest that kind of time into vectorization. Certainly a large proportion of the time when a novice user wants to do something "by row", there is a relatively straightforward vectorization. But I don't think that's always the case. I tried to give a couple examples above. Whether somebody's a novice or expert, vectorizing certain computations will require more time than it's worth in that situation, and a by-row operation is sufficient. So, I'm trying to make the case that 'by-row' should be more fully supported, without having to face the indignity of |
I haven't been through the code to see how .EACHI works and so I don't know how this would impact the current code, but data.table could work in a way that
will execute
would do what David says without the need of new syntax. |
On 1), that's great! And I agree. @nigmastar I thought about it to.. but using |
Any comments about possible API to use simply |
It seems harder than it needs to be to group by individual row in a data.table. An idiom I have seen suggested several times is something like:
I'm not familiar with the data.table codebase, but I imagine it might not be difficult add a feature that gives the same effect described above (or better), without having to create the id decorator column or (worse) setting the key.
might be a reasonable notation.
I sometimes wonder if this is not done for philosophical reasons, because doing things by row is "wrong". But the request here can be viewed as a way to use data.table to conveniently vectorize operations in a data.table context. Suppose we have a function f that is not vectorized and cannot easily be vectorized (e.g. it just wraps some C function). For example,
Would be nice to just be:
Usual R approaches for vectorization that I know of (e.g. mapply) don't play well with data.table, in my experience.
The text was updated successfully, but these errors were encountered: