Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Row operations in data.table using by = .I #1732

Closed
rafapereirabr opened this issue Jun 7, 2016 · 6 comments · Fixed by #5235
Closed

Row operations in data.table using by = .I #1732

rafapereirabr opened this issue Jun 7, 2016 · 6 comments · Fixed by #5235
Milestone

Comments

@rafapereirabr
Copy link

I was exploring alternatives of how to do row operations in data.table and I think I've found a bug.

These three lines of code should return the same result. However, the result of by = .I seems return a wrong result.

dt[, sdd := sum(.SD[, 2:4, with=FALSE]), by = 1:NROW(dt) ]
dt[, rowpos := .I][ , sdd := sd(.SD[, -1, with=FALSE]), by = rowpos ]
dt[ , sdd := sd(.SD[, -1, with=FALSE]), by = .I ]

sample data:
dt <- data.table(V0 =LETTERS[c(1,1,2,2,3)], V1=1:5, V2=3:7, V3=5:1)

@eantonya
Copy link
Contributor

eantonya commented Jun 7, 2016

Alternatively, by = .I should give an error, though it would be nice to have it work with an i-expression present.

Similar issue with using .N in by (again smth one might naively try - dt[, ..., by = 1:.N] - although this particular expression gives an error, it's not really the "right" error).

@leoluyi
Copy link

leoluyi commented Jun 26, 2018

Why not just add feature "rowwise" by using by = .I, which sounds intuitive.

#1063

@rafapereirabr
Copy link
Author

Hi @leoluyi ,

the behaviour of by = .I is equivalent to by = NULL . Have a look at this SO discussion https://stackoverflow.com/questions/37667335/row-operations-in-data-table-using-by-i

@rafapereirabr
Copy link
Author

Thank you all. This is extremely helpful ! I may have found a small bug though. by = .I seems to work with sum() operation but it throws and error with a sd() operation. See the reprex below.

library(data.table)

dt <- data.table(V0 =LETTERS[c(1,1,2,2,3)],
                 V1=1:5,
                 V2=3:7,
                 V3=5:1)

# it works with sum operation
dt[ ,  ssum := sum(.SD[, -1, with=FALSE]), by = .I ] 
dt
>    V0 V1 V2 V3 ssum
> 1:  A  1  3  5   55
> 2:  A  2  4  4   55
> 3:  B  3  5  3   55
> 4:  B  4  6  2   55
> 5:  C  5  7  1   55

# id does NOT work with standard deviation operation
dt[ ,  sdd := sd(.SD[, -1, with=FALSE]), by = .I ] 

> Error in is.data.frame(x) : 
>   'list' object cannot be coerced to type 'double'

@ben-schwen
Copy link
Member

ben-schwen commented Jan 31, 2023

@rafapereirabr

Both work for me on dev.version. Note that this might only come to CRAN with 1.14.8, since the last CRAN versions were merely hotfixes.

library(data.table)

dt = data.table(V0 =LETTERS[c(1,1,2,2,3)], V1=1:5, V2=3:7, V3=5:1)

dt[,  sum(.SD[, -1, with=FALSE]), by = .I]
#>        I    V1
#>    <int> <int>
#> 1:     1     9
#> 2:     2    10
#> 3:     3    11
#> 4:     4    12
#> 5:     5    13
dt[,  sd(.SD[, -1, with=FALSE]), by = .I]
#>        I       V1
#>    <int>    <num>
#> 1:     1 2.000000
#> 2:     2 1.154701
#> 3:     3 1.154701
#> 4:     4 2.000000
#> 5:     5 3.055050

What's your output for sessionInfo() ?

@rafapereirabr
Copy link
Author

Oh, I thought this was already implemented in version 1.14.6. I've tried the dev version and it works fine! Thanks for the clarification, @ben-schwen .

@jangorecki jangorecki modified the milestones: 1.14.9, 1.15.0 Oct 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants