-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Make tidyr functions play nicely with data.table #171
Comments
I'd be willing to review pull requests to do this, but I don't have the time (or knowledge of data.table) to do it myself. |
@mindymallory, What you describe is not an issue with There are few error in the tests you've done with
So the code you have tried have these problems independently of Here is some ways to use library(data.table)
library(tidyr)
dt <- data.table(
year = c(2015, NA, NA, NA),
trt = c("A", NA, "B", NA)
)
tracemem(dt)
#> [1] "<0000000019B796C8>" If you want to use dt[, fill(.SD, year)]
#> year trt
#> 1: 2015 A
#> 2: 2015 NA
#> 3: 2015 B
#> 4: 2015 NA You can then select a column with dt[, fill(.SD, year)$year]
#> [1] 2015 2015 2015 2015 If you give several columns to fill, it returns these columns filled. dt[, fill(.SD, year, trt)]
#> year trt
#> 1: 2015 A
#> 2: 2015 A
#> 3: 2015 B
#> 4: 2015 B So, how apply You could extract the vector you want and use dt[, year := fill(.SD, year)$year]
dt
#> year trt
#> 1: 2015 A
#> 2: 2015 NA
#> 3: 2015 B
#> 4: 2015 NA
dt[, trt := fill(.SD, trt)$trt]
dt
#> year trt
#> 1: 2015 A
#> 2: 2015 A
#> 3: 2015 B
#> 4: 2015 B you could also apply the change at once using dt <- data.table(
year = c(2015, NA, NA, NA),
trt = c("A", NA, "B", NA)
)
tracemem(dt)
#> [1] "<0000000006104428>"
dt[, c("year", "trt") := .(fill(.SD, year)$year, fill(.SD, trt)$trt)]
dt
#> year trt
#> 1: 2015 A
#> 2: 2015 A
#> 3: 2015 B
#> 4: 2015 B However, as dt <- data.table(
year = c(2015, NA, NA, NA),
trt = c("A", NA, "B", NA)
)
tracemem(dt)
#> [1] "<0000000005B57B98>"
dt[, c("year", "trt") := fill(.SD, year, trt)]
dt
#> year trt
#> 1: 2015 A
#> 2: 2015 A
#> 3: 2015 B
#> 4: 2015 B Finally, you could use an approach more suited for programming. dt <- data.table(
year = c(2015, NA, NA, NA),
trt = c("A", NA, "B", NA)
)
tracemem(dt)
#> [1] "<00000000187CF230>"
cols <- c("year", "trt")
dt
#> year trt
#> 1: 2015 A
#> 2: NA NA
#> 3: NA B
#> 4: NA NA
dt[, (cols) := fill_(.SD, cols), .SDcols = cols]
dt
#> year trt
#> 1: 2015 A
#> 2: 2015 A
#> 3: 2015 B
#> 4: 2015 B Note that using this way, you could apply dt <- data.table(
year = c(2015, NA, NA, NA),
trt = c("A", NA, "B", NA),
trt2 = c(NA, "C", NA, "D")
)
tracemem(dt)
#> [1] "<00000000049EFD30>"
cols <- c("year", "trt")
dt
#> year trt trt2
#> 1: 2015 A NA
#> 2: NA NA C
#> 3: NA B NA
#> 4: NA NA D
dt[, (cols) := fill_(.SD, cols), .SDcols = cols]
dt
#> year trt trt2
#> 1: 2015 A NA
#> 2: 2015 A C
#> 3: 2015 B NA
#> 4: 2015 B D Hope this answer your question and the feature request you want. Note that I use @hadley, I think you won't have any PR to review on this subject. |
@cderv Thanks for this detailed explanation! You've saved me a lot of memory usage! |
The tidyr functions don't seem to work with data.table functionality. For example, I would like to fill missing values of a large data.table in the following way:
but the code above throws the following error:
Since sequentially applying
fill()
to each column does not modify in place, the method below is costly for large data.tables.Thanks for taking a look!
The text was updated successfully, but these errors were encountered: