New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
why data.table is faster with vectorized column subset than list column subset #3477
Comments
You can fix the formatting of your post by using a single line of three backticks before and after the code chunk:
I guess repeatedly selecting columns from small tables is something that should, and in most cases can, be avoided...? Because Regarding other ways to approach your problem (and maybe this would be a better fit for Stack Overflow if you want to talk about it more) ... Depending on what else you want to do with the table, you could just delete the other cols, If you still prefer to take the subset, grabbing column pointers is much faster than either option you considered here, though this way edits to the result will affect the original table:
(I took randomization out of your example and reduced # times in the benchmark because I was impatient.) I've never found a way to directly call R's list subset (which gets used after the Regarding "edits to the result will modify the original table", I mean:
|
Ok, I have been learnt something new and speedy (the oddballs) today and I have been taking note of that there is a trade-off between speed and parsimonious coding. So the glass is half full! Thanks! |
I guess #852 related |
I like this data.table stuff, evenly for its execution speed and for its parsimonious way of scripting.
I use it even on small tables as well.
I regularly subset tables this way: DT[, .(id1, id5)]
and not this way: DT[, c("id1", "id5")]
Today I measured speed of the two and I have been astonished of the speed difference on small tables. The parsimonious method is way slower.
Is this difference something intended?
Is there aspiration to make the parsimonious way to converge in terms of execution speed to the other one?
(It counts when I have to subset several small tables in a repetitive way.)
Ubuntu 18.04
R version 3.5.3 (2019-03-11)
data.table 1.12.0
RAM 32GB
Intel® Core™ i7-8565U CPU @ 1.80GHz × 8
The text was updated successfully, but these errors were encountered: