-
Notifications
You must be signed in to change notification settings - Fork 11
union_all() fills up disk when passed many tables #197
Comments
Could you perhaps provide me with a reproducible example? Its possible there is an issue here. |
Unfortunately, I'm not allowed to share the dataset. As I said, it's made of about 160 individual files with 10,000 to 600,000 rows, for a total of 19M rows. There are 50 variables, most of them strings with a small number of unique values. I can try to generate random data to reproduce the problem, but maybe you already have toy datasets for that? |
I don't have a toy dataset, so generated files that provoke the issue would be welcome! |
Here's a simple example where the database takes 500MB before calling library(dplyr)
library(dbplyr)
library(MonetDBLite)
x <- sample(LETTERS[1:10], 100000, replace=TRUE)
df <- data.frame(x=x)
for(i in 1:50)
df[paste0("x", i)] = x
db <- src_monetdblite("test.monetdb", create=TRUE)
tables <- list()
for(i in 1:100)
tables[[i]] <- copy_to(db, df, paste0("table", i), temporary=FALSE)
total <- compute(Reduce(union_all, tables)) |
I'm getting
|
Hmm, weird. Is that with the latest released dplyr/dbplyr? I'm using dplyr_0.7.4 and dbplyr_1.1.0. What's the |
|
Actually that was just because of a silly mistake: should have used |
confirmed there is something really fishy going on here. Thanks for creating the example! |
I need to concatenate about 160 tables stored in a MonetDBLite database into a single table. I used to do so with the SQLite backend to
dplyr
, like this:Reduce(union_all, tables_list)
. Unfortunately, this doesn't work with MonetDBLite: the database grows from 3.5GB to 70GB, and the merge fails after there's no free disk space left.I've eventually found a workaround by splitting the operation in smaller parts:
In the end, the database only takes 8.5GB (including original small tables and the big concatenated table).
Is this expected? FWIW, I've checked that the SQL commands generated by dplyr are very clean, i.e. a series of
(SELECT * FROM TABLE1) UNION ALL ...
. I was wondering whether some temporary files were not freed as they should in the middle of the operation.The text was updated successfully, but these errors were encountered: