You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The default top limit of file descriptors as of OSX Monterey 12.4 seems to be 256 file descriptors (a bit low when compared with the linux one, which is 1024, IIRC), but regardless:
(base) rvalls@m1 out % dsq SBJ02239_PRJ221187.parquet
open SBJ02239_PRJ221187.parquet: too many open files
(base) rvalls@m1 out % dsq SBJ02239_PRJ221187.parquet "SELECT * FROM {} LIMIT 10"
open SBJ02239_PRJ221187.parquet: too many open files
(base) rvalls@m1 out % ulimit -a
-t: cpu time (seconds) unlimited
-f: file size (blocks) unlimited
-d: data seg size (kbytes) unlimited
-s: stack size (kbytes) 8176
-c: core file size (blocks) 0
-v: address space (kbytes) unlimited
-l: locked-in-memory size (kbytes) unlimited
-u: processes 2666
-n: file descriptors 256
(base) rvalls@m1 out % ulimit -n 8912
(base) rvalls@m1 out % dsq SBJ02239_PRJ221187.parquet "SELECT * FROM {} LIMIT 10"
(cannot show the actual contents of the file, but it works after `ulimit`)
How come so many (intermediate?) files are required to open a regular .parquet file around the ~200KB filesize mark?
The text was updated successfully, but these errors were encountered:
Sorry can't share the contents for that specific file, but OSX Instruments tells me that dsq performs ~343 open() syscalls, most of them on the same input file, thus creating a ton of file descriptors :-!... why can't the same fd be shared?:
The file contains around 1500 columns in its original .tsv form. Not sure I'll have time to put together a reproducer with non-private data, but I hope that serves as some kind of hint on what might be going wrong? Adjusting ulimit seems like a bad patch for what it seems to be an underlying library issue?
The default top limit of file descriptors as of OSX Monterey 12.4 seems to be 256 file descriptors (a bit low when compared with the linux one, which is 1024, IIRC), but regardless:
How come so many (intermediate?) files are required to open a regular
.parquet
file around the ~200KB filesize mark?The text was updated successfully, but these errors were encountered: