Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SHM size exceeded Error #2272

Closed
tcederquist opened this issue Jul 14, 2017 · 4 comments
Closed

SHM size exceeded Error #2272

tcederquist opened this issue Jul 14, 2017 · 4 comments
Labels

Comments

@tcederquist
Copy link

Feature request, check the size of /dev/shm to see if it matches the mmap size. In my case I was using R inside a docker container and the default size for /dev/shm is 64M. When the code does a mmap on a piped stream (such as with hadoop fs -cat /myfile.csv) it will only read the shm size bytes from the pipe into mmap. It does not report an error via the C api which I suspect is normal. However, debugging why fread complained about the file format resulted in a deep dive into the R and C code of data.table to discover it uses this mechanism. The error reported (random message based on where my pipe happened to be cut off):

 (ERROR):Expected sep (',') but ' ' ends field 0 when detecting types from point 10: 14183667

This can be reproduced by doing the following:

  • Build a file that is ~5 meg large than /dev/shm
  • Adjust the /dev/shm to something like 64M (this is the default for a Docker container)
  • Run fread on "cat ~/myfile.csv" <-- cat creates the pipe
  • Docker V1.12+
  • Centos latest image from docker hub
  • R-open v3.4.0 (microsoft)

In the code: https://github.com/Rdatatable/data.table/blob/master/src/fread.c
Around line 788 perhaps it should check the size of /dev/shm to see if it matches the file it just read into memory. In my case in docker here is the verbose output of the failed test condition:

 > dat.df3<-fread("/opt/cloudera/parcels/CDH/bin/hadoop fs -cat /user/tcederquist/tim_pop_comm_14_5 | head -3668850" ,sep=",", header=TRUE, verbose=TRUE)
 Input contains no \n. Taking this to be a filename to open
 File opened, filesize is 0.062500 GB.
 Memory mapping ... ok
 ....basic output
 Type codes (point  8): 1111
 Type codes (point  9): 1111
 (ERROR):Expected sep (',') but ' ' ends field 0 when detecting types from point 10: 14183667

Expected results:

 File opened, filesize is 0.373524 GB.

In addition, when it fails it knows internally the line # and other useful information when it failed. I had to zero in on the value by hand before I discovered the verbose flag. Would be nice if normal error messages indicated the row #. vebose=T shows the location when it calculates the # of delimiters such as for this test case would have be useful output on error (since I knew the file had 20M records):

 Count of eol: 3668839 (including 0 at the end)
 nrow = MIN( nsep [11006514] / (ncol [4] -1), neol [3668839] - endblanks [0] ) = 3668838
@tcederquist
Copy link
Author

For anyone finding this same issue, the short term fix is to increase the shared memory size of the container or in your OS if your /dev/shm is too small. Typical modern OS's use 50% of your available memory. In my 64G amazon ec2 instance it I set the docker container to use:

 docker run --shm-size="30g" ... other stuff ...

@mattdowle mattdowle added the fread label Mar 3, 2018
@mattdowle
Copy link
Member

Agreed. Sorry about that.
A recent change in dev is this from news :

Ram disk (/dev/shm) is no longer used for the output of system command input. Although faster when it worked, it was causing too many device full errors; e.g., #1139 and zUMIs/19. Thanks to Kyle Chung for reporting. Standard tempdir() is now used. If you wish to use ram disk, set TEMPDIR to /dev/shm; see ?tempdir.

Please try dev 1.10.5 and open a new issue if it's still a problem.

@barosmarko
Copy link

Guys, I'm the one who reported zUMIS/19. I've tried your dev 1.10.5 and it works perfectly. Great work guys!

@dss010101

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants