Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible fits memmap bug: memmap just doesn't work. #1380

Closed
keflavich opened this issue Aug 26, 2013 · 12 comments
Closed

Possible fits memmap bug: memmap just doesn't work. #1380

keflavich opened this issue Aug 26, 2013 · 12 comments

Comments

@keflavich
Copy link
Contributor

I'm trying to load some gigantic FITS record tables using memmap=True, and I'm getting error: [Errno 12] Cannot allocate memory.

An example session:

filename = '/home/sdfits/AGBT12B_221_01/AGBT12B_221_01.raw.acs.fits'
import astropy.io.fits as fits
filefits = fits.open(filename,memmap=True)
data = filefits[2].data[:50]

The error is at this line:

/users/aginsbur/anaconda/lib/python2.7/site-packages/numpy/core/memmap.py(253)__new__()
--> 253             mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)

ipdb> bytes
23718381056L
ipdb> bytes/1024**2
22619L
ipdb> start
413921280
ipdb> acc
3

I don't really know what's going on, but I suspect memmap is improperly deciding on how much data to read. Any tips on how to further debug? Is this actually a FITS issue, or a numpy issue?

Details:

In [15]: numpy.__version__
Out[15]: '1.7.1'

In [16]: astropy.__version__
Out[16]: '0.2.4'

In [18]: sys.maxint
Out[18]: 9223372036854775807
@embray
Copy link
Member

embray commented Aug 27, 2013

What OS?

What does ulimit -v return?

@keflavich
Copy link
Contributor Author

OS is some flavor of linux; don't know off the top of my head or the
easiest command to find out.

Also, was using anaconda install of python/astropy/numpy but upgraded
astropy via pip.

$ ulimit -v
unlimited

On Tue, Aug 27, 2013 at 4:02 PM, Erik Bray notifications@github.com wrote:

What OS?

What does ulimit -v return?


Reply to this email directly or view it on GitHubhttps://github.com//issues/1380#issuecomment-23374814
.

Adam

@embray
Copy link
Member

embray commented Aug 28, 2013

What does cat /proc/meminfo show?

@keflavich
Copy link
Contributor Author

$ cat /proc/meminfo
MemTotal:        1903396 kB
MemFree:          203864 kB
Buffers:          215320 kB
Cached:           884708 kB
SwapCached:         2268 kB
Active:           492052 kB
Inactive:         954324 kB
Active(anon):     165684 kB
Inactive(anon):   181096 kB
Active(file):     326368 kB
Inactive(file):   773228 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       1048568 kB
SwapFree:        1031460 kB
Dirty:                24 kB
Writeback:             0 kB
AnonPages:        344352 kB
Mapped:            65676 kB
Shmem:               432 kB
Slab:             191348 kB
SReclaimable:     151148 kB
SUnreclaim:        40200 kB
KernelStack:        2312 kB
PageTables:        22940 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     2000264 kB
Committed_AS:     847268 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      286128 kB
VmallocChunk:   34359439336 kB
HardwareCorrupted:     0 kB
AnonHugePages:     12288 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        8188 kB
DirectMap2M:     2070528 kB

@keflavich
Copy link
Contributor Author

...that seems like a tiny amount of memory; 2 GB? Hrmph.

@astrofrog
Copy link
Member

@keflavich - are you still seeing this issue?

@embray
Copy link
Member

embray commented Dec 2, 2013

Totally forgot about this. MemTotal is just the total available physical memory. These 2GB is not a lot, sure, but that shouldn't be the issue. You have about 32 TB for VmallocTotal which is what should matter here--in principle mmap should be able to use most of that. So there's something fishy going on here.

@embray
Copy link
Member

embray commented Dec 2, 2013

Ah! I think I see the issue here. By default PyFITS uses the MAP_PRIVATE when opening a file in readonly mode so that users can still modify the data array in place as they would if the entire file were mapped into main memory.

The problem is, that means in principle the entire file can be overwritten, so mmap needs to be able to allocate enough memory ahead of time should that occur. That's why this is happening here. PyFITS/Astropy should definitely catch that scenario and provide a more helpful error.

Currently there are two ways around this: You can open the file with mode='denywrite. I added that a while ago specifically for this case, but it's rarely used. That opens the mmap with MAP_SHARED | PROT_READ--this means the pages are read-only (any attempt to modify the array will result in an exception). But if all you need is to read the data this works fine, and doesn't require allocating any swap space I don't think.

Another possibility is to open with mode='update'. Then any changes to the array can be synced directly back to the file which is fine if you want that, but obviously not so much if you don't.

Looking at the man page, it looks like there's also a flag, at least on Linux, called MAP_NORESERVE which will prevent it from pre-allocating space for copy-on-write. So if you don't need to write any changes to the entire array that could work too. But we'd have to be able to catch the SIGSEGV that results if you do end up running out of swap space.

@embray
Copy link
Member

embray commented Dec 4, 2013

Managed to reproduce this directly--indeed, both of the workarounds I offered (mode='denywrite' and mode='update' work. Will still try to see what I can do about MAP_NORESERVE, and otherwise catching this error and providing a better error message.

@embray
Copy link
Member

embray commented Dec 4, 2013

Annoyance: numpy.memmap doesn't allow tweaking the flags that are passed to the mmap call. Though looking at it, it's not much more than a light subclass of ndarray that handles the work of creating an mmap with the right flags and then calling ndarray.__new__ with the mmap as its buffer. It also adds a flush method.

It should be easy enough to just eschew use of numpy.memmap at all and handle mmap ourselves. But that's still more than I want to do on this for now. So instead I'll resolve this issue by catching the error and suggesting one of the existing workarounds.

@saimn
Copy link
Contributor

saimn commented Aug 27, 2018

Note that with #7597 we no more use np.memmap, so it should easier to use other flags if that is useful.

@astrofrog
Copy link
Member

This can now be closed, as a workaround has been merged in #7926. MAP_NORESERVE is not available from Python, so that isn't a solution unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants