Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify issues with opening lots of FITS files in FAQ #3827

Merged
merged 1 commit into from Jul 21, 2015

Conversation

embray
Copy link
Member

@embray embray commented Jul 14, 2015

Got a question via e-mail where a user was opening thousands of FITS files in a loop and running out of file handles. As I pointed out to them, this is because of the default use of mmap to access the data arrays, and that HDUList.close() does not normally close mmaps, so that the data is still transparently accessible. The mmap can be closed by running:

del hdulist[extnum].data

as mentioned here in the docs: http://docs.astropy.org/en/stable/io/fits/index.html#working-with-large-files However, this is completely non-obvious, and the warning in the docs is easy to miss (as it's in the context of large files, though it also applies to a large number of files).

This question has come up a few times, so it deserves an FAQ entry.

It might also be worth adding an optional argument to HDUList.close() to close all mmaps as well, since in many cases this is fine, as the data won't need to be accessed again (such as in the case of reading from files in a loop).

@astrofrog
Copy link
Member

In Matplotlib there is now a warning if there are more than 20 figures open. We could consider a warning if there are more than say 1000 FITS files open?

@embray
Copy link
Member Author

embray commented Jun 8, 2015

I don't see why--if they're opening too many FITS files then it will fail anyways. What I could do is catch the relevant I/O error and provide a useful message pointing to the relevant docs. And an extra option in close() would also help for this case.

Where I might consider a warning is for some large number of open mmaps, specifically, since those are more subtle to the user.

I also would like, if I had the time, to create my own version of Numpy's memmap array type, for a couple reasons:

  1. To handle things like this issue: Possible fits memmap bug: memmap just doesn't work. #1380
  2. I'd like a sort of "lazy" mmap array that can close mmaps if running low on file handles (there would be basically an LRU cache of mmaps), but automatically reopen them on an as-needed basis (i.e. when the array is read or written to). Obviously this is tricker for read/write since it would have to flush changes before closing a mmap, but not really a big deal generally.

@astrofrog
Copy link
Member

Right, in the case of matplotlib the warning is emitted because opening too many will actually make the computer run out of memory and hang, which is bad, but if here you just get an error, then this is different.

@embray
Copy link
Member Author

embray commented Jun 8, 2015

Yeah, it shouldn't lead to any sort of severe performance issues.

@Mondrik
Copy link

Mondrik commented Jun 8, 2015

I would like to point out another interaction that can masquerade as this issue.

if one is opening fits files in a for loop and plotting the data, say

for myfile in mylist:
    d = open(myfile)
    a = d[1].data.field('myfield1')
    b = d[1].data.field('myfield1')
    plt.plot(a,b)
    del a
    del b
    del d[1].data
    d.close()

I still receive an error saying too many files are open. This can be confusing if the user has this issue, adds the del d[1].data line, and continues to receive the exact same error even after applying the fix (as the problem arises when one is attempting to open the next file, and the error is thus associated with opening the fits files).

I think this particular version of this issue has to do with how matplotlib stores data before a show command is called. A simple workaround is to just append all the data to be plotted to a list, then plot the list after all files have been read.

@embray
Copy link
Member Author

embray commented Jun 8, 2015

A simpler workaround to that, when possible, is to use memmap=False when opening the files :) But I get the point. All the more reason to add an option to .close() as well. Perhaps close_data or close_memmaps or something.

embray added a commit that referenced this pull request Jul 21, 2015
Clarify issues with opening lots of FITS files in FAQ
@embray embray merged commit 658cf5e into astropy:master Jul 21, 2015
@embray embray deleted the fits/issue-3827 branch July 21, 2015 16:42
embray added a commit that referenced this pull request Aug 5, 2015
Clarify issues with opening lots of FITS files in FAQ
embray added a commit that referenced this pull request Aug 7, 2015
Clarify issues with opening lots of FITS files in FAQ
embray added a commit that referenced this pull request Aug 11, 2015
Clarify issues with opening lots of FITS files in FAQ
embray added a commit that referenced this pull request Aug 11, 2015
Clarify issues with opening lots of FITS files in FAQ
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants