Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid writing out a temporary memmap if the input data can already be recognized as a memmap #393

Open
astrofrog opened this issue Sep 14, 2023 · 1 comment

Comments

@astrofrog
Copy link
Member

Currently when in parallel mode, if a user specifies a filename for the input file we load the data with astropy.io.fits, which even if memory mapped is then written out to a new memory-mapped file for the purposes of the parallel computation (to avoid copying the array in memory to all the processes).

We should find a way, whenever possible, to avoid writing out a new memmap if the original data is backed by a file on disk.

I'm not sure if there is a way to do this if HDU objects are passed in as hdu.data is a regular Numpy array and the details of the memmap are hidden in the buffer. However if a filename is passed, we should be able to set up a memmap ourselves using the BITPIX and NAXIS? in the header for the HDU. If parallel mode is specified, we could then warn if an HDU or HDUList is passed that this is not optimal and that a filename should be passed instead.

We should also make sure we support passing in np.memmap objects and properly handle these (again avoiding any re-writing out of the arrays).

@astrofrog
Copy link
Member Author

Given an HDU, we can actually do:

np.memmap(hdu.fileinfo()['file'].name, mode='r', dtype=hdu.data.dtype, shape=hdu.data.shape, offset=hdu.fileinfo()['datLoc'])

to extract a Numpy memmap, so perhaps that's the way to go for FITS input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant