New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow reading/modifying arrays with Mmap #235
Comments
Quick fix: what if when |
This feature would be great (even if read-only). However, my usecase would be to read only a very small portion of a large array from a file on an NFS volume instead of the whole file. Would that even work like I think it would? Or do I remember correctly there were problems with mmap over NFS? Currently, to speed things up, I copy the whole file to a local directory before reading, because I expect that to be faster than accessing it directly (especially if there are multiple random accesses to the file), but that consideration might change if it was possible to read only a small portion over the network. |
I am not completely sure, but it seems that my issue is related to the improvement discussed here. Therefore I would like to ask: what is the status of this proposal? Is there a WIP solution? What parts of it are working and what parts of it are not? What else needs to be done? I wanted to fill a large dataset array part by part in a loop. For that, I have tried to create the dataset of the required size in advance and then update its parts afterwards. However, it did not work because all the changes I introduced were not saved to the disk. As an MWP of what I tried to do, consider
What is interesting, if I use HDF5 to update the array (replace Since JLD2 realizes a subset of HDF5 and HDF5 is able to update the array, can we borrow the approach used there? |
Hi @Gregstrq,
this bit will already load the whole array from the file into a As described above, one might be able to use |
To allow modification of arrays in existing files we should be able to make use of the
mmaparrays
keyword.Essentially one can modify
read_array
to check for themmaparrays
flag and, if set,use
Mmap.mmap!
to return a memorybacked array.I have tested that locally and it works (some work with alignment required but that seems solvable).
The main problem is one with
checksums
.JLD2 computes a checksum for every dataset and when you modify an array that obviously invalidates the checksum and it has to be recomputed.
When and how to recompute it is the tricky part. Suppose the following case
One of my initial ideas was to recompute the checksum inside
close(f)
. This works but only when there are no further updates of the array after the file was closed.What should happen when trying to edit the array after closing the array can be discussed. (error / nothing / array not accessible anymore) I just don't want the above to corrupt the file or segfault julia.
Other ideas include implementing a
finalizer
but I must admit that I don't fully understand the docs for thatand haven't been able to make it work successfully.
Again another approach could be to implement our own Array wrapper type that knows about the state of the file and takes care of
sync!
ing and recomputing the checksum.The text was updated successfully, but these errors were encountered: