Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load JLD2 from IOBuffer #346

Open
cwiese opened this issue Sep 2, 2021 · 4 comments
Open

Load JLD2 from IOBuffer #346

cwiese opened this issue Sep 2, 2021 · 4 comments

Comments

@cwiese
Copy link

cwiese commented Sep 2, 2021

I am not finding a method to load a JLD from IOBuffer. This seems basic for saving and loading to cloud like S3 or Azure Blob storage (which is my case). I would like to avoid reading from a REST call and saving locally - just to load using a filename. Perhaps I am missing something?

@JonasIsensee
Copy link
Collaborator

JonasIsensee commented Sep 5, 2021

Hi @cwiese ,
it's not possible to load JLD2 files directly from IOBuffers. ( it has to be seekable)
JLD2 is based on the HDF5 format and therefore a hierarchical file format. This requires non-linear access during both reading and writing.
I typically load remote files either by copy (duh) or using a file system abstraction (sshfs).

I also remember this discussion #233

It should absolutely be possible to add this feature to JLD2
by implementing a more general buffered reader that reads linearly from file and holds an incremental copy in memory.
(Note, the "table of contents" is typically located at the end of a file, so this will be rather inefficient, if bandwidth / file size is a concern)

@bartvanerp
Copy link

Ran into the same issue today when loading from AWS S3. I think it would be a nice addition to this package. For other people who will encounter this same limitation, I wanted to share my snippet of code to help them save time. The code basically hacks around the issue by saving locally as described above:

function load(x::Vector{UInt8}, format::String)

    # create temp folder
    mkdir(".tmp")

    # write to file
    write(".tmp/file." * format, x)

    # load object
    y = load(".tmp/file." * format)

    # remove temp folder
    rm(".tmp"; recursive=true)

    # return object
    return y

end

object = load(s3("GET", "/your-bucket/file.jld2"), "jld2")

@stevengj
Copy link
Member

stevengj commented Jan 3, 2024

it's not possible to load JLD2 files directly from IOBuffers. ( it has to be seekable)

An IOBuffer is seekable! (It's just a wrapper around an array, so random access is no problem.)

For example:

julia> buf = IOBuffer("foo bar")
IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=7, maxsize=Inf, ptr=1, mark=-1)

julia> seek(buf, 4)
IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=7, maxsize=Inf, ptr=5, mark=-1)

julia> read(buf, Char)
'b': ASCII/Unicode U+0062 (category Ll: Letter, lowercase)

julia> seek(buf, 1)
IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=7, maxsize=Inf, ptr=2, mark=-1)

julia> read(buf, Char)
'o': ASCII/Unicode U+006F (category Ll: Letter, lowercase)

The main thing seems to be to implement the analogue of https://github.com/JuliaIO/JLD2.jl/blob/ad988caea3e2da874aaf0140ef37165e6189b64b/src/mmapio.jl for IOBuffer streams (which are effectively already "memory-mapped").

@JonasIsensee
Copy link
Collaborator

@stevengj You are right. This should definitely be possible to implement.

I wonder if it would make sense to implement a more general internal io buffer that can also wrap non-seekable IO objects (just read) and then buffer internally.
If made cleverly, this could allow reading from e.g. compression streams.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants