Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JLD2 writes zeros to first 3440 bytes in file #533

Open
tiagopereira opened this issue Jan 16, 2024 · 4 comments
Open

JLD2 writes zeros to first 3440 bytes in file #533

tiagopereira opened this issue Jan 16, 2024 · 4 comments

Comments

@tiagopereira
Copy link

When writing arrays or other structures to a JLD2 file, it writes zeros to the first 3440 bytes in the file. Here is a sample code:

using JLD2
tmp = ones(Float64, 100, 100, 100)
jldsave("test.jld2"; tmp)
aa = load("test.jld2")["tmp"]

Then the read array aa will be zeros until element 431 (first 3440 bytes). If I instead write a Float32 array, then it is zero until element 861 (still first 3440 bytes).

This error persists if I write different structures with arrays. The array parts of the structure have also the first elements filled with zeros. There is no difference whether using the FileIO interface or jldsave.

Using Julia 1.10.0, JLD2 0.4.43 on Linux. The error only happens in Linux. Tested with a M1 Mac with Julia ARMv8 for and it worked fine.

@JonasIsensee
Copy link
Collaborator

Hi @tiagopereira ,
that seems rather frustrating.
I cannot reproduce this issue locally on my linux.

Can you please test if this is
a) fixed by passing jldsave("test.jld2", IOStream; tmp).

b) just a loading problem, so load("test.jld2"; iotype=IOStream)

c) do

jldopen("test.jld2", "w") do f
    f["prep"] = ones(100)
    f["tmp"] = ones(100,100,100)
end

to see if the length of the first array controls how may floats are still zero in the second array.

@tiagopereira
Copy link
Author

Thank you for the quick answer. Doing jldsave("test.jld2", IOStream; tmp) makes the problem go away, so maybe it is the MmapIO interface?

I am sure it is not a loading problem. I checked the file with h5dump and it was written with zeros.

I did the last check you suggested, and the first array written (prep) was correctly written, but the second array (tmp) had zeros in the first elements. In this case, it was not 3440 bytes, but the first non-zero element was index 322. Strangely, it was written with 3.0517578125e-5, and all the subsequent elements with ones.

@tiagopereira
Copy link
Author

I should add that the filesystem I am writing into is a networked file system (StorNext SAN via Infiniband). Are there known issues with MmapIO and writing to networked file systems?

@JonasIsensee
Copy link
Collaborator

JonasIsensee commented Jan 16, 2024

I should add that the filesystem I am writing into is a networked file system (StorNext SAN via Infiniband). Are there known issues with MmapIO and writing to networked file systems?

There have been issues with Mmap and network files systems in the past.
This is certainly linked.
What is likely happening, ist the first Mmap page is not properly flushed to disk. ( up to byte 4096 in the file).

You can avoid the issue by using IOStream. On network file systems this should not be a big performance hit.

If you want to dig a little deeper, you can test the size dependence:

Have a look at
https://github.com/JuliaIO/JLD2.jl/blob/master/src%2Fdataio.jl#L102

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants