Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Zstd compression #560

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Add Zstd compression #560

wants to merge 6 commits into from

Conversation

milankl
Copy link

@milankl milankl commented May 13, 2024

fixes #357

@mkitti I'm not sure what the 4-element tuple in ID_TO_COMPRESSOR is supposed to be? Package name, compressor name, decompressor name, some short name in caps?

Copy link

codecov bot commented May 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.92%. Comparing base (ccd60eb) to head (aa60b84).
Report is 8 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #560      +/-   ##
==========================================
- Coverage   86.94%   86.92%   -0.02%     
==========================================
  Files          31       31              
  Lines        4311     4321      +10     
==========================================
+ Hits         3748     3756       +8     
- Misses        563      565       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mkitti
Copy link
Member

mkitti commented May 13, 2024

I'm not an expert on this package, but you seem to be correct.

modname, compressorname, decompressorname, = ID_TO_DECOMPRESSOR[filter_id]

@milankl
Copy link
Author

milankl commented May 13, 2024

Manual test:

julia> using JLD2, CodecZstd

julia> A = zeros(1000,1000);

julia> A[1] = rand()
0.44335952762563824

julia> sizeof(A)/1000^2   # 8 MB array
8.0

julia> save("test_without_compression.jld2", "A", A)

julia> save("test_with_compression.jld2", "A", A, compress=ZstdCompressor())

julia> A == load("test_with_compression.jld2", "A")
true

julia> A == load("test_without_compression.jld2", "A")
true

and I have two files on disk then

-rw-r--r--@   1 milan  staff   1.1K 13 May 12:59 test_with_compression.jld2
-rw-r--r--@   1 milan  staff   7.6M 13 May 12:58 test_without_compression.jld2

one uncompressed of about 8MB and one compressed to 1KB

@mkitti
Copy link
Member

mkitti commented May 13, 2024

You should test if you can load the file via HDF5.jl.

using HDF5
using H5Zzstd

h5open("test_with_compression.jld2") do h5f
   h5f["A"][]
end

@milankl
Copy link
Author

milankl commented May 13, 2024

Okay that's not working correctly

julia> h5open("test_with_compression.jld2") do h5f
          h5f["A"][]
       end
1000×1000 Matrix{Float64}:
   0.0           0.0             0.0  0.0  0.0  0.0  0.0  0.0  0.0
   0.0           0.0              0.0  0.0  0.0  0.0  0.0  0.0  0.0
   7.41892e-310  0.0              0.0  0.0  0.0  0.0  0.0  0.0  0.0
   1.95e-321     0.0              0.0  0.0  0.0  0.0  0.0  0.0  0.0
   7.41892e-310  0.0              0.0  0.0  0.0  0.0  0.0  0.0  0.0
   1.42e-321     0.0             0.0  0.0  0.0  0.0  0.0  0.0  0.0
 NaN             0.0              0.0  0.0  0.0  0.0  0.0  0.0  0.0
 NaN             0.0              0.0  0.0  0.0  0.0  0.0  0.0  0.0
   7.41892e-310  0.0              0.0  0.0  0.0  0.0  0.0  0.0  0.0
   3.6e-322      0.0              0.0  0.0  0.0  0.0  0.0  0.0  0.0
                                          
   0.0           4.0e-323        0.0  0.0  0.0  0.0  0.0  0.0  0.0
   0.0           0.0              0.0  0.0  0.0  0.0  0.0  0.0  0.0
   0.0           0.0              0.0  0.0  0.0  0.0  0.0  0.0  0.0
   0.0           0.0              0.0  0.0  0.0  0.0  0.0  0.0  0.0
   0.0           5.17085e170      0.0  0.0  0.0  0.0  0.0  0.0  0.0
   0.0           9.56977e-315    0.0  0.0  0.0  0.0  0.0  0.0  0.0
   0.0           0.0              0.0  0.0  0.0  0.0  0.0  0.0  0.0
   0.0           0.0              0.0  0.0  0.0  0.0  0.0  0.0  0.0
   0.0           1.11254e-308     0.0  0.0  0.0  0.0  0.0  0.0  0.0
   0.0           1.11255e-308     0.0  0.0  0.0  0.0  0.0  0.0  0.0

(it should be A[1] == 0.4436 otherwise zero), even though the file without zstd compression is loaded correctly. Why is that?

edit: The same works perfectly fine with compress=ZlibCompressor(), but I'm also confused the decompressed matrix is different every time it's decompressed...

@mkitti
Copy link
Member

mkitti commented May 13, 2024

So the problem is that we are using ZSTD_compressStream and ZSTD_decompressStream here. While the upstream reference uses ZSTD_compress and ZSTD_decompress.

https://github.com/HDFGroup/hdf5_plugins/blob/master/ZSTD/src/H5Zzstd.c

https://facebook.github.io/zstd/zstd_manual.html

@mkitti
Copy link
Member

mkitti commented May 13, 2024

OK, I'm going to start working on a ZstdFrameCompressor / ZstdFrameDecompressor . I'm not sure if that is the final name yet, but it's basically the NotStream compressor / decompressor.

I'm not completely sure how to make that work in the TranscodingStream API, but we'll see.

@milankl
Copy link
Author

milankl commented May 13, 2024

Awesome, let me know if I can help!

@milankl
Copy link
Author

milankl commented May 14, 2024

@mkitti I've written already the tests for CodecZstd here using ZstdFrameCompressor, however, they are commented until JuliaIO/CodecZstd.jl#46 is merged and released

@milankl
Copy link
Author

milankl commented May 30, 2024

I've set the compressor here from ZstdFrameCompressor to ZstdCompressor, but we need to release a new version of CodecZstd.jl to actually test this here properly: JuliaIO/CodecZstd.jl#52 (comment)

@mkitti
Copy link
Member

mkitti commented May 30, 2024

I think it should still be ZstdFrameCompressor. Sorry for the confusion. I'll will work on a release tonight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Zstd compression
2 participants