Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Zstd compression #357

Open
PallHaraldsson opened this issue Nov 3, 2021 · 9 comments · May be fixed by #560
Open

Support Zstd compression #357

PallHaraldsson opened this issue Nov 3, 2021 · 9 comments · May be fixed by #560

Comments

@PallHaraldsson
Copy link

PallHaraldsson commented Nov 3, 2021

I noticed it done here: https://github.com/JuliaIO/HDF5Plugins.jl/blob/main/Project.toml

I noticed in this package Blosc is commented out, for some reason (also not in the package above/main HDF5), not sure why, I don't really care if it's supported if the best format is support that I think is Zstd (seems also important for compatibility with main format):

#UInt16(32001) => (:Blosc, :BloscCompressor, :BloscDecompressor, "BLOSC"),

I also commented on another closed PR (on pointers to null), regarding a typo (and slipped in a question). I hope people see it, not sure if they do since it's closed. Another question, do (Ptr) nulls take up space in the output stream?

@PallHaraldsson PallHaraldsson changed the title Support Zstsdd compression Support Zstsd compression Nov 3, 2021
@PallHaraldsson PallHaraldsson changed the title Support Zstsd compression Support Zstd compression Nov 3, 2021
@JonasIsensee
Copy link
Collaborator

JonasIsensee commented Nov 3, 2021

This should be relatively easy to add.
Here's the list of registered HDF5 filter plugins (for the id)
https://confluence.hdfgroup.org/display/support/Registered+Filter+Plugins

@mkitti
Copy link
Member

mkitti commented Dec 7, 2021

I should point out that the plugins in HDF5Plugins.jl are implemented in Julia, having been ported from C. They depend on the Codec* packages but mainly use them for C library wrapper.

For example, the Zstandard plugin implementation is found here:
https://github.com/JuliaIO/HDF5Plugins.jl/blob/main/src/H5Zzstd.jl
It uses the CodecZstd.LibZstd submodule:
https://github.com/JuliaIO/CodecZstd.jl/blob/master/src/libzstd/LibZstd.jl

Also note that we are merging the plugins directly into the HDF5.jl git tree:
JuliaIO/HDF5.jl#875
Currently they are being integrated as submodules, but we have plans to spin the individual modules out as subtree packages.

@mkitti
Copy link
Member

mkitti commented Dec 8, 2021

Looking at how the current compression filters are implemented, I'm not sure if they are compatible with the way the HDF5 plugins work. For example, for LZ4 compression the HDF5 plugin stores the original size of the file as well as the blocksize before storing the compressed data.

https://github.com/JuliaIO/HDF5Plugins.jl/blob/main/src/H5Zlz4.jl

@JonasIsensee
Copy link
Collaborator

JonasIsensee commented Dec 8, 2021

Hi @mkitti ,

it's great to see you taking an interest in this.

I'm not sure if they are compatible with the way the HDF5 plugins work

This is important. JLD2 compression tries to conform to the HDF5 standards. Thus JLD2 files using Bzip2 and Zlib compression can be decoded by HDF5.jl. This is sadly not the case for LZ4 but I was unable to figure out why, but you may be hinting in the right direction?

It'd be very neat, if one could transfer the knowledge gained from HDF5Plugins.jl to enabling the same compression algorithms in JLD2.

@milankl
Copy link

milankl commented May 13, 2024

@JonasIsensee bumping this up, I still get a

ArgumentError: Unsupported Compressor
Supported Compressors are 
	JLD2.ShuffleFilter
	CodecBzip2.Bzip2Compressor
	CodecLz4.LZ4FrameCompressor
	CodecZlib.ZlibCompressor

when using compress = ZstdCompressor(). How much work is it to add this? Happy to help

@JonasIsensee
Copy link
Collaborator

Shouldn't be much work at all.
Here's we the compressor needs to be added

const COMPRESSOR_TO_ID = Dict(
:ZlibCompressor => UInt16(1),
:ShuffleFilter => UInt16(2),
:Bzip2Compressor => UInt16(307),
#:BloscCompressor => UInt16(32001),
:LZ4FrameCompressor => UInt16(32004),
)
# For loading need filter_ids as keys
const ID_TO_DECOMPRESSOR = Dict(
UInt16(1) => (:CodecZlib, :ZlibCompressor, :ZlibDecompressor, ""),
UInt16(2) => (:JLD2, :ShuffleFilter, :ShuffleFilter, ""),
UInt16(307) => (:CodecBzip2, :Bzip2Compressor, :Bzip2Decompressor, "BZIP2"),
#UInt16(32001) => (:Blosc, :BloscCompressor, :BloscDecompressor, "BLOSC"),
UInt16(32004) => (:CodecLz4, :LZ4FrameCompressor, :LZ4FrameDecompressor, "LZ4"),
)

and the corresponding filter ID, you can look up at:

This should be relatively easy to add. Here's the list of registered HDF5 filter plugins (for the id) https://confluence.hdfgroup.org/display/support/Registered+Filter+Plugins

@milankl
Copy link

milankl commented May 13, 2024

Great, I'll create a PR! :ZstdCompressor is the preferred name?

@mkitti
Copy link
Member

mkitti commented May 13, 2024

@mkitti
Copy link
Member

mkitti commented May 13, 2024

The new registered filter plugins reference page is here:

https://github.com/HDFGroup/hdf5_plugins/blob/master/docs/RegisteredFilterPlugins.md

@milankl milankl linked a pull request May 13, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants