Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blosc filters have no effect #273

Open
watsaig opened this issue Mar 8, 2024 · 2 comments
Open

Blosc filters have no effect #273

watsaig opened this issue Mar 8, 2024 · 2 comments

Comments

@watsaig
Copy link
Contributor

watsaig commented Mar 8, 2024

Creating a dataset with any of the blosc filters compiles and runs with no errors, but does not compress the data at all.
If I use lzf or szip instead, the dataset is compressed as expected.

Just to be clear, the filter does appear to be applied (looking at the output of h5dump), but there is no compression.

Are there any external dependencies needed for blosc to work?

Here is a minimal example:

use hdf5::filters;
use ndarray::Array2;
use std::env::temp_dir;
fn main() -> anyhow::Result<()> {
    println!("Blosc available? {:}", filters::blosc_available());
    println!("LZF available? {:}", filters::lzf_available());
    println!("SZIP available? {:}", filters::szip_available());

    let path_uncomp = temp_dir().join("uncompressed.h5");
    let path_comp = temp_dir().join("compressed.h5");
    let file_uncomp = hdf5::File::create(&path_uncomp)?;
    let file_comp = hdf5::File::create(&path_comp)?;

    let data = Array2::<f32>::ones((1000, 1000));
    file_uncomp
        .new_dataset_builder()
        .with_data(data.view())
        .create("data")?;

    file_comp
        .new_dataset_builder()
        .blosc_lz4(9, true)
        //.blosc_zstd(9, true)
        //.blosc_snappy(9, true)
        //.lzf()
        //.szip(filters::SZip::NearestNeighbor, 16)
        .with_data(data.view())
        .create("data")?;

    println!(
        "Uncompressed file size: {:} kB",
        path_uncomp.metadata()?.len() / 1024
    );
    println!(
        "Compressed file size: {:} kB",
        path_comp.metadata()?.len() / 1024
    );
    Ok(())
}

Cargo.toml:

[dependencies]
anyhow = "1.0.80"
hdf5 = { git = "https://github.com/aldanor/hdf5-rust.git", features = [
    "blosc",
    "lzf",
] }
ndarray = { version = "0.15.6" }

The output is:

Blosc available? true
LZF available? true
SZIP available? true
Uncompressed file size: 3908 kB
Compressed file size: 3910 kB

Using szip, the compressed file size is 12 kB.

@mulimoen
Copy link
Collaborator

This would happen if the compressor is not available for blosc. If one specifies --features blosc-src/lz4,blosc-src/zlib one gets down to 19kB with the blosc-lz4 filter and 8kB with blosc-zlib.

It is unfortunate that we don't error on trying to apply the filter when it is not available, but instead skip it. Setting

H5Pset_filter(plist_id, filter_id, H5Z_FLAG_OPTIONAL, cd_nelmts, cd_values)
to the mandatory flag would provide such a message

@watsaig
Copy link
Contributor Author

watsaig commented Mar 11, 2024

I see, thank you. I added blosc-src = { version = "0.3.0", features = ["lz4", "zlib", "zstd"] } to Cargo.toml to make it work. May I suggest adding this to the documentation of the blosc_ functions?

Agreed that an error would be great in this case, or maybe even a more in-depth function like blosc_available that would return which of the blosc filters are available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants