Blosc Compression-Performance Tips #231

dwerner95 · 2023-02-07T14:16:15Z

Hey All,

I've been playing around with the blosc compression implemented in 0.8.0 and i have some questions regarding the performance.

So basically, for my 400 MB csv-file that i convert to a HDF5 file i see compression of ~84%, which is amazing, however the performance doesn't seem to be affected at all.
Looking at this graph of the official HDF5 website:

I should see an enormous boost in the throughput.
Looking a bit at my CPU usage, the program is only using a single thread, despite me setting the number of blosc threads with blosc_set_nthreads
Additionally, blosc_get_nthreads only returns one single thread, which makes me think if there is an additional flag that needs to be set?

Overall, i wished there was some kind of performance-guide on this topic, is that something that would be possible to include in the documentation?

Best wishes,
Dominik

The text was updated successfully, but these errors were encountered:

mulimoen · 2023-02-08T08:02:37Z

Performance tuning is difficult and what works for one dataset might not work for another. Is your data similar to the one used for the graph?

I'll have a look at the blosc bug when I am back at a computer.

dwerner95 · 2023-02-08T09:04:02Z

All my datasets are identical. Each HDF5 file consists of at least 3 Datasets, one 1D ndarray and two 2D ndarrays with additional 1D ndarrays. All of them have the same size in the first dimension. Not entirely sure what data they plot in the image, but i would imagine that arrays are the easiest to compress (?).

I found an issue in the h5py github about this. It seems like that even if the number of threads is set, the program chooses to use serial compression if the chunk-size is insufficiently large. However, even if i set the chunk size to the size of the array i still don't see any improvement.

mulimoen · 2024-05-29T18:04:57Z

I can trace it back to https://github.com/Blosc/c-blosc/blob/d306135aaf378ade04cd4d149058c29036335758/blosc/blosc.c#L913. One can force a block size by calling e.g. blosc_sys::blosc_set_blocksize(256) which enables parallel compression. (no idea is such a small block size makes sense, it should likely be much, much larger)

mulimoen mentioned this issue May 29, 2024

Compression is likely not working. #288

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blosc Compression-Performance Tips #231

Blosc Compression-Performance Tips #231

dwerner95 commented Feb 7, 2023

mulimoen commented Feb 8, 2023

dwerner95 commented Feb 8, 2023

mulimoen commented May 29, 2024 •

edited

Blosc Compression-Performance Tips #231

Blosc Compression-Performance Tips #231

Comments

dwerner95 commented Feb 7, 2023

mulimoen commented Feb 8, 2023

dwerner95 commented Feb 8, 2023

mulimoen commented May 29, 2024 • edited

mulimoen commented May 29, 2024 •

edited