Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NTNDArray Blosc compression byte order #6

Open
mrkraimer opened this issue Dec 4, 2018 · 7 comments
Open

NTNDArray Blosc compression byte order #6

mrkraimer opened this issue Dec 4, 2018 · 7 comments

Comments

@mrkraimer
Copy link
Contributor

The NDPluginCodec supports scalar arrays of all numeric types: int8,uint8,...,int64,uint64.
In all cases the compresed array will have type byte (which is the same as int8)
For all except int8 and uint8, if client and server have different byte order, then byte order must be switched by either client or server.

Lets assume the server compresses and client decompresses.
Then it should be the client that switches byte order after decompression.
In order to do this the NTNDArray.codec.attribute structure most have fields like:

    bool serverByteOrderBigEndian
    bool clientByteOrderBigEndian

If these differ than the client must switch byte order

In the C code for blosc there are methods:

int blosc_compress(int clevel, int doshuffle, size_t typesize,
                 size_t nbytes, const void *src, void *dest,
                 size_t destsize);

and

BLOSC_EXPORT int blosc_decompress(const void *src, void *dest, size_t destsize);

The doshuffle argument can be one of

 #define BLOSC_NOSHUFFLE   0  /* no shuffle */
 #define BLOSC_SHUFFLE     1  /* byte-wise shuffle */
 #define BLOSC_BITSHUFFLE  2  /* bit-wise shuffle */

I think that BLOSC_SHUFFLE just means switch byte order.

Only compress has an argument to switch byte order.

But we want client to switch the byte order.

There is also a method:

void
shuffle(const size_t bytesoftype, const size_t blocksize,
      const uint8_t* _src, const uint8_t* _dest);

The client can call this if byte order needs to be changed.

BUT The Java blosc code does not provide this method.
Thus fir

@MarkRivers
Copy link
Member

I think that BLOSC_SHUFFLE just means switch byte order.

No, that is not correct. BLOSC_SHUFFLE is an additional operation to improve compression. It is exposed in the NDPluginCodec, and by selecting BLOSC_SHUFFLE or BLOSC_BITSHUFFLE the compression can be greatly improved.

NDPluginCodec passes the shuffle argument on compression:

    int compSize = blosc_compress_ctx(clevel, shuffle, info.bytesPerElement,
            info.totalBytes, input->pData, output->pData, output->dataSize,
            compname, blockSize, numThreads);

But it is transparent on decompression, because the shuffle is encoded in the byte stream:

    int ret = blosc_decompress_ctx(input->pData, output->pData,
            output->dataSize, numThreads);

I suspect the Blosc compressor assumes the input is native byte order of the host, compresses into a well-defined stream of bytes which is identical whether the host is big-endian or little-endian. The decompressor knows the datatype of what it is decompressing and converts to the native endianness of the machine doing the decompressing. One reason I think this is that the Blosc compressor is widely used for compressing in files like HDF5 which are commonly written and read on machines with different endianness. They need to make it transparent.

@mrkraimer
Copy link
Contributor Author

OK thanks for this information.
I will close this issue.

@MarkRivers
Copy link
Member

I'm not saying I am certain of this, so it is definitely worth testing.

I don't have a big-endian machine to test with, because mine all run vxWorks which does not support Blosc compression.

@MarkRivers MarkRivers reopened this Dec 4, 2018
@brunoseivam
Copy link
Member

We could encode byte order in the codec.params NTNDArray field. Currently NDPluginCodec puts a PVInt32 there to indicate the original datatype, but there's nothing preventing us from setting a more complex structure to params

@MarkRivers
Copy link
Member

Does anyone have a big-endian machine we can test with? I am not sure there is really a problem.

Here are 2 notes from the Blosc release notes: https://github.com/Blosc/c-blosc/blob/master/RELEASE_NOTES.rst

Changes from 1.11.2 to 1.11.3
Fixed #181: bitshuffle filter for big endian machines.

Changes from 0.9.3 to 0.9.4
Support for cross-platform big/little endian compatibility in Blosc headers has been added.

@mrkraimer
Copy link
Contributor Author

But what does the following mean?

Support for cross-platform big/little endian compatibility in Blosc headers has been added.

I think that it only means that it handles byte order in it's private fields.
I looked briefly at source code and this looks like it is what it is doing.
Sounds like a test is required.
Sorry I do not have access to two systems with different byte orders.

@mrkraimer
Copy link
Contributor Author

Note that if we have to switch byte order then java.nio.ByteBuffer has methods:

public final ByteOrder order()
Retrieves this buffer's byte order.
The byte order is used when reading or writing multibyte values, 
and when creating buffers that are views of this byte buffer.
The order of a newly-created byte buffer is always BIG_ENDIAN.

and

public final ByteBuffer order(ByteOrder bo)
Modifies this buffer's byte order.
Parameters:
bo - The new byte order, either BIG_ENDIAN or LITTLE_ENDIAN
Returns:
This buffer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants