Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support zstd compressed CZI #600

Merged
merged 12 commits into from
May 22, 2024
Merged

support zstd compressed CZI #600

merged 12 commits into from
May 22, 2024

Conversation

iewchen
Copy link
Contributor

@iewchen iewchen commented May 15, 2024

CZI has two zstd compression modes: zstd0 and zstd1.

In zstd0 mode, pixel data is compressed with zstd and stored in subblock
as it is.

zstd1 mode is different in that it prefix the zstd compressed data with
a header. This header is either 1 byte or 3 bytes long. The first byte
of the header is its length. CZI may use a trick called high low byte
unpack, which packs less significant byte of 16bits pixels in the first
half of the image array, and more significant byte in the second half of
the image array, before been compressed by zstd. This trick is used if:

  • the header length is 3, and

  • the second byte in the header is 1, and

  • the lowest bit of the third byte is 1

Obviously this trick only applies to 16bits grayscale images and 48bits
color images.

@openslide-bot
Copy link
Member

openslide-bot commented May 15, 2024

DCO signed off ✔️

All commits have been signed off. You have certified to the terms of the Developer Certificate of Origin, version 1.1. In particular, you certify that this contribution has not been developed using information obtained under a non-disclosure agreement or other license terms that forbid you from contributing it under the GNU Lesser General Public License, version 2.1.

iewchen and others added 12 commits May 21, 2024 01:13
CZI has two zstd compression modes: zstd0 and zstd1.

In zstd0 mode, pixel data is compressed with zstd and stored in subblock
as it is.

zstd1 mode is different in that it prefix the zstd compressed data with
a header. This header is either 1 byte or 3 bytes long. The first byte
of the header is its length. CZI may use a trick called high low byte
unpack, which packs less significant byte of 16bits pixels in the first
half of the image array, and more significant byte in the second half of
the image array, before been compressed by zstd. This trick is used if:

- the header length is 3, and

- the second byte in the header is 1, and

- the lowest bit of the third byte is 1

Obviously this trick only applies to 16bits grayscale images and 48bits
color images.

Signed-off-by: Wei Chen <chenw1@uthscsa.edu>
While almost all CZI slide files contains SizeS in xml metadata and 'S'
dimension in dimension entry, an exception is the embedded SlidePreview
CZI file, which missing Scene dimension in both xml metadata and
dimension entry. The embedded SlidePreview is valid CZI as Zeiss ZEN
software can view it, when it is extracted and save as individual file.

The SlidePreview CZI is the only 48bits color image available, which
makes them good candidates for testing zstd1 mode hi low bytes pack.

Signed-off-by: Wei Chen <chenw1@uthscsa.edu>
Signed-off-by: Wei Chen <chenw1@uthscsa.edu>
Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
Decode raw big-endian ARGB pixels.

Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
Rename the function for consistency with _openslide_inflate_buffer().
Don't bother checking the size of the first compressed frame, since there
might be more than one, and libzstd should fail if there isn't enough
output space.  Do check that the decompressed data matches the expected
length.  Use int64_t arguments rather than ones with arch-dependent
widths.  Use g_try_malloc().

Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
and add a redundant packed attribute.

Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
Use a single function to process both uncompressed and zstd images, rather
than duplicating code.  Clean up zstd1 header parsing and add some
additional error checks.

Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
@bgilbert
Copy link
Member

I've done some refactoring and added tests and some error checking. While I think the refactored version is a net improvement, czi_read_raw() is somewhat unwieldy, and may need to be revisited when we get to JXR support.

This series seems clean enough to merge without squashing, so I've rebased to pick up some CI changes in main.

Please take a look! I think this is ready to land.

@iewchen
Copy link
Contributor Author

iewchen commented May 21, 2024

Thank you for the review!

I tested the latest commit. It works.

@bgilbert bgilbert merged commit 637b213 into openslide:main May 22, 2024
17 checks passed
@bgilbert
Copy link
Member

Great, thank you for the PR!

The next two PRs will probably take longer to land. I assume the JXR one will be pretty straightforward to review, but it may not be able to land right away because of the libjxr situation. The SIMD one will likely require substantial effort to review and test. If it makes sense to submit both in parallel, feel free, and otherwise I'd suggest submitting the JXR one next. I may not be able to review the SIMD one for a couple months or more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants