Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support content-type specific compression/decompression, like jpeg xl? #8092

Open
ThomasWaldmann opened this issue Feb 12, 2024 · 5 comments
Labels

Comments

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Feb 12, 2024

Let's discuss here, whether / how borg could support this, assuming there is a jpeg xl library (with python / cython binding), that supports a bit-identical compression (transformation to jpeg xl format) and decompression (transformation back to the original file).

Notable:

  • borg usually works on CHUNKS (pieces of files, as output by the borg buzhash or fixed chunker): file data -> chunk -> compress -> encrypt/auth -> store
  • borg usually compresses all chunks in the same way, using the same compression algorithm, e.g. zstd or lz4
  • in the past we already did one try to implement file-type specific compression, but we abandoned that because of the configuration hassle and went with an "auto" compressor that is simpler to use and does not need configuration per file-type.
@knutov
Copy link

knutov commented Feb 12, 2024

It's a good possibility to improve compression in some cases, but:

  • it requires a lot of cpu, so it should be definitely optional and disabled by default
  • there are seems to be a lot of task with higher priority, like stable v2 release

@alexandervlpl
Copy link

alexandervlpl commented Feb 12, 2024

I didn't realize it was file data -> chunk -> compress, that rules out any simple implementation. JXL is not a compression algorithm like lz4 that takes any bytes you throw at it. If you're starting with a JPEG it needs a complete file with a header and all the pixels.

I could probably write a separate "chunker" not just for JPEG, but all image formats supported by Pillow. Split the image into raw tiles (chunks) of the size you need and then compress each chunk as a separate, lossless JXL image. There's a Pillow JXL plugin with lossless support. Additionally to achieve a bit-identical reversal of the entire process, the original image header (EXIF metadata, etc) will need to be stored in a separate chunk and reconstituted.

  1. That's a lot of extra complexity.
  2. @knutov in addition to CPU usage this will be very memory intensive for large images.

Seems like it's not worth it?

@FabioPedretti
Copy link
Contributor

This discussion remembers me some of the arguments detailed here:
https://www.nongnu.org/lzip/xz_inadequate.html
(I read it years ago, don't remembers the details, but the point was to try to have simple formats for archiving data minimizing issues.)

@ThomasWaldmann
Copy link
Member Author

If we don't come up with a good/easy solution, an alternate way to use jpeg xl is of course that the users convert their photos to that format at the primary storage location.

If there is an easy transformation back to the original format, that seems the better idea anyway because then it also uses less storage at the primary location. Only issue could be that the tools preferred by the users do not (yet) read/display that format.

@alexandervlpl
Copy link

an alternate way to use jpeg xl is of course that the users convert their photos to that format at the primary storage location.

That's what I do, I use the official CLI tools to encode/decode as needed before/after running borg.

Only issue could be that the tools preferred by the users do not (yet) read/display that format.

This is the real problem. Adoption has stalled, currently to browse thumbnails and open the images you pretty much need to be on Linux and you need to compile something like gThumb yourself. 0.000001% of users will do this and it looks like that won't change.

So I was hoping JXL can at least have a future as an archive format used internally by tools like borg. In my case it already saves me 50+GB of space and bandwidth, would be very useful to make that available to everyone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants