Skip to content
This repository has been archived by the owner on Feb 14, 2023. It is now read-only.

Compress without decompressing the whole JPEG #84

Open
StephanBusch opened this issue Mar 16, 2017 · 1 comment
Open

Compress without decompressing the whole JPEG #84

StephanBusch opened this issue Mar 16, 2017 · 1 comment

Comments

@StephanBusch
Copy link

There must be some workaround for the high memory requirements Lepton has at present.
I guess, solutions like PAQ and WinZip may be able to compress without having to decompress
the whole JPEG in memory before. The memory requirement there is pretty much the same no matter how big the input file is.

In PAQ7 source, Matt wrote:
"Files are further compressed by partially uncompressing back to the DCT coefficients to provide context for the next Huffman code."

Does anyone has an idea how this partially uncompressing could be made real in Lepton?

@danielrh
Copy link
Contributor

Hi Stephan... actually it may work with some tiny tweaks
right now you can already pass a command line flag -startbyte=X -trunc=Y
it should only allocate memory needed to store the data between those offsets... (unless there's a bug)
So if you were willing to split up your jpeg into N pieces with a lepton file for each chunk, it may work as is...
of course ideally we could simply teach lepton about decoding those pieces without re-invoking it on each chunk.... but you could theoretically tar it up at that point.

The missing gaps here are
a) ideally it would spit out the whole jpeg at once right now the algorithm does O(N^2) I/O for a file of N chunks (since you need to feed the data into N compression operations, one per chunk)--though it simply ignores the data outside of the range, it does need that data to track the bit offsets and pixel locations within the jpeg--of course JPEGs with restart markers would make it possible to do this rather qucikly

b) reassembling the file from the multiple pieces...
we'd need some sort of meta-archive format to contain each piece of the file so it could be reassembled into a whole.

Dropbox doesn't need this technology because right now we always store data in chunks of no more than 4 megabytes. This means that no individual JPEG piece ever exceeds 4 MiB.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants