New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster MCU_Decode #657
base: main
Are you sure you want to change the base?
Faster MCU_Decode #657
Conversation
Definitely not 40%. Using the benchmark procedures and test images described here, I observe: 2.8 GHz Intel Xeon W3530
3.6 GHz Intel Xeon W2123
Other issues:
Please bear in mind that libjpeg-turbo is used indirectly by billions of people every day (through Firefox, Chrome, Linux, Android, etc.) and is an ISO/ITU-T reference implementation, so extreme care must be taken when modifying it. The Huffman decoder is an attack surface for security exploits, so even more care must be taken when modifying that particular module. The issues above do not instill me with confidence that such care was taken. |
I did observe a 40% speedup for large image decompression in my image library. A key idea behind is to allocate more time to build a more sophisticated Huffman decompression table while making Huffman decompression more efficient. Indeed the performance gain depends on the image size. In the case of tiny images, the advantage of faster Huffman decompression is offset by the slower building of Huffman decompression table. |
The images I tested were 1 to 10 megapixels in size, so not "tiny" by any means. How are you measuring performance? |
Sorry for my delayed responses. See my answers below. Decompression does not seem to complete with 32-bit code (infinite loop?)
Your code should be based on the main branch. As it is, I had to disable C_LOSSLESS_SUPPORTED and D_LOSSLESS_SUPPORTED in order to build it.
Your code should not introduce trailing whitespace.
Your code should be formatted and spaced consistently with the rest of the libjpeg-turbo code base.
Code comments should not be personal. That is, do not refer to yourself, and when describing modifications, mention the baseline against which the modifications were made.
Commented debugging code should be removed.
Existing code comments should not be removed unless there is a good reason to do so.
|
Performance comparison was actually measured by a coworker who recently quit. I have been scrambling to pick up his workload, including replicating his testing environment. One key checkpoint is that MCU_Decode() accounts for 73% CPU time in the existing version. I also know the pool includes some huge-size images of ~400MB. Also, the pool contains NO progressive images. In addition, the performance is system dependent on the parameter pair |
Given that the libjpeg-turbo General Fund is exhausted for the next 13 months, it is unlikely that I will be able to look at this any time soon. I have no reason to believe that your fixes magically fixed the performance. I tested with fairly large (up to 7.4-megapixel) images. You have never indicated the type of CPU on which you ran your benchmarks, but 73% CPU time spent on Huffman decoding is definitely not normal. |
AMD EPYC 7742 64-core, base clock 2.25GHz |
In libjpeg-turbo, the decompression is predominated by MCU and Huffman decoding,
which accounts for about 73% of decompression time in our observation of decompressing a large pool of images.
We have re-designed MCU and Huffman decoding to achieve substantially higher speed.
With the new algorithmic design, we observed an overall 40% speedup in jpeg decompression.
Our new design mainly includes four aspects.
in at most twice lookups. It also contains an effective preprocessing to dynamically skip
the second lookup.
Their structural difference lies in lookup table sizes, which allows for better cashing.
to remove a critical branching inside MCU decoding loop.
table decoding. It allows the deprecation of the cumbersome function jpeg_huff_decode().
We maintained the original functional inputs/outputs so that the new files simply replace the counterpart
existing files in libjpeg-turbo and are ready to run.
Functional modifications