New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce artefacts in decoding by more artefact-aware dequantization #558
Comments
Such a deep modification of the libjpeg algorithms is a difficult proposition at best. Too much software has come to rely upon the literal bitwise output of the library, which has largely remained unchanged since the late 1990s. (libjpeg-turbo's mission statement is more about accelerating existing algorithms than improving those algorithms significantly.) The SIMD extensions for various CPUs produce the same bitwise output as the C code, so changing any of the algorithms would probably require changing about 8 or 9 different SIMD implementations as well, some of which are written in hand-tuned assembly. libjpeg-turbo is also an ISO reference implementation, so that limits what we can do. Before I could even begin to look into something like this, I would need funding for my labor, even if that labor just involves peer review of someone else's code. I would also need answers to a handful of burning questions:
|
We would derive new code from first principles.
Dequantization is affected, i.e., after entropy decoding and before inverse dct a cheap algorithm peaking to the 8x8 and 4 connected 8x8 blocks is added. This algorithm adjusts some a small amount of the low frequency components (in a way that they remain in their quantization buckets). For example if we read a value 0 and a quantization matrix value is 31, the value 1 might be something between -15 and 15. Instead of just computing a value 0 * 21 == 0 for a coefficient we can replace that value with a value of 7 (or any value in the range from -15 to 15) to reduce image discontinuities. This is a big improvement for smooth gradients which tend to become 8x8 blocky in low qualities.
I believe that the new algorithms can be implemented in a way that maintains a very high execution speed.
We could quantify the improvement by using butteraugli, dssim, ssimulacra and a small number of human viewers. The improvement is relatively subtle for complex images, but also surprisingly big for images with gradients such as sunset or a blue sky. I believe a 2-5 % improvement can be achieved for worst case use (gradients) in the lower range of images that is still used in the internet. |
I don't understand your last comment. Are you saying that 2-5% improvement is the minimum achievable or the maximum achievable? Also, 2-5% improvement in which metric? Qualitative statements like "surprisingly big" aren't helpful. I want to see some quantitative metrics before I will even consider the idea. If you want to discuss funding, contact me through direct e-mail. |
To make the "2-5% improvement" more quantative, I looked for a random image of " blue sky" on wikimedia commons https://commons.wikimedia.org/wiki/File:Blue_Sky.png and compared djpeg of libjpeg-turbo with knusperli with the following script:
Then I run that with the 640 x 480 version of the Blue_sky.png image from above with quality 30 and 36 to get the following results:
and
Hence with knusperli we can decompress to about the same PSNR for an image compressed at quality 30 as djpeg does for quality 36, saving about 6.24% of the bytes. Of course I agree one should rather take a perceptual metric for those comparisons, when doing it only for one image it is probably best to just look at the images. Quality 36: Quality 30: I find that even knusperli at q 30 looks better than libjpeg-turbo at q 36, even though PSNR is similar then. |
Q30 and Q36 are not very common cases in this day and age. To use @kornelski's words, comparing images at that low of a JPEG quality is sort of like driving a Formula 1 race car in a muddy field and thus concluding that tractors are faster than Formula 1 race cars. Cell phones and other digital cameras typically use JPEG quality levels well above Q90. Even Facebook will tend to use JPEG quality levels in the 70s or 80s for its heavily recompressed images. And you admitted above that a blue sky was one of the best cases for this new algorithm. So let's look at a wider range of photographic cases and compare DSSIM, which quantifies perceptual loss. I also included the full-resolution blue sky image for reference. In all cases, the images were compressed using libjpeg-turbo with the "slow" integer forward DCT and no subsampling, in order to eliminate as many artifacts as possible that are not related to the issue at hand. Prepare test directory:
Run tests:
Results:
(NOTE: It is unclear what the perceptual threshold is in terms of DSSIM, so I tend to use DSSIM only as a relative measurement. However, other research into perceptual loss with JPEG images has shown that, under most viewing conditions, Q90 is perceptually lossless. In the interest of being overly conservative, I make the hand-waving assumption below that < 0.0002 difference in DSSIM is below the perceptual threshold, but I freely admit that this may not be valid.) Bearing in mind that lower DSSIM numbers are better, Knusperli produced a worse DSSIM for the vast majority of the images than libjpeg-turbo did. The exceptions were:
It is possible that Knusperli is using a less accurate IDCT algorithm, in which case the comparison above is not strictly apples-to-apples. I compared some of the images visually at Q70. I had to zoom in to about 400% to even see the difference, and the difference mostly involved a change in the apparent brightness of the artifacts rather than a reduction in the overall number of them. For images in which Knusperli performed worse, the artifacts were slightly brighter. For images in which Knusperli performed better, the artifacts were slightly dimmer. Let's look at the quality range you focused on: Run tests:
Results:
I quantitatively reproduced your results for the blue sky image, and similar results were achieved with flower_foveon, hdr, nightshot_iso_100, and spider_web. For the rest of the images, however, Knusperli produced the equivalent at Q30 of libjpeg-turbo at about Q31-Q33. That is not terribly compelling. I would consider adopting this algorithm as a non-default option in libjpeg-turbo if:
Otherwise, it is not a very good fit for libjpeg-turbo, at least not at the moment. |
This is effective guidance for the effort and I wholeheartedly agree with it. Would you be ok to measure success using butteraugli and ssimulacra instead of dssim? |
How about limiting smoothing to the DC coefficients only (without touching any ACs at all)? Posterization of DC is very visible at the terribly-low quality levels. It could help especially chroma channels which can tolerate heavier compression/smoother image. OTOH smoothing of only DC shouldn't harm high-quality images, since it won't interfere with higher-frequency coefficients and won't smooth "energy" out of an image. I think the case for DC is simpler. It's likely computationally much faster (could probably be reduced to a "smart blur" of the DC coefficients). |
@kornelski I'd have to see hard numbers regarding both methods. It seems like it should be possible for the decompressor to detect the amount of quantization and automatically enable the appropriate smoothing algorithm. For better or worse, the bitwise output of libjpeg v6b has become something of a pseudo-standard upon which many projects rely, so nothing we're proposing here can be enabled by default. Thus, I feel strongly that if we want the feature to be relevant enough for people to enable it, it must do no harm for high-quality images and show significant improvement for low-quality images.
@jyrkialakuijala Yes, that would be OK.
This was not a valid assumption, and I should have known that from the math. Updated test script that uses DSSIM to compare the two output images:
|
JPEG1 allows for quantization/dequantization to happen within their quantization ranges. Knusperli utilizes this.
Center-bucket dequantization for a quality 50 jpeg:
Block artefact-aware dequantization (knusperli) of the same input:
The artefact-aware dequantization is compliant with JPEG1 standard -- adjustments are done in DCT space and are within their respective quantization buckets.
A prototype exists in https://github.com/google/knusperli
The text was updated successfully, but these errors were encountered: