Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expected speed? #179

Open
vitacon opened this issue Mar 9, 2023 · 6 comments
Open

Expected speed? #179

vitacon opened this issue Mar 9, 2023 · 6 comments

Comments

@vitacon
Copy link

vitacon commented Mar 9, 2023

Today I used LodePNG in my little tool for blending images. It works as expected but I am quite surprised by its relative slowness compared to C# libraries.

I have a few similar tools written in C# and I though converting them to C/C++ could make them faster (I run them on thousands of files) but the first test with optimized .exes is rather bad. The first tool is very simple and quick (the others are more complicated) so the time of calculation has improved only from 31 ms to 23 ms. However, the time needed for PNG manipulation has increased much more - from 100 ms to 511 ms.

The differences are caused by these three lines:

error1 = lodepng_decode32_file(&bitmap1, &xsize1, &ysize1, filename1);
(...)
error2 = lodepng_decode32_file(&bitmap2, &xsize2, &ysize2, filename2);
(...)
lodepng_encode32_file(fileout, bitmap1, xsize1, ysize1);

Is it expected behaviour or can I make LodePNG faster by some less obvious compiler options? I am really surprised that the .NET functions are so much faster. What kind of sorcery could Microsoft use?

f:\color-compare>PNG-color.exe 004999.png 004999-2.png 1.png
PNG Colorizer v0.60 {C#)

Grayscale:      004999.png
Colormask:      004999-2.png
Output:         1.png
Saturation:     100 %
Force size:     False
Resolution 1:   1490 x 1080
Resolution 2:   745 x 540
Zoom:           2

........

Preparation:       31 ms
Calculation:       31 ms
Saving:            69 ms
Total duration:   131 ms

Done.

f:\color-compare>PNG-color-cpp.exe 004999.png 004999-2.png 2.png
PNG Colorizer CPP v0.52

Grayscale:      004999.png
Colormask:      004999-2.png
Output:         2.png
Saturation:     100 %
Force size:     0

Resolution 1:   1490 x 1080
Resolution 2:   745 x 540
Zoom:           2

........
Saving

Preparation:        52ms
Calculation:        23ms
Saving:            459ms
Total duration:    534ms
@lvandeve
Copy link
Owner

Hi, 4x slower is not expected, however it could also depend on settings:

For compiler settings: do you compile with optimizations enabled, such as with the -O2 or -O3 flag? Or if you use an IDE, are there any optimization settings or debug vs release compile settings that can be tuned for speed in there?

For PNG encoding itself: there are various compression related settings, such as windowsize, nicematch and lazymatching, see those values in lodepng.h. Tuning these values may make the encoding faster but the compression worse. How does the image size compare to what you get out of the C# library?

Thanks!

@vitacon
Copy link
Author

vitacon commented Mar 11, 2023

Thank you for your replay.

Yes, the optimization was enabled.

Meanwhile I checked FPNG-test that compresses and decompresses PNGs using different libraries and the results are different too:

SSE 4.1 supported: 1
Filename: 004999.png
Dimensions: 1490x1080, Has Alpha: 0, Total Pixels: 1609200, bytes: 4827600 (4.603958 MB)

** Decoding:
FPNG:    0.034443 secs, 44.556 MP/sec
lodepng: 0.196648 secs,  7.804 MP/sec
stbi:    0.094508 secs, 16.238 MP/sec
wuffs:   0.090095 secs, 17.034 MP/sec
qoi:     0.020679 secs, 74.213 MP/sec

** Encoding:
FPNG:    0.036475 secs, 2256206 bytes, 2.152 MB, 42.075 MP/sec
lodepng: 1.283517 secs, 1455642 bytes, 1.388 MB,  1.196 MP/sec
stbi:    0.883381 secs, 2194322 bytes, 2.093 MB,  1.737 MP/sec
qoi:     0.026551 secs, 1972663 bytes, 1.881 MB, 57.800 MP/sec

This led me to enabling different instruction sets but it did not affect the results that much.
My priority for this project is speed and not compression rate, so for now I stuck to WUFFS for loading and FPNG for saving.

By the way, WUFFS's loading seems to be slightly faster then C# - maybe about 25%.

f:\color-compare>PNG-color.exe 004999.png 004999-2.png 1.png
PNG Colorizer v0.60

Grayscale:      004999.png
Colormask:      004999-2.png
Output:         1.png
Saturation:     100 %
Force size:     False
Resolution 1:   1490 x 1080
Resolution 2:   745 x 540
Zoom:           2

........

Preparation:       33 ms
Calculation:       25 ms
Saving:            79 ms
Total duration:   137 ms

Done.

f:\color-compare>PNG-color-cpp.exe 004999.png 004999-2.png 2.png
PNG Colorizer CPP v0.53

Grayscale:      004999.png
Colormask:      004999-2.png
Output:         2.png
Saturation:     100 %
Force size:     0

FPNG - SSE 4.1: 1
Resolution 1:   1490 x 1080
Resolution 2:   745 x 540
Zoom:           2

........
Saving

Preparation:        23ms
Calculation:        25ms
Saving:             14ms
Total duration:     62ms

Thanks again!

@lvandeve
Copy link
Owner

Thanks for testing! Which platform are you using?

When I run the fpng_test benchmark, I get:

SSE 4.1 supported: 1
Filename: test.png
Dimensions: 2560x1440, Has Alpha: 0, Total Pixels: 3686400, bytes: 11059200 (10.546875 MB)
** Encoding:
FPNG:    0.007738 secs, 2276818 bytes, 2.171 MB, 454.333 MP/sec
lodepng: 0.203284 secs, 1872993 bytes, 1.786 MB, 17.294 MP/sec
stbi:    0.140620 secs, 2245940 bytes, 2.142 MB, 25.001 MP/sec
qoi:     0.007633 secs, 2047762 bytes, 1.953 MB, 460.582 MP/sec
** Decoding:
FPNG:    0.008594 secs, 409.079 MP/sec
lodepng: 0.024952 secs, 140.896 MP/sec
stbi:    0.018491 secs, 190.126 MP/sec
wuffs:   0.012571 secs, 279.662 MP/sec
qoi:     0.007874 secs, 446.485 MP/sec

For the compression, note that lodepng produces a much smaller result in bytes than the others, compressing harder is slower.

lodepng doesn't use SIMD instructions (directly) which is why FPNG is much faster. QOI is a different image format.

@vitacon
Copy link
Author

vitacon commented Mar 11, 2023

My home CPU is AMD Ryzen 5 3600 (6 cores, 12 threads). I believe Intel would produce different ratios of measured times because I already noticed that while measuring my apps. I'll try to run this test on my office computer too.

note that lodepng produces a much smaller result in bytes than the others, compressing harder is slower.

I noticed that =) but for my current project is more important speed. Better compression might be useful next time. =)

@lvandeve
Copy link
Owner

lvandeve commented Mar 11, 2023

I would actually expect your CPU to do faster on all codecs (I'd even expect 10x faster for decoding/encoding for most of them) here, I'm not sure what could be going wrong though if optimizations are enabled. I also measured on AMD above, from the Ryzen 7xxx series.

Unless it's a very special image? But I don't know what kind of pixel pattern could make encoding and decoding 10x slower either.

@vitacon
Copy link
Author

vitacon commented Mar 13, 2023

Weeell, I think I pasted by mistake a log from debug version of FPNG-test. It was just an illustration of ratios so I did not notice the absolute values are too high. Sorry for that. =}

These two are from release build (with "default optimize"):

AMD Ryzen 5 3600 (6 cores, 12 threads)

SSE 4.1 supported: 1
Filename: 004999.png
Dimensions: 1490x1080, Has Alpha: 0, Total Pixels: 1609200, bytes: 4827600 (4.603958 MB)
** Encoding:
FPNG:    0.009921 secs, 2256206 bytes, 2.152 MB, 154.687 MP/sec
lodepng: 0.472853 secs, 1455642 bytes, 1.388 MB, 3.246 MP/sec
stbi:    0.230889 secs, 2194322 bytes, 2.093 MB, 6.647 MP/sec
qoi:     0.018489 secs, 1972663 bytes, 1.881 MB, 83.003 MP/sec
** Decoding:
FPNG:    0.011660 secs, 131.620 MP/sec
lodepng: 0.058879 secs, 26.065 MP/sec
stbi:    0.033797 secs, 45.408 MP/sec
wuffs:   0.025916 secs, 59.216 MP/sec
qoi:     0.013002 secs, 118.034 MP/sec
Intel i5-8400 (6 cores, 6 threads)

SSE 4.1 supported: 1
Filename: 004999.png
Dimensions: 1490x1080, Has Alpha: 0, Total Pixels: 1609200, bytes: 4827600 (4.603958 MB)
** Encoding:
FPNG:    0.011236 secs, 2256206 bytes, 2.152 MB, 136.580 MP/sec
lodepng: 0.507247 secs, 1455642 bytes, 1.388 MB, 3.025 MP/sec
stbi:    0.277720 secs, 2194322 bytes, 2.093 MB, 5.526 MP/sec
qoi:     0.017358 secs, 1972663 bytes, 1.881 MB, 88.411 MP/sec
** Decoding:
FPNG:    0.013338 secs, 115.063 MP/sec
lodepng: 0.050619 secs, 30.318 MP/sec
stbi:    0.032889 secs, 46.662 MP/sec
wuffs:   0.027274 secs, 56.267 MP/sec
qoi:     0.012056 secs, 127.289 MP/sec

This is the test image:
004999

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants