Optimize NEON encoding for high dynamic range blocks #663

Developer-Ecosystem-Engineering · 2024-01-02T18:33:03Z

We can improve the encoding performance on Apple silicon configurations with the following enhancements

Replace the switch with a for loop
Invert qlp_factors & remove else conditions as branches not taken are returned out

13% - 17% improvement depending on compiler, configuration and .wav

Clang 16 or GCC 13 necessary

Platform	MBP18,2	Mac14,12	AMD Ryzen 9 7900X x 24
Compiler	GCC 13 ---- Clang 16	GCC 13 ---- Clang 16	GNU 13.2.1
Original	6.554 ---- 6.176	5.725 ---- 5.377	5.018
Optimized	5.642 ---- 5.131	4.803 ---- 4.672	5
Overall	14% ---- 17%	16% ---- 13%	0

We can improve the encoding performance on Apple silicon configurations with the following enhancements - Replace the switch with a for loop - Invert qlp_factors & remove else conditions as branches not taken are returned out 13% - 17% improvement depending on compiler, configuration and .wav Clang 16 or GCC 13 necessary

ktmf01 · 2024-02-26T19:34:05Z

What kind of audio did you use to get these numbers?

Perhaps somewhat rude, but I am currently not inclined to accept this PR. This change is not covered by fuzzing, provides a modest improvement for a very niche use. So, I feel like the maintenance burden of this addition is not proportional to the gains.

Developer-Ecosystem-Engineering · 2024-03-11T21:40:31Z

Hi @ktmf01

The optimization is designed to improve high bit rate audio. The test WAVs are detailed below. It's all stereo, 24bps, music files across many genres.

This table was against macOS 14.4 + 16 inch MBP M2.

	Duration	Sample rate	Base time (s)	Optimized time (s)	Improvement (x times)
Sample 1	08:47	44.1	1.843	1.685	1.09
Sample 2	22:35	48	4.856	4.376	1.11
Sample 3	15:36	88.2	8.28	6.537	1.27

This change is not covered by fuzzing, provides a modest improvement for a very niche use.

We'd be happy to review the PR further, including if you'd like additional fuzzing tests. A performance improvement for high res audio seems inline with FLAC use cases, i.e. #669

ktmf01 · 2024-03-12T07:55:01Z

Please provide some more information on the tests, because I cannot explain the results you're getting. The problem I have with this, is that the function you are optimizing should only be rarely used for 24-bit material, if at all. So, what kind of preset are you using, and what times are you measuring? Is it possible to provide gprof reports with and without this optimization on your hardware?

Okay, nevermind, this function is indeed used for 24-bit material. But why have you made this "NEON-exclusive"? It seems to me this code would work just fine for AMD64 as well and provide a performance benefit.

Considering fuzzing, the problem is not in the code, but in the hardware. Currently fuzzing is not run on ARMv8/v9 hardware, meaning this code is not fuzzed, because it is exclusive for that hardware. If it were not exclusive, fuzzing would not be a problem

Developer-Ecosystem-Engineering · 2024-03-14T03:57:19Z

Appreciate the dialog, thank you for taking the time to review and consider these changes!

We made it exclusive because the benefit was much more pronounced on that architecture and flat on others. You are correct that it could be enabled for other architectures and that would also resolve the fuzzing issue.

We'd be happy to modify the PR to not constrain it to only NEON. If there are specific WAVs and presets the project would like to see it tested against let us know! We will update with the results using those inputs and the expanded architecture.

ktmf01 · 2024-05-16T14:06:23Z

Okay, nevermind, this function is indeed used for 24-bit material.

While this was not the intent of this PR, thank you for (indirectly) pointing me to this. I've merged two PRs, #700 and #702, which should make flac use the function you optimized, FLAC__lpc_compute_residual_from_qlp_coefficients_limit_residual, way less often.

Could you perhaps rebase this PR against the current state of the repository, and re-evaluate this PR?

Developer-Ecosystem-Engineering · 2024-05-16T15:47:18Z

Of course, standby.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize NEON encoding for high dynamic range blocks #663

Optimize NEON encoding for high dynamic range blocks #663

Developer-Ecosystem-Engineering commented Jan 2, 2024 •

edited

ktmf01 commented Feb 26, 2024

Developer-Ecosystem-Engineering commented Mar 11, 2024

ktmf01 commented Mar 12, 2024 •

edited

Developer-Ecosystem-Engineering commented Mar 14, 2024 •

edited

ktmf01 commented May 16, 2024

Developer-Ecosystem-Engineering commented May 16, 2024

Optimize NEON encoding for high dynamic range blocks #663

Are you sure you want to change the base?

Optimize NEON encoding for high dynamic range blocks #663

Conversation

Developer-Ecosystem-Engineering commented Jan 2, 2024 • edited

ktmf01 commented Feb 26, 2024

Developer-Ecosystem-Engineering commented Mar 11, 2024

ktmf01 commented Mar 12, 2024 • edited

Developer-Ecosystem-Engineering commented Mar 14, 2024 • edited

ktmf01 commented May 16, 2024

Developer-Ecosystem-Engineering commented May 16, 2024

Developer-Ecosystem-Engineering commented Jan 2, 2024 •

edited

ktmf01 commented Mar 12, 2024 •

edited

Developer-Ecosystem-Engineering commented Mar 14, 2024 •

edited