You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There aren't any serious issues that demand this at the moment, but as compilers and CPUs evolve, new features are introduced that are difficult for us to implement at the assembly level and can lead to various problems if we don't implement them. (Refer to #350, #707, #708, and #729.) I would never attempt an intrinsics implementation of the i386 SIMD algorithms, because the lack of available registers would make such an implementation very difficult, if not impossible, to optimize. (The i386 SIMD algorithms are basically a legacy feature anyhow.) SSE2 still doesn't have a comfortable number of registers, when you consider that using xmm8-xmm15 can cause performance issues in some cases. However, it might at least be worth attempting. Just bear in mind that it would likely require hundreds of hours of labor to get this right, and an organization would have to be willing to fund 100% of that labor. My experience with the Arm Neon intrinsics implementation was that certain compilers (GCC < v12) had poorly-optimized Neon intrinsics, so it was necessary to keep some of the assembly algorithms when using those compilers. Such may be necessary in the near term with the proposed x86-64 intrinsics implementation as well, so it may similarly be a forward-looking feature that isn't always enabled by default.
I will not accept PRs for this feature, because:
I have more experience with the x86-64 libjpeg-turbo SIMD extensions than anyone else in the world, so this is one of those features whereby it would take me longer to clean up, optimize, test, document, and integrate an outside contribution than it would take me to implement the feature myself.
Since the feature would be "nice to have" rather than critical, my primary interest in it is frankly monetary. (I do this for a living, not for a hobby.)
The text was updated successfully, but these errors were encountered:
There aren't any serious issues that demand this at the moment, but as compilers and CPUs evolve, new features are introduced that are difficult for us to implement at the assembly level and can lead to various problems if we don't implement them. (Refer to #350, #707, #708, and #729.) I would never attempt an intrinsics implementation of the i386 SIMD algorithms, because the lack of available registers would make such an implementation very difficult, if not impossible, to optimize. (The i386 SIMD algorithms are basically a legacy feature anyhow.) SSE2 still doesn't have a comfortable number of registers, when you consider that using xmm8-xmm15 can cause performance issues in some cases. However, it might at least be worth attempting. Just bear in mind that it would likely require hundreds of hours of labor to get this right, and an organization would have to be willing to fund 100% of that labor. My experience with the Arm Neon intrinsics implementation was that certain compilers (GCC < v12) had poorly-optimized Neon intrinsics, so it was necessary to keep some of the assembly algorithms when using those compilers. Such may be necessary in the near term with the proposed x86-64 intrinsics implementation as well, so it may similarly be a forward-looking feature that isn't always enabled by default.
I will not accept PRs for this feature, because:
The text was updated successfully, but these errors were encountered: