Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intrinsics implementation of x86-64 SIMD algorithms #732

Open
dcommander opened this issue Oct 24, 2023 · 0 comments
Open

Intrinsics implementation of x86-64 SIMD algorithms #732

dcommander opened this issue Oct 24, 2023 · 0 comments

Comments

@dcommander
Copy link
Member

There aren't any serious issues that demand this at the moment, but as compilers and CPUs evolve, new features are introduced that are difficult for us to implement at the assembly level and can lead to various problems if we don't implement them. (Refer to #350, #707, #708, and #729.) I would never attempt an intrinsics implementation of the i386 SIMD algorithms, because the lack of available registers would make such an implementation very difficult, if not impossible, to optimize. (The i386 SIMD algorithms are basically a legacy feature anyhow.) SSE2 still doesn't have a comfortable number of registers, when you consider that using xmm8-xmm15 can cause performance issues in some cases. However, it might at least be worth attempting. Just bear in mind that it would likely require hundreds of hours of labor to get this right, and an organization would have to be willing to fund 100% of that labor. My experience with the Arm Neon intrinsics implementation was that certain compilers (GCC < v12) had poorly-optimized Neon intrinsics, so it was necessary to keep some of the assembly algorithms when using those compilers. Such may be necessary in the near term with the proposed x86-64 intrinsics implementation as well, so it may similarly be a forward-looking feature that isn't always enabled by default.

I will not accept PRs for this feature, because:

  1. I have more experience with the x86-64 libjpeg-turbo SIMD extensions than anyone else in the world, so this is one of those features whereby it would take me longer to clean up, optimize, test, document, and integrate an outside contribution than it would take me to implement the feature myself.
  2. Since the feature would be "nice to have" rather than critical, my primary interest in it is frankly monetary. (I do this for a living, not for a hobby.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant