Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify codepath selection #130

Open
diizzyy opened this issue Dec 26, 2023 · 4 comments
Open

Simplify codepath selection #130

diizzyy opened this issue Dec 26, 2023 · 4 comments
Assignees
Labels

Comments

@diizzyy
Copy link

diizzyy commented Dec 26, 2023

Is there a reason why codepath selection logic seems to be trying to work around compilers? Wouldn't it be easier and cleaner to just rely on what gets passed to the compiler?

Drop all HAVE_EXTENSION and just check for __EXTENSION__ instead?
https://stackoverflow.com/questions/28939652/how-to-detect-sse-sse2-avx-avx2-avx-512-avx-128-fma-kcvi-availability-at-compile

@aklomp aklomp self-assigned this Dec 26, 2023
@aklomp
Copy link
Owner

aklomp commented Dec 26, 2023

Good question, you'd think we could just use those macros at compile time and all would be good. But unfortunately it's a case of Chesterton's Fence.

On x86, this library supports runtime dispatch for multiple architectures, so that the same library can run unmodified on everything from an Atom to the latest AVX512 machine. This simplifies the life of a distro packager because they can ship one binary library that works everywhere.

Let's take a concrete example and focus on the SSE2 codec for really old Pentiums. On the code path for these old Pentiums, none of the code can use any instructions newer than SSE2, or the machine will crash with an illegal instruction error. That means that the library dispatch code should be compiled for nothing higher than generic x86, and the SSE2 codec should be compiled with nothing higher than SSE2 support.

Because what happens when you compile the SSE2 codec with a compiler that has -mavx512 enabled? The compiler will happily "optimize" the code to use all sorts of newer instructions. And that will crash our Pentium, or anything that can't handle AVX512 in fact, if the compiler happens to use AVX512 instructions in its optimizations.

For that reason, this library has to have full control over the architecture arguments to the compiler. It has to ensure that the SSE2 codec is only compiled at SSE2 compatibility level and strictly nothing higher. The way to inform the library that you want support for a certain architecture, and to set the compiler flags for that certain architecture, is through the HAVE_XXX stuff.

I hope it now becomes clear that the __SSE2__ macro doesn't actuallly help much. If it's defined, you could still be generating code that won't run on an SSE2 machine.

@diizzyy
Copy link
Author

diizzyy commented Dec 26, 2023

I actually asked about this to simplify packaging. While I do understand your concern about safeguarding for incorrect compiler flags I would argue that it causes more issues than it solves. If you define lets say -march=tigerlake (11th gen with AVX512) and expect the final binary to run on a Haswell based Pentium (no AVX instructions at all) that's hardly the project at fault. That would also cause the majority of packages by far to be unusable since they'll be optimized for the target hardware defined by compiler flags. Some projects do however provide a "baseline" option which disables all optimization (ffmpeg for example) but that's usually for debugging rather than safeguarding.

fwiw, I started hacking on this mostly as an exercise but also to see if it were possible to simplify packaging.

@aklomp
Copy link
Owner

aklomp commented Dec 26, 2023

You're advocating for a library that selects a single codepath at compile time. That's a reasonable position to take, but it's not the current design intent of this library. This library is positioned as a multi-arch, runtime-dispatched library that can run on all sorts of hardware without recompilation. It's also intended to provide the best possible performance by dispatching based on the capabilities of the actual hardware, rather than whatever baseline the packager chose. You are free to question these assumptions (and I'm not too happy with them after many years of maintaining this library), but they are quite fundamental. Changing them would mean a major break in compatibility.

Full disclosure: I am unhappy about the runtime dispatching and would like to get rid of it (or rather make it optional) at some point in the future. I also agree with you that the compilation and configuration system is complex and messy. To fix this, I have some not-very-concrete plans about rebranding this project and doing a large rewrite at some point.

I'd like to have standalone codecs which are exported as first-class symbols and are callable directly by the user, so that the user can do their own runtime dispatch. The internal runtime dispatcher needs to be optional, or maybe needs to disappear entirely, because I don't want to maintain platform selection code any more. The tree of #includes needs to be cleaned up. I'd like to simplify the compilation system and supply the arch-specific compiler arguments through some other mechanism, possibly #pragma's embedded in the codecs. The build system should be drastically simplified. Cleaner directory layout, and so on, and so on.

If you have any good suggestions in this area after hacking on the library, I'm certainly interested.

@BurningEnlightenment
Copy link
Contributor

I'd like to simplify the compilation system and supply the arch-specific compiler arguments through some other mechanism, possibly #pragma's embedded in the codecs.

This would introduce strong coupling between compiler flags and source files which I really dislike. And while I concur that the buildsystem can be simplified, I find it surprising that you'd want to move details about the build process from the buildsystem into the source files.

I've rewritten the BLAKE3 buildsystem within the last year--a library which faces similar build challenges--and I've arrived at a simpler design by dropping most of the compiler introspection support and fine grained codec (SSE/AVX/...) selection.

I'd like to have standalone codecs which are exported as first-class symbols and are callable directly by the user, so that the user can do their own runtime dispatch.

Sure this is fine for simple cases, but as soon as you have more than one independent component/library which takes a dependency on this library you will get multiple runtime dispatch implementations (each has to include their own) or have to make the optional component... uh... non-optional 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants