crypto: uniform runtime capability detection #3512

patricklodder · 2024-04-06T22:13:14Z

Because we want to check for hardware capabilities at runtime among a set of current experimental features when they mature, it's good to have a uniform, central facility that checks capabilities. Detection code is riddled with macros making it hard to review/debug, because for each architecture/compiler there can be quirks.

This PR does 3 things:

It backports compat/cpuid.h from Bitcoin Core, to provide a generic, low level interface for reading out x86 CPU properties. This helps us in the future outside of crypto algorithms too, because we'll need it for RNG enhancements; see for example this finding on a proposed RDSEED implementation.
It implements a detection function in crypto/hwcap.*, where all capabilities can be tested (once), and decoupled from implementation selectors. It can be easily extended by enumerating new booleans into the HardwareCapabilities struct, which can then be detected in the DetectHWCapabilities() function. The values in this struct can also be overridden in init.cpp in the future, for example: there are plenty products where you would not want to run RDSEED or AVX2 on because either the implementation is terribly slow or has serious security flaws. This construct allows us to create a facility that turns those off using a startup argument that overrides the detection. This would allow users to simply run a released binary but runtime override what features to use, sparing them the need for compile expertise.
Implementation on the scrypt-sse2 experimental code, which is now a lot cleaner. I've slightly refactored it to have namespaces, like the Bitcoin sha256 and the sha512 proposal from crypto: added runtime checks for SHA hardware #3188 has.

Testing

I think this needs a couple of specific tests that we don't have a CI for:

Compiling with clang
Compiling with MSVC
Compiling with newer mingw
Sync at least testnet with the experimental scrypt code, and once without it, because scrypt is of course part of consensus code.

Backported from: 4437d6e 63c16ed 723c796

A generic and reusable hardware capability detection mechanism to not query and reimplement multi-architecture code all over the place.

victorsk2019 · 2024-04-10T18:40:01Z

Hi,

I ran bench_dogecoin for this PR and original 1.15.0-dev branch on my Dell XPS Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz system without explicitly enabling scrypt-sse2 support.

For this PR I've got for scrypt:
Scrypt,2048,0.000507116317749,0.000533625483513,0.000520408619195,1119821,1178271,1149121

For original 1.15.0-dev benchmark run I've got:
Scrypt,4096,0.000255910679698,0.000260507687926,0.000257457257248,565081,575221,568493

Judging by the results (and skimming over the code), my understanding is that this PR detects and automatically sets SSE2 CPU features if they are supported by hardware? This is really awesome! Does this mean sse2 and avx2 features can be regarded as standard and not that much "experimental" features anymore? The reason why I ask is in my net-p2p/dogecoin-qt I make it a requirement to specify experimental use flag as part of avx2 and scrypt-sse2 selection options (to model our current requirement for these features.

However, if CPU supports such features as avx2 and/or sse2, it is more preferable in Gentoo ecosystem to automatically detect and set CPU features during package installation without explicitly specifying them with USE flag (something similar to what this PR aims at doing), provided the user correctly specifies them for CPU_FLAGS_X86 variable in make.conf. Does this mean I can modify net-p2p/dogecoin-qt ebuild to automatically use sse2 and avx2 during package installation without experimental requirement?

Thank you,
Victor.

patricklodder · 2024-04-10T19:25:50Z

Thank you.

without explicitly enabling scrypt-sse2 support

This PR does not change the experimental-ness of scrypt-sse2 and you should still pass --enable-experimental --enable-scrypt-sse2

For this PR I've got [..]

Those are odd results, could you please attach your config.log for both versions?

Judging by the results (and skimming over the code), my understanding is that this PR detects and automatically sets SSE2 CPU features if they are supported by hardware?

It already did that before - I just made the method available generically.

However, if CPU supports such features as avx2 and/or sse2, it is more preferable in Gentoo ecosystem to automatically detect and set CPU features during package installation

We can support that once we take this out of experimental. However, the results you posted are not consistent with what I would expect!

victorsk2019 · 2024-04-10T19:37:35Z

Hi,

Thank you for clarifying. Attaching both config.log files as requested.
config_3512.log
config_orig.log

patricklodder · 2024-04-10T20:19:28Z

I reviewed the logs... your result is a 2x worse performance when compiling without SSE2 enabled. I will try to reproduce these results, because that would be an unintended effect.

victorsk2019 · 2024-04-11T18:03:02Z

Hi,

Just a quick note that I also ran test on another Linux installation in VirtualBox and got following compilation error:
sync.cpp:60:14: error: ‘set’ in namespace ‘std’ does not name a template type 60 | typedef std::set<std::pair<void*, void*> > InvLockOrders; | ^~~ sync.cpp:14:1: note: ‘std::set’ is defined in header ‘<set>’; did you forget to ‘#include <set>’? 13 | #include <boost/thread.hpp> +++ |+#include <set> 14 | sync.cpp:72:5: error: ‘InvLockOrders’ does not name a type; did you mean ‘LockOrders’? 72 | InvLockOrders invlockorders; | ^~~~~~~~~~~~~ | LockOrders sync.cpp: In function ‘void push_lock(void*, const CLockLocation&, bool)’: sync.cpp:123:18: error: ‘struct LockData’ has no member named ‘invlockorders’; did you mean ‘lockorders’? 123 | lockdata.invlockorders.insert(p2); | ^~~~~~~~~~~~~ | lockorders sync.cpp: In function ‘void DeleteLock(void*)’: sync.cpp:172:18: error: ‘struct LockData’ has no member named ‘invlockorders’; did you mean ‘lockorders’? 172 | lockdata.invlockorders.erase(invitem); | ^~~~~~~~~~~~~ | lockorders sync.cpp:175:5: error: ‘InvLockOrders’ has not been declared 175 | InvLockOrders::iterator invit = lockdata.invlockorders.lower_bound(item); | ^~~~~~~~~~~~~ sync.cpp:176:12: error: ‘invit’ was not declared in this scope; did you mean ‘int’? 176 | while (invit != lockdata.invlockorders.end() && invit->first == cs) { | ^~~~~ | int sync.cpp:176:30: error: ‘struct LockData’ has no member named ‘invlockorders’; did you mean ‘lockorders’? 176 | while (invit != lockdata.invlockorders.end() && invit->first == cs) { | ^~~~~~~~~~~~~ | lockorders sync.cpp:179:18: error: ‘struct LockData’ has no member named ‘invlockorders’; did you mean ‘lockorders’? 179 | lockdata.invlockorders.erase(invit++); | ^~~~~~~~~~~~~ | lockorders make[2]: *** [Makefile:6600: libdogecoin_util_a-sync.o] Error 1 make[2]: Leaving directory '/home/me/projects/fromgit/dogecoin/PR-3512/dogecoin/src' make[1]: *** [Makefile:9882: all-recursive] Error 1 make[1]: Leaving directory '/home/me/projects/fromgit/dogecoin/PR-3512/dogecoin/src' make: *** [Makefile:695: all-recursive] Error 1

This error was fixed by adding #include <set> in sync.cpp (as message advises). I ran bench_dogecoin on that installation and got this for Scrypt:

Scrypt,1792,0.000569975003600,0.000772126019001,0.000594207617853,1258510,1704819,1312008

config_vbox.log

patricklodder · 2024-04-11T20:11:12Z

@victorsk2019 2 notes:

Using --enable-debug may not be optimal when doing benchmarks, as that compiles with -O0 (and your make log must be full of warnings if you do this on Gentoo). Without it, you also do not get the include error.
On 1.15 you can use --enable-c++14 instead of adding the CXX flag; you can see in your config.log that it adds 2 c++ standard instructions, but only does checks for c++11. (And after build: update AX_CXX_COMPILE_STDCXX to serial 18 #3494 goes in, I will also propose --enable-c++17.)

victorsk2019 · 2024-04-11T23:31:07Z

Thank you for additional information. I've recompiled this PR's code without --enable-debug and got the following bench results for Scrypt (this is probably what was expected):
Scrypt,4096,0.000250319950283,0.000264337286353,0.000253840873484,552764,583719,560546
Maybe had something to do with mutexes locking ~~processes~~ threads causing slower benchmark performance 😕

patricklodder added 3 commits April 6, 2024 12:16

compat: introduce unified cpuid

1cc0a41

Backported from: 4437d6e 63c16ed 723c796

crypto: detect hardware capabilities uniformly

bdfbd0f

A generic and reusable hardware capability detection mechanism to not query and reimplement multi-architecture code all over the place.

crypto: use generic hwcap detection for scrypt sse2

f5c21c4

patricklodder added enhancement consensus labels Apr 6, 2024

patricklodder added this to the 1.15.0 milestone Apr 6, 2024

patricklodder requested a review from a team April 6, 2024 22:13

patricklodder added this to 🚀 needs review in Review & merge board Apr 6, 2024

patricklodder mentioned this pull request Apr 7, 2024

crypto: added runtime checks for SHA hardware #3188

Open

patricklodder mentioned this pull request Apr 11, 2024

Missing include for std::set in sync.cpp #3516

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

crypto: uniform runtime capability detection #3512

crypto: uniform runtime capability detection #3512

patricklodder commented Apr 6, 2024 •

edited

victorsk2019 commented Apr 10, 2024

patricklodder commented Apr 10, 2024

victorsk2019 commented Apr 10, 2024

patricklodder commented Apr 10, 2024

victorsk2019 commented Apr 11, 2024

patricklodder commented Apr 11, 2024

victorsk2019 commented Apr 11, 2024 •

edited

crypto: uniform runtime capability detection #3512

Are you sure you want to change the base?

crypto: uniform runtime capability detection #3512

Conversation

patricklodder commented Apr 6, 2024 • edited

Testing

victorsk2019 commented Apr 10, 2024

patricklodder commented Apr 10, 2024

victorsk2019 commented Apr 10, 2024

patricklodder commented Apr 10, 2024

victorsk2019 commented Apr 11, 2024

patricklodder commented Apr 11, 2024

victorsk2019 commented Apr 11, 2024 • edited

patricklodder commented Apr 6, 2024 •

edited

victorsk2019 commented Apr 11, 2024 •

edited