Automatic detection of CPU capabilities. #45

dietmarwo · 2022-10-19T22:46:35Z

Is there a way the CPU capabilites can automatically being detected?
Since today EigenRand is used in all of the optmization algorithms of fcmaes
currently by hardcoding SSE2 support. Would be better if AVX support would be automatically detected
since fcmaes is used from Python and I cannot predict on which CPUs it is deployed.

But I already see a significant speedup, specially with machine learning tasks ( EvoJax uses some of my algos) which require up to a million random variable generations each iteration (lasting a few seconds).
Thks for this great project!

bab2min · 2022-10-20T16:20:03Z

Hi @dietmarwo,
Like Eigen, EigenRand aims compile-time architecture selection only and dynamic dispatch for various CPU architecture cannot be supported because Eigen, the underlying library of EigenRand, does not support dynamic dispatch.

For the deployment of Python, there are some options you can choose.

Deploying by source code:

Since it allows to compile in the native machine(like -march option in gcc), you don't need to care about which SIMD is supported. The compiler will select the best SIMD instruction set. But this option isn't work well at Windows platform.

Deploying with multiple compiled binaries into one package and selecting the best binary using cpuinfo, pycpuinfo, etc:

It is the most convenient distribution method for users, but it is quite difficult to configure deployment.

I choosed this option for my python tomotopy, which is using Eigen & EigenRand. You can see the simple example at https://github.com/bab2min/tomotopy/blob/main/src/python/py_rt.cpp. If you want to see the intra structure of the package, you can unzip any wheel files at https://pypi.org/project/tomotopy/#files.

Deploying multiple versions by SIMD architectures and letting users choose:

Simple solution, but users should know their details of CPU architecture.

dietmarwo · 2022-11-10T16:16:38Z

Thanks for the detailed answer. For now I decided to use only SSE2 for the public version and compile
an AVX version only for personal use. Perfomance of random number generation is only relevant for high
dimensional use cases as machine learning - two of the fcmaes C++ algorithms are wrapped in
https://github.com/google/evojax/tree/main/evojax/algo). More important was the quality of the generated random sequence, the main reason I switched to EigenRand.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic detection of CPU capabilities. #45

Automatic detection of CPU capabilities. #45

dietmarwo commented Oct 19, 2022 •

edited

bab2min commented Oct 20, 2022

dietmarwo commented Nov 10, 2022

Automatic detection of CPU capabilities. #45

Automatic detection of CPU capabilities. #45

Comments

dietmarwo commented Oct 19, 2022 • edited

bab2min commented Oct 20, 2022

dietmarwo commented Nov 10, 2022

dietmarwo commented Oct 19, 2022 •

edited