Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic detection of CPU capabilities. #45

Open
dietmarwo opened this issue Oct 19, 2022 · 2 comments
Open

Automatic detection of CPU capabilities. #45

dietmarwo opened this issue Oct 19, 2022 · 2 comments

Comments

@dietmarwo
Copy link

dietmarwo commented Oct 19, 2022

Is there a way the CPU capabilites can automatically being detected?
Since today EigenRand is used in all of the optmization algorithms of fcmaes
currently by hardcoding SSE2 support. Would be better if AVX support would be automatically detected
since fcmaes is used from Python and I cannot predict on which CPUs it is deployed.

But I already see a significant speedup, specially with machine learning tasks ( EvoJax uses some of my algos) which require up to a million random variable generations each iteration (lasting a few seconds).
Thks for this great project!

@bab2min
Copy link
Owner

bab2min commented Oct 20, 2022

Hi @dietmarwo,
Like Eigen, EigenRand aims compile-time architecture selection only and dynamic dispatch for various CPU architecture cannot be supported because Eigen, the underlying library of EigenRand, does not support dynamic dispatch.

For the deployment of Python, there are some options you can choose.

  1. Deploying by source code:

Since it allows to compile in the native machine(like -march option in gcc), you don't need to care about which SIMD is supported. The compiler will select the best SIMD instruction set. But this option isn't work well at Windows platform.

  1. Deploying with multiple compiled binaries into one package and selecting the best binary using cpuinfo, pycpuinfo, etc:

It is the most convenient distribution method for users, but it is quite difficult to configure deployment.

I choosed this option for my python tomotopy, which is using Eigen & EigenRand. You can see the simple example at https://github.com/bab2min/tomotopy/blob/main/src/python/py_rt.cpp. If you want to see the intra structure of the package, you can unzip any wheel files at https://pypi.org/project/tomotopy/#files.

  1. Deploying multiple versions by SIMD architectures and letting users choose:

Simple solution, but users should know their details of CPU architecture.

@dietmarwo
Copy link
Author

Thanks for the detailed answer. For now I decided to use only SSE2 for the public version and compile
an AVX version only for personal use. Perfomance of random number generation is only relevant for high
dimensional use cases as machine learning - two of the fcmaes C++ algorithms are wrapped in
https://github.com/google/evojax/tree/main/evojax/algo). More important was the quality of the generated random sequence, the main reason I switched to EigenRand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants