-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intel compiler optimization #1808
Comments
Thanks a lot for the finding! This is really interesting. |
I will share this plot which shows some of the options (not complete list of what is possible), this was performed on a Intel Xeon processor, this is a bit older model (a few years old). This also shows variation as a function of number of cores to see how well the scaling of the code performs for a cantilever beam simulation with around 20.000 shell elements (I would not expect it to scale very well beyond 4 cores). Here you can see that adding Ofast, adding xHost and removing the -ax... line reduces the CPU time. Adding -static does not improve noticeable compared to -Ofast on this plot. The black vertical line indicates a factor of two in the total CPU time. Compiler_comparisons_CPU_time.pdf Note that the speed-up might be even bigger on newer CPUs. |
As requested the details of the CPU are: Intel(R) Xeon(R) Gold 6234 CPU @ 3.30GHz (x86_64), 3300 MHz, 515398 MB RAM |
Is your feature request related to a problem? Please describe.
I noticed that the intel compiler is not significantly faster by default compared to the Gfortran compiler. I played a bit around with the compiler flags and found that using different compiler flags could make the engine more than 50% faster.
Describe the solution you'd like
I think that it would be ideal to make the fastest possible executable for OpenRadioss for the machine you compile on by default.
I find that the current default flags in cmake_linux64_intel.txt (lines 77 and 78):
-axSSE3,COMMON-AVX512 -no-fma -O3 -fp-model precise -fimf-use-svml=true -qopenmp
Can be replaced by:
-no-fma -Ofast -xHost -static -fp-model precise -fimf-use-svml=true -qopenmp
and this will already give a 50% faster code. There might be even better combinations of compiler flags.
Let me know what you think!
The text was updated successfully, but these errors were encountered: