Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intel compiler optimization #1808

Open
Fonotec opened this issue Aug 23, 2023 · 3 comments
Open

Intel compiler optimization #1808

Fonotec opened this issue Aug 23, 2023 · 3 comments

Comments

@Fonotec
Copy link

Fonotec commented Aug 23, 2023

Is your feature request related to a problem? Please describe.
I noticed that the intel compiler is not significantly faster by default compared to the Gfortran compiler. I played a bit around with the compiler flags and found that using different compiler flags could make the engine more than 50% faster.

Describe the solution you'd like
I think that it would be ideal to make the fastest possible executable for OpenRadioss for the machine you compile on by default.

I find that the current default flags in cmake_linux64_intel.txt (lines 77 and 78):
-axSSE3,COMMON-AVX512 -no-fma -O3 -fp-model precise -fimf-use-svml=true -qopenmp
Can be replaced by:
-no-fma -Ofast -xHost -static -fp-model precise -fimf-use-svml=true -qopenmp
and this will already give a 50% faster code. There might be even better combinations of compiler flags.

Let me know what you think!

@elequiniou
Copy link
Contributor

Thanks a lot for the finding! This is really interesting.
We are investigating to reproduce and understand better from which option(s) the improvement is coming and if it changes the numerical answer.
The generic options we provided allow to run on many platforms. Using -xhost when the compilation and run machine is the same sounds a good tip.

@Fonotec
Copy link
Author

Fonotec commented Aug 24, 2023

I will share this plot which shows some of the options (not complete list of what is possible), this was performed on a Intel Xeon processor, this is a bit older model (a few years old). This also shows variation as a function of number of cores to see how well the scaling of the code performs for a cantilever beam simulation with around 20.000 shell elements (I would not expect it to scale very well beyond 4 cores). Here you can see that adding Ofast, adding xHost and removing the -ax... line reduces the CPU time. Adding -static does not improve noticeable compared to -Ofast on this plot. The black vertical line indicates a factor of two in the total CPU time.

Compiler_comparisons_CPU_time.pdf

Note that the speed-up might be even bigger on newer CPUs.

@Fonotec
Copy link
Author

Fonotec commented Aug 24, 2023

As requested the details of the CPU are: Intel(R) Xeon(R) Gold 6234 CPU @ 3.30GHz (x86_64), 3300 MHz, 515398 MB RAM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants