Intel compiler optimization #1808

Fonotec · 2023-08-23T11:40:20Z

Is your feature request related to a problem? Please describe.
I noticed that the intel compiler is not significantly faster by default compared to the Gfortran compiler. I played a bit around with the compiler flags and found that using different compiler flags could make the engine more than 50% faster.

Describe the solution you'd like
I think that it would be ideal to make the fastest possible executable for OpenRadioss for the machine you compile on by default.

I find that the current default flags in cmake_linux64_intel.txt (lines 77 and 78):
-axSSE3,COMMON-AVX512 -no-fma -O3 -fp-model precise -fimf-use-svml=true -qopenmp
Can be replaced by:
-no-fma -Ofast -xHost -static -fp-model precise -fimf-use-svml=true -qopenmp
and this will already give a 50% faster code. There might be even better combinations of compiler flags.

Let me know what you think!

The text was updated successfully, but these errors were encountered:

elequiniou · 2023-08-24T13:29:33Z

Thanks a lot for the finding! This is really interesting.
We are investigating to reproduce and understand better from which option(s) the improvement is coming and if it changes the numerical answer.
The generic options we provided allow to run on many platforms. Using -xhost when the compilation and run machine is the same sounds a good tip.

Fonotec · 2023-08-24T13:49:20Z

I will share this plot which shows some of the options (not complete list of what is possible), this was performed on a Intel Xeon processor, this is a bit older model (a few years old). This also shows variation as a function of number of cores to see how well the scaling of the code performs for a cantilever beam simulation with around 20.000 shell elements (I would not expect it to scale very well beyond 4 cores). Here you can see that adding Ofast, adding xHost and removing the -ax... line reduces the CPU time. Adding -static does not improve noticeable compared to -Ofast on this plot. The black vertical line indicates a factor of two in the total CPU time.

Compiler_comparisons_CPU_time.pdf

Note that the speed-up might be even bigger on newer CPUs.

Fonotec · 2023-08-24T15:01:49Z

As requested the details of the CPU are: Intel(R) Xeon(R) Gold 6234 CPU @ 3.30GHz (x86_64), 3300 MHz, 515398 MB RAM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intel compiler optimization #1808

Intel compiler optimization #1808

Fonotec commented Aug 23, 2023

elequiniou commented Aug 24, 2023

Fonotec commented Aug 24, 2023

Fonotec commented Aug 24, 2023

Intel compiler optimization #1808

Intel compiler optimization #1808

Comments

Fonotec commented Aug 23, 2023

elequiniou commented Aug 24, 2023

Fonotec commented Aug 24, 2023

Fonotec commented Aug 24, 2023