Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check impact of -ffast-math on performance & correctness #82

Open
JamieJQuinn opened this issue Mar 27, 2024 · 5 comments
Open

Check impact of -ffast-math on performance & correctness #82

JamieJQuinn opened this issue Mar 27, 2024 · 5 comments

Comments

@JamieJQuinn
Copy link
Collaborator

Just noticed we're setting -ffast-math or -fast in the release build. For a summary of why this is dangerous yet helpful, see this blog post. I haven't personally found it affects the correctness of fluid simulations, but I also haven't found it to improve performance so I tend to stick to just -O3 just in case.

I think it's worth testing

  1. whether -ffast-math actually improves performance, and
  2. whether we're potentially affecting correctness of results
@semi-h
Copy link
Member

semi-h commented Apr 4, 2024

Thanks for the issue, we actually had a discussion on this around 6 months ago and people were particularly worried about -ffast-math breaking the order of floating point arithmetic (associative math). In the original codebase (Incompact3D) the derivative functions are written very very carefully to specify a very clear ordering when operating on data points. Its written in a way that the finite difference operations always carried out as specified in the discretisations, like $u_{i-3}$ is paired with $u_{i+3}$ when taking a derivative on a velocity point, or somethimes $u_{i-3}$ is paired with $u_{i+2}$ for staggared operations.

Then @mathrack did some tests on the original codebase using the -ffast-math flag which is definitely destructive to these delicate associative operations alongside many other things -ffast-math triggers, and the results were identical as far as I remember for a test case where there is an analytical solution exists. I guess this was convincing for all of us (especially for going ahead with the single and unified tds solver functions we have in the new framework), but its a good idea to monitor this and report any differences it may cause on the quantities of interest and on performance as we move forward and support new cases in the codebase.

If I remember correctly the performance difference was around %2-3 for a part of the code, but we're still not running the entire solver on CPUs so don't know for certain how it'll be at the end. But considering -ffast-math can only make FP stuff better and we're bandwidth bound, its impact will likely be small. And when it comes to the affect of it on accuracy, I think its very unlikely that it'll have an impact as you mentioned. In CFD simulations we operate on pairs of data with similar order of magnitude, and if things go bad and we get to a point where -ffast-math can have an impact on accuracy, it probably won't make any difference and simulation would blow up regardless. The only case I can think of this might have an impact is an iLES simulation, but probably even for that it might not have any measurable impact.

And I also want to try some performance optimisations on CUDA kernels that I put on hold a few months ago to do more work on the codebase. I'll try and use similar flags on the GPU and measure their impact on performance, and I think this issue is a good place to report those as well.

@pbartholomew08
Copy link
Member

A very interesting article. If the observed performance difference is low, it might be better to favour the safer -O3 by default, and document how to change this. Based on the article though we might want to check what impact it has on vectorisation across TDMA solves/etc (although given the explicit use of OMP SIMD in the CPU code atleast this should, hopefully not be impacted).

@mathrack
Copy link

mathrack commented Apr 4, 2024

I remember some tests on 2D TGV with similar results at machine precision for low-level and high-level optimization using gfortran. However, I am afraid this kind of test can change depending on the machine, the compiler, the compiler version, ...

@JamieJQuinn
Copy link
Collaborator Author

JamieJQuinn commented Apr 9, 2024

So how do we feel about setting the default value to just -O3, so a new user compiles with no potential numerical issues (from fast-math at least...) but we document in the user guide & in the CMakeLists.txt how to change it with info on potential issues & performance gains.

OR we could keep fast-math on and print a warning when compiling & running telling the user to check their results or turn opts down to -O3.

👍 for option 1, 👎 for option 2, 👀 for a different suggestion.

@semi-h
Copy link
Member

semi-h commented Apr 12, 2024

Having below in the CMake file

set(CMAKE_Fortran_FLAGS_FAST "-O3 -ffast-math")

and then using -DCMAKE_BUILD_TYPE=fast when building worked for me. Does anyone know if this is something CMake supports or I'm being luck somehow, or missing something? If this is okay it will allow people to go ahead with their preference regarding -ffast-math.
My main concern is the vectorisation as @pbartholomew08 mentioned because it makes a meaningful difference, not just a few percent, and it can be tricky to get the compiler to generate the instructions especially for AVX512, but it'll be easy to spot. Hopefuly compiler will vectorise even without -ffast-math, and I'm pretty sure it did in my earlier tests. However, [1] mentiones the potential vectorisation issues for SSE and SSE2 in particular, and I'm sure it goes all the way up to AVX512 too. If it doesn't vectorise in some cases, we may consider adding a specific flag to enable vectorsiation in the release build, so that performance impact of not having -ffast-math will be minimal while not letting anyone getting scared. I'll probably go ahead with fast build most of the time and do occasional checks just out of curiosity, reporting anything interesting.

[1] https://gcc.gnu.org/wiki/FloatingPointMath

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants