You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I generate database with gprMax on different computers and I found differences between the values of the amplitudes according to the processors.
I use I3-4100M (CPU), I7-8650U (CPU) , Xeon E5410 (CPU) , Xeon Gold 6130 (CPU) and Tesla M60 (GPU).
I found similar amplitude levels between I3-4100M vs. Xeon E5410 and between I7-8650U vs. Xeon Gold 6130. But I have differences in other cases (like I3-4100M vs. I7-8650U or Xeon E5410 vs. Xeon Gold 6130 or Tesla M60 vs. all). This error seems to be proportional.
I perform my tests in simple and double precision.
I would like to know why this gap ? I think that the architecture / the generation of the processors is at issue. But my real question is : What is the good value ? Maybe I have to compare to an analytic model to have the real values and consequently which processor is closest to reality.
I send you an attachment with a ".in" in example and the ".out" response for each processors.
Thank you very much !
Best regards,
Greg.
Hi Craig,
Thank you for your answer. First time, when I plotted the Ez, I don't see the difference. But when you compare precisely the values (with H5py for example), there is a little difference.
For example, if I take the Ez maximum value insimple precision for each case I send you, I have :
EzMax [I3-4100M] = 2358,80419921875 V/m
EzMax [I7- 8650U] = 2360,99877929687 V/m
EzMax [Xeon E5410] = 2358,80419921875 V/m
EzMax [Xeon Gold 613] = 2360,99877929687 V/m
EzMax [Tesla M60] = 2361,0029296875 V/M.
I have same results between the I3-4100M vs. the Xeon E5410 and between the I7- 8650U vs. Xeon Gold 613.
I am attaching a spreadsheet comparing the Ez values in simple precision from each .out
Thank you very much !
Best regards,
Greg.
The text was updated successfully, but these errors were encountered:
I have investigated this some more, and I think the bottom line is that the differences you are seeing are related to how floating point calculations are handled, and how floating point numbers are approximated on different Intel CPU architectures and on NVIDIA GPUs. There are potentially several factors at play here, and I recommend you have a read of the following articles:
I think the first thing to try is to follow the advice in the Intel CNR document, and force the Intel Math library to follow the same pathway irrespective of the Intel CPU type. It looks like you can do this by setting a the MKL_CBWR environment variable before running gprMax, e.g.
I would try the SSE2 setting first, as it seems to be the lowest common denominator between different Intel CPUs, but it may also be worth trying the AVX2,STRICT setting as well.
The second thing to try is to force the NVIDIA compiler (nvcc) to not use the Fused Multiply-Add (FMA) operation which is possible on GPU but (as far as I know) does not exist on CPU. This may bring the GPU result closer to the CPU one. You can do this by making a small change to the gprMax code (NB no need to recompile gprMax after this change, just run it). If you go into the module 'model_build_run.py' and to line 498, you should see the code
compiler_opts = None
change it to,
compiler_opts = ['-fmad=false']
The above assumes you are not using MS Windows. If you are then change line 496 to add the above argument to the compiler options list.
I am interested to know how this goes, and will give it some more thought in the meantime. I'm also going to add it to our issue tracker on GitHub, so we have a record of it to refer back to in the future.
Quoted post from our Google Group - https://groups.google.com/g/gprmax/c/KLyUH4pnPxE/m/tEWRC3XpAQAJ
The text was updated successfully, but these errors were encountered: