You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
user@debian:~/FinanceBench/Monte-Carlo/OpenMP$ make
g++ -O3 -march=native -fopenmp monteCarloEngine.c -o monteCarloEngine.exe
user@debian:~/FinanceBench/Monte-Carlo/OpenMP$ ./monteCarloEngine.exe
Number of Samples: 400000
Run on CPU using OpenMP
Processing time on CPU using OpenMP: 33599.273438 (ms)
Average Price (CPU computation): 8.096899
Run on CPU
Processing time on CPU: 4020.650879 (ms)
Average Price (CPU computation): 8.085914
Speedup Using OpenMP: 0.119665
user@debian:~/FinanceBench/Monte-Carlo/OpenMP$
gcc 8.2.0, amd64
I have 16 core, and 32 physical threads. AMD ThreadRipper 2950X.
When benchmark runs I noticed two things:
only 800% of CPU used (8 cores/8 threads), instead 3200%.
85% of each core time is spent in kernel, probably doing futexes or something (strace is indeed showing a lot of live spinning on futex, I guess this is gcc openmp implementation thingy).
Instead of OpenMP being 8 times faster, I am actually getting OpenMP version be about 8-9 times slower than single threaded normal code path! This is horrendous.
If I edit the #pragma omp in monteCarloKernelsCpu.c, to use 32 threads, it indeed starts to use 32 threads (why 8 is hardcoded??!?), it uses 3200% of CPU. However the time spent in kernel grows to 95% on each core!. Speedup: 0.109!!, so even worse.
Solution, do not use rand() and remove all unnecessary omp stuff:
As in title,
gcc 8.2.0, amd64
I have 16 core, and 32 physical threads. AMD ThreadRipper 2950X.
When benchmark runs I noticed two things:
Instead of OpenMP being 8 times faster, I am actually getting OpenMP version be about 8-9 times slower than single threaded normal code path! This is horrendous.
If I edit the
#pragma omp
inmonteCarloKernelsCpu.c
, to use 32 threads, it indeed starts to use 32 threads (why 8 is hardcoded??!?), it uses 3200% of CPU. However the time spent in kernel grows to 95% on each core!. Speedup: 0.109!!, so even worse.Solution, do not use
rand()
and remove all unnecessary omp stuff:In getPathCpu:
And CPU kernel, adjusted:
Result?
Still correct (actually finally correct) computations.
So, in total my patch makes it 171 times faster than before!
The text was updated successfully, but these errors were encountered: