Monte-Carlo OpenMP benchmark is extremally slow due to use of not-thread safe rand(). Patch included to speed it up 170x! #6

baryluk · 2018-12-13T23:17:21Z

As in title,

user@debian:~/FinanceBench/Monte-Carlo/OpenMP$ make
g++ -O3 -march=native -fopenmp monteCarloEngine.c -o monteCarloEngine.exe
user@debian:~/FinanceBench/Monte-Carlo/OpenMP$ ./monteCarloEngine.exe
Number of Samples: 400000

Run on CPU using OpenMP
Processing time on CPU using OpenMP: 33599.273438 (ms)
Average Price (CPU computation): 8.096899

Run on CPU
Processing time on CPU: 4020.650879 (ms)
Average Price (CPU computation): 8.085914

Speedup Using OpenMP: 0.119665

user@debian:~/FinanceBench/Monte-Carlo/OpenMP$

gcc 8.2.0, amd64

I have 16 core, and 32 physical threads. AMD ThreadRipper 2950X.

When benchmark runs I noticed two things:

only 800% of CPU used (8 cores/8 threads), instead 3200%.
85% of each core time is spent in kernel, probably doing futexes or something (strace is indeed showing a lot of live spinning on futex, I guess this is gcc openmp implementation thingy).

Instead of OpenMP being 8 times faster, I am actually getting OpenMP version be about 8-9 times slower than single threaded normal code path! This is horrendous.

If I edit the #pragma omp in monteCarloKernelsCpu.c, to use 32 threads, it indeed starts to use 32 threads (why 8 is hardcoded??!?), it uses 3200% of CPU. However the time spent in kernel grows to 95% on each core!. Speedup: 0.109!!, so even worse.

Solution, do not use rand() and remove all unnecessary omp stuff:

void monteCarloGpuKernelCpuOpenMP(float* const __restrict samplePrices, float* const __restrict sampleWeights, const float* __restrict times, const float dt, const monteCarloOptionStruct* const __restrict optionStructs, const int numSamples)
{
	unsigned int seed = time(NULL);
	#pragma omp parallel
	{
		unsigned int my_id = omp_get_thread_num();
		unsigned int my_seed = seed + my_id;
		#pragma omp for schedule(static, 1000)
		for (size_t numSample = 0; numSample < numSamples; numSample++)
		{
			// Declare and initialize the path.
			float path[SEQUENCE_LENGTH];
			initializePathCpu(path);

			const int optionStructNum = 0;

			getPathCpu(path, numSample, dt, optionStructs[optionStructNum], &my_seed);
			const float price = getPriceCpu(path[SEQUENCE_LENGTH-1]);
		
			samplePrices[numSample] = price;
			sampleWeights[numSample] = DEFAULT_SEQ_WEIGHT;
		}
	}
}

In getPathCpu:

void getPathCpu(float* path, size_t sampleNum, float dt, monteCarloOptionStruct optionStruct, unsigned int* seedp)
{
        path[0] = getProcessValX0Cpu(optionStruct);

        for (size_t i=1; i<SEQUENCE_LENGTH; i++) 
	{
            	float t = i*dt; 
		float randVal = ((float)rand_r(seedp)) / ((float)RAND_MAX);
		float inverseCumRandVal = compInverseNormDistCpu(randVal); 
            	path[i] = processEvolveCpu(t, path[i-1], dt, inverseCumRandVal, optionStruct); 
        }
}

And CPU kernel, adjusted:

void monteCarloGpuKernelCpu(float* samplePrices, float* sampleWeights, float* times, float dt, monteCarloOptionStruct* optionStructs, int numSamples)
{
	unsigned int seed = time(NULL);
	for (size_t numSample = 0; numSample < numSamples; numSample++)
	{
		//declare and initialize the path
		float path[SEQUENCE_LENGTH];
		initializePathCpu(path);

		int optionStructNum = 0;

		getPathCpu(path, numSample, dt, optionStructs[optionStructNum], &seed);
		float price = getPriceCpu(path[SEQUENCE_LENGTH-1]);
	
		samplePrices[numSample] = price;
		sampleWeights[numSample] = DEFAULT_SEQ_WEIGHT;
	}
}

Result?

Processing time on CPU using OpenMP: 188.975006 (ms)
Processing time on CPU: 3519.522949 (ms)

Speedup Using OpenMP: 18.624277

Still correct (actually finally correct) computations.

So, in total my patch makes it 171 times faster than before!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monte-Carlo OpenMP benchmark is extremally slow due to use of not-thread safe rand(). Patch included to speed it up 170x! #6

Monte-Carlo OpenMP benchmark is extremally slow due to use of not-thread safe rand(). Patch included to speed it up 170x! #6

baryluk commented Dec 13, 2018 •

edited

Monte-Carlo OpenMP benchmark is extremally slow due to use of not-thread safe rand(). Patch included to speed it up 170x! #6

Monte-Carlo OpenMP benchmark is extremally slow due to use of not-thread safe rand(). Patch included to speed it up 170x! #6

Comments

baryluk commented Dec 13, 2018 • edited

baryluk commented Dec 13, 2018 •

edited