Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bench/bowen/desul raja atomics #1624

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

johnbowen42
Copy link
Contributor

Comparison of RAJA's implementation of atomic operations with desul

@johnbowen42 johnbowen42 force-pushed the bench/bowen/desul-raja-atomics branch from dad5812 to 7d0da79 Compare April 15, 2024 21:56
@johnbowen42 johnbowen42 force-pushed the bench/bowen/desul-raja-atomics branch from 7e171df to c4a9582 Compare April 23, 2024 01:57

template<int BLOCK_SZ>
struct ExecPolicyGPU {
using policy = RAJA::cuda_exec<BLOCK_SZ>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a synchronous policy.

void TimeAtomicOp(int num_iterations = 2, int array_size = 100) {
RAJA::Timer timer;

for (int i = 0; i < num_iterations; ++i) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to not time the first run of the kernel. Then time multiple iterations of the loop, running asynchronously for gpus, instead of timing each loop individually.

@rhornung67 rhornung67 marked this pull request as ready for review April 30, 2024 19:20

template<int BLOCK_SZ>
struct ExecPolicyGPU {
using policy = RAJA::cuda_exec<BLOCK_SZ>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
using policy = RAJA::cuda_exec<BLOCK_SZ>;
using policy = RAJA::cuda_exec<BLOCK_SZ, true /*asynchronous*/>;

By default, RAJA exec policies are synchronous. To make them asynchronous, you add a template parameter with a bool true value.

#include "desul/atomics.hpp"
#include "RAJA/util/Timer.hpp"

#define N 1000000000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think N should be a command line arg to select at run time.

// GPU benchmarks
std::cout << "Executing CUDA benchmarks" << std::endl;
std::cout << INDENT << "Executing atomic add benchmarks" << std::endl;
TimeAtomicOp<ExecPolicyGPU<64>::policy, int, AtomicAdd<int, typename GPUAtomic::policy>, true>(4);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be cleaner and easier to work with to set a constexpr thread block size variable to a default value, such as 256, at the top of the file. I think we agreed in a recent group meeting that thread block size doesn't have a significant performance impact for the simple kernels in this file. @MrBurmark what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me, then we could vary it fairly easily if we wanted to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we could follow the pattern in RAJA Perf if we wanted to try block size sweeps.

Copy link
Member

@MrBurmark MrBurmark May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a for_each_type function into RAJA for just these kind of use cases https://github.com/LLNL/RAJA/blob/develop/include/RAJA/util/for_each.hpp#L88

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants