Draft: Desul ordered atomic policies + litmus tests #1616

publixsubfan · 2024-03-19T23:04:06Z

Summary

When the Desul atomic backend is enabled, adds atomic policies in the form of RAJA::atomic_{mem_policy}_{scope}, where:
- mem_policy is one of relaxed, acquire, release, acq_rel, or seq_cst
- scope is either empty (device-scope), system for a system-wide atomic, or block for a block-wide atomic
Adds 2-thread litmus tests to test the ability of acquire-release and SC atomics to restore sequential memory behavior on relaxed memory platforms

Motivation

On architectures which adopt relaxed memory models (ARM, PowerPC, most GPU architectures), the order in which memory modifications on one thread are observed on another thread may differ from the "program order" of the memory operations. This may lead to unexpected results if, for example, an atomic variable is used as a mutex; writes done in the critical section may not be visible to another thread due to the memory subsystem reordering the writes.

x86 implements a much stronger, but not entirely sequentially-consistent memory model (x86-TSO). The only observable reordering between threads is Store->Load reordering, where an earlier (in "program-order") store can be reordered after the load. The "Store Buffer" litmus test demonstrates this behavior, where without fencing, it can appear as if the store instruction in both threads happen after their corresponding load instructions.

Desul supports specifying a memory order policy, which can restore a consistent view of memory operations between threads with a stronger memory ordering pair between the two threads.

Litmus testing

The added GPU litmus tests are based off of the work here: https://gpuharbor.ucsc.edu/webgpu-mem-testing/ and in a paper "Foundations of Empirical Memory Consistency Testing", Kirkham et al. (OOPSLA 2020).

Litmus testing allows us to probe the existence of relaxed memory behavior on GPU platforms. We implement a family of 2-thread tests, where each thread attempts to write data to or read data from a thread residing on a different block.

More references

"A Tutorial Introduction to the ARM and POWER Relaxed Memory Models"

Fiddling around with some parameters for the litmus test driver: - It seems that having only a subset of the running blocks participate in the Message Passing litmus test increases the rate at which weak memory behaviors are observed. - Pre-stressing memory doesn't seem to help on NVIDIA V100s.

Store buffering is an observable behavior where a store may be reordered after a load. This exercises MemoryOrderSeqCst.

- Use a forall device kernel to check results - Interleave order of operations between testing threads - Only warn on a lack of observed relaxed behaviors

Correctly use the stress testing formulation from the paper, "Foundations of Empirical Memory Consistency Testing" (OOPSLA 2020). Instead of having all stressing blocks scatter their accesses across the "stressing" array, select a small-ish subset of 64-word lines and stripe them across the stressing blocks. This increases the stress on the contention hardware in a GPU. Synchronize testing blocks and stressing blocks together on each iteration.

trws · 2024-03-20T01:45:01Z

One comment based on yours @publixsubfan, all previously existing raja atomics were relaxed. It's the stronger ones, and really the scopes, that are most interesting because they can mean we can pass data without having to do all loads and stores with atomics. Block scope atomics, with device scope fences only when necessary, also are likely to help us greatly on El Cap. The atomics themselves wont be faster, but there will be substantially less expensive cache invalidation.

The code looks good to me with a cursory look over it. I'm not sure what we want to do with respect to these interfaces longer term, but this looks good to me as a place to explore.

MrBurmark · 2024-03-21T20:48:26Z

Can you add something about how you get back non-atomic memory operations using this interface?

MrBurmark · 2024-03-21T21:16:13Z

include/RAJA/policy/desul/atomic.hpp

+RAJA_HOST_DEVICE RAJA_INLINE T atomicAdd(AtomicPolicy, T volatile *acc, T value)
+{
+  using desul_order =
+      typename detail::DesulAtomicPolicy<AtomicPolicy>::memory_order;


Does this mean that AtomicPolicy has to be detail_atomic_t<...> instead of DesulAtomicPolicy<...>

trws · 2024-03-22T01:30:53Z

Can you add something about how you get back non-atomic memory operations using this interface?

When designing desul, we realized that the sequential option behaved more like a scope than a memory order. The scope is MemoryScopeCaller in the desul interface here. The main reason for this is that we only need one implementation for essentially all backends to support it, and it makes no sense to give a different scope when there's no coherence. Of course, there's no point in giving a memory order either, but if the scope is the caller they at least all make sense.

publixsubfan added 14 commits March 18, 2024 13:17

Add ordered memory atomics from desul

89e38c8

Add a test driver for litmus tests, and a message passing test

b71c42b

Remove stress_pattern array

4be5c8c

Add a store buffer litmus test

e0dea11

Store buffering is an observable behavior where a store may be reordered after a load. This exercises MemoryOrderSeqCst.

Do strided index calculations within policy

35a7a11

Litmus tests: various modifications

d32bf98

- Use a forall device kernel to check results - Interleave order of operations between testing threads - Only warn on a lack of observed relaxed behaviors

RENAME

2784605

Post-rename changes

1fea7cf

Add a load buffer litmus test

94d45b5

Add a store litmus test

a90f93a

Add a read litmus test

a86a176

Add a 2+2 write litmus test

3f4dc30

MrBurmark reviewed Mar 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: Desul ordered atomic policies + litmus tests #1616

Draft: Desul ordered atomic policies + litmus tests #1616

publixsubfan commented Mar 19, 2024

trws commented Mar 20, 2024

MrBurmark commented Mar 21, 2024

MrBurmark Mar 21, 2024

trws commented Mar 22, 2024

Draft: Desul ordered atomic policies + litmus tests #1616

Are you sure you want to change the base?

Draft: Desul ordered atomic policies + litmus tests #1616

Conversation

publixsubfan commented Mar 19, 2024

Summary

Motivation

Litmus testing

More references

trws commented Mar 20, 2024

MrBurmark commented Mar 21, 2024

MrBurmark Mar 21, 2024

Choose a reason for hiding this comment

trws commented Mar 22, 2024