Skip to content

Compare ways to access elements of a vector atomically.

License

Notifications You must be signed in to change notification settings

puzzlef/vector-atomic-access

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Compare ways to access elements of a vector atomically.

In this experiment we try to access a shared vector on mutiple threads (48 threads, using OpenMP). Our objective is to obtain the number size (in bits), beyond which memory reads and writes are not atomic (by default). Each thread generates a random number with zero parity, and writes to a random index in the vector. It then reads a number stored at another random index, and computes its parity. When a read and write occurrs to the same memory location at the same time (randomly), and the number is read before a full write is complete (or vice versa), it can be detected with 50% chance when we observe a parity of the number to be a 1 instead of a 0. However when reads and writes are atomic, a parity of 1 can never be observed. We sum the total parity of obtained on all threads.

We attempt both the default memory access, and the atomic memory access (using std::atomic). We adjust the size of the vector from 1 to 32. With each size, we initialize the vector with zeros (with parity 0) and perform a total of one million writes and reads on each thread (we repeat this 5 times to obtain an average time). Memory accesses are performed using default and atomic approach with the bit-width of number varying from 32 to 16384 bits. Numbers larger than 64 bits are represented using std::array of uint64_t.

From the results we observe a zero parity-sum for numbers of sizes 32 and 64 bits (on both default and atomic accesses). This indicates that the x86_64 architecture performs atomic reads and writes of upto 64-bit numbers (its word size) by default. For numbers larger that 64 bits, we observe a non-zero parity-sum for the default access approach, indicating that reads and writes are not atomic. Also note that as the size of the vector is increased from 1 to 32, the number of read-write conflicts decrease (as indicated by a decrease in parity-sum) as we would expect it to given the reducing chances two threads accessesing the same memory location at the same time. However, we observe no increase/decrease in the number of conflicts with increasing number size (in bits). In terms of time taken for 1M accesses, we also observe that increasing the size of the vector reduces the amount of time taken to complete. Increasing the bit-width of each number also increases the time take, but this is expected to the larger memory access requirement with larger numbers.

All outputs are saved in a gist and a small part of the output is listed here. Some charts are also included below, generated from sheets. This experiment was done with guidance from Prof. Kishore Kothapalli, Prof. Dip Sankar Banerjee, and Prof. Sathya Peri.


$ g++ -std=c++17 -O3 -latomic -fopenmp main.cxx
$ ./a.out

# OMP_NUM_THREADS=48
# [02686.566 ms; 0.0 par_sum] access32Default {size=1}
# [02742.435 ms; 0.0 par_sum] access32Atomic  {size=1}
# [02662.369 ms; 0.0 par_sum] access32Default {size=2}
# [02753.327 ms; 0.0 par_sum] access32Atomic  {size=2}
# [02675.026 ms; 0.0 par_sum] access32Default {size=4}
# [02797.603 ms; 0.0 par_sum] access32Atomic  {size=4}
# ...
#
# ...
# [1164004.250 ms; 0.0 par_sum] access16384Atomic  {size=4}
# [27715.729 ms; 2362642.0 par_sum] access16384Default {size=8}
# [1163569.125 ms; 0.0 par_sum] access16384Atomic  {size=8}
# [27597.848 ms; 1384799.0 par_sum] access16384Default {size=16}
# [1156279.500 ms; 0.0 par_sum] access16384Atomic  {size=16}
# [27711.156 ms; 576781.0 par_sum] access16384Default {size=32}
# [1156447.125 ms; 0.0 par_sum] access16384Atomic  {size=32}



References




ORG DOI