Q: preferred way to implement GPU communication #12452

angainor · 2024-04-04T09:36:47Z

I am implementing a communication pattern where GPUs exchange parts of their local data vector. The exchanged vector entries are 'unstructured' (arbitrary indices) with block size of ~8KB: for each communication index the sender GPU sends ~1k contiguous doubles.

I implemented this pattern with MPI_Type_indexed + MPI_Isend and it works. My question is, which of the following implementations is expected to be most efficient:

use MPI_Isend directly, without packing, with the newly defined indexed type (I guess OpenMPI allocates some internal buffer?)
use MPI_Pack with the newly defined type, copying data to a GPU buffer, then use MPI_Isend on the packed buffer
use MPI_Pack with the newly defined type, copying data to a CPU buffer, then use MPI_Isend on the packed buffer

Or is there any other, better way to implement such scenario?

Thank you!

The text was updated successfully, but these errors were encountered:

wenduwan added the question label Apr 8, 2024

jsquyres assigned janjust Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Q: preferred way to implement GPU communication #12452

Q: preferred way to implement GPU communication #12452

angainor commented Apr 4, 2024

Q: preferred way to implement GPU communication #12452

Q: preferred way to implement GPU communication #12452

Comments

angainor commented Apr 4, 2024