Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Implement MPI collective operations in KOKKOS implementation of KSPACE #4140

Open
hagertnl opened this issue Apr 17, 2024 · 2 comments · May be fixed by #4143
Open

[Feature Request] Implement MPI collective operations in KOKKOS implementation of KSPACE #4140

hagertnl opened this issue Apr 17, 2024 · 2 comments · May be fixed by #4143

Comments

@hagertnl
Copy link
Contributor

Creating this issue to track the implementation of kspace_modify collective yes in Kokkos-enabled KSPACE package.
This has already been implemented in the non-Kokkos version of KSPACE.

Detailed Description

Collective code in the regular (non-Kokkos) node here:
https://github.com/lammps/lammps/blob/develop/src/KSPACE/remap.cpp#L113-L203
which is missing from the Kokkos version: https://github.com/lammps/lammps/blob/develop/src/KOKKOS/remap_kokkos.cpp#L106

The Kokkos version needs to support both GPU-aware MPI on and off, which is already done for the point-to-point version.

Further Information, Files, and Links

@hagertnl
Copy link
Contributor Author

@stanmoore1 I started working on this and had a quick question -- do you know why the send/recv buffers for the Alltoallv are allocated during the execution stage of the remap (

auto packedSendBuffer = (FFT_SCALAR *) malloc(sizeof(FFT_SCALAR) * sendBufferSize);
) while the buffers for the MPI_Send/Irecv's are allocated during the plan phase (
plan->sendbuf = (FFT_SCALAR *) malloc(size*sizeof(FFT_SCALAR));
)?

I am considering moving the Alltoallv buffer allocation to be in the same place as the Send/Irecv's to avoid excessive memory allocations, since we do know the sizes ahead of time. Thanks!

@stanmoore1
Copy link
Contributor

@hagertnl I think that would be a good optimization--I don't see a reason to allocate memory inside the execution stage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants