Need updated logic for better behavior under `mpiexec` #5

eirrgang · 2022-10-10T12:17:52Z

gmxapi simulator ranks with no work to do use a dummy Session, and have slightly different behavior in the Context. Specifically, since no Simulator is created, no Potential pluggable restraints are created, and the potentials attribute of the Context is empty or non-existent.

However, run_brer assumes that the Context.potentials list exists for any simulation that runs successfully, regardless of MPI rank.

This means that if a run_brer script is run with more MPI ranks than are usable for the work, the script will fail after checking for potentials on ranks with no work.

In order to support a wider range of gmxapi versions, we should find a solution within the run_brer package.

There may be other bugs stemming from assumptions about whether work is performed on a given rank. We have not been testing run_brer under an MPI context, to my knowledge. I don't know how rigorous we want to be at this point (at least, pending progress on kassonlab/run_brer#18), but we should decide generally what we want to happen when someone launches a run_brer script with mpiexec, and then make sure the package behaves appropriately when the user (a) uses mpiexec in a valid way, if any, or (b) uses mpiexec to allocate more ranks than can be mapped to simulation work.

I think that there may not currently be any internal support for mpi4py-based ensembles, so it may be appropriate to just document this. For GROMACS 2023 and gmxapi >=0.4, though, we expect single simulations to be able to use a multi-rank MPI communicator, so at least in these circumstances, we need to behave appropriately when we are running simultaneously on multiple ranks. (This is especially important with respect to file writing.)

At least in some cases, it may be appropriate to rely on version-specific details from gmxapi instead of querying the environment through mpi4py so that we aren't trying to intuit gmxapi behavior.

Relates to kassonlab/run_brer#18

The text was updated successfully, but these errors were encountered:

eirrgang self-assigned this Oct 10, 2022

eirrgang transferred this issue from kassonlab/run_brer Oct 17, 2022

eirrgang mentioned this issue Oct 17, 2022

Ensemble workflow management and MPI compatibility #7

Open

This was referenced Apr 24, 2023

Support MPI-GROMACS #52

Merged

Test mult-rank simulator. #54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need updated logic for better behavior under `mpiexec` #5

Need updated logic for better behavior under `mpiexec` #5

eirrgang commented Oct 10, 2022

Need updated logic for better behavior under mpiexec #5

Need updated logic for better behavior under mpiexec #5

Comments

eirrgang commented Oct 10, 2022

Need updated logic for better behavior under `mpiexec` #5

Need updated logic for better behavior under `mpiexec` #5