You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
gmxapi simulator ranks with no work to do use a dummy Session, and have slightly different behavior in the Context. Specifically, since no Simulator is created, no Potential pluggable restraints are created, and the potentials attribute of the Context is empty or non-existent.
However, run_brer assumes that the Context.potentials list exists for any simulation that runs successfully, regardless of MPI rank.
This means that if a run_brer script is run with more MPI ranks than are usable for the work, the script will fail after checking for potentials on ranks with no work.
In order to support a wider range of gmxapi versions, we should find a solution within the run_brer package.
There may be other bugs stemming from assumptions about whether work is performed on a given rank. We have not been testing run_brer under an MPI context, to my knowledge. I don't know how rigorous we want to be at this point (at least, pending progress on kassonlab/run_brer#18), but we should decide generally what we want to happen when someone launches a run_brer script with mpiexec, and then make sure the package behaves appropriately when the user (a) uses mpiexec in a valid way, if any, or (b) uses mpiexec to allocate more ranks than can be mapped to simulation work.
I think that there may not currently be any internal support for mpi4py-based ensembles, so it may be appropriate to just document this. For GROMACS 2023 and gmxapi >=0.4, though, we expect single simulations to be able to use a multi-rank MPI communicator, so at least in these circumstances, we need to behave appropriately when we are running simultaneously on multiple ranks. (This is especially important with respect to file writing.)
At least in some cases, it may be appropriate to rely on version-specific details from gmxapi instead of querying the environment through mpi4py so that we aren't trying to intuit gmxapi behavior.
gmxapi simulator ranks with no work to do use a dummy Session, and have slightly different behavior in the Context. Specifically, since no Simulator is created, no Potential pluggable restraints are created, and the
potentials
attribute of the Context is empty or non-existent.However, run_brer assumes that the Context.potentials list exists for any simulation that runs successfully, regardless of MPI rank.
This means that if a run_brer script is run with more MPI ranks than are usable for the work, the script will fail after checking for
potentials
on ranks with no work.In order to support a wider range of gmxapi versions, we should find a solution within the run_brer package.
There may be other bugs stemming from assumptions about whether work is performed on a given rank. We have not been testing run_brer under an MPI context, to my knowledge. I don't know how rigorous we want to be at this point (at least, pending progress on kassonlab/run_brer#18), but we should decide generally what we want to happen when someone launches a run_brer script with mpiexec, and then make sure the package behaves appropriately when the user (a) uses mpiexec in a valid way, if any, or (b) uses mpiexec to allocate more ranks than can be mapped to simulation work.
I think that there may not currently be any internal support for mpi4py-based ensembles, so it may be appropriate to just document this. For GROMACS 2023 and gmxapi >=0.4, though, we expect single simulations to be able to use a multi-rank MPI communicator, so at least in these circumstances, we need to behave appropriately when we are running simultaneously on multiple ranks. (This is especially important with respect to file writing.)
At least in some cases, it may be appropriate to rely on version-specific details from gmxapi instead of querying the environment through mpi4py so that we aren't trying to intuit gmxapi behavior.
Relates to kassonlab/run_brer#18
The text was updated successfully, but these errors were encountered: