Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Goals
This PR aims to completely remove the use of
gmxapi
from the implementation of EEXE, as functions likegmxapi.commandline_operation
andgmxapi.mdrun
cause non-negligible memory leaks (see issue #274 of the old gmxapi repo, and the examples below) whenn_iter
is large, which tremendously slow down the simulation and increase the computational cost. While refactoring the code using gmxapi workflow/dataflow might solve the issue, we found it challenging and probably not feasible to usegmxapi.subgraph
andgmxapi.while_loop
while having non-ensemble tasks (those that require only one rank) depending on the ensemble input (e.g. things calculated from an ensemble of dhdl files or log files). Therefore, we decided to replacegmxapi.commandline_operation
andgmxapi.mdrun
withsubprocess.run
and remove the use ofgmxapi
in our implementation of synchronous EEXE. If it becomes clearer the advantages of usinggmxapi
in our specific case and the way to refactor our code, we might switch back togmxapi
. We might also adapt the code to fit SCALE-MS.Comparison between
subprocess
andgmxapi
Overall, using
subprocess.run
has the following advantages for synchronous EEXE implementation compared withgmxapi
:nst_sim=1250
takes 844 seconds to finish ifsubprocess.run
is used, with the memory usage remaining constant at around 207 MB. However, ifgmxapi.mdrun
is used (with thegrompp
commands launched bysubprocess.run
), the memory usage increases linearly with the number of iterations, in turn making the wall time of running the :code:mdrun
command increase linearly. For the same simulation, the maximum memory usage was 1562 MB, more than 7 times than the other test. This slowed down the process a little, and it takes 982 seconds to finish the simulation (16% slower).n_iter
gets larger, the difference between usingsubprocess.run
andgmxapi
functionalities will be larger and the simulation will eventually become unaffordable. For example, a 20000-iteration EEXE simulation withnst_sim=1250
could take 30 hours to finish ifgmxapi.commandline_operation
andgmxapi.mdrun
are used to launched GROMACSgrompp
andmdrun
commands, respectively. However, usingsubprocess.run
, the wall time can be cut down to less than 10 hours for the same simulation.gmxapi
does. This also means that we don't need to restructure the directory anymore, which further avoids overhead in python execution.gmxapi
(version 0.4.0) still does not support MPI-enabled GROMACS. However, usingsubprocess.run
, we should be able to run either MPI-enabled or threadMPI-enabled GROAMCS. Currently, we assume threadMPI-enabled, but changes will be made to allow higher flexibility in this so that there is no assumption of the type of GROMACS or CLI for launching MPI processes.While
gmxapi
is more than a wrapper of GROMACS and could have much more potential use in other applications, it is not really beneficial in our current implementation in synchronous EEXE. In addition, it is almost no longer under maintenance as most research efforts will be invested in the development of SCALE-MS, which is expected to completely replace gmxapi. Therefore, we decided to completely remove the use ofgmxapi
from our EEXE code starting from version 0.6.0. Still, the implementation of synchronous EEXE might be refactored in the future to better work with SCALE-MS as needed.TODOS
run_EEXE
inensemble_EXE.py
to replacegmx.mdrun
withsubprocess.run
.run_grompp
inensemble_EEXE.py
to run the GROMACS grompp commands in parallel.run_EEXE.py
andensemble_EXE.py
(or any other codes, if any).parallel
, as we will assume EEXE simulations to be always performed in parallel. We might reconsider adding the parameter back if the serial analog is useful for testing.gmx_executable
to allow specification of the executable path.ensemble_md
.Follow-up
There are some possible follow-up works that will not be aimed in this PR but in future PRs or issues. This includes:
parallel
in the input YAML file.ensemble_EXE.py
. Ideally, we want to be able to test simulations in parallel. Some simple functional testing might be needed.