Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the use of gmxapi #9

Merged
merged 11 commits into from
May 1, 2023
Merged

Remove the use of gmxapi #9

merged 11 commits into from
May 1, 2023

Conversation

wehs7661
Copy link
Owner

@wehs7661 wehs7661 commented Apr 30, 2023

Goals

This PR aims to completely remove the use of gmxapi from the implementation of EEXE, as functions like gmxapi.commandline_operation and gmxapi.mdrun cause non-negligible memory leaks (see issue #274 of the old gmxapi repo, and the examples below) when n_iter is large, which tremendously slow down the simulation and increase the computational cost. While refactoring the code using gmxapi workflow/dataflow might solve the issue, we found it challenging and probably not feasible to use gmxapi.subgraph and gmxapi.while_loop while having non-ensemble tasks (those that require only one rank) depending on the ensemble input (e.g. things calculated from an ensemble of dhdl files or log files). Therefore, we decided to replace gmxapi.commandline_operation and gmxapi.mdrun with subprocess.run and remove the use of gmxapi in our implementation of synchronous EEXE. If it becomes clearer the advantages of using gmxapi in our specific case and the way to refactor our code, we might switch back to gmxapi. We might also adapt the code to fit SCALE-MS.

Comparison between subprocess and gmxapi

Overall, using subprocess.run has the following advantages for synchronous EEXE implementation compared with gmxapi:

  • There are no memory leaks, and the simulation is not slowed down, as evidenced by the following two examples:
    • A 500-iteration EEXE simulation with nst_sim=1250 takes 844 seconds to finish if subprocess.run is used, with the memory usage remaining constant at around 207 MB. However, if gmxapi.mdrun is used (with the grompp commands launched by subprocess.run), the memory usage increases linearly with the number of iterations, in turn making the wall time of running the :code:mdrun command increase linearly. For the same simulation, the maximum memory usage was 1562 MB, more than 7 times than the other test. This slowed down the process a little, and it takes 982 seconds to finish the simulation (16% slower).
    • As n_iter gets larger, the difference between using subprocess.run and gmxapi functionalities will be larger and the simulation will eventually become unaffordable. For example, a 20000-iteration EEXE simulation with nst_sim=1250 could take 30 hours to finish if gmxapi.commandline_operation and gmxapi.mdrun are used to launched GROMACS grompp and mdrun commands, respectively. However, using subprocess.run, the wall time can be cut down to less than 10 hours for the same simulation.
  • Using has the advantage of much fewer I/O operations as it doesn't generate tons of directories like gmxapi does. This also means that we don't need to restructure the directory anymore, which further avoids overhead in python execution.
  • The current version of gmxapi (version 0.4.0) still does not support MPI-enabled GROMACS. However, using subprocess.run, we should be able to run either MPI-enabled or threadMPI-enabled GROAMCS. Currently, we assume threadMPI-enabled, but changes will be made to allow higher flexibility in this so that there is no assumption of the type of GROMACS or CLI for launching MPI processes.

While gmxapi is more than a wrapper of GROMACS and could have much more potential use in other applications, it is not really beneficial in our current implementation in synchronous EEXE. In addition, it is almost no longer under maintenance as most research efforts will be invested in the development of SCALE-MS, which is expected to completely replace gmxapi. Therefore, we decided to completely remove the use of gmxapi from our EEXE code starting from version 0.6.0. Still, the implementation of synchronous EEXE might be refactored in the future to better work with SCALE-MS as needed.

TODOS

  • Rewrite the function run_EEXE in ensemble_EXE.py to replace gmx.mdrun with subprocess.run.
  • Rewrite the function run_grompp in ensemble_EEXE.py to run the GROMACS grompp commands in parallel.
  • Remove all functions and variables using gmxapi in run_EEXE.py and ensemble_EXE.py (or any other codes, if any).
  • Remove all tests corresponding to functions using gmxapi.
  • Adjust parameters allowed in the input YAML files. This include
    • Removing the parameter parallel, as we will assume EEXE simulations to be always performed in parallel. We might reconsider adding the parameter back if the serial analog is useful for testing.
    • Adding the compulsory parameter gmx_executable to allow specification of the executable path.
  • Fix any linting and CI errors.
  • Update the dependencies and protocols of installing ensemble_md.
  • Update the documentation, including adding instructions about how to allocate resources reasonably and specify additional runtime arguments in the input YAML file accordingly.
  • Make sure the code still can extend EEXE simulations.

Follow-up

There are some possible follow-up works that will not be aimed in this PR but in future PRs or issues. This includes:

  • Re-evaluating the use of the parameter parallel in the input YAML file.
  • Enable the use of MPI-enabled GROMACS.
  • Developing unit tests for the newly added and modified functions in ensemble_EXE.py. Ideally, we want to be able to test simulations in parallel. Some simple functional testing might be needed.

@wehs7661 wehs7661 self-assigned this Apr 30, 2023
@wehs7661 wehs7661 added the enhancement New feature or request label Apr 30, 2023
@wehs7661 wehs7661 changed the title Replaced gmx.mdrun with subprocess Remove the use of gmxapi Apr 30, 2023
@wehs7661 wehs7661 merged commit 239776a into master May 1, 2023
2 checks passed
@wehs7661 wehs7661 deleted the remove_gmxapi branch May 1, 2023 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant