Skip to content

Commit

Permalink
gmxapi-79 Safe management of session working directories
Browse files Browse the repository at this point in the history
Document proposed user interface for filesystem artifacts.
  • Loading branch information
eirrgang committed May 30, 2018
1 parent b02b6e5 commit b7935a4
Show file tree
Hide file tree
Showing 3 changed files with 86 additions and 12 deletions.
38 changes: 28 additions & 10 deletions src/gmx/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -286,10 +286,20 @@ class ParallelArrayContext(object):
... # rank = session.rank
... # The local context object knows where it fits in the global array.
... rank = context.rank
... output_path = os.path.join(context.workdir_list[rank], 'traj.trr')
... assert(os.path.exists(output_path))
... print('Worker {} produced {}'.format(rank, output_path))
... output = work[0]['traj.trr']
...
>>> output_path = str(output.extract())
>>> assert(os.path.exists(output_path))
When the session is created to run the workflow, a uniquely named directory is created in the filesystem to be the
session's working directory. This directory name is available in the attribute `context.path`. Each
operation on each rank has its own subdirectory. In the example above, the directory for MD artifacts for each of
the two ranks used can be accessed through `work.path[0]` and `work.path[1]`. Artifacts in each path can be accessed
as dictionary keys. E.g. `work.path[0]['traj.trr']`.
Note that these attributes are proxy objects that may not exist at the time they are referenced with this syntax.
To force the artifacts to be made available locally, use the `extract` method. The string representation of the
returned object is a valid local absolute filename.
Implementation notes:
Expand Down Expand Up @@ -479,7 +489,17 @@ def add_operation(self, namespace, operation, get_builder):
def __load_tpr(self, element):
"""Implement the gromacs.load_tpr operation.
Updates the minimum width of the workflow parallelism. Does not add any API object to the graph.
File paths are taken to be relative to the session directory. Helper functions implemented for the Context
should make sure to copy files into place or to ensure that the files are expected outputs of other operations.
If the element has other elements listed in `depends` then the working directories of those elements are used
to replace occurrences of the element names in the tpr filename arguments, using a forward slash (`/`) to separate
the part of the string naming an element and the part of the string naming a relative file path.
Absolute filenames are not allowed, as they imply relation to an element named with a null string, which we
would not want to respect even if it existed.
Updates the minimum width of the workflow parallelism. This operation is fused with the MD operation and does
not add any API object to the graph.
"""
class Builder(object):
def __init__(self, tpr_list):
Expand Down Expand Up @@ -578,17 +598,15 @@ def done():
def __enter__(self):
"""Implement Python context manager protocol, producing a Session for the specified work in this Context.
A session directory is created (if not yet present) with a unique key for the work specification. This prevents
different work specifications from getting mixed in the same output directory. Each element in the work has its
own subdirectory or subdirectories (one per worker) to hold artifacts and checkpoint information. The
Returns:
Session object the can be run and/or inspected.
Additional API operations are possible while the Session is active. When used as a Python context manager,
the Context will close the Session at the end of the `with` block by calling `__exit__`.
Note: this is probably where we will have to process the work specification to determine whether we
have appropriate resources (such as sufficiently wide parallelism). Until we have a better Session
abstraction, this means the clean approach should take two passes to first build a DAG and then
instantiate objects to perform the work. In the first implementation, we kind of muddle things into
a single pass.
"""
import numpy
try:
Expand Down
50 changes: 50 additions & 0 deletions src/gmx/test/test_context.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,53 @@ def test_setting(self):
mdargs.set(param)
context = gmx.core.Context()
context.setMDArgs(mdargs)

class PathManagementTestCase(unittest.TestCase):
"""Test proper directory management for load_tpr and Session startup.
- [ ] Session should use working directory keyed by WorkSpec unique identifier.
- [ ] Existing directory should not be corrupted.
- [ ] Existing directory should be checked for state.
- [ ] File inputs should be made accessible to the Session.
- [ ] Filesystem artifacts from an element should be accessible by another element.
- [ ] Filesystem artifacts should be made accessible to the client.
"""
# Use the harness features to set up a reusable temporary directory
def setUp(self):
return

def tearDown(self):
return

def test_directory_creation(self):
"""Check that the session launched but not run in setUp() got created."""
return

def test_directory_safety(self):
"""Check that the Session logic refuses to overwrite existing data."""
return

class PathManagementTestCase(unittest.TestCase):
"""Test proper directory management for load_tpr and Session startup.
- [ ] Session should use working directory keyed by WorkSpec unique identifier.
- [ ] Existing directory should not be corrupted.
- [ ] Existing directory should be checked for state.
- [ ] File inputs should be made accessible to the Session.
- [ ] Filesystem artifacts from an element should be accessible by another element.
- [ ] Filesystem artifacts should be made accessible to the client.
"""
# Use the harness features to set up a reusable temporary directory
def setUp(self):
return

def tearDown(self):
return

def test_directory_creation(self):
"""Check that the session launched but not run in setUp() got created."""
return

def test_directory_safety(self):
"""Check that the Session logic refuses to overwrite existing data."""
return
10 changes: 8 additions & 2 deletions src/gmx/workflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -555,6 +555,13 @@ def get_source_elements(workspec):
def from_tpr(input=None, **kwargs):
"""Create a WorkSpec from a (list of) tpr file(s).
Absolute filenames are interpreted in reference to the local filesystems where the script is run, but the path is
removed from the recorded work specification and the file is made available in the Session working directory.
Relative path names (or filenames without paths) are assumed to refer to files that already exist relative to the
Session working directory. They are either outputs from other elements or must be put in place between session
launch and session run. (See gmx.context)
Required Args:
input: string or list of strings giving the filename(s) of simulation input
Expand All @@ -573,7 +580,7 @@ def from_tpr(input=None, **kwargs):
Produces a WorkSpec with the following data.
version: "gmxapi_workspec_1_0"
version: "gmxapi_workspec_0_1"
elements:
tpr_input:
namespace: "gromacs"
Expand Down Expand Up @@ -604,7 +611,6 @@ def from_tpr(input=None, **kwargs):
arg_path = os.path.abspath(arg)
raise exceptions.UsageError(usage + " Got {}".format(arg_path))

# \todo These are runner parameters, not MD parameters, and should be in the call to gmx.run() instead of here.
params = {}
for arg_key in kwargs:
if arg_key == 'grid' or arg_key == 'dd':
Expand Down

0 comments on commit b7935a4

Please sign in to comment.