-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add better logic in finding a temp directory for the Toil coordination directory #4918
Add better logic in finding a temp directory for the Toil coordination directory #4918
Conversation
…77-try-tmp-coordination-dir
…l argument for message bus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this will work right for relative-path --tmpdir-prefix
values; do we know those aren't allowed?
Also, I think we could do better at actually respecting --tmpdir-prefix
when it is in fact a prefix by creating a temporary directory with that prefix. Maybe take the basename and use it as prefix
and the dirname and use it as dir
to mkdtemp
?
Also also, instead of manually carrying through the prefix to use in particular places where we make temp files on the leader, maybe we should be setting tempfile.tempdir
on the leader and the worker if we have a tmpdir_prefix?
src/toil/common.py
Outdated
""" | ||
# if tmpdir_prefix exists, return its parent or None depending on if it is accessible | ||
if tmpdir_prefix is not None: | ||
return tmpdir_prefix if os.path.isdir(tmpdir_prefix) else os.path.dirname(tmpdir_prefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want tmpdir_prefix if os.path.isdir(tmpdir_prefix)
. If tmpdir_prefix
is /tmp/cwltmp-
but the directory /tmp/cwltmp-
happens to exist, we still want to make temproary files and directories like /tmp/cwltmp-XXXX
and not change to making them in that directory. The CWL-style tmpdir prefix will end in /
if it is itself a directory we are meant to make stuff in.
src/toil/common.py
Outdated
# gettempdir returns the current working directory as a last resort | ||
# but we don't want that, so return None if the cwd is reached |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if TMPDIR
is /tmp
and we happen to be standing in /tmp
when running the workflow? I don't think we want to disqualify a directory just because it's the current directory. If it has to fall all the way back to the current directory as our temp directory, maybe we should in fact use it as our temp directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we want to prioritize Toil's work dir over the current working directory? If I don't omit the current directory from gettempdir
, then I think Toil's work dir will never be reached, as gettempdir() should always be returning something (unless toil is somehow installed in a nonwritable directory).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we could prioritize the workDir below actual temp directories like the ones from the environment variables and /tmp
, but above the current working directory as a fallback if /tmp
is unavailable.
But if the current working directory is itself /tmp
, we still want to use /tmp
.
src/toil/common.py
Outdated
If tmpdir_prefix is given, assuming it is accessible, it will return its parent directory | ||
if the path itself is not a directory path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might recommend creating a directory with that prefix. Then we could obey it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with that. The only downside is there is no easy way without a refactor to just tack on the workflow_id
onto the prefix path and use that as the coordination directory. The current solution tacks the workflow_id on but then lets get_local_workflow_coordination_dir
create another subdirectory with the same workflow_id name. This is since the batchsystem cleanup depends on get_toil_coordination_dir
to return the parent directory of the coordination files, even though all the coordination directory logic is there. So in the current setup, I think I have to return a full directory path. I think the only way to respect the prefix probably is to either carry arguments down the function call (as I did) or carry return values up (so get_local_workflow_coordination_dir
would understand when get_toil_coordination_dir
returns a prefix/is not a full directory path)
src/toil/bus.py
Outdated
""" | ||
Return a file path in tmp to store the message bus at. | ||
Calling function is responsible for cleaning the generated file. | ||
""" | ||
fd, path = tempfile.mkstemp() | ||
fd, path = tempfile.mkstemp(prefix=tmpdir_prefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like using an absolute path as a prefix actually does work. We might need a comment asserting this; I thought this was wrong when I looked at it, and the docs don't seem to promise that this works.
What if the --tmpdir-prefix
option is a relative path? It looks like mkstemp
interprets a non-absolute prefix as relative to its chosen temp directory, not relative to the current directory. Will something turn it into an absolute path before we get here? If so, we should cover that in the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the prefix passed into mkstemp
is a string it looks like it will try to prefix that into the filename itself, choosing a valid tmp directory option
>>> tmpdir_prefix = "test"
>>> fd, path = tempfile.mkstemp(prefix=tmpdir_prefix)
>>> path
'/tmp/testmo78tq3y'
Looks like it fails if it's a relative path and the parent directory that it points at doesn't exist
>>> fd, path = tempfile.mkstemp(prefix=tmpdir_prefix)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.12/tempfile.py", line 496, in mkstemp
return _mkstemp_inner(dir, prefix, suffix, flags, output_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/tempfile.py", line 395, in _mkstemp_inner
fd = _os.open(file, flags, 0o600)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/testfolder/testqevkrm39'
I'm not sure how cwltool handles this case but we currently don't turn it into an absolute path before here, as this carries the option directly from cwltool's argparse.parse_args.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cwltool handles relative paths by creating the tmp file relative to the current working directory. I can probably mimic that.
@stxue1 Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there must be a simpler way to accomplish:
toil-cwl-runner --tmpdir-prefix whatever
doesn't write to/tmp
.workflow.py --workDir /shared/fs
doesn't use/shared/fs
for the coordination directory.
While preserving the somewhat important security guarantee of:
3. Only secure functions like tempfile.mkstemp
and tempfile.mkdtemp
are used to create files in world-writable temp directories.
I think that, rather than trying to solve the hard problem of getting all the workers to agree on startup about what the coordination directory is, given only a prefix for which any given name including it might be taken, we should instead expand what toil-cwl-runner
already does for the workDir and, at workflow start, come up with paths obeying that prefix for the coordination directory and the Python process temp directory. (We could maybe make sure to clean them all up at CWL interpreter shutdown.)
Then we don't have to touch the message bus path generator at all (since it uses the Python process temp directory), and we don't have to let the somewhat fiddly concept of a temp directory prefix leak out of the CWL interpreter implementation to create hard problems in other parts of Toil, that then create complexity when we solve them.
If we do add tmpdir_prefix
to the config, and we don't do something to teach the tempfile
module to obey it, we are not going to be able to use the normal functions in tempfile
to make internal Toil temp files, because they won't obey the prefix. That's a thing we'd have to remember at more or less all points in the Toil codebase, and we'd have to always pass around the config or prefix to anything that might need a temp file, whether CWL-related or not, so if we can avoid it we should.
…rdination-bus' into issues/4877-better-tmp-logic-coordination-bus
src/toil/common.py
Outdated
try_path('/run/lock') or | ||
# Before trying the workdir, try the /tmp directory as tmp directories | ||
# should be local to the node and not shared like a work directoy might be | ||
try_path('/tmp/toil_coordination') or | ||
# Finally, fall back on the work dir and hope it's a legit filesystem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But now we're not actually avoiding using the workdir, for anything that isn't toil-cwl-runner
.
…rdination-bus' into issues/4877-better-tmp-logic-coordination-bus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think this will work. The workdir fallback is now I think dead code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
…n directory (#4918) * Try /tmp before the workdir * Add better logic for coordination dir tmpdir fallback; respect cwltool argument for message bus * Fix prefix/dir mixup * Carry through workflow_id to deal with prefixes in a return-directory-only function * mypy, make workflow_id argument mandatory * Use coordination_dir instead of a new config attribute * Add back tempdir to coordination_dir fallback --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Addresses #4914 and #4910
This currently lets the Toil bus tmpdir respect the
--tmpdir-prefix
from cwltool, if it exists. However, a lot of internal code depends on directory paths instead of prefixed paths, so the tmp coordination directory selection does not currently respect the full prefix path.This does add another option to Toil's config object, without an associated add_argument (at least for wdl + default toil). If that is not ideal, maybe Toil should get a command line argument to specify where the tmp directory should be?
Changelog Entry
To be copied to the draft changelog by merger:
--tmpdir-prefix
Reviewer Checklist
issues/XXXX-fix-the-thing
in the Toil repo, or from an external repo.camelCase
that want to be insnake_case
.docs/running/{cliOptions,cwl,wdl}.rst
Merger Checklist