Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modules: workflow.source_path() does not respect tmpdir #1435

Closed
BEFH opened this issue Feb 26, 2022 · 7 comments · Fixed by #1436
Closed

Modules: workflow.source_path() does not respect tmpdir #1435

BEFH opened this issue Feb 26, 2022 · 7 comments · Fixed by #1436
Labels
bug Something isn't working

Comments

@BEFH
Copy link

BEFH commented Feb 26, 2022

Snakemake version

Version 7.0.0 (not a regression; exists on 6.13.1 too)

Describe the bug

When using workflow.source_path('some/file'), Snakemake copies some/file to the /tmp directory, instead of respecting the tmpdir resource. This is problematic because /tmp is specific to particular nodes instead of shared across the cluster in many computing environments including mine.

Logs

This log is from the below example, modified to not reflect the path of the example:

Building DAG of jobs...
Job stats:
job        count    min threads    max threads
-------  -------  -------------  -------------
badtemp        1              1              1
total          1              1              1


[Sat Feb 26 14:56:21 2022]
rule badtemp:
    input: /tmp/tmp070w1xpdsnakemake-runtime-source-cache/c9cb0be31dd11615875aee7311f33afdd3fd5083fb94613e3cb3e6163077a54c
    output: moved_file
    jobid: 0
    resources: tmpdir=/path/to/workflow/temp

cp /tmp/tmp070w1xpdsnakemake-runtime-source-cache/c9cb0be31dd11615875aee7311f33afdd3fd5083fb94613e3cb3e6163077a54c moved_file
Job stats:
job        count    min threads    max threads
-------  -------  -------------  -------------
badtemp        1              1              1
total          1              1              1

This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

You can see that Snakemake is using the /tmp/ directory instead of the tmpdir.

Minimal example

Snakefile (placed in root of pipeline directory instead of workflow/):

rule badtemp:
    input: workflow.source_path('some/file')
    output: 'moved_file'
    resources:
        tmpdir = workflow.basedir + '/temp'
    shell: 'cp {input} {output}

Running:

mkdir some
echo test > some/file
snakemake -np moved_file

Additional context

This functionality is necessary for modules that require resources unless the user copies the resources into the parent workflow directory. The fact that file extensions are not preserved is also potentially problematic.

I would be happy if somebody could provide a workaround, especially one that will work with module prefixes, before this is addressed in the code. I need this functionality in order to modularize workflows that have resources and workflow.source_path() doesn't work when submitting to my cluster.

@BEFH BEFH added the bug Something isn't working label Feb 26, 2022
@johanneskoester
Copy link
Contributor

This is not really a bug, but certainly highlighting an important corner case:

  1. when the default temp folder in the system does not remain the same across cluster nodes, snakemake would run into problems with the current code.
  2. when the default temp folder (either /tmp or whatever is pointed to by the env var $TMPDIR) is not writable, this will fail.

Of course a not writable temp folder is a misconfiguation, but at least one should get a proper error in such a case. What remains is case 1, which is still not optimal.

@BEFH
Copy link
Author

BEFH commented Feb 26, 2022

Earlier, we discussed storing the files in a sub-directory of .snakemake, just like conda envs and singularity containers, and I think that's optimal.

BTW, it does not seem to be respecting the $TMPDIR environment variable either. I tried setting it both inside and outside of the snakefile and no dice.

set TMPDIR $PWD/temp in fish, and the following in the snakefile (also confirmed that the variable was set):

temp_dir = workflow.basedir + '/../temp'
os.environ['TMPDIR'] = temp_dir
if not os.path.exists(temp_dir):
    os.makedirs(temp_dir)

@johanneskoester
Copy link
Contributor

Snakemake just uses pythons tempfile module, the behavior is documented here: https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir

If that does not work with setting TMPDIR in your case, something must interfere there.

@johanneskoester
Copy link
Contributor

However, solved properly in PR #1436.

@johanneskoester
Copy link
Contributor

Just a quick thing: using .snakemake is (although I initially thought it) not the right solution because it can have problems in case the user sets workdir: in the workflow, in which case the workdir changes after source files have been read. Hence the different implementation in the PR, which is anyway more consistent with the non-runtime source cache mechanism at the same place (for git files etc.).

@BEFH
Copy link
Author

BEFH commented Feb 26, 2022

Yeah, this seems like a lot more of a simple fix than I expected. Clearly the underlying python module was well designed!

As for the TMPDIR env var problem, it's clearly on my end:
python -c 'import tempfile; tempfile.gettempdir()' returns nothing despite echo $TMPDIR returning the valid path.

@BEFH
Copy link
Author

BEFH commented Feb 26, 2022

Actually, my test command was broken because it needed print(). python -c 'import tempfile; print(tempfile.gettempdir())' still returns the wrong value though. No idea why.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants