New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reasonable caching of downloaded conda packages when --use-conda and --use-singularity together #1713
Comments
This is peripherally related (apologies if this is derailing the main issue!), but this would be great to have for containerizing workflows as well. I work on a shared HPC and my colleague determined that executing workflows with the There is also a sub-issue where using the default # first time running containerized workflow, no /tmp/conda exists yet
$ ls -lah /tmp/conda/
ls: cannot access /tmp/conda/: No such file or directory
# execute without singularity
$ snakemake -s test/Snakefile --cores 4 --directory test_out --use-conda
# runs fine; still no issue with /tmp/conda, remains non-existant
$ ls -lah /tmp/conda/
ls: cannot access /tmp/conda/: No such file or directory
# running with singularity results in a curious issue where mamba cannot create temp dirs (my $TMPDIR is set to /scratch/watersn/
$ rm -r test_out ; snakemake -s test/Snakefile --cores 4 --directory test_out --use-conda --use-singularity
Creating specified working directory test_out.
Building DAG of jobs...
Pulling singularity image docker://condaforge/mambaforge:4.12.0-0.
Creating conda environment ../test/envs/samtools.yaml...
Downloading and installing remote packages.
CreateCondaEnvironmentException:
Could not create conda environment from /lila/home/watersn/GitHub/vdblab-pipelines/test/rules/../envs/samtools.yaml:
Command:
SINGULARITYENV_CONDA_PKGS_DIRS=/tmp/conda/d19ae122-5ded-4832-94bb-b536827a2ea6 singularity exec --home /lila/home/watersn/GitHub/vdblab-pipelines/test_out /lila/home/watersn/GitHub/vdblab-pipelines/test_out/.snakemake/singularity/487a72edc080f28be9d6fa74dcc90b96.simg sh -c 'conda config --set channel_priority strict && mamba env create --quiet --file "/lila/home/watersn/GitHub/vdblab-pipelines/test_out/.snakemake/conda/8a4b67f2bd88f4485c9862bcd85f1291.yaml" --prefix "/lila/home/watersn/GitHub/vdblab-pipelines/test_out/.snakemake/conda/8a4b67f2bd88f4485c9862bcd85f1291"'
Output:
error libmamba Error opening for writing "/scratch/watersn/mambafpbcIjJHiVl": No such file or directory
error libmamba Error opening for writing "/scratch/watersn/mambafmRUfiscyMo": No such file or directory
error libmamba Error opening for writing "/scratch/watersn/mambaf5gu21jzX6a": No such file or directory
error libmamba Error opening for writing "/scratch/watersn/mambafswihDDfX0S": No such file or directory
error libmamba Error opening for writing "/scratch/watersn/mambafmlCUIPDfko": No such file or directory
error libmamba Error opening for writing "/scratch/watersn/mambafj3NUbn2R7T": No such file or directory
error libmamba Error opening for writing "/scratch/watersn/mambafpbcIjJHiVl": No such file or directory
error libmamba Could not open file for download /scratch/watersn/mambafpbcIjJHiVl: No such file or directory
# running with singularity forcing the conda frontend fixes the
$ rm -r test_out ; snakemake -s test/Snakefile --cores 4 --directory test_out --use-conda --use-singularity --conda-frontend conda
# runs without error, but creates /tmp/conda/ that no one else can write to
$ ls -lah /tmp/conda/
total 84K
drwxr-xr-x 4 watersn brinkvd 94 Jun 27 22:45 .
drwxrwxrwt. 866 root root 416K Jun 27 22:48 ..
drwxr-xr-x 24 watersn brinkvd 4.0K Jun 27 22:47 4b10257a-f757-4c5b-b3fe-504f44c083fc
drwxr-xr-x 3 watersn brinkvd 35 Jun 27 22:43 d19ae122-5ded-4832-94bb-b536827a2ea6 |
This might be related to #1588. I'm running snakemake 7.8.3 and singularity 3.7.1, btw. |
Is your feature request related to a problem? Please describe.
Installing conda environments can be slow, which makes troubleshooting a problematic one very frustrating. It would be nice if package downloads were cached during install, in order to speed up subsequent environment building.
Describe the solution you'd like
I would like to be able to set the necessary environmental variable with e.g.
export SINGULARITYENV_CONDA_PKGS_DIRS=$HOME/.snakemake/conda/pkgs
, which should be enough to cache downloaded packages to this location. However, the current implementation overrides this environmental variable for all singularity runs:snakemake/snakemake/deployment/conda.py
Lines 605 to 607 in 41175b3
Just amending this line in the snakemake source seems to be sufficient to get the behavior I'm requesting. However, I don't know why the existing implementation was put into place, so I'm not sure this is a sustainable solution.
Describe alternatives you've considered
A variety of other interfaces would be acceptable, e.g. a CLI flag. Alternatively,
get_singularity_envvars
could hardcode a cache path in $TMPDIR or something. The current location ("/tmp/conda/{}".format(uuid.uuid4())
) seems like the worst of all worlds (doesn't respect $TMPDIR, different every time and therefore doesn't help with caching).Additional context
n/a
The text was updated successfully, but these errors were encountered: