Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for named conda environments #494

Closed
wants to merge 3 commits into from

Conversation

nh13
Copy link
Contributor

@nh13 nh13 commented Jul 12, 2020

Use the conda_env directive

Implements: #352

@sonarcloud
Copy link

sonarcloud bot commented Jul 12, 2020

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities (and Security Hotspot 0 Security Hotspots to review)
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

class CondaEnv(RuleKeywordState):
@property
def keyword(self):
return "conda_env"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be condaenv but using snakecase to make it clearer

CondaEnvTuple = namedtuple(
"CondaEnvTuple",
"is_named env"
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be removed, and instead of using is_named, we just using isinstance(conda_env, str) in its stead.

@johanneskoester
Copy link
Contributor

Thank you! I'll try to review this ASAP. Just give me some additional time as there is a deadline tomorrow and I need to release some urgent fixes in Varlociraptor.

Copy link
Contributor

@johanneskoester johanneskoester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This is a very good start. In order to make the code even clearer, I think it might be better to implement an OO hierarchy for the specification of conda envs, i.e.:

class CondaEnvSpec(ABC):
    @abstractmethod
    def expand_wildcards(self, wildcards):
        ...

    @abstractmethod
    def get_conda_env(self):
        ...

class CondaEnvFile(CondaEnvSpec):
    ...

class CondaEnvName(CondaEnvSpec):
    ...

Does that make sense?

@johanneskoester
Copy link
Contributor

Further, may I ask you to format the PR with black?

@nh13
Copy link
Contributor Author

nh13 commented Aug 12, 2020

@johanneskoester I'll try to disentangle the reliance on an environment file existing, as that's what I was trying to make work. Likely I'll get to this in early September when I have some time off for side-projects :)

@conchoecia
Copy link
Contributor

I take it that this still isn't resolved? Just was trying to figure out calling up a conda env for a single rule today!

@nh13
Copy link
Contributor Author

nh13 commented Dec 2, 2020

@conchoecia I haven't had time to get around to this, and have stuck with boiler plate code for now.

I like to store all my conda environments in one place using conda-env-builder (which I wrote). It's super handy if I have a single git repo where I'm building multiple docker images (each a snakemake pipeline), each with one or more conda environments.

In the snakefile I use the following shell boilerplate:

rule:    
    params:
        conda_env = "some_prebuilt_and_named_env"
    shell:
        """
        set +eu
        && PS1=dummy
        && . $(conda info --base)/etc/profile.d/conda.sh
        && conda activate {params.conda_env};
        set -eu 
        ...
        """

@nh13 nh13 force-pushed the feature/named-conda-environments branch from a701c2e to 31b8b2e Compare December 18, 2020 08:25
@nh13
Copy link
Contributor Author

nh13 commented Dec 18, 2020

@johanneskoester I really tried to reason about how conda_env was really used in snakemake. So much of the code is predicated on having a valid and present environments YAML file, I had a lot of trouble finding every place it is used (jobs, persistence, cleanup), and coming up with a good abstraction. So many objects (workflows, rules, persistence) keep copies of things, and it's really hard to tell exactly what (path, Env class, string) without types. Sorry for the rant, I do think that it could really benefit from some refactoring like you suggested, but that's beyond the time I have right now.

I think the latest set of changes are good compromise for a minimal set of changes to get it working. Thank-you for your consideration!

@sonarcloud
Copy link

sonarcloud bot commented Dec 18, 2020

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@nh13
Copy link
Contributor Author

nh13 commented Dec 18, 2020

@johanneskoester tests pass, this is ready to go!

@mw55309
Copy link
Contributor

mw55309 commented Jul 24, 2021

Just adding a +1 for this feature - is there a timeline for implementation?

@LiterallyUniqueLogin
Copy link

I'm also interested in this. I also don't need snakemake to manage my conda environments for me. Given that the code has already been helpfully written, can we get traction on this?

@colinbrislawn
Copy link
Contributor

Any updates?

I'm looking forward to using Snakemake with complex (but standardized!) environments like this one:
https://data.qiime2.org/distro/core/qiime2-2021.11-py38-linux-conda.yml

@snystrom
Copy link

Chiming in to say this would be very useful in a couple of my more esoteric workflows with external non-conda deps.

@Kdreval
Copy link

Kdreval commented Jan 16, 2022

This would be useful enhancement of snakemake functionality! Any updates on the timeline for this feature?

Copy link
Contributor

@johanneskoester johanneskoester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for pointing me again to this. Sorry for the delay. This slipped my attention, because, to be honest, I personally rather favor to have the yaml in the workflow definition, because I think this is much more transparent and adaptable (at least for the average non-expert user). Nevertheless, Snakemake best practices are not a religion and if we can make a change that fits your habits without totally rewriting the way Snakemake works, I am fine with giving it a try.

Nevertheless, I don't want to merge this in the current form, because of the hack using something considered to be a file path as a name instead. I really think the OO solution is worth giving a try. But I know, this is quite some refactoring work, and I can understand that you probably also don't have the time for this. Given the fact that I forgot this PR for such a long time and you patiently waited for it, I will have a look into this today. Maybe I can come up with a solution quickly.

@nh13
Copy link
Contributor Author

nh13 commented Jan 18, 2022

Thank you @johanneskoester for giving it a shot. I commend you for making reproducibility easy and the first choice when running Snakemake, but also understand the pragmatic choices that other people (including myself) make that require alternatives. Like you said, we shouldn’t become dogmatic about the right way to ensure reproducibility. In some cases, I want Snakemake just to be the excellent workflow engine that it is, and not force me down a specific path to build my environment. I’d argue that it’s great both are included in Snakemake, but should be allowed to be decoupled if wanted.

I did try to figure out how to implement the OO approach, but like I said, there’s a lot of coupling in the code and as a newcomer to the code it was quite daunting. I’d love to learn and contribute more (along with our team) as we really love Snakemake and use it in our day to day across a large set of clients.

Seeing how you’d approach this feature could help us understand the code, and later we can add more functionality and/or code docs to help our future selves. Thank you for your consideration!

@colinbrislawn
Copy link
Contributor

Thanks for returning to this PR, @johanneskoester

I'm glad we waited to revisit this issue. Much has changed, so my use case for named environments is different.

My user story: most rules I write call a single program (bwa) or some unix tools (grep + sed). I find it easy to

  1. write env.yaml files and --use-conda or
  2. call snakemake from within whatever conda environment I am building

But sometimes I use Qiime2. Each version has +400 dependencies, and is installed with conda into an environment with a fixed name. Because the conda environment for each release of is consistent, calling conda activate qiime2-2021.11 should work for anyone already using Qiime2.

Before mamba, installing more copies of qiime2 was very slow. But times have changed! I'll try putting this into an snakemake conda env and see how it goes!

@johanneskoester
Copy link
Contributor

So, I think I have successfully finalized an object oriented flavor of this feature in PR #1340. Hence, closing in favor of that PR. Thanks for your help on this, for your patience, and for sharing your use cases! And sorry for being slow with the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants