Add example of using a MLflow model in a nextflow pipeline #306

edmundmiller · 2021-03-10T16:00:14Z

Nothing fancy, and not exactly in the scope of this project, but it might be helpful to the community.

https://twitter.com/LukasHeumos/status/1369573166130081793?s=20

Or maybe we should just make a nf-core module for MLflow models, @KevinMenden

Zethson · 2021-03-10T16:04:14Z

https://github.com/mlf-core/nextflow-lcep But yeah, a module would be cool. The 2 projects complement each other. Wednesday, 10 March 2021, 05:00PM +01:00 from Edmund Miller notifications@github.com :

…

Nothing fancy, and not exactly in the scope of this project, but it might be helpful to the community. https://twitter.com/LukasHeumos/status/1369573166130081793?s=20 Or maybe we should just make a nf-core module for MLflow models, @KevinMenden — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub , or unsubscribe .

edmundmiller · 2021-03-10T18:32:00Z

Oh cool I hadn't looked into all the repos deeply enough! I'm thinking a module for running the Python packages produced by mlf-core and maybe one for system-intelligence

Zethson · 2021-03-10T19:49:06Z

@emiller88 yeah. There is a lot of stuff that we can do to improve the bridging of mlf-core and nf-core. I might open a more detailed issue for that in a couple of weeks, but it has no priority atm.

Step 1: mlf-core/system-intelligence#147

And this may not be possible...

KevinMenden · 2021-03-11T07:17:01Z

To be honest I'm not sure if I am entirely convinced that a mlf-core module would make sense for nf-core.
Just can't imagine it being a part of a pipeline ... but I'm happy to be convinced otherwise :)

What would the other steps of that pipeline be? What's the advantage of wrapping that with nextflow?

Zethson · 2021-03-11T09:22:11Z

@KevinMenden Not necessarily mlf-core itself, but rather for tools that perform predictions. What I could think of (going wild here) is a single package, which takes in mlf-core trained pytorch/tensorflow/xgboost models (as a parameter) and outputs predictions as files. This package could be wrapped as a module.
Using GPUs with NF requires new labels (GPU labels). Additionally the docker run and singularity exec etc commands need to be amended. Possibly the module could take care of that?

KevinMenden · 2021-03-11T11:53:25Z

Yeah but that would imply that everything done with mlf-core somehow has similar inputs/outputs etc., also the file type will be different. And basically a model trained with mlf-core in the end is just a pytoch/tensorflow model that can be loaded and used.

And for mlf-core, in my opinion, it is very important not to add too many constraints/guidelines on how to code (looking at you, linter). It needs to be so flexible that most ML projects can use it. I had some issue when doing the syncing, need to write that down and think about it a bit though 😁

Anyway if you keep it flexible enough (which you should) then I'm not sure whether a module to encapture all the models is doable/ makes sense. Just my two cents 🙂

Zethson · 2021-03-11T14:24:18Z

Anyway if you keep it flexible enough (which you should) then I'm not sure whether a module to encapture all the models is doable/ makes sense. Just my two cents slightly_smiling_face

Yeah I would need to think about it more as well.

And for mlf-core, in my opinion, it is very important not to add too many constraints/guidelines on how to code (looking at you, linter). It needs to be so flexible that most ML projects can use it. I had some issue when doing the syncing, need to write that down and think about it a bit though

Yup, we're trying this. Less centralized and flexible are certainly the goals. After the preprint is out @Imipenem and me will revisit the linter and also make it a little bit more customizable. But user feedback aka your feedback is always appreciated ^_^

KevinMenden · 2021-03-11T14:26:17Z

Cool :) Yes I definitely want to go through the syncing process again and write down what I thought could annoy me if I were to build something with mlf-core.
Will let you know!

grst · 2021-08-09T09:28:46Z

Just dropping one of my use-cases here:

I use nextflow to make my single-cell analyses reproducible. The pipeline chains a bunch of scripts and jupyter notebooks together and generates HTML reports containing all results and figures. See also nf-core/modules#617 for the corresponding notebook modules.

When the pipeline involves scVI for data integration, it would be nice to rely on mlf-core to ensure reproducibility of model. A nextflow module would be extremely nice for that, or alternatively a python module to import that does all the seed-setting.

Zethson · 2021-08-09T10:32:57Z

Just dropping one of my use-cases here:

I use nextflow to make my single-cell analyses reproducible. The pipeline chains a bunch of scripts and jupyter notebooks together and generates HTML reports containing all results and figures. See also nf-core/modules#617 for the corresponding notebook modules.

When the pipeline involves scVI for data integration, it would be nice to rely on mlf-core to ensure reproducibility of model. A nextflow module would be extremely nice for that, or alternatively a python module to import that does all the seed-setting.

Hey Gregor,

cool to see you here :)

When the pipeline involves scVI for data integration, it would be nice to rely on mlf-core to ensure reproducibility of model. A nextflow module would be extremely nice for that, or alternatively a python module to import that does all the seed-setting.

I would be very happy to support scVI to ensure that all models are deterministic. There's just a couple of things to keep in mind.

Setting all required seeds + enforcing deterministic algorithms where available during the inference step is possible
Solely step 1 is not sufficient. All scVI modules would need to be screened to remove all non-deterministic algorithms. An example: 2D convolution is deterministic with deterministic algorithms forced. 3D convolution currently cannot be made deterministic. Therefore, a drop in mlf-core module would not help at all. The models need to be trained based on mlf-core (which enforces deterministic algorithms only)
Hardware needs to be tracked to reproduce deterministic results. A Nextflow module for system-intelligence would be very easy and would solve that.

grst · 2021-08-09T11:35:40Z

3D convolution currently cannot be made deterministic.

Shouldn't pytorch raise an Exception in that case if use_deterministic_algorithms is set to True? (ofc it wouldn't train successfully, if there is non-determinism, but it should at least be easy to detect it? )

A Nextflow module for system-intelligence would be very easy and would solve that.

Such a module would be neat, but that depends on mlf-core/system-intelligence#147 I guess. I was just trying to set-up the lshw tool on our cluster this morning. Having all dependencies on conda would be quite helpful!

grst · 2021-08-09T11:38:05Z

The models need to be trained based on mlf-core (which enforces deterministic algorithms only)

How do you actually do that? just by linting?

Zethson · 2021-08-09T11:40:06Z

3D convolution currently cannot be made deterministic.

Shouldn't pytorch raise an Exception in that case if use_deterministic_algorithms is set to True?

Yes.

This flag is rather new. (Just when we got mlf-core out it was introduced)
I am not sure whether it rose exceptions for all algorithms that we tested? CC @luiskuhn
This is of course PyTorch specific and does not help with TensorFlow

A Nextflow module for system-intelligence would be very easy and would solve that.

Such a module would be neat, but that depends on mlf-core/system-intelligence#147 I guess. I was just trying to set-up the lshw tool on our cluster this morning. Having all dependencies on conda would be quite helpful!

Yeah, sorry about the missing Conda package. Honestly, I don't have the time in the near future to do that. Maybe @KevinMenden @emiller88 or potentially even you could help out here?

Zethson · 2021-08-09T11:41:12Z

The models need to be trained based on mlf-core (which enforces deterministic algorithms only)

How do you actually do that? just by linting?

Mix of linting, enforced containers, hardware architecture tracking and logging of the training history (hyperparameters, obtained metrics etc etc). Latter not necessarily relevant for inference.

grst · 2021-08-09T18:30:50Z

Regarding scVI, this snippet seems to have done the trick (on the same hardware, of course). Running the same pipeline twice yielded exactely the same clustering + UMAP plot.

But I feel it would be best to add that to scVI directly.

def set_all_seeds(seed=0):
    import os
    import random
    import numpy as np
    import torch

    scvi.settings.seed = seed
    os.environ["PYTHONHASHSEED"] = str(seed)  # Python general
    os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8"
    np.random.seed(seed)  # Numpy random
    random.seed(seed)  # Python random

    torch.manual_seed(seed)
    torch.use_deterministic_algorithms(True)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)  # For multiGPU


set_all_seeds()

EDIT: scVI already sets some seeds.

edmundmiller added the documentation Improvements or additions to documentation label Mar 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add example of using a MLflow model in a nextflow pipeline #306

Add example of using a MLflow model in a nextflow pipeline #306

edmundmiller commented Mar 10, 2021

Zethson commented Mar 10, 2021 via email

edmundmiller commented Mar 10, 2021

Zethson commented Mar 10, 2021

KevinMenden commented Mar 11, 2021

Zethson commented Mar 11, 2021

KevinMenden commented Mar 11, 2021

Zethson commented Mar 11, 2021

KevinMenden commented Mar 11, 2021

grst commented Aug 9, 2021

Zethson commented Aug 9, 2021

grst commented Aug 9, 2021 •

edited

grst commented Aug 9, 2021

Zethson commented Aug 9, 2021

Zethson commented Aug 9, 2021

grst commented Aug 9, 2021 •

edited

Add example of using a MLflow model in a nextflow pipeline #306

Add example of using a MLflow model in a nextflow pipeline #306

Comments

edmundmiller commented Mar 10, 2021

Zethson commented Mar 10, 2021 via email

edmundmiller commented Mar 10, 2021

Zethson commented Mar 10, 2021

KevinMenden commented Mar 11, 2021

Zethson commented Mar 11, 2021

KevinMenden commented Mar 11, 2021

Zethson commented Mar 11, 2021

KevinMenden commented Mar 11, 2021

grst commented Aug 9, 2021

Zethson commented Aug 9, 2021

grst commented Aug 9, 2021 • edited

grst commented Aug 9, 2021

Zethson commented Aug 9, 2021

Zethson commented Aug 9, 2021

grst commented Aug 9, 2021 • edited

grst commented Aug 9, 2021 •

edited

grst commented Aug 9, 2021 •

edited