-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add example of using a MLflow model in a nextflow pipeline #306
Comments
https://github.com/mlf-core/nextflow-lcep
But yeah, a module would be cool. The 2 projects complement each other. Wednesday, 10 March 2021, 05:00PM +01:00 from Edmund Miller notifications@github.com :
…Nothing fancy, and not exactly in the scope of this project, but it might be helpful to the community.
https://twitter.com/LukasHeumos/status/1369573166130081793?s=20
Or maybe we should just make a nf-core module for MLflow models, @KevinMenden
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub , or unsubscribe .
|
Oh cool I hadn't looked into all the repos deeply enough! I'm thinking a module for running the Python packages produced by mlf-core and maybe one for system-intelligence |
@emiller88 yeah. There is a lot of stuff that we can do to improve the bridging of mlf-core and nf-core. I might open a more detailed issue for that in a couple of weeks, but it has no priority atm. Step 1: mlf-core/system-intelligence#147 And this may not be possible... |
To be honest I'm not sure if I am entirely convinced that a mlf-core module would make sense for nf-core. What would the other steps of that pipeline be? What's the advantage of wrapping that with nextflow? |
@KevinMenden Not necessarily mlf-core itself, but rather for tools that perform predictions. What I could think of (going wild here) is a single package, which takes in mlf-core trained pytorch/tensorflow/xgboost models (as a parameter) and outputs predictions as files. This package could be wrapped as a module. |
Yeah but that would imply that everything done with mlf-core somehow has similar inputs/outputs etc., also the file type will be different. And basically a model trained with mlf-core in the end is just a pytoch/tensorflow model that can be loaded and used. And for mlf-core, in my opinion, it is very important not to add too many constraints/guidelines on how to code (looking at you, linter). It needs to be so flexible that most ML projects can use it. I had some issue when doing the syncing, need to write that down and think about it a bit though 😁 Anyway if you keep it flexible enough (which you should) then I'm not sure whether a module to encapture all the models is doable/ makes sense. Just my two cents 🙂 |
Yeah I would need to think about it more as well.
Yup, we're trying this. Less centralized and flexible are certainly the goals. After the preprint is out @Imipenem and me will revisit the linter and also make it a little bit more customizable. But user feedback aka your feedback is always appreciated ^_^ |
Cool :) Yes I definitely want to go through the syncing process again and write down what I thought could annoy me if I were to build something with mlf-core. |
Just dropping one of my use-cases here: I use nextflow to make my single-cell analyses reproducible. The pipeline chains a bunch of scripts and jupyter notebooks together and generates HTML reports containing all results and figures. See also nf-core/modules#617 for the corresponding notebook modules. When the pipeline involves scVI for data integration, it would be nice to rely on mlf-core to ensure reproducibility of model. A nextflow module would be extremely nice for that, or alternatively a python module to import that does all the seed-setting. |
Hey Gregor, cool to see you here :)
I would be very happy to support scVI to ensure that all models are deterministic. There's just a couple of things to keep in mind.
|
Shouldn't pytorch raise an Exception in that case if
Such a module would be neat, but that depends on mlf-core/system-intelligence#147 I guess. I was just trying to set-up the |
How do you actually do that? just by linting? |
Yes.
Yeah, sorry about the missing Conda package. Honestly, I don't have the time in the near future to do that. Maybe @KevinMenden @emiller88 or potentially even you could help out here? |
Mix of linting, enforced containers, hardware architecture tracking and logging of the training history (hyperparameters, obtained metrics etc etc). Latter not necessarily relevant for inference. |
Regarding scVI, this snippet seems to have done the trick (on the same hardware, of course). Running the same pipeline twice yielded exactely the same clustering + UMAP plot. But I feel it would be best to add that to scVI directly. def set_all_seeds(seed=0):
import os
import random
import numpy as np
import torch
scvi.settings.seed = seed
os.environ["PYTHONHASHSEED"] = str(seed) # Python general
os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8"
np.random.seed(seed) # Numpy random
random.seed(seed) # Python random
torch.manual_seed(seed)
torch.use_deterministic_algorithms(True)
if torch.cuda.is_available():
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed) # For multiGPU
set_all_seeds() EDIT: scVI already sets some seeds. |
Nothing fancy, and not exactly in the scope of this project, but it might be helpful to the community.
https://twitter.com/LukasHeumos/status/1369573166130081793?s=20
Or maybe we should just make a nf-core module for MLflow models, @KevinMenden
The text was updated successfully, but these errors were encountered: