Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy information of a model from MLFlow #52

Open
momegas opened this issue Dec 4, 2022 · 10 comments
Open

Copy information of a model from MLFlow #52

momegas opened this issue Dec 4, 2022 · 10 comments
Labels
discussion needed This issue needs some discussion to move forward enhancement New feature or request needs analysis This issue needs analysis

Comments

@momegas
Copy link
Member

momegas commented Dec 4, 2022

Description

Since MLFlow is an industry standard and a lot of people use it, it makes sense that whitebox integrates with it and uses it as a data store, or something similar providing missing functionality in the monitoring field of MLOps

@momegas momegas added this to the Whitebox Roadmap milestone Jan 3, 2023
@momegas momegas added discussion needed This issue needs some discussion to move forward needs analysis This issue needs analysis labels Jan 3, 2023
@momegas
Copy link
Member Author

momegas commented Jan 3, 2023

@stavrostheocharis @gcharis @NickNtamp @sinnec

Here are some thoughts about the implementation of an MLFlow integration.
Give your feedback with numbers as below, please. This will help me have other views because I gave a lot of thought to this, I think I'm short-sighted now.

  1. We need a way for users to copy information of a model from MLFlow. This can be directly implemented in our SDK.
  2. Since we save a lot of info similar to MLFlow we could migrate our database to a (non-accessible) MLFlow instance running behind Whitebox. This will save a lot of implementation time (artefacts, S3, integrations, etc) since we can reuse MLFlow's integrations. Also, we can use exp tracking for saving our time series as well.
  3. If we go with number 2, would it make sense to give the user the ability to use an existing MLFlow instance? This would result in having both the monitoring and experiments in one dashboard (did some tests there. It's not as ugly as i thought).

@momegas momegas added the enhancement New feature or request label Jan 3, 2023
@momegas momegas changed the title MLflow integration [Roadmap] MLflow integration Jan 4, 2023
@momegas
Copy link
Member Author

momegas commented Jan 12, 2023

Bump 👋

@momegas momegas modified the milestones: 🐻‍❄️ Whitebox Roadmap, 😻 Q2 2023 Jan 18, 2023
@momegas
Copy link
Member Author

momegas commented Jan 20, 2023

Bump 💥

@gcharis
Copy link
Contributor

gcharis commented Jan 24, 2023

Aren't we talking about an mlflow plugin? It makes sense to me

https://mlflow.org/docs/latest/plugins.html

@momegas
Copy link
Member Author

momegas commented Jan 25, 2023

MLFlow plugins are for mlflow to integrate with other tools, right? We need the opposite, I guess. Whitebox should be getting data from mlflow

@NickNtamp
Copy link
Contributor

Some thoughts also from me regarding the above points:

  1. Correct. I don't know if they or we have to copy something, but in any case we need a way of using the client's models which are stocked under the MLflow.
  2. I don't know about the optimal implementation in terms of databases, but yes, I totally agree that we have to take advantage of the data which are created from the MLflow.
  3. Generally, my thought is that we can have Whitebox as an "extension" of MLflow, taking advantage of all the data which are used and created there!

@momegas
Copy link
Member Author

momegas commented Jan 25, 2023

After all these days I think we only need number 1 btw

@momegas momegas changed the title [Roadmap] MLflow integration MLflow integration Feb 15, 2023
@momegas
Copy link
Member Author

momegas commented Feb 17, 2023

For anyone taking this issue, lets go with number one option for now:

  1. We need a way for users to copy information of a model from MLFlow. This can be directly implemented in our SDK.

I propose a method in the SDK that requests the model from MLFlow. Take into account that we may need to point the SDK to MLFlow. Renaming this issue to something more relevant

@momegas momegas changed the title MLflow integration Copy information of a model from MLFlow Feb 17, 2023
@stavrostheocharis
Copy link
Contributor

I think @NickNtamp can share some thoughts on that since he is going to check a bit Mlflow

@NickNtamp
Copy link
Contributor

Based on my investigation till now mlflow is capable of saving/tracking the followings per different experiments:

  1. artifacts: Mainly requirements in a yaml file and dependencies. Here we can find also the model in a pkl format.
  2. metrics: Metrics regarding the experiment. Metrics could be either custom or standardized based on libraries (sklearn etc.)
  3. params: The combination of hyperparameters used for the experiment
  4. tags: Various files that are consumed by mlflow api (versions, timestamps etc.)
  5. some metadata in a yaml file

Based on my knowledge the only thing (at least for now) that can be used is number 2 by replacing the functions which calculate the evaluation metrics here from row 65 and below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion needed This issue needs some discussion to move forward enhancement New feature or request needs analysis This issue needs analysis
Projects
None yet
Development

No branches or pull requests

4 participants