Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ease packaging and publishing process in python #180

Open
gbolmier opened this issue Jul 18, 2021 · 3 comments
Open

Ease packaging and publishing process in python #180

gbolmier opened this issue Jul 18, 2021 · 3 comments
Labels
area/model Model related functions, including model warehouse, model compression, model evaluation, etc. enhancement New feature or request kind/design Categorizes issue or PR as related to design. kind/feature Categorizes issue or PR as related to a new feature.

Comments

@gbolmier
Copy link
Contributor

gbolmier commented Jul 18, 2021

/kind feature

What happened:

My team is working with the Kubeflow platform and we're investigating using ormb to share and publish our ML models and other stateful artifacts like transformers (e.g. standard scaler, pca, tf-idf vectorizer) on Harbor.

As far as I understand, to publish a stateful artifact after it processed data, the following steps need to be performed:

  • save the "fitted" artifact within an <artifact_name>/model/ directory
  • write an <artifact_name>/ormbfile.yaml artifact config file containing the artifact's metadata
  • run the ormb save and push commands to package and publish the stateful artifact

As some of the metadata can:

  • only be known at runtime (e.g. created datetime, size of the artifact , run-dependent hyperparameters, metrics)
  • or better be automatically populated at runtime (e.g. revision, framework with its version used)

<artifact_name>/ormbfile.yaml artifact config file needs to be programmatically written/modified. This step – without any utilities – requires to write a lot of logic on the user side.

What you expected to happen:

Have a process of publishing ML stateful artifacts as convenient & automated as possible for the end user, i.e. the data scientist.

Maybe we could implement some utilities within ormb python sdk to make the process more convenient in practice.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

I'm not that familiar with image based registries, the underlying concepts, and the tools of that ecosystem, so feel free to correct me or suggest me any useful materials.

@gaocegege
Copy link
Member

Maybe we could implement some utilities within ormb python sdk to make the process more convenient in practice.

Yeah, I think so. Now we do not provide such SDK or sdk. I am not sure if we should implment in Python or Go. Are you using Python SDK?

@gaocegege
Copy link
Member

I'm not that familiar with image based registries, the underlying concepts, and the tools of that ecosystem, so feel free to correct me or suggest me any useful materials.

No problem, I think your suggestion is great. If you want to know more about image registries or the features we used in Harbor, please have a look at https://github.com/goharbor/community/blob/master/proposals/enhanced-default-processor.md

@gaocegege gaocegege added area/model Model related functions, including model warehouse, model compression, model evaluation, etc. enhancement New feature or request kind/design Categorizes issue or PR as related to design. kind/feature Categorizes issue or PR as related to a new feature. labels Jul 21, 2021
@gbolmier
Copy link
Contributor Author

Yeah, I think so. Now we do not provide such SDK or sdk. I am not sure if we should implment in Python or Go. Are you using Python SDK?

Yes I'm using the Python SDK, but to make it work on my mac, I had to replace the downloaded pre-compiled binaries with the ones from source locally compiled (see #181).

If implemented in go, it would make sense to add a Python wrapper for data scientists. I haven’t learnt go yet so I can’t really say which one makes more sense. On my side, I would therefore be more helpful if it's done in Python.

No problem, I think your suggestion is great. If you want to know more about image registries or the features we used in Harbor, please have a look at https://github.com/goharbor/community/blob/master/proposals/enhanced-default-processor.md

Thanks a lot for the pointer @gaocegege :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/model Model related functions, including model warehouse, model compression, model evaluation, etc. enhancement New feature or request kind/design Categorizes issue or PR as related to design. kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

2 participants