Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a Reusable E2E Kubeflow ML Lifecycle #3728

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

andreyvelich
Copy link
Member

Based on our recent discussion with @franciscojavierarceo I updated the ML lifecycle diagram in the architecture guides: #3719 (comment)
We can re-use this ML lifecycle diagram in each Kubeflow Component and explain the user value of that component.

I like the existing diagrams, but they little bit out of date.
I am happy to improve my diagrams based on your feedback.

Also, I removed unused images.

/assign @franciscojavierarceo @kubeflow/kubeflow-steering-committee @thesuperzapper @StefanoFioravanzo @hbelmiro

/hold for review

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a typo, should be Data Producers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! @franciscojavierarceo I am wondering, should we add the Data Producers to the Offline Feature store as well?
E.g. Spark ingest data from Data Producers and extract features.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also may be useful to add a Feature Extraction to the Offline Store to make it concrete how the offline store is used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! @franciscojavierarceo I am wondering, should we add the Data Producers to the Offline Feature store as well?

Yeah, I think that's a great idea! That can get complicated if we're to get specific but I think if we're just generic and create a box like we do for the online store that works fine.

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
@StefanoFioravanzo
Copy link
Member

@andreyvelich What exactly are you trying to accomplish? I didn't fully get this part

We can re-use this ML lifecycle diagram in each Kubeflow Component and explain the user value of that component.

Is this diagram supposed to be re-used by each component, and if so, how do you envision that?

@andreyvelich
Copy link
Member Author

andreyvelich commented May 3, 2024

@andreyvelich What exactly are you trying to accomplish? I didn't fully get this part

We can re-use this ML lifecycle diagram in each Kubeflow Component and explain the user value of that component.

Is this diagram supposed to be re-used by each component, and if so, how do you envision that?

That's right. Please check these examples:

We can do the same for Model Registry, Spark Operator, Notebooks if other WGs agree with that.

What do you think about it @StefanoFioravanzo ?

@StefanoFioravanzo
Copy link
Member

Oh Ok now I understand your approach, I like this. You are proposing we build a canonical Kubeflow ML lifecycle diagram and then highlight what parts of the diagram each component covers.

So, based on this, I propose two things:

  1. rename this PR to better represent what we are doing (e.g. Implement a reusable E2E ML lifecycle diagram or something like that)
  2. Consider using and adapting an existing diagram. There are many E2E ML lifecycle diagram in the open source, widely used and promoted by large oragnizations. One option is to overlay Kubeflow and Kubeflow components on top of one of these

If you want to keep the focus smaller and have a quicker iteration on the existing diagram, I am fine with it and you can ignore the two points above.

@StefanoFioravanzo
Copy link
Member

cc @chasecadet can probably provide some good insight on this

@StefanoFioravanzo
Copy link
Member

@andreyvelich a very good open source diagram that we can reuse is this one by the AI Infrastructure Alliance. See here https://github.com/ai-infrastructure-alliance/blueprints

There is no explicit license, by the do write in the README:

Please retain the AIIA Logo on the diagrams when you use them, otherwise you are free to modify them in any way you see fit.

I think this would be a pretty good starting point for a reusable diagram. They have an editable figma file, and even an interactive version. Take a look at all the folders, there's various versions.

We could fork the repository under the Kubeflow org and adapt it to the various component. If we want we could embed the interactive diagram in our website. If we are unsure about licensing and reusability of that content, I can reach out to a couple of folks at AIIA.

@StefanoFioravanzo
Copy link
Member

I can see us doing something similar to this interactive version https://ai-infrastructure-alliance.github.io/blueprints/interactive-stack-diagram/stack.html where each option is one of the Kubeflow components. So you can see how the entire Kubeflow platform (we can have a "all" picker) covers the E2E ML lifecycle or based on
your a-la-carte choice

@andreyvelich andreyvelich changed the title Update Kubeflow ML Lifecycle Implement a Reusable E2E Kubeflow ML lifecycle May 6, 2024
@andreyvelich
Copy link
Member Author

rename this PR to better represent what we are doing (e.g. Implement a reusable E2E ML lifecycle diagram or something like that)

That makes sense, renamed it.

@andreyvelich andreyvelich changed the title Implement a Reusable E2E Kubeflow ML lifecycle Implement a Reusable E2E Kubeflow ML Lifecycle May 6, 2024
@andreyvelich
Copy link
Member Author

If you want to keep the focus smaller and have a quicker iteration on the existing diagram, I am fine with it and you can ignore the two points above.

To be honest, I have concerns with existing diagram, since it was implemented ~ 5 years ago which is very out-of-date. E.g. it doesn't include model fine-tuning which is the modern approach for model development, and it doesn't have online feature store. WDYT @StefanoFioravanzo @franciscojavierarceo ?

@andreyvelich
Copy link
Member Author

a very good open source diagram that we can reuse is this one by the AI Infrastructure Alliance. See here https://github.com/ai-infrastructure-alliance/blueprints

I like there diagrams, but it looks similar to what we have in this PR, isn't ?

E.g. the differences:

  • We simplify data sources for Data ingestion with Spark.
  • We don't introduce lakehouse concepts for Data Lakes.
  • We don't have model monitoring in serving to re-train model in production.

Maybe we can improve our diagram with additional stages ?
WDYT @franciscojavierarceo @StefanoFioravanzo

@franciscojavierarceo
Copy link
Contributor

I can see us doing something similar to this interactive version https://ai-infrastructure-alliance.github.io/blueprints/interactive-stack-diagram/stack.html where each option is one of the Kubeflow components. So you can see how the entire Kubeflow platform (we can have a "all" picker) covers the E2E ML lifecycle or based on your a-la-carte choice

I agree the old diagram is outdated.

I am much more preferential to a diagram that reflects the view of a Data Scientist and the needs in their workflow, which the diagram you proposed does. The AI Infrastructure Aliiance I think highlights things in a way that highlights the needs for different companies with different structure and, while that's helpful, I don't think that elicits clarity on the value of Kubeflow.

@chasecadet
Copy link

@StefanoFioravanzo finally getting to this! Before I say too much I'd like to take a step back because as we allll know "tactics without vision is just noise before defeat". I like the idea of an ML diagram. I would love to know what our vision for these documents is and how we are approaching this. Someone reads the diagram they learn X and then start building using Y and deliver Z value to their project/org.

Allow me to free associate here a bit on what I think would be interesting. I like the idea of talking about use cases for specific components, but I struggle with the idea of telling users what to do. I want to help them envision using these tools and enable them to creatively solve solutions. Another way to say this is I would love if the users told us what they use these components for in collaboration with our vision for these components. We as a community can provide guidance. If we act as a ground truth authority on use cases we might lose out on the value of new community members using the tools in powerful but unexpected ways we can later integrate into more robust use cases.

Questions I'd love to have answers to are:

  • What are the common use cases?
  • What are some considerations?
  • What pitfalls do we see?
  • How might we run into issues using these solutions in ways not intended?

We can touch on trying to say use KFP without a training operator to attempt to run an XGBOOST job vs using and integrating the training operator to show that you "can" do things in MANY ways but may lose out on overall value trying to redo our engineering efforts through your own means..

That being said, stands on soap box
I love calling out the model development lifecycle according to this community and placing components within that lifecycle as suggestions. Some are more concrete than others (you can't use Kserve to train a model) but also showing that we have a flexible, composable, and integrated solution you can port anywhere to run MLOPs at scale. I think @jbottum said it very well in that the power of KF is more than just our components but the community. As we grow we benefit from continuing to demonstrate the tribal community knowledge we are building and sharing with the world so teams can "Go with the Kubeflow" knowing they are part of a community that is writing code with a purpose using learning from many orgs, communities, and perspectives to build a world class MLOPs solution vastly democratizing access to ML/AI across the industry. Showing others "What's in it for them" using KF will bring them into the community and ensure it stays healthy and fuel the next generation of contributors as we go from incubation to graduation and beyond. hops off soap box

Maybe I missed the point of the CC. I also have a chapter in that class I built on the model dev lifecycle. I officially own the content and we can use it how we see fit to create some MLOPs like documents.

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from andreyvelich. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@andreyvelich
Copy link
Member Author

@StefanoFioravanzo @franciscojavierarceo I've made a few updates to the lifecycle diagram based on the feedback.
Does it look good to you ?
I think, we can merge this PR before Kubeflow 1.9 release.

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
@franciscojavierarceo
Copy link
Contributor

@StefanoFioravanzo @franciscojavierarceo I've made a few updates to the lifecycle diagram based on the feedback. Does it look good to you ? I think, we can merge this PR before Kubeflow 1.9 release.

Looks great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants