Implement a Reusable E2E Kubeflow ML Lifecycle #3728

andreyvelich · 2024-05-02T15:04:16Z

Based on our recent discussion with @franciscojavierarceo I updated the ML lifecycle diagram in the architecture guides: #3719 (comment)
We can re-use this ML lifecycle diagram in each Kubeflow Component and explain the user value of that component.

I like the existing diagrams, but they little bit out of date.
I am happy to improve my diagrams based on your feedback.

Also, I removed unused images.

/assign @franciscojavierarceo @kubeflow/kubeflow-steering-committee @thesuperzapper @StefanoFioravanzo @hbelmiro

/hold for review

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

franciscojavierarceo · 2024-05-02T15:23:54Z

content/en/docs/started/images/ml-lifecycle-kubeflow.drawio.svg

Looks like a typo, should be Data Producers.

Nice catch! @franciscojavierarceo I am wondering, should we add the Data Producers to the Offline Feature store as well?
E.g. Spark ingest data from Data Producers and extract features.

It also may be useful to add a Feature Extraction to the Offline Store to make it concrete how the offline store is used.

Nice catch! @franciscojavierarceo I am wondering, should we add the Data Producers to the Offline Feature store as well?

Yeah, I think that's a great idea! That can get complicated if we're to get specific but I think if we're just generic and create a box like we do for the online store that works fine.

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

StefanoFioravanzo · 2024-05-03T10:42:40Z

@andreyvelich What exactly are you trying to accomplish? I didn't fully get this part

We can re-use this ML lifecycle diagram in each Kubeflow Component and explain the user value of that component.

Is this diagram supposed to be re-used by each component, and if so, how do you envision that?

andreyvelich · 2024-05-03T15:23:47Z

@andreyvelich What exactly are you trying to accomplish? I didn't fully get this part

We can re-use this ML lifecycle diagram in each Kubeflow Component and explain the user value of that component.

Is this diagram supposed to be re-used by each component, and if so, how do you envision that?

That's right. Please check these examples:

We can do the same for Model Registry, Spark Operator, Notebooks if other WGs agree with that.

What do you think about it @StefanoFioravanzo ?

StefanoFioravanzo · 2024-05-05T08:52:36Z

Oh Ok now I understand your approach, I like this. You are proposing we build a canonical Kubeflow ML lifecycle diagram and then highlight what parts of the diagram each component covers.

So, based on this, I propose two things:

rename this PR to better represent what we are doing (e.g. Implement a reusable E2E ML lifecycle diagram or something like that)
Consider using and adapting an existing diagram. There are many E2E ML lifecycle diagram in the open source, widely used and promoted by large oragnizations. One option is to overlay Kubeflow and Kubeflow components on top of one of these

If you want to keep the focus smaller and have a quicker iteration on the existing diagram, I am fine with it and you can ignore the two points above.

StefanoFioravanzo · 2024-05-05T08:54:57Z

cc @chasecadet can probably provide some good insight on this

StefanoFioravanzo · 2024-05-05T09:10:38Z

@andreyvelich a very good open source diagram that we can reuse is this one by the AI Infrastructure Alliance. See here https://github.com/ai-infrastructure-alliance/blueprints

There is no explicit license, by the do write in the README:

Please retain the AIIA Logo on the diagrams when you use them, otherwise you are free to modify them in any way you see fit.

I think this would be a pretty good starting point for a reusable diagram. They have an editable figma file, and even an interactive version. Take a look at all the folders, there's various versions.

We could fork the repository under the Kubeflow org and adapt it to the various component. If we want we could embed the interactive diagram in our website. If we are unsure about licensing and reusability of that content, I can reach out to a couple of folks at AIIA.

StefanoFioravanzo · 2024-05-05T09:13:15Z

I can see us doing something similar to this interactive version https://ai-infrastructure-alliance.github.io/blueprints/interactive-stack-diagram/stack.html where each option is one of the Kubeflow components. So you can see how the entire Kubeflow platform (we can have a "all" picker) covers the E2E ML lifecycle or based on
your a-la-carte choice

andreyvelich · 2024-05-06T16:11:33Z

rename this PR to better represent what we are doing (e.g. Implement a reusable E2E ML lifecycle diagram or something like that)

That makes sense, renamed it.

andreyvelich · 2024-05-06T16:14:11Z

If you want to keep the focus smaller and have a quicker iteration on the existing diagram, I am fine with it and you can ignore the two points above.

To be honest, I have concerns with existing diagram, since it was implemented ~ 5 years ago which is very out-of-date. E.g. it doesn't include model fine-tuning which is the modern approach for model development, and it doesn't have online feature store. WDYT @StefanoFioravanzo @franciscojavierarceo ?

andreyvelich · 2024-05-06T16:23:20Z

a very good open source diagram that we can reuse is this one by the AI Infrastructure Alliance. See here https://github.com/ai-infrastructure-alliance/blueprints

I like there diagrams, but it looks similar to what we have in this PR, isn't ?

E.g. the differences:

We simplify data sources for Data ingestion with Spark.
We don't introduce lakehouse concepts for Data Lakes.
We don't have model monitoring in serving to re-train model in production.

Maybe we can improve our diagram with additional stages ?
WDYT @franciscojavierarceo @StefanoFioravanzo

franciscojavierarceo · 2024-05-06T16:23:32Z

I can see us doing something similar to this interactive version https://ai-infrastructure-alliance.github.io/blueprints/interactive-stack-diagram/stack.html where each option is one of the Kubeflow components. So you can see how the entire Kubeflow platform (we can have a "all" picker) covers the E2E ML lifecycle or based on your a-la-carte choice

I agree the old diagram is outdated.

I am much more preferential to a diagram that reflects the view of a Data Scientist and the needs in their workflow, which the diagram you proposed does. The AI Infrastructure Aliiance I think highlights things in a way that highlights the needs for different companies with different structure and, while that's helpful, I don't think that elicits clarity on the value of Kubeflow.

chasecadet · 2024-05-07T14:09:23Z

@StefanoFioravanzo finally getting to this! Before I say too much I'd like to take a step back because as we allll know "tactics without vision is just noise before defeat". I like the idea of an ML diagram. I would love to know what our vision for these documents is and how we are approaching this. Someone reads the diagram they learn X and then start building using Y and deliver Z value to their project/org.

Allow me to free associate here a bit on what I think would be interesting. I like the idea of talking about use cases for specific components, but I struggle with the idea of telling users what to do. I want to help them envision using these tools and enable them to creatively solve solutions. Another way to say this is I would love if the users told us what they use these components for in collaboration with our vision for these components. We as a community can provide guidance. If we act as a ground truth authority on use cases we might lose out on the value of new community members using the tools in powerful but unexpected ways we can later integrate into more robust use cases.

Questions I'd love to have answers to are:

What are the common use cases?
What are some considerations?
What pitfalls do we see?
How might we run into issues using these solutions in ways not intended?

We can touch on trying to say use KFP without a training operator to attempt to run an XGBOOST job vs using and integrating the training operator to show that you "can" do things in MANY ways but may lose out on overall value trying to redo our engineering efforts through your own means..

That being said, stands on soap box
I love calling out the model development lifecycle according to this community and placing components within that lifecycle as suggestions. Some are more concrete than others (you can't use Kserve to train a model) but also showing that we have a flexible, composable, and integrated solution you can port anywhere to run MLOPs at scale. I think @jbottum said it very well in that the power of KF is more than just our components but the community. As we grow we benefit from continuing to demonstrate the tribal community knowledge we are building and sharing with the world so teams can "Go with the Kubeflow" knowing they are part of a community that is writing code with a purpose using learning from many orgs, communities, and perspectives to build a world class MLOPs solution vastly democratizing access to ML/AI across the industry. Showing others "What's in it for them" using KF will bring them into the community and ensure it stays healthy and fuel the next generation of contributors as we go from incubation to graduation and beyond. hops off soap box

Maybe I missed the point of the CC. I also have a chapter in that class I built on the model dev lifecycle. I officially own the content and we can use it how we see fit to create some MLOPs like documents.

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

google-oss-prow · 2024-05-23T15:48:58Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from andreyvelich. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

andreyvelich · 2024-05-23T15:49:54Z

@StefanoFioravanzo @franciscojavierarceo I've made a few updates to the lifecycle diagram based on the feedback.
Does it look good to you ?
I think, we can merge this PR before Kubeflow 1.9 release.

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

franciscojavierarceo · 2024-05-23T16:58:42Z

@StefanoFioravanzo @franciscojavierarceo I've made a few updates to the lifecycle diagram based on the feedback. Does it look good to you ? I think, we can merge this PR before Kubeflow 1.9 release.

Looks great!

andreyvelich added 2 commits May 2, 2024 15:53

Update Kubeflow ML Lifecycle

fa4b681

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

Delete unused images

4e03a73

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

google-oss-prow bot added the do-not-merge/hold label May 2, 2024

google-oss-prow bot requested review from alfsuse, cspavlou, johnugeorge and PatrickXYS May 2, 2024 15:04

google-oss-prow bot added approved size/L labels May 2, 2024

Fix Kubeflow ML Lifecycle image

9c00af0

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

franciscojavierarceo reviewed May 2, 2024

View reviewed changes

andreyvelich added 2 commits May 2, 2024 16:34

Fix Data Producers

60899db

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

Fix Data Producers in all diagrams

9fa9609

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

andreyvelich mentioned this pull request May 3, 2024

Update Kubeflow Installation with Standalone Mode #3724

Open

andreyvelich changed the title ~~Update Kubeflow ML Lifecycle~~ Implement a Reusable E2E Kubeflow ML lifecycle May 6, 2024

andreyvelich changed the title ~~Implement a Reusable E2E Kubeflow ML lifecycle~~ Implement a Reusable E2E Kubeflow ML Lifecycle May 6, 2024

andreyvelich mentioned this pull request May 10, 2024

add Model Registry doc to website #3698

Merged

4 tasks

Update Lifecycle Diagrams

02dae92

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

google-oss-prow bot removed the approved label May 23, 2024

Fix diagram

9421897

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a Reusable E2E Kubeflow ML Lifecycle #3728

Implement a Reusable E2E Kubeflow ML Lifecycle #3728

andreyvelich commented May 2, 2024

franciscojavierarceo May 2, 2024 •

edited

andreyvelich May 2, 2024

franciscojavierarceo May 2, 2024

franciscojavierarceo May 2, 2024 •

edited

StefanoFioravanzo commented May 3, 2024

andreyvelich commented May 3, 2024 •

edited

StefanoFioravanzo commented May 5, 2024

StefanoFioravanzo commented May 5, 2024

StefanoFioravanzo commented May 5, 2024

StefanoFioravanzo commented May 5, 2024

andreyvelich commented May 6, 2024

andreyvelich commented May 6, 2024

andreyvelich commented May 6, 2024

franciscojavierarceo commented May 6, 2024

chasecadet commented May 7, 2024

google-oss-prow bot commented May 23, 2024

andreyvelich commented May 23, 2024

franciscojavierarceo commented May 23, 2024

Implement a Reusable E2E Kubeflow ML Lifecycle #3728

Are you sure you want to change the base?

Implement a Reusable E2E Kubeflow ML Lifecycle #3728

Conversation

andreyvelich commented May 2, 2024

franciscojavierarceo May 2, 2024 • edited

Choose a reason for hiding this comment

andreyvelich May 2, 2024

Choose a reason for hiding this comment

franciscojavierarceo May 2, 2024

Choose a reason for hiding this comment

franciscojavierarceo May 2, 2024 • edited

Choose a reason for hiding this comment

StefanoFioravanzo commented May 3, 2024

andreyvelich commented May 3, 2024 • edited

StefanoFioravanzo commented May 5, 2024

StefanoFioravanzo commented May 5, 2024

StefanoFioravanzo commented May 5, 2024

StefanoFioravanzo commented May 5, 2024

andreyvelich commented May 6, 2024

andreyvelich commented May 6, 2024

andreyvelich commented May 6, 2024

franciscojavierarceo commented May 6, 2024

chasecadet commented May 7, 2024

google-oss-prow bot commented May 23, 2024

andreyvelich commented May 23, 2024

franciscojavierarceo commented May 23, 2024

franciscojavierarceo May 2, 2024 •

edited

franciscojavierarceo May 2, 2024 •

edited

andreyvelich commented May 3, 2024 •

edited