Introduce ADR for CodeFlare operator redesign #9

astefanutti · 2023-08-30T15:51:30Z

No description provided.

KPostOffice

Thanks for the ADR. It looks good. Will there be an option in the configuration to have Instascale disabled?

PCF-ADR-0007-operator-redesign.md

astefanutti · 2023-08-31T09:17:54Z

Thanks for the ADR. It looks good. Will there be an option in the configuration to have Instascale disabled?

Good catch, thanks! I completely forgot I wanted to detail that. Added with 4fea0e6. PTAL.

PCF-ADR-0007-operator-redesign.md

sutaakar · 2023-09-05T12:59:40Z

/lgtm

PCF-ADR-0007-operator-redesign.md

dimakis

In it's current guise and for purpose this LGTM.

I do have a question though, how much complexity do you think would be involved in abstracting out the queuing/ quota/ scaling components and having a more modular design?

I think this would make the versioning of the subcomponents less hassle, but it may also mean that instascale could potentially be swapped out for the kube autoscaler -- with any cloud provider flavour if the stack were to be deployed in a vanilla kube env.

It may also mean that queuing etc. could potentially be swapped out for a component of the users choice.

I feel like this would be extremely expensive, from an engineering perspective, but seeing as @asm582 has plans on making mcad more modular and introducing features such as the renewal energy feature it may be an idea to have this abstraction at the CFO as opposed to MCAD?

asm582 · 2023-09-06T19:04:53Z

/lgtm

asm582 · 2023-09-06T19:05:01Z

/approve

astefanutti · 2023-09-07T11:11:32Z

In it's current guise and for purpose this LGTM.

I do have a question though, how much complexity do you think would be involved in abstracting out the queuing/ quota/ scaling components and having a more modular design?

I think this would make the versioning of the subcomponents less hassle, but it may also mean that instascale could potentially be swapped out for the kube autoscaler -- with any cloud provider flavour if the stack were to be deployed in a vanilla kube env.

It may also mean that queuing etc. could potentially be swapped out for a component of the users choice.

I feel like this would be extremely expensive, from an engineering perspective, but seeing as @asm582 has plans on making mcad more modular and introducing features such as the renewal energy feature it may be an idea to have this abstraction at the CFO as opposed to MCAD?

To answer these questions, strictly restricted to the scope of that ADR, the goal that aims at delegating the installation concern, to the underlying platform, such as OpenDataHub, will make it so modularity and polymorphism (the ability to swap a component for another) details that intersect installation, will have to be implemented at the level of that platform. While this ADR proposal won't preclude some form of modularity and polymorphism, it'll lower the threshold of the module boundaries, beyond which it'll have to touch the installation, hence be implemented by the platform, rather than the CodeFlare operator. Concretely, it'll still be possible to modularise controllers like InstaScale, quota manager, or finer-grained modules, like MCAD, backoff strategies, but swapping entire components like Kueue instead of MCAD, or Kubernetes cluster autoscaler instead of InstaScale, will likely be best achieved at the platform level.

To answer these questions beyond the scope of that ADR, I think that raises the fundamental question of what is the value-add of the CodeFlare stack. Modularity and polymorphism are technicalities that software engineers are prompt to introduce, while end-users are generally more interested to get their job done, out of the box, instead of figuring out what job scheduler or autoscaler should they use, or even be aware of. From the later end-user standpoint, I'd expect CodeFlare to provide an opinionated stack of best-in-class components, to enable users to get their job done as easily as possible, out of the box. So if the only value behind supporting multiple components, and the ability to swap them, is to mitigate gaps in existing components, it may be a better option to make these components best-in-class, or pick the ones that are already.

dimakis · 2023-09-07T11:38:01Z

Thanks very much for your detailed reply @astefanutti

In it's current guise and for purpose this LGTM.
I do have a question though, how much complexity do you think would be involved in abstracting out the queuing/ quota/ scaling components and having a more modular design?
I think this would make the versioning of the subcomponents less hassle, but it may also mean that instascale could potentially be swapped out for the kube autoscaler -- with any cloud provider flavour if the stack were to be deployed in a vanilla kube env.
It may also mean that queuing etc. could potentially be swapped out for a component of the users choice.
I feel like this would be extremely expensive, from an engineering perspective, but seeing as @asm582 has plans on making mcad more modular and introducing features such as the renewal energy feature it may be an idea to have this abstraction at the CFO as opposed to MCAD?

To answer these questions, strictly restricted to the scope of that ADR, the goal that aims at delegating the installation concern, to the underlying platform, such as OpenDataHub, will make it so modularity and polymorphism (the ability to swap a component for another) details that intersect installation, will have to be implemented at the level of that platform. While this ADR proposal won't preclude some form of modularity and polymorphism, it'll lower the threshold of the module boundaries, beyond which it'll have to touch the installation, hence be implemented by the platform, rather than the CodeFlare operator. Concretely, it'll still be possible to modularise controllers like InstaScale, quota manager, or finer-grained modules, like MCAD, backoff strategies, but swapping entire components like Kueue instead of MCAD, or Kubernetes cluster autoscaler instead of InstaScale, will likely be best achieved at the platform level.

This is likely the right call.

To answer these questions beyond the scope of that ADR, I think that raises the fundamental question of what is the value-add of the CodeFlare stack. Modularity and polymorphism are technicalities that software engineers are prompt to introduce, while end-users are generally more interested to get their job done, out of the box, instead of figuring out what job scheduler or autoscaler should they use, or even be aware of. From the later end-user standpoint, I'd expect CodeFlare to provide an opinionated stack of best-in-class components, to enable users to get their job done as easily as possible, out of the box. So if the only value behind supporting multiple components, and the ability to swap them, is to mitigate gaps in existing components, it may be a better option to make these components best-in-class, or pick the ones that are already.

My thoughts were not necessarily with the end user in mind, as to my mind the installation of the stack and components comes from a devops/ platform team or similar and not the end user. So offering them the most customisable way may help in gaining adoption as we could target a larger number of different environments. That at least was my trail of thought here. None the less, I see that this may be better done elsewhere. I just wanted to explore all options and gain your insight.

I'm happy with the direction this takes the CodeFlare stack.

dimakis · 2023-09-07T11:38:16Z

/approve

KPostOffice reviewed Aug 30, 2023

View reviewed changes

PCF-ADR-0007-operator-redesign.md Outdated Show resolved Hide resolved

anishasthana reviewed Aug 31, 2023

View reviewed changes

PCF-ADR-0007-operator-redesign.md Outdated Show resolved Hide resolved

PCF-ADR-0007-operator-redesign.md Show resolved Hide resolved

astefanutti added 2 commits August 31, 2023 11:00

Introduce ADR for CodeFlare operator redesign

a0265cc

Add a config to enable InstaScale controller

4fea0e6

astefanutti force-pushed the pr-operator-refactor branch from 411b85a to 4fea0e6 Compare August 31, 2023 09:13

Remove mention to package config as OLM additional resource

6f24d97

Document existing users migration

a593348

anishasthana reviewed Aug 31, 2023

View reviewed changes

PCF-ADR-0007-operator-redesign.md Show resolved Hide resolved

sutaakar reviewed Sep 5, 2023

View reviewed changes

PCF-ADR-0007-operator-redesign.md Show resolved Hide resolved

asm582 reviewed Sep 6, 2023

View reviewed changes

PCF-ADR-0007-operator-redesign.md Show resolved Hide resolved

astefanutti mentioned this pull request Sep 6, 2023

Add documentation on installing custom images opendatahub-io/distributed-workloads#67

Closed

3 tasks

dimakis reviewed Sep 6, 2023

View reviewed changes

anishasthana approved these changes Sep 7, 2023

View reviewed changes

anishasthana merged commit 36041f8 into project-codeflare:main Sep 7, 2023

astefanutti deleted the pr-operator-refactor branch September 8, 2023 07:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce ADR for CodeFlare operator redesign #9

Introduce ADR for CodeFlare operator redesign #9

astefanutti commented Aug 30, 2023

KPostOffice left a comment

astefanutti commented Aug 31, 2023

sutaakar commented Sep 5, 2023

dimakis left a comment

asm582 commented Sep 6, 2023

asm582 commented Sep 6, 2023

astefanutti commented Sep 7, 2023

dimakis commented Sep 7, 2023

dimakis commented Sep 7, 2023

Introduce ADR for CodeFlare operator redesign #9

Introduce ADR for CodeFlare operator redesign #9

Conversation

astefanutti commented Aug 30, 2023

KPostOffice left a comment

Choose a reason for hiding this comment

astefanutti commented Aug 31, 2023

sutaakar commented Sep 5, 2023

dimakis left a comment

Choose a reason for hiding this comment

asm582 commented Sep 6, 2023

asm582 commented Sep 6, 2023

astefanutti commented Sep 7, 2023

dimakis commented Sep 7, 2023

dimakis commented Sep 7, 2023