-
Notifications
You must be signed in to change notification settings - Fork 833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: RFC Supporting ODCR in Karpenter #5716
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for karpenter-docs-prod canceled.
|
✅ Deploy Preview for karpenter-docs-prod ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice doc! Some high-level comments
designs/odcr.md
Outdated
|
||
_We are keeping the scope of this design very targeted so even if these could be things we eventually support, we aren't scoping them into this design_ | ||
- Supporting prioritization when launching nodes | ||
- Supporting any pre-calcaluation of ODCR capacity utilization before node launch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we think that pre-calculation should be out of scope here? How do we handle the case where we try to launch into a capacity reservation that doesn't have any available instance types and it fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be quite complicated to implement. Because checking if capacity exist then launching against can't be done atomically, there is no guarantee that we won't run into capacity issue. I felt that allowing this to fail and we fallback into unable to launch additional capacity from this node pool would be reasonable and push users toward using weights on different node pools to pickup any insufficient capacity issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I just wonder if we can do this optimistically, with ICE-ing as a back-stop. I worry that if we don't pre-check we'll just keep slamming the API because we think that the NodePool still has instance type availability ever 3m.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mm that make sense. I think its reasonable to call the API to check. Do you think we should cache this as part of the existing instance type offering cache?
designs/odcr.md
Outdated
## Non-Goals | ||
|
||
_We are keeping the scope of this design very targeted so even if these could be things we eventually support, we aren't scoping them into this design_ | ||
- Supporting prioritization when launching nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't Karpenter already support this through its NodePool "weight" concept?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant within a single node pool. I can update
designs/odcr.md
Outdated
Cons: | ||
- This implementation is overly restrictive causing it to be difficult to use. Its possible that the restrictions makes it too hard for users to use EC2NodeClasses effectively. | ||
|
||
Both a pro and a con: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I think this is a good thing. When it comes to fallback and prioritization, I think this makes sense. You have a set of instance types that you prioritize for usage with the ODCR and then you can create a lower-weighted NodePool that has a broader set of instance types or perhaps uses the same instance type that you had for the ODCR but with spot and on-demand to try to get that type again just like from a normal request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
|
||
#### Labels | ||
|
||
When a node is launched against a CapacityReservation we will expose Capacity Reservation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anytime we are thinking about setting a label, we have to think if a user would select against it. Are you anticipating that users would select against the capacity reservation ID. I could perhaps see that, but I think that the reservation setting is significantly less valuable. Were you thinking of just having this there for observability information?
In either case, I think that we should callout how we expect users to use these new labels if we are going to propose them here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree on the setting, I think Id is much more helpful than setting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have a dedicated NodePool for an ODCR or set of ODCRs, it does seem like you would just want to label those nodes and have a node selector on that instead of the ODCR ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I track that my nodeclaim is part of a targeted capacity reservation that no longer exists, but I have another targeted capacity reservation that has available capacity, I need to terminate my nodeclaim and bring up a new on on the unused capacity reservation
designs/odcr.md
Outdated
|
||
#### Status | ||
|
||
We will expose this information both in the NodeClaim's, EC2NodeClass' and NodePool's Condition Status. _I know that currently NodeClasses and NodePool status do not have conditions but I wanted to see if we are opened to adding Conditions to these resources_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, absolutely open to adding conditions, see: kubernetes-sigs/karpenter#493
designs/odcr.md
Outdated
Status: | ||
Conditions: | ||
Last Transition Time: 2024-02-22T03:46:42Z | ||
Status: LimitExceeded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think that we need this status condition if Karpenter is aware of the available capacity and it can just know when to fallback to other NodePools since it ran out of capacity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. From an operator's perspective it would be good to know currently this nodeclass hit its limit and we are going to fallback into other node pools
Pull Request Test Coverage Report for Build 8025230145Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
Co-authored-by: Jonathan Innis <jonathan.innis.ji@gmail.com>
Co-authored-by: Jonathan Innis <jonathan.innis.ji@gmail.com>
Sorry for the slow down. I will try to get back to this PR but atm I got reshuffled in work so need some time to get back to this |
designs/odcr.md
Outdated
capacityReservationSpec: | ||
capacityReservationPreference: open | none | None # Cannot be defined if capacityReservationTarget is specified | ||
capacityReservationTarget: # Cannot be defined if capacityReservationPreference is specified | ||
capacityReservationId: String | None | ||
capacityReservationResourceGroupArn: String | None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep this similar to existing design of how to find objects?
It seems using selectorTerms
is the current design (f.e.: https://github.com/aws/karpenter-provider-aws/blob/main/pkg/apis/crds/karpenter.k8s.aws_ec2nodeclasses.yaml#L60-L102)
capacityReservationSpec: | |
capacityReservationPreference: open | none | None # Cannot be defined if capacityReservationTarget is specified | |
capacityReservationTarget: # Cannot be defined if capacityReservationPreference is specified | |
capacityReservationId: String | None | |
capacityReservationResourceGroupArn: String | None | |
capacityReservationSelectorTerms: | |
- tags: | |
karpenter.sh/discovery: "${CLUSTER_NAME}" | |
environment: test | |
- id: cr-1234567 # arn | |
- preference: open | none | None | |
capacityReservationResourceGroupSelectorTerms: | |
- tags: | |
karpenter.sh/discovery: "${CLUSTER_NAME}" | |
environment: test | |
- id: arn:aws:resource-groups:us-west-2:123456789012:group/my-resource-group | |
- preference: open | none | None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd skeptical of a need to specify multiple ODCRs that we can select on in a single EC2NodeClass. What are you imagining the use-case is for wanting to specify multiple ODCRs in an EC2NodeClass? The only one I can think of is multi-AZ support. Other than than -- I figure you get an ODCR because you have a particular type of capacity that you are planning to use for your workload and you want to target that specifically. Do we have an example of how a pod would target this capacity? Would we imagine that these pods target instance types with preferences and then fallback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the use of multiple ODCRs is because additional capacity is (usually) allocated via a new ODCR. so if you are growing your capacity, you are doing so by adding additional ODCRs. This is why its recommended to use the resource group which acts as the container, and as new capacity is allocated via additional ODCRs, those are associated to the resource group which expands the pool of available capacity
The use of the resource group helps to decouple the act of adding capacity from the consumers of the capacity (i.e. - I don't need to keep updating my manifests when capacity is added. Might be less of a concern for Karpenter, but can be quite disruptive and harder to consume with node groups that are utilizing launch templates)
designs/odcr.md
Outdated
|
||
The main failure scenario is when Capacity Reservation limit is hit and no new nodes can be launched from any Capacity Reservation the launch template targets. | ||
|
||
#### Status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this, we need a documentation that we will update the status of EC2NodeClass for the found capacity reservations / resource groups, similar how we do it for AMI or Subnets.
designs/odcr.md
Outdated
|
||
Pros: | ||
- This puts the onus of checking on the user to verify capatbility of their NodeClasses and requirements. It makes no assumption about what a Capacity Reservation can do nor implies can't do (Open to checking if Capacity is at limit prior to calling CreateFleet). | ||
- This forces users who wish to leverage fallback and prioritization to use NodePool weights or Capacity Reservation Group rather than relying on EC2NodeClass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this only works for Capacity Reservations
As far as I understand Capacity Reservation Resource Groups
have an automatic fallback to on-demand
? for the initial design are we for that reason than only scoping in Capacity Reservations
?
designs/odcr.md
Outdated
|
||
## Background | ||
|
||
In AWS [ODCR](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capacity-reservations.html) allows users to reserve compute capacity to mitigate against the risk of being |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In AWS [ODCR](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capacity-reservations.html) allows users to reserve compute capacity to mitigate against the risk of being | |
In AWS [ODCR](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capacity-reservations.html) allows users to reserve compute capacity to mitigate the risk of |
designs/odcr.md
Outdated
## Background | ||
|
||
In AWS [ODCR](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capacity-reservations.html) allows users to reserve compute capacity to mitigate against the risk of being | ||
unabled to get on demand capacity. This is very helpful during seasonal holidays where higher traffic is expected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unabled to get on demand capacity. This is very helpful during seasonal holidays where higher traffic is expected. | |
getting on-demand capacity. This is very helpful during seasonal holidays where higher traffic is expected. |
designs/odcr.md
Outdated
## Background | ||
|
||
In AWS [ODCR](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capacity-reservations.html) allows users to reserve compute capacity to mitigate against the risk of being | ||
unabled to get on demand capacity. This is very helpful during seasonal holidays where higher traffic is expected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unabled to get on demand capacity. This is very helpful during seasonal holidays where higher traffic is expected. | |
unabled to get on demand capacity. This is very helpful during seasonal holidays where higher traffic is expected or for reserving highly-desired instance types, like the `p5.48xlarge` or other large GPU instance types. |
AWS also supports grouping Capacity Reservation into Capacity Reservation groups. | ||
Both these entities are supported in Launch Template's CapacityReservationTarget [definitions](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ec2-launchtemplate-capacityreservationtarget.html). | ||
|
||
## Goals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to lay out a use-cases section here for the need and usage of ODCRs? Knowing these use-cases will help set the context and also help us think about how we should design the solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One additional goal I might (strongly) suggest - the ability to define "fairness" in some form when an ODCR is shared amongst multiple nodepools/clusters. The scenario:
- An ODCR for 500 instances is created
- That ODCR is shared across x-number of AWS accounts (say 5)
- Each account contains a cluster that should be entitled to 1/x (roughly) percent of the reservation at minimum (each cluster at minimum is guaranteed access to launch 100 instances)
The most basic form could be simply providing a max/ceiling on the number of instances a nodepool can launch from a reservation and let users set that value.
This is used in conjunction with the resource group - as more capacity is needed, more ODCRs are created and added into the "pool" of capacity, but that "pool" of capacity is shared amongst multiple Karpener nodepools across clusters/accounts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you already do this by having an ODCR NodePool with max vCPUs configured?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One additional goal I might (strongly) suggest - the ability to define "fairness" in some form when an ODCR is shared amongst multiple nodepools/clusters. The scenario:
- An ODCR for 500 instances is created
- That ODCR is shared across x-number of AWS accounts (say 5)
- Each account contains a cluster that should be entitled to 1/x (roughly) percent of the reservation at minimum (each cluster at minimum is guaranteed access to launch 100 instances)
The most basic form could be simply providing a max/ceiling on the number of instances a nodepool can launch from a reservation and let users set that value.
This is used in conjunction with the resource group - as more capacity is needed, more ODCRs are created and added into the "pool" of capacity, but that "pool" of capacity is shared amongst multiple Karpener nodepools across clusters/accounts
What you are describing is outside of the current capabilities of AWS itself, once those capabilities are available we can update or create another design document to incorporate this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you already do this by having an ODCR NodePool with max vCPUs configured?
Agreed, seems like a quick win, but is management overhead for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you are describing is outside of the current capabilities of AWS itself
Which capability(s) exactly? You can share ODCRs across accounts
Can't you already do this by having an ODCR NodePool with max vCPUs configured?
In theory, yes - but for most of the workloads that are utilizing GPUs via ODCRs, you aren't using vCPU as a proxy for your GPU allotment. p5s have double the vCPU capacity as p4s - so not a great proxy value
Not saying every use case has to be considered and implemented on the first go - but it seems we are missing the mark on ODCRs and how they are intended to be used:
- ODCRs should be added to a resource group and treated like a pool of compute resources. This pool of compute will expand/contract by adding/removing ODCRs
- ODCRs are not always 1:1 to a cluster/account/team (they rarely are 1:1) - again, its a pool of resources and that pool may need to be divided/shared amongst clusters/accounts/teams
- If you have a fixed pool of available resources, you will inevitably need to consider priority and fairness in terms of resource allocation. On day 1 you may have two teams who each consume half the pool by verbally agreeing to use x number of instances. As workloads grow in size, as is the norm for ML workloads, teams will need more resources and now you need to give the whole pool of resources to one team at a time but still ensure teams have equal time with the pool of resources
This is an RFC so not looking for every configuration or use case to be provided initially, but just want to ensure we are looking at the use cases and workloads and thinking about the implementation holistically - at least ensuring we have considered those common uses cases and can support them in the future without too much churn/disruption to the API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those cases were considered however we didn't choose the implement them on the first pass for a few reasons.
- Resource groups for capacity reservation makes the problem considerably more complicated because of its fallback logic to OD if the Capacity reservation group (CRG)'s ODCRs can't satisfy the request. This is complicated because it requires more complicated reconciliation because at this time we assume any ODCR will cost zero and have a fixed limit of instances. That isn't to say this won't be supported but to move this project forward we needed to narrow in a basic use case before we tackle the harder ones.
- Multi-account usage is supported in the selectors however you are right that we lack any fairness. Once again because we didn't get any early feedback from users we didn't implement any fairness between "shared" ODCRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please join Karpenter working group on June 6th 2PM PST to discuss (https://karpenter.sh/docs/contributing/working-group/)
designs/odcr.md
Outdated
- Support associating ODCR to EC2NodeClass | ||
- Define Karpenter's behavior when launching nodes into Capacity Reservation | ||
- Define Karpenter's behavior when encountering errors when attempting to launch nodes into Capacity Reservation | ||
- Define Karpenter's behavior when consolidating nodes in Capacit Reservation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Define Karpenter's behavior when consolidating nodes in Capacit Reservation | |
- Define Karpenter's behavior when consolidating nodes in Capacity Reservation |
designs/odcr.md
Outdated
##### Consolidating Capacity Reserved Instances | ||
|
||
During consolidation pricing does matter as it affects which candidate will be [prioritized](https://github.com/kubernetes-sigs/karpenter/blob/75826eb51589e546fffb594bfefa91f3850e6c82/pkg/controllers/disruption/consolidation.go#L156). Since all capacity instances are paid ahead of time, their cost is already incurred. Users would likely want to prioritize filling their capacition | ||
reservation first then fall back into other instances. Because of this reserved instances should likely show up as 0 dollar pricing when we calculate the instance pricing. Since each candidate is tied to a NodePool and NodeClass, we should be able to safely override the pricing per node under a capacity reservation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd view this as a special type of "offering" in the InstanceTypes returned by the CloudProvider. This is an ODCR offering that has price 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the cost is already a sunk cost (bought and paid for up front, no savings to be had by consolidating), we still need to consider consolidation from the perspective of shared resources. Meaning, there are other consumers (nodepools) that will potentially want access to these resources if they are available
|
||
##### Consolidating into Capacity Reserved Instances | ||
|
||
If we track Capacity Reservation usage, we can optimize the cluster configuration by moving non-Capacity Reserved instances into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should just work out-of-the-box if the user has prioritization set-up with weights on their NodePools and we model the pricing correctly. I'd rather just assume that this works and then discuss what the interaction is going to look like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on some of the usage patterns we've seen so far - a form of priority and fairness needs to be considered. We don't want one nodepool, or a few nodepools, consuming the majority of the reservation and leaving other nodepools starved for resources because they are unable to provision instances
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean "move"? Do you mean between the NodePools? Because if the ODCR is open, it will automatically bind even if you didn't target it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this meant multiple node pool has the same targetted ODCR
#### Labels | ||
|
||
When a node is launched against a CapacityReservation we will expose Capacity Reservation | ||
information as labels `karpenter.k8s.aws/capacity-reservation-id`. This is helpful for users to identify those nodes that are being used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open question: If we're tying one capacity reservation ID to an EC2NodeClass, we may not need this label at all because we can just use something like karpenter.k8s.aws/ec2nodeclass
to know which CR is being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is bringing in a conflicting detail that most users would not need to be aware of:
- If we add this label, then we are creating a 1:1 of capacity reservation to nodepool; this conflicts with the statements above about using resource groups where multiple ODCR IDs are added to the group, and the group is what is used for creating capacity, not the ODCRs directly
- This is not a "characteristic" a workload would typically look for in terms of label selectors. Workloads will look to target things such as OS platform, device manufacturer or model, etc. I would be curious to know the use cases where consumers are aware of the ODCR IDs and why they would use those for selectors over other characteristics provided by the instance details (i.e. - think NFD or GFD)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a 1 ODCR per NodePool restriction? If so, that's not clear to me. What if your CR selector terms match multiple ODCRs??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the label is attached to the nodeclaim and Node, not Nodepool or EC2NodeClass. Its a fact that gets added to the nodeclaim when it was brought up within a capacity reservation.
designs/odcr.md
Outdated
|
||
#### Status | ||
|
||
We will expose this information both in the NodeClaim's, EC2NodeClass' and NodePool's Condition Status. _I know that currently NodeClasses and NodePool status do not have conditions but I wanted to see if we are opened to adding Conditions to these resources_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may only need/want to expose this in the EC2NodeClass status condition. On top of that, this gets trickier depending on if we are talking about supporting a single capacity reservation or multiple. Obviously, only supporting one means that we can create a status condition to represent that one, but what does it mean when we have multiple? Do we just document the list of specified capacity reservations that aren't available?
designs/odcr.md
Outdated
We will avoid updating (unavailableOfferingCache)[https://github.com/aws/karpenter-provider-aws/blob/main/pkg/providers/instance/instance.go#L239C41-L239C58] because the pool is different than rest of AWS. However we may want to create a new unavailable offering cache keyed against Capacity Reservations. _Not sure if we want to support to this during the first iteration_ | ||
|
||
## Open Questions | ||
- The UX of adding Capacity Reservation feels wrong because NodeClasses previously didn't fully restrict instance types but with Capacity Reservation it kind of does. There isn't a good primative in Karpenter to expose these kinds of restrictions (I think?) specifically around preflight or static analysis that tells you your NodePool may not be able to launch any nodes because of X, Y and Z in your EC2NodeClass. I believe this is already an issue where if a NodeClass selects for x86 architecture AMIs but the NodePool allows for ARM architecture instance types that Karpenter may just quietly never spawn ARM instances? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a primitive. When you are resolving the GetInstanceTypes() call, you can just tell it not to return back certain instance types based on the EC2NodeClass used. We do have some slight mutation behavior around some EC2NodeClasses specified today already, like when using Windows, this scopes down the supported instance types that we return back.
In AWS [ODCR](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capacity-reservations.html) allows users to reserve compute capacity to mitigate against the risk of being | ||
unabled to get on demand capacity. This is very helpful during seasonal holidays where higher traffic is expected. | ||
|
||
### Capacity Reservations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI - there are (currently) two forms of reservation - on-demand capacity reservations (ODCR) and ML capacity block reservations (CBR) - either the terminology used here will need to be updated to be specific to ODCR, or the implementation should consider both forms of reservations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that it's worth considering, but it does also add the complication that capacity blocks are also their own capacity type (i.e. like Spot and OD).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc is specific to ODCR so for now we won't support CBR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/ec2@v1.162.1/types#UsageClassType
- on-demand
- spot
- capacity-block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please join Karpenter working group on June 6th 2PM PST to discuss (https://karpenter.sh/docs/contributing/working-group/)
designs/odcr.md
Outdated
However in the case where a Capacity Reservation Group is used it will fall back into On Demand nodes when the Reservation Group limit is hit. | ||
|
||
We won't introduce any other prioritization nor fallback when launching nodes against EC2NodeClasses with a Capacity Reservation. This means that it is the user's responsibility to ensure the Capacity Reservation associated to a EC2NodeClasses will include instance types, availability zone and platform the EC2NodeClasses should launch. | ||
This also means that we will not allow an EC2NodeClass to create Spot instances when user specified a Capacity Reservation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not?
designs/odcr.md
Outdated
Status: LimitExceeded | ||
Type: CapacityReservation | ||
``` | ||
The condition will reset if new nodes were able to launch and the Status will return to `Available`. It may also be helpful to show the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about when new ODCR ID(s) are added to the resource group - shouldn't this trigger an update?
…dditional add odcr information from current poc implementation
In AWS [ODCR](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capacity-reservations.html) allows users to reserve compute capacity to mitigate against the risk of being | ||
unabled to get on demand capacity. This is very helpful during seasonal holidays where higher traffic is expected. | ||
|
||
### Capacity Reservations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that it's worth considering, but it does also add the complication that capacity blocks are also their own capacity type (i.e. like Spot and OD).
- Define Karpenter's behavior when encountering errors when attempting to launch nodes into Capacity Reservation | ||
- Define Karpenter's behavior when consolidating nodes to Capacity Reservation when available | ||
|
||
## Non-Goals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May want to add Capacity Blocks to this list. I think Karpenter should support Capacity Blocks, but you may want to separate that out (or else explicitly call it out in the goals).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is CBR part of ODCR? The doc's title is specific to ODCR so I didn't think we would include any info around CBR because that isn't an ODCR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CBR is a separate reservation process from ODCR. I think the main point is just being explicitly clear about what this RFC is addressing, and also clear in the terminology used throughout to avoid any confusion between the two forms. For example Capacity Reservation
is generically used throughout - its not specific enough to disambiguate which form of capacity reservation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capacity Blocks are a type of Capacity Reservation. It makes more sense for the API to be around "capacity reservations" (which is what the EC2 APIs are around) versus "ODCRs", which is a short hand for a capacity reservation that is typed to on-demand. I see that your RFC calls out "capacityReservationSelectorTerms", which does not sound ODCR specific to me. As a result, does not seem to me to be obvious that Capacity Blocks are out of scope, hence my comment that it should probably be in your non-goal list and is probably something to think about in the design (even if you don't implement it now) because I'm guessing some folks will be interested in Capacity Blocks support.
AWS also supports grouping Capacity Reservation into Capacity Reservation groups. | ||
Both these entities are supported in Launch Template's CapacityReservationTarget [definitions](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ec2-launchtemplate-capacityreservationtarget.html). | ||
|
||
## Goals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you already do this by having an ODCR NodePool with max vCPUs configured?
# dedicated - The Capacity Reservation is created on single-tenant hardware that is dedicated to a single Amazon Web Services account. | ||
tenancy: String | None | ||
status: | ||
capacityReservations: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you purposely excluding start/end date? What about platform, owner id, reservation type, tags, create date, fleet id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure we can add all of them, basically the whole struct
totalInstanceCount: Integer | ||
``` | ||
|
||
This follows closely (does not implement all fields) how EC2 [DescribeCapacityReservations](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeCapacityReservations.html) can filter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is in reference to the selector terms? So I assume you're purposely excluding instance-match-criteria, placement-group-arn, and start/end dates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instanceMatchCriteria is only needed if we support something outside of target for this first implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont see a use case where you filter based on start/end date for the selector terms, as we by default only filter for active
ones
|
||
### Supporting new capacity-type "capacity-reservation" for NodePool | ||
|
||
Adding a new `karpenter.sh/capacity-type: capacity-reservation` allows us to have a EC2NodeClass that does not automatically fallback to `on-demand` if `capacity-reservation` is not available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would strongly suggest you incorporate the reservation type into this so you can differentiate an ODCR from a Capacity Block.
Are we sure capacity-type is the right way to model this? Right now, capacity-type aligns with EC2 instance metadata for market type. This would be a notable divergence from that existing philosophy. I'm wondering if we can't add a new top-level requirement instead of overloading capacity-type?
During consolidation pricing does matter as it affects which candidate will be [prioritized](https://github.com/kubernetes-sigs/karpenter/blob/75826eb51589e546fffb594bfefa91f3850e6c82/pkg/controllers/disruption/consolidation.go#L156). Since all capacity instances are paid ahead of time, their cost is already incurred. Users would likely want to prioritize filling their reserved capacity. | ||
reservation first then fall back into other instances. Because of this reserved instances should likely show up as 0 dollar pricing when we calculate the instance pricing. Since each candidate is tied to a NodePool and EC2NodeClass, we should be able to safely override the pricing per node under a capacity reservation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about if the ODCR gets canceled? The instance will still be running but no longer "0 dollars" - will Karpenter poll to account for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, we can add all the properties from the struct https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/ec2@v1.162.1/types#CapacityReservation
|
||
##### Consolidating into Capacity Reserved Instances | ||
|
||
If we track Capacity Reservation usage, we can optimize the cluster configuration by moving non-Capacity Reserved instances into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean "move"? Do you mean between the NodePools? Because if the ODCR is open, it will automatically bind even if you didn't target it.
|
||
#### Labels | ||
|
||
When a node is launched against a CapacityReservation we will expose Capacity Reservation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have a dedicated NodePool for an ODCR or set of ODCRs, it does seem like you would just want to label those nodes and have a node selector on that instead of the ODCR ID.
#### Labels | ||
|
||
When a node is launched against a CapacityReservation we will expose Capacity Reservation | ||
information as labels `karpenter.k8s.aws/capacity-reservation-id`. This is helpful for users to identify those nodes that are being used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a 1 ODCR per NodePool restriction? If so, that's not clear to me. What if your CR selector terms match multiple ODCRs??
designs/odcr.md
Outdated
|
||
_We are keeping the scope of this design very targeted so even if these could be things we eventually support, we aren't scoping them into this design_ | ||
- Supporting open Capacity Reservations. _The behaviour of indirectly linked nodes to an open ODCR can cause rotation, adding unnecessary node rotation into the cluster_ | ||
- Supporting changes in scaling behavior when ODCR is associated to a NodeClass. _We won't bring up N nodes to match an N node capacity reservation_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea here that we'd be too reactive? I'd like to understand a bit more why this is hard, and shouldn't be in the MVP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we do this, that means the ODCRs can not be shared, this should be an additional feature if really needed outside of this initial approach
metadata: | ||
name: example-node-class | ||
spec: | ||
capacityReservationSelectorTerms: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which AWS SDK Go struct type should this most closely match? https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/ec2#CreateCapacityReservationInput or https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/ec2#DescribeCapacityReservationFleetsInput?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filtering of DesribeCacpacityReservations isn't great, so I filter it mostly after the fact using the properties in the object available
https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/ec2@v1.162.1/types#CapacityReservation
designs/odcr.md
Outdated
# The platform of operating system for which the Capacity Reservation reserves capacity | ||
id: String | None | ||
# The type of operating system for which the Capacity Reservation reserves capacity | ||
instancePlatform: String | None | ||
# The type of operating system for which the Capacity Reservation reserves capacity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably worth updating these comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Starts the fix but doesn't completely fixes #3042
Description
Narrowly scoped design doc for support ODCR in Karpenter.
How was this change tested?
Does this change impact docs?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.