docs: RFC Supporting ODCR in Karpenter #5716

garvinp-stripe · 2024-02-23T00:00:59Z

Starts the fix but doesn't completely fixes #3042

Description
Narrowly scoped design doc for support ODCR in Karpenter.

How was this change tested?

Does this change impact docs?

Yes, PR includes docs updates
Yes, issue opened: #
No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

netlify · 2024-02-23T00:01:35Z

✅ Deploy Preview for karpenter-docs-prod canceled.

Name	Link
🔨 Latest commit	`6760e96`
🔍 Latest deploy log	https://app.netlify.com/sites/karpenter-docs-prod/deploys/65d7e03e3d1e3f00089b777d

netlify · 2024-02-23T00:02:20Z

✅ Deploy Preview for karpenter-docs-prod ready!

Name	Link
🔨 Latest commit	`fca1abf`
🔍 Latest deploy log	https://app.netlify.com/sites/karpenter-docs-prod/deploys/665e39be4bf34d000850af24
😎 Deploy Preview	https://deploy-preview-5716--karpenter-docs-prod.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

jonathan-innis

Nice doc! Some high-level comments

designs/odcr.md

jonathan-innis · 2024-02-28T05:31:27Z

designs/odcr.md

+
+_We are keeping the scope of this design very targeted so even if these could be things we eventually support, we aren't scoping them into this design_
+- Supporting prioritization when launching nodes 
+- Supporting any pre-calcaluation of ODCR capacity utilization before node launch


Why do we think that pre-calculation should be out of scope here? How do we handle the case where we try to launch into a capacity reservation that doesn't have any available instance types and it fails?

I think it would be quite complicated to implement. Because checking if capacity exist then launching against can't be done atomically, there is no guarantee that we won't run into capacity issue. I felt that allowing this to fail and we fallback into unable to launch additional capacity from this node pool would be reasonable and push users toward using weights on different node pools to pickup any insufficient capacity issues.

Sure, I just wonder if we can do this optimistically, with ICE-ing as a back-stop. I worry that if we don't pre-check we'll just keep slamming the API because we think that the NodePool still has instance type availability ever 3m.

Mm that make sense. I think its reasonable to call the API to check. Do you think we should cache this as part of the existing instance type offering cache?

jonathan-innis · 2024-02-28T05:31:38Z

designs/odcr.md

+## Non-Goals
+
+_We are keeping the scope of this design very targeted so even if these could be things we eventually support, we aren't scoping them into this design_
+- Supporting prioritization when launching nodes 


Doesn't Karpenter already support this through its NodePool "weight" concept?

I meant within a single node pool. I can update

jonathan-innis · 2024-02-28T05:36:17Z

designs/odcr.md

+Cons:
+- This implementation is overly restrictive causing it to be difficult to use. Its possible that the restrictions makes it too hard for users to use EC2NodeClasses effectively.
+
+Both a pro and a con:


In general, I think this is a good thing. When it comes to fallback and prioritization, I think this makes sense. You have a set of instance types that you prioritize for usage with the ODCR and then you can create a lower-weighted NodePool that has a broader set of instance types or perhaps uses the same instance type that you had for the ODCR but with spot and on-demand to try to get that type again just like from a normal request

jonathan-innis · 2024-02-28T05:37:56Z

designs/odcr.md

+
+#### Labels
+
+When a node is launched against a CapacityReservation we will expose Capacity Reservation


Anytime we are thinking about setting a label, we have to think if a user would select against it. Are you anticipating that users would select against the capacity reservation ID. I could perhaps see that, but I think that the reservation setting is significantly less valuable. Were you thinking of just having this there for observability information?

In either case, I think that we should callout how we expect users to use these new labels if we are going to propose them here.

Agree on the setting, I think Id is much more helpful than setting

If you have a dedicated NodePool for an ODCR or set of ODCRs, it does seem like you would just want to label those nodes and have a node selector on that instead of the ODCR ID.

How do I track that my nodeclaim is part of a targeted capacity reservation that no longer exists, but I have another targeted capacity reservation that has available capacity, I need to terminate my nodeclaim and bring up a new on on the unused capacity reservation

jonathan-innis · 2024-02-28T05:38:33Z

designs/odcr.md

+
+#### Status
+
+We will expose this information both in the NodeClaim's, EC2NodeClass' and NodePool's Condition Status. _I know that currently NodeClasses and NodePool status do not have conditions but I wanted to see if we are opened to adding Conditions to these resources_


Yes, absolutely open to adding conditions, see: kubernetes-sigs/karpenter#493

jonathan-innis · 2024-02-28T05:39:14Z

designs/odcr.md

+Status:
+  Conditions:
+    Last Transition Time:  2024-02-22T03:46:42Z
+    Status:                LimitExceeded


Do you think that we need this status condition if Karpenter is aware of the available capacity and it can just know when to fallback to other NodePools since it ran out of capacity?

I think so. From an operator's perspective it would be good to know currently this nodeclass hit its limit and we are going to fallback into other node pools

designs/odcr.md

coveralls · 2024-02-28T06:07:03Z

Pull Request Test Coverage Report for Build 8025230145

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 82.579%

Totals
Change from base Build 8025066475:	0.0%
Covered Lines:	5053
Relevant Lines:	6119

💛 - Coveralls

Co-authored-by: Jonathan Innis <jonathan.innis.ji@gmail.com>

garvinp-stripe · 2024-03-25T18:29:17Z

Sorry for the slow down. I will try to get back to this PR but atm I got reshuffled in work so need some time to get back to this

tvonhacht-apple · 2024-05-05T15:01:46Z

designs/odcr.md

+  capacityReservationSpec:
+    capacityReservationPreference: open | none | None  # Cannot be defined if capacityReservationTarget is specified
+    capacityReservationTarget: # Cannot be defined if capacityReservationPreference is specified
+      capacityReservationId: String | None
+      capacityReservationResourceGroupArn: String | None


Can we keep this similar to existing design of how to find objects?
It seems using selectorTerms is the current design (f.e.: https://github.com/aws/karpenter-provider-aws/blob/main/pkg/apis/crds/karpenter.k8s.aws_ec2nodeclasses.yaml#L60-L102)

Suggested change

capacityReservationSpec:

capacityReservationPreference: open | none | None # Cannot be defined if capacityReservationTarget is specified

capacityReservationTarget: # Cannot be defined if capacityReservationPreference is specified

capacityReservationId: String | None

capacityReservationResourceGroupArn: String | None

capacityReservationSelectorTerms:

- tags:

karpenter.sh/discovery: "${CLUSTER_NAME}"

environment: test

- id: cr-1234567 # arn

- preference: open | none | None

capacityReservationResourceGroupSelectorTerms:

- tags:

karpenter.sh/discovery: "${CLUSTER_NAME}"

environment: test

- id: arn:aws:resource-groups:us-west-2:123456789012:group/my-resource-group

- preference: open | none | None

I'd skeptical of a need to specify multiple ODCRs that we can select on in a single EC2NodeClass. What are you imagining the use-case is for wanting to specify multiple ODCRs in an EC2NodeClass? The only one I can think of is multi-AZ support. Other than than -- I figure you get an ODCR because you have a particular type of capacity that you are planning to use for your workload and you want to target that specifically. Do we have an example of how a pod would target this capacity? Would we imagine that these pods target instance types with preferences and then fallback?

the use of multiple ODCRs is because additional capacity is (usually) allocated via a new ODCR. so if you are growing your capacity, you are doing so by adding additional ODCRs. This is why its recommended to use the resource group which acts as the container, and as new capacity is allocated via additional ODCRs, those are associated to the resource group which expands the pool of available capacity

The use of the resource group helps to decouple the act of adding capacity from the consumers of the capacity (i.e. - I don't need to keep updating my manifests when capacity is added. Might be less of a concern for Karpenter, but can be quite disruptive and harder to consume with node groups that are utilizing launch templates)

tvonhacht-apple · 2024-05-05T15:07:11Z

designs/odcr.md

+
+The main failure scenario is when Capacity Reservation limit is hit and no new nodes can be launched from any Capacity Reservation the launch template targets. 
+
+#### Status


for this, we need a documentation that we will update the status of EC2NodeClass for the found capacity reservations / resource groups, similar how we do it for AMI or Subnets.

tvonhacht-apple · 2024-05-05T15:44:23Z

designs/odcr.md

+
+Pros: 
+- This puts the onus of checking on the user to verify capatbility of their NodeClasses and requirements. It makes no assumption about what a Capacity Reservation can do nor implies can't do (Open to checking if Capacity is at limit prior to calling CreateFleet).
+- This forces users who wish to leverage fallback and prioritization to use NodePool weights or Capacity Reservation Group rather than relying on EC2NodeClass.


this only works for Capacity Reservations
As far as I understand Capacity Reservation Resource Groups have an automatic fallback to on-demand? for the initial design are we for that reason than only scoping in Capacity Reservations?

jonathan-innis · 2024-05-10T03:48:03Z

designs/odcr.md

+
+## Background
+
+In AWS [ODCR](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capacity-reservations.html) allows users to reserve compute capacity to mitigate against the risk of being 


Suggested change

In AWS [ODCR](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capacity-reservations.html) allows users to reserve compute capacity to mitigate against the risk of being

In AWS [ODCR](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capacity-reservations.html) allows users to reserve compute capacity to mitigate the risk of

jonathan-innis · 2024-05-10T03:48:14Z

designs/odcr.md

+## Background
+
+In AWS [ODCR](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capacity-reservations.html) allows users to reserve compute capacity to mitigate against the risk of being 
+unabled to get on demand capacity. This is very helpful during seasonal holidays where higher traffic is expected. 


Suggested change

unabled to get on demand capacity. This is very helpful during seasonal holidays where higher traffic is expected.

getting on-demand capacity. This is very helpful during seasonal holidays where higher traffic is expected.

jonathan-innis · 2024-05-10T03:49:22Z

designs/odcr.md

+## Background
+
+In AWS [ODCR](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capacity-reservations.html) allows users to reserve compute capacity to mitigate against the risk of being 
+unabled to get on demand capacity. This is very helpful during seasonal holidays where higher traffic is expected. 


Suggested change

unabled to get on demand capacity. This is very helpful during seasonal holidays where higher traffic is expected.

unabled to get on demand capacity. This is very helpful during seasonal holidays where higher traffic is expected or for reserving highly-desired instance types, like the `p5.48xlarge` or other large GPU instance types.

jonathan-innis · 2024-05-10T03:50:45Z

designs/odcr.md

+AWS also supports grouping Capacity Reservation into Capacity Reservation groups. 
+Both these entities are supported in Launch Template's CapacityReservationTarget [definitions](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ec2-launchtemplate-capacityreservationtarget.html).
+
+## Goals


Does it make sense to lay out a use-cases section here for the need and usage of ODCRs? Knowing these use-cases will help set the context and also help us think about how we should design the solution

One additional goal I might (strongly) suggest - the ability to define "fairness" in some form when an ODCR is shared amongst multiple nodepools/clusters. The scenario:

An ODCR for 500 instances is created

That ODCR is shared across x-number of AWS accounts (say 5)

Each account contains a cluster that should be entitled to 1/x (roughly) percent of the reservation at minimum (each cluster at minimum is guaranteed access to launch 100 instances)

The most basic form could be simply providing a max/ceiling on the number of instances a nodepool can launch from a reservation and let users set that value.

This is used in conjunction with the resource group - as more capacity is needed, more ODCRs are created and added into the "pool" of capacity, but that "pool" of capacity is shared amongst multiple Karpener nodepools across clusters/accounts

Can't you already do this by having an ODCR NodePool with max vCPUs configured?

One additional goal I might (strongly) suggest - the ability to define "fairness" in some form when an ODCR is shared amongst multiple nodepools/clusters. The scenario:

An ODCR for 500 instances is created

That ODCR is shared across x-number of AWS accounts (say 5)

Each account contains a cluster that should be entitled to 1/x (roughly) percent of the reservation at minimum (each cluster at minimum is guaranteed access to launch 100 instances)

The most basic form could be simply providing a max/ceiling on the number of instances a nodepool can launch from a reservation and let users set that value.

This is used in conjunction with the resource group - as more capacity is needed, more ODCRs are created and added into the "pool" of capacity, but that "pool" of capacity is shared amongst multiple Karpener nodepools across clusters/accounts

What you are describing is outside of the current capabilities of AWS itself, once those capabilities are available we can update or create another design document to incorporate this

Can't you already do this by having an ODCR NodePool with max vCPUs configured?

Agreed, seems like a quick win, but is management overhead for now

What you are describing is outside of the current capabilities of AWS itself

Which capability(s) exactly? You can share ODCRs across accounts

Can't you already do this by having an ODCR NodePool with max vCPUs configured?

In theory, yes - but for most of the workloads that are utilizing GPUs via ODCRs, you aren't using vCPU as a proxy for your GPU allotment. p5s have double the vCPU capacity as p4s - so not a great proxy value

Not saying every use case has to be considered and implemented on the first go - but it seems we are missing the mark on ODCRs and how they are intended to be used:

ODCRs should be added to a resource group and treated like a pool of compute resources. This pool of compute will expand/contract by adding/removing ODCRs

ODCRs are not always 1:1 to a cluster/account/team (they rarely are 1:1) - again, its a pool of resources and that pool may need to be divided/shared amongst clusters/accounts/teams

If you have a fixed pool of available resources, you will inevitably need to consider priority and fairness in terms of resource allocation. On day 1 you may have two teams who each consume half the pool by verbally agreeing to use x number of instances. As workloads grow in size, as is the norm for ML workloads, teams will need more resources and now you need to give the whole pool of resources to one team at a time but still ensure teams have equal time with the pool of resources

This is an RFC so not looking for every configuration or use case to be provided initially, but just want to ensure we are looking at the use cases and workloads and thinking about the implementation holistically - at least ensuring we have considered those common uses cases and can support them in the future without too much churn/disruption to the API

Those cases were considered however we didn't choose the implement them on the first pass for a few reasons.

Resource groups for capacity reservation makes the problem considerably more complicated because of its fallback logic to OD if the Capacity reservation group (CRG)'s ODCRs can't satisfy the request. This is complicated because it requires more complicated reconciliation because at this time we assume any ODCR will cost zero and have a fixed limit of instances. That isn't to say this won't be supported but to move this project forward we needed to narrow in a basic use case before we tackle the harder ones.

Multi-account usage is supported in the selectors however you are right that we lack any fairness. Once again because we didn't get any early feedback from users we didn't implement any fairness between "shared" ODCRs.

Please join Karpenter working group on June 6th 2PM PST to discuss (https://karpenter.sh/docs/contributing/working-group/)

jonathan-innis · 2024-05-10T03:51:39Z

designs/odcr.md

+- Support associating ODCR to EC2NodeClass
+- Define Karpenter's behavior when launching nodes into Capacity Reservation
+- Define Karpenter's behavior when encountering errors when attempting to launch nodes into Capacity Reservation
+- Define Karpenter's behavior when consolidating nodes in Capacit Reservation


Suggested change

- Define Karpenter's behavior when consolidating nodes in Capacit Reservation

- Define Karpenter's behavior when consolidating nodes in Capacity Reservation

jonathan-innis · 2024-05-10T04:27:37Z

designs/odcr.md

+##### Consolidating Capacity Reserved Instances
+
+During consolidation pricing does matter as it affects which candidate will be [prioritized](https://github.com/kubernetes-sigs/karpenter/blob/75826eb51589e546fffb594bfefa91f3850e6c82/pkg/controllers/disruption/consolidation.go#L156). Since all capacity instances are paid ahead of time, their cost is already incurred. Users would likely want to prioritize filling their capacition
+reservation first then fall back into other instances. Because of this reserved instances should likely show up as 0 dollar pricing when we calculate the instance pricing. Since each candidate is tied to a NodePool and NodeClass, we should be able to safely override the pricing per node under a capacity reservation.


I'd view this as a special type of "offering" in the InstanceTypes returned by the CloudProvider. This is an ODCR offering that has price 0.

Since the cost is already a sunk cost (bought and paid for up front, no savings to be had by consolidating), we still need to consider consolidation from the perspective of shared resources. Meaning, there are other consumers (nodepools) that will potentially want access to these resources if they are available

jonathan-innis · 2024-05-10T04:31:19Z

designs/odcr.md

+
+##### Consolidating into Capacity Reserved Instances
+
+If we track Capacity Reservation usage, we can optimize the cluster configuration by moving non-Capacity Reserved instances into 


I think this should just work out-of-the-box if the user has prioritization set-up with weights on their NodePools and we model the pricing correctly. I'd rather just assume that this works and then discuss what the interaction is going to look like.

Based on some of the usage patterns we've seen so far - a form of priority and fairness needs to be considered. We don't want one nodepool, or a few nodepools, consuming the majority of the reservation and leaving other nodepools starved for resources because they are unable to provision instances

What do you mean "move"? Do you mean between the NodePools? Because if the ODCR is open, it will automatically bind even if you didn't target it.

I think this meant multiple node pool has the same targetted ODCR

jonathan-innis · 2024-05-10T04:34:45Z

designs/odcr.md

+#### Labels
+
+When a node is launched against a CapacityReservation we will expose Capacity Reservation
+information as labels `karpenter.k8s.aws/capacity-reservation-id`. This is helpful for users to identify those nodes that are being used


Open question: If we're tying one capacity reservation ID to an EC2NodeClass, we may not need this label at all because we can just use something like karpenter.k8s.aws/ec2nodeclass to know which CR is being used.

I think this is bringing in a conflicting detail that most users would not need to be aware of:

If we add this label, then we are creating a 1:1 of capacity reservation to nodepool; this conflicts with the statements above about using resource groups where multiple ODCR IDs are added to the group, and the group is what is used for creating capacity, not the ODCRs directly

This is not a "characteristic" a workload would typically look for in terms of label selectors. Workloads will look to target things such as OS platform, device manufacturer or model, etc. I would be curious to know the use cases where consumers are aware of the ODCR IDs and why they would use those for selectors over other characteristics provided by the instance details (i.e. - think NFD or GFD)

Is there a 1 ODCR per NodePool restriction? If so, that's not clear to me. What if your CR selector terms match multiple ODCRs??

the label is attached to the nodeclaim and Node, not Nodepool or EC2NodeClass. Its a fact that gets added to the nodeclaim when it was brought up within a capacity reservation.

jonathan-innis · 2024-05-10T04:36:43Z

designs/odcr.md

+
+#### Status
+
+We will expose this information both in the NodeClaim's, EC2NodeClass' and NodePool's Condition Status. _I know that currently NodeClasses and NodePool status do not have conditions but I wanted to see if we are opened to adding Conditions to these resources_


I think we may only need/want to expose this in the EC2NodeClass status condition. On top of that, this gets trickier depending on if we are talking about supporting a single capacity reservation or multiple. Obviously, only supporting one means that we can create a status condition to represent that one, but what does it mean when we have multiple? Do we just document the list of specified capacity reservations that aren't available?

jonathan-innis · 2024-05-10T04:37:58Z

designs/odcr.md

+We will avoid updating (unavailableOfferingCache)[https://github.com/aws/karpenter-provider-aws/blob/main/pkg/providers/instance/instance.go#L239C41-L239C58] because the pool is different than rest of AWS. However we may want to create a new unavailable offering cache keyed against Capacity Reservations. _Not sure if we want to support to this during the first iteration_ 
+
+## Open Questions
+- The UX of adding Capacity Reservation feels wrong because NodeClasses previously didn't fully restrict instance types but with Capacity Reservation it kind of does. There isn't a good primative in Karpenter to expose these kinds of restrictions (I think?) specifically around preflight or static analysis that tells you your NodePool may not be able to launch any nodes because of X, Y and Z in your EC2NodeClass. I believe this is already an issue where if a NodeClass selects for x86 architecture AMIs but the NodePool allows for ARM architecture instance types that Karpenter may just quietly never spawn ARM instances?


There's a primitive. When you are resolving the GetInstanceTypes() call, you can just tell it not to return back certain instance types based on the EC2NodeClass used. We do have some slight mutation behavior around some EC2NodeClasses specified today already, like when using Windows, this scopes down the supported instance types that we return back.

bryantbiggs · 2024-05-21T13:28:22Z

designs/odcr.md

+In AWS [ODCR](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capacity-reservations.html) allows users to reserve compute capacity to mitigate against the risk of being 
+unabled to get on demand capacity. This is very helpful during seasonal holidays where higher traffic is expected. 
+
+### Capacity Reservations


FYI - there are (currently) two forms of reservation - on-demand capacity reservations (ODCR) and ML capacity block reservations (CBR) - either the terminology used here will need to be updated to be specific to ODCR, or the implementation should consider both forms of reservations

Agreed that it's worth considering, but it does also add the complication that capacity blocks are also their own capacity type (i.e. like Spot and OD).

The doc is specific to ODCR so for now we won't support CBR

https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/ec2@v1.162.1/types#UsageClassType

on-demand

spot

capacity-block

Please join Karpenter working group on June 6th 2PM PST to discuss (https://karpenter.sh/docs/contributing/working-group/)

bryantbiggs · 2024-05-21T13:49:10Z

designs/odcr.md

+However in the case where a Capacity Reservation Group is used it will fall back into On Demand nodes when the Reservation Group limit is hit. 
+
+We won't introduce any other prioritization nor fallback when launching nodes against EC2NodeClasses with a Capacity Reservation. This means that it is the user's responsibility to ensure the Capacity Reservation associated to a EC2NodeClasses will include instance types, availability zone and platform the EC2NodeClasses should launch.
+This also means that we will not allow an EC2NodeClass to create Spot instances when user specified a Capacity Reservation. 


bryantbiggs · 2024-05-21T14:06:51Z

designs/odcr.md

+    Status:                LimitExceeded
+    Type:                  CapacityReservation
+```
+The condition will reset if new nodes were able to launch and the Status will return to `Available`. It may also be helpful to show the


What about when new ODCR ID(s) are added to the resource group - shouldn't this trigger an update?

…dditional add odcr information from current poc implementation

eptiger · 2024-05-31T21:18:04Z

designs/odcr.md

+In AWS [ODCR](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capacity-reservations.html) allows users to reserve compute capacity to mitigate against the risk of being 
+unabled to get on demand capacity. This is very helpful during seasonal holidays where higher traffic is expected. 
+
+### Capacity Reservations


Agreed that it's worth considering, but it does also add the complication that capacity blocks are also their own capacity type (i.e. like Spot and OD).

eptiger · 2024-05-31T21:20:13Z

designs/odcr.md

+- Define Karpenter's behavior when encountering errors when attempting to launch nodes into Capacity Reservation
+- Define Karpenter's behavior when consolidating nodes to Capacity Reservation when available
+
+## Non-Goals


May want to add Capacity Blocks to this list. I think Karpenter should support Capacity Blocks, but you may want to separate that out (or else explicitly call it out in the goals).

Is CBR part of ODCR? The doc's title is specific to ODCR so I didn't think we would include any info around CBR because that isn't an ODCR

CBR is a separate reservation process from ODCR. I think the main point is just being explicitly clear about what this RFC is addressing, and also clear in the terminology used throughout to avoid any confusion between the two forms. For example Capacity Reservation is generically used throughout - its not specific enough to disambiguate which form of capacity reservation

Capacity Blocks are a type of Capacity Reservation. It makes more sense for the API to be around "capacity reservations" (which is what the EC2 APIs are around) versus "ODCRs", which is a short hand for a capacity reservation that is typed to on-demand. I see that your RFC calls out "capacityReservationSelectorTerms", which does not sound ODCR specific to me. As a result, does not seem to me to be obvious that Capacity Blocks are out of scope, hence my comment that it should probably be in your non-goal list and is probably something to think about in the design (even if you don't implement it now) because I'm guessing some folks will be interested in Capacity Blocks support.

eptiger · 2024-05-31T21:20:16Z

designs/odcr.md

+AWS also supports grouping Capacity Reservation into Capacity Reservation groups. 
+Both these entities are supported in Launch Template's CapacityReservationTarget [definitions](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ec2-launchtemplate-capacityreservationtarget.html).
+
+## Goals


Can't you already do this by having an ODCR NodePool with max vCPUs configured?

eptiger · 2024-05-31T21:23:57Z

designs/odcr.md

+      #   dedicated - The Capacity Reservation is created on single-tenant hardware that is dedicated to a single Amazon Web Services account.
+      tenancy: String | None
+status:
+  capacityReservations:


Are you purposely excluding start/end date? What about platform, owner id, reservation type, tags, create date, fleet id?

sure we can add all of them, basically the whole struct

eptiger · 2024-05-31T21:24:59Z

designs/odcr.md

+      totalInstanceCount: Integer
+```
+
+This follows closely (does not implement all fields) how EC2 [DescribeCapacityReservations](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeCapacityReservations.html) can filter.


I assume this is in reference to the selector terms? So I assume you're purposely excluding instance-match-criteria, placement-group-arn, and start/end dates?

instanceMatchCriteria is only needed if we support something outside of target for this first implementation

I dont see a use case where you filter based on start/end date for the selector terms, as we by default only filter for active ones

eptiger · 2024-05-31T21:26:50Z

designs/odcr.md

+
+### Supporting new capacity-type "capacity-reservation" for NodePool
+
+Adding a new `karpenter.sh/capacity-type: capacity-reservation` allows us to have a EC2NodeClass that does not automatically fallback to `on-demand` if `capacity-reservation` is not available.


I would strongly suggest you incorporate the reservation type into this so you can differentiate an ODCR from a Capacity Block.

Are we sure capacity-type is the right way to model this? Right now, capacity-type aligns with EC2 instance metadata for market type. This would be a notable divergence from that existing philosophy. I'm wondering if we can't add a new top-level requirement instead of overloading capacity-type?

eptiger · 2024-05-31T21:30:00Z

designs/odcr.md

+During consolidation pricing does matter as it affects which candidate will be [prioritized](https://github.com/kubernetes-sigs/karpenter/blob/75826eb51589e546fffb594bfefa91f3850e6c82/pkg/controllers/disruption/consolidation.go#L156). Since all capacity instances are paid ahead of time, their cost is already incurred. Users would likely want to prioritize filling their reserved capacity.
+reservation first then fall back into other instances. Because of this reserved instances should likely show up as 0 dollar pricing when we calculate the instance pricing. Since each candidate is tied to a NodePool and EC2NodeClass, we should be able to safely override the pricing per node under a capacity reservation.


What about if the ODCR gets canceled? The instance will still be running but no longer "0 dollars" - will Karpenter poll to account for this?

sure, we can add all the properties from the struct https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/ec2@v1.162.1/types#CapacityReservation

eptiger · 2024-05-31T21:31:03Z

designs/odcr.md

+
+##### Consolidating into Capacity Reserved Instances
+
+If we track Capacity Reservation usage, we can optimize the cluster configuration by moving non-Capacity Reserved instances into 


What do you mean "move"? Do you mean between the NodePools? Because if the ODCR is open, it will automatically bind even if you didn't target it.

eptiger · 2024-05-31T21:32:09Z

designs/odcr.md

+
+#### Labels
+
+When a node is launched against a CapacityReservation we will expose Capacity Reservation


If you have a dedicated NodePool for an ODCR or set of ODCRs, it does seem like you would just want to label those nodes and have a node selector on that instead of the ODCR ID.

eptiger · 2024-05-31T21:33:05Z

designs/odcr.md

+#### Labels
+
+When a node is launched against a CapacityReservation we will expose Capacity Reservation
+information as labels `karpenter.k8s.aws/capacity-reservation-id`. This is helpful for users to identify those nodes that are being used


Is there a 1 ODCR per NodePool restriction? If so, that's not clear to me. What if your CR selector terms match multiple ODCRs??

njtran · 2024-05-28T23:48:45Z

designs/odcr.md

+
+_We are keeping the scope of this design very targeted so even if these could be things we eventually support, we aren't scoping them into this design_
+- Supporting open Capacity Reservations. _The behaviour of indirectly linked nodes to an open ODCR can cause rotation, adding unnecessary node rotation into the cluster_
+- Supporting changes in scaling behavior when ODCR is associated to a NodeClass. _We won't bring up N nodes to match an N node capacity reservation_


Is the idea here that we'd be too reactive? I'd like to understand a bit more why this is hard, and shouldn't be in the MVP.

if we do this, that means the ODCRs can not be shared, this should be an additional feature if really needed outside of this initial approach

njtran · 2024-05-29T20:49:43Z

designs/odcr.md

+metadata:
+  name: example-node-class
+spec:
+  capacityReservationSelectorTerms:


Which AWS SDK Go struct type should this most closely match? https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/ec2#CreateCapacityReservationInput or https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/ec2#DescribeCapacityReservationFleetsInput?

The filtering of DesribeCacpacityReservations isn't great, so I filter it mostly after the fact using the properties in the object available
https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/ec2@v1.162.1/types#CapacityReservation

njtran · 2024-05-29T20:51:02Z

designs/odcr.md

+      # The platform of operating system for which the Capacity Reservation reserves capacity
+      id: String | None
+      # The type of operating system for which the Capacity Reservation reserves capacity
+      instancePlatform: String | None
+      # The type of operating system for which the Capacity Reservation reserves capacity


probably worth updating these comments

garvinp-stripe added 12 commits February 20, 2024 16:27

Create odcr.md

3220646

Continue on doc

e6d4892

launching nodes in CRs

f42429f

Update odcr.md

814ddfb

Update odcr.md

a311067

Update odcr.md

16d7bf4

fail scenario

366eebf

Update odcr.md

d365a24

Update odcr.md

a22850b

failure

cd74ceb

Update odcr.md

6760e96

Merge branch 'main' into garvinp-odcr-support-doc

aa80b4a

garvinp-stripe added 3 commits February 23, 2024 09:03

Update odcr.md

0a11af5

Update odcr.md

47e009c

Merge branch 'main' into garvinp-odcr-support-doc

d627b58

garvinp-stripe marked this pull request as ready for review February 23, 2024 21:21

garvinp-stripe requested a review from a team as a code owner February 23, 2024 21:21

garvinp-stripe requested a review from jigisha620 February 23, 2024 21:21

jonathan-innis reviewed Feb 28, 2024

View reviewed changes

billrayburn requested a review from akestner March 12, 2024 17:40

garvinp-stripe and others added 3 commits March 20, 2024 10:55

Update designs/odcr.md

63c3dc7

Co-authored-by: Jonathan Innis <jonathan.innis.ji@gmail.com>

Update designs/odcr.md

0ea7e46

Co-authored-by: Jonathan Innis <jonathan.innis.ji@gmail.com>

Merge branch 'main' into garvinp-odcr-support-doc

c03f18a

jonathan-innis self-assigned this Apr 16, 2024

garvinp-stripe and others added 2 commits May 1, 2024 13:24

PR comments + costing/pricing

60b819a

Merge branch 'main' into garvinp-odcr-support-doc

d59baf2

add more to consolidation + createfleet behavior

9a662e7

tvonhacht-apple reviewed May 5, 2024

View reviewed changes

jonathan-innis reviewed May 10, 2024

View reviewed changes

This was referenced May 14, 2024

feat: RFC Implementation Supporting ODCR #6198

Open

feat: RFC Implementation Supporting AWS On-Demand Capacity Reservations kubernetes-sigs/karpenter#1263

Open

add odcr information from current poc implementation

be69c4e

bryantbiggs reviewed May 21, 2024

View reviewed changes

garvinp-stripe and others added 2 commits May 21, 2024 09:19

Merge pull request #1 from tvonhacht-apple/garvinp-odcr-support-doc-a…

ca81e97

…dditional add odcr information from current poc implementation

fix typos and clunky sentences

133f2de

eptiger reviewed May 31, 2024

View reviewed changes

njtran reviewed Jun 3, 2024

View reviewed changes

Adresse RFC comments

fca1abf


		#### Labels

		When a node is launched against a CapacityReservation we will expose Capacity Reservation


		#### Status

		We will expose this information both in the NodeClaim's, EC2NodeClass' and NodePool's Condition Status. _I know that currently NodeClasses and NodePool status do not have conditions but I wanted to see if we are opened to adding Conditions to these resources_

-  capacityReservationSpec:
-    capacityReservationPreference: open | none | None  # Cannot be defined if capacityReservationTarget is specified
-    capacityReservationTarget: # Cannot be defined if capacityReservationPreference is specified
-      capacityReservationId: String | None
-      capacityReservationResourceGroupArn: String | None
+  capacityReservationSelectorTerms:
+    - tags:
+        karpenter.sh/discovery: "${CLUSTER_NAME}"
+        environment: test
+    - id: cr-1234567 # arn
+    - preference: open | none | None
+  capacityReservationResourceGroupSelectorTerms:
+    - tags:
+        karpenter.sh/discovery: "${CLUSTER_NAME}"
+        environment: test
+    - id: arn:aws:resource-groups:us-west-2:123456789012:group/my-resource-group
+    - preference: open | none | None


		The main failure scenario is when Capacity Reservation limit is hit and no new nodes can be launched from any Capacity Reservation the launch template targets.

		#### Status


		## Background

		In AWS [ODCR](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capacity-reservations.html) allows users to reserve compute capacity to mitigate against the risk of being

	unabled to get on demand capacity. This is very helpful during seasonal holidays where higher traffic is expected.
	getting on-demand capacity. This is very helpful during seasonal holidays where higher traffic is expected.

	- Define Karpenter's behavior when consolidating nodes in Capacit Reservation
	- Define Karpenter's behavior when consolidating nodes in Capacity Reservation


		##### Consolidating into Capacity Reserved Instances

		If we track Capacity Reservation usage, we can optimize the cluster configuration by moving non-Capacity Reserved instances into


		### Supporting new capacity-type "capacity-reservation" for NodePool

		Adding a new `karpenter.sh/capacity-type: capacity-reservation` allows us to have a EC2NodeClass that does not automatically fallback to `on-demand` if `capacity-reservation` is not available.

		During consolidation pricing does matter as it affects which candidate will be [prioritized](https://github.com/kubernetes-sigs/karpenter/blob/75826eb51589e546fffb594bfefa91f3850e6c82/pkg/controllers/disruption/consolidation.go#L156). Since all capacity instances are paid ahead of time, their cost is already incurred. Users would likely want to prioritize filling their reserved capacity.
		reservation first then fall back into other instances. Because of this reserved instances should likely show up as 0 dollar pricing when we calculate the instance pricing. Since each candidate is tied to a NodePool and EC2NodeClass, we should be able to safely override the pricing per node under a capacity reservation.

docs: RFC Supporting ODCR in Karpenter #5716

Are you sure you want to change the base?

docs: RFC Supporting ODCR in Karpenter #5716

Conversation

garvinp-stripe commented Feb 23, 2024 • edited

netlify bot commented Feb 23, 2024 • edited

✅ Deploy Preview for karpenter-docs-prod canceled.

netlify bot commented Feb 23, 2024 • edited

✅ Deploy Preview for karpenter-docs-prod ready!

jonathan-innis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Feb 28, 2024

Pull Request Test Coverage Report for Build 8025230145

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

garvinp-stripe commented Mar 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GnatorX Jun 3, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GnatorX Jun 3, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

garvinp-stripe commented Feb 23, 2024 •

edited

netlify bot commented Feb 23, 2024 •

edited

netlify bot commented Feb 23, 2024 •

edited

GnatorX Jun 3, 2024 •

edited

GnatorX Jun 3, 2024 •

edited