Karpenter support for prioritizing AWS Capacity Reservations #3042

dkheyman · 2022-12-15T17:00:43Z

Tell us about your request

As a EKS platform engineer, I would like to reduce my platform cost and increase resiliency to capacity outages by provisioning AWS EC2 instances through the EC2 Fleet API by prioritizing Capacity Reservations. Capacity reservations are a mechanism for us to reserve capacity of particular instance types in particular AZs. This is done to reduce cost and increase resiliency to issues in acquiring capacity at large scale.

I would like to be able to specify in the, e.g. AWSTemplate.spec.on-demand-options.CapacityReservations and specify use-capacity-reservations-first in order to make sure the instances that launch are placed in targeted capacity reservations in that account. Without this support, we could end up paying for reserved capacity and On-Demand instances through EC2 Fleet.

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

The problem is this workload is critical, and has a set SLA that we need to meet. Capacity reservations help us achieve tremendous vCPU counts without worrying about capacity, and currently we cannot use Karpenter's scalability and advantages at all.

Reserved instances and open capacity reservations help, but this puts the responsibility of the EKS platform engineer to always have to monitor and provide the AZs and empty reservation slots to the provisioner, which makes the application team less autonomous and adds complexity in managing the reservations. In addition, with targeted reservations opening the door for specific components of an application to take advantage of different reservation amounts in a single account, there is no current way in the provisioner to specify the reservation ID either.

Are you currently working around this issue?

There is no workaround; we don't have a mechanism to use Karpenter because it does not support specifying Capacity Reservations. Cluster Autoscaler also does not do this, so we have no way of guarranteeing the use of CRs with EKS currently on AWS unless we own the Ec2 fleet calls.

Additional Context

No response

Attachments

No response

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

spring1843 · 2022-12-19T21:53:25Z

We have something very similar explained here: https://karpenter.sh/preview/concepts/scheduling/#savings-plans-and-reserved-instances

dkheyman · 2022-12-19T22:19:05Z

Ryan,

While I appreciate this reserved instance approach, this won’t work with “targeted” capacity reservations because those need to be “pointed to” by the EC2 Fleet API. The current approach you mentioned also won’t factor in availability of capacity reservations by AZ even if the reservation is of type “open”.

If I reserve 20 instances of c4.large each in 5 AZs, with a total of 100, those 100 instances can be launched by Karpenter in any AZ without regard for placing them evenly across each AZ. Similarly, if I had 10 reservations available across 4 AZs, there’s no mechanism for Karpenter to place them in those AZs accordingly.

Let me know if I'm missing something.

Thanks,
David

FernandoMiguel · 2022-12-20T12:21:46Z

if I had 10 reservations available across 4 AZs, there’s no mechanism for Karpenter to place them in those AZs accordingly.

your provisioner can specify which AZs to deploy to. Worst case, you could have a provisioner per AZ

dkheyman · 2022-12-20T15:52:00Z

I understand that, but how do I know I have 10 reservations across 4 AZs? There's no way the provisioner would know those exist, you as the infrastructure engineer would have to know how many are in each AZ, and the number could fluctuate due to other workloads or components taking up those reserved slots. Why not let EC2 Fleet figure it out, instead of putting onus on the infrastructure engineers?

And this still does not address targeted Capacity Reservations. If you have multiple workloads or components of a workload in the same account and have different reservations for each via reservation IDs, you are SOL.

yaroslav-nakonechnikov · 2023-04-19T14:39:42Z

I can add some additions:
as we are hosting our servers in Frankfurt, we see lots of issues with capacity.
So we created capacity reservations and noticed, that karpenter ignores it, and creates instance with defined instance type, but in different AZ.
So, it would be great if there will be way to tell karpenter, that on first priority - load reserved spots, and then - cost(or whatever else)-effective.

jeffspahr · 2023-07-12T05:14:54Z

I also need the ability to reference targeted on demand capacity requests. We use ODCRs for instance types that are capacity constrained where we need a guarantee of being able to launch those instances.

garvinp-stripe · 2024-02-14T22:49:09Z

I think it make sense to support capacity reservation but I am not sure it make sense to go down a priority within the same Node Pool. I think Karpenter has a solve for this already with weights. So you would have your ODCR NodePool at a higher weight and if that fails you would fall into the next node pool (maybe spot) and so on. I think its difficult to support the prioritization because ODCR is tied to a launch template, which Karpenter creates in the back. In order to support this fallback behavior, I think Karpenter would have to have a set of Launch Templates which I suspect is too complicated? Note that Spot is supported because CreateFleet supports choosing the more optimized allocation strategy

ellistarn · 2024-02-16T16:32:04Z

Fwiw, we maintain many launch templates under the hood per node pool. It's required for things like labels and architectures.

That said, we model "preferential fallback" using the weight feature, so I don't think it's crazy to reuse that here. However, LT combinatorics don't seem like they'll be a design constraint on this question.

garvinp-stripe · 2024-02-16T22:04:41Z

Fwiw, we maintain many launch templates under the hood per node pool. It's required for things like labels and architectures.

Ya I realize I don't actually know that code that well and I was suspecting I was wrong after writing this. But I think I agree that it's likely better to push that logic to weights instead. I am trying to write up a brief design on implementing adding capacity reservation to EC2NodeClasses so I wanted to make sure that we want to keep LTs under nodeclass clean/ minimal for the sake for just launching the right node and not for prioriziation

dkheyman added the feature New feature or request label Dec 15, 2022

spring1843 closed this as completed Dec 19, 2022

ellistarn reopened this Dec 20, 2022

ellistarn added the needs-design Design required label Dec 20, 2022

engedaam mentioned this issue Apr 19, 2023

Let karpenter to check possible free capacity reservations and start that instances first. #3779

Closed

billrayburn added the v1 Issues requiring resolution by the v1 milestone label Sep 27, 2023

billrayburn added v1.x Issues prioritized for post-1.0 and removed v1 Issues requiring resolution by the v1 milestone labels Nov 22, 2023

garvinp-stripe linked a pull request Feb 23, 2024 that will close this issue

docs: RFC Supporting ODCR in Karpenter #5716

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Karpenter support for prioritizing AWS Capacity Reservations #3042

Karpenter support for prioritizing AWS Capacity Reservations #3042

dkheyman commented Dec 15, 2022 •

edited

spring1843 commented Dec 19, 2022

dkheyman commented Dec 19, 2022

FernandoMiguel commented Dec 20, 2022

dkheyman commented Dec 20, 2022

yaroslav-nakonechnikov commented Apr 19, 2023

jeffspahr commented Jul 12, 2023

garvinp-stripe commented Feb 14, 2024 •

edited

ellistarn commented Feb 16, 2024

garvinp-stripe commented Feb 16, 2024 •

edited

Karpenter support for prioritizing AWS Capacity Reservations #3042

Karpenter support for prioritizing AWS Capacity Reservations #3042

Comments

dkheyman commented Dec 15, 2022 • edited

Tell us about your request

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

Are you currently working around this issue?

Additional Context

Attachments

Community Note

spring1843 commented Dec 19, 2022

dkheyman commented Dec 19, 2022

FernandoMiguel commented Dec 20, 2022

dkheyman commented Dec 20, 2022

yaroslav-nakonechnikov commented Apr 19, 2023

jeffspahr commented Jul 12, 2023

garvinp-stripe commented Feb 14, 2024 • edited

ellistarn commented Feb 16, 2024

garvinp-stripe commented Feb 16, 2024 • edited

dkheyman commented Dec 15, 2022 •

edited

garvinp-stripe commented Feb 14, 2024 •

edited

garvinp-stripe commented Feb 16, 2024 •

edited