Introduce a new autoscaling mode (`VPAAndHPA`) for Shoot Kubernetes API servers #9678

ialidzhikov · 2024-04-26T14:55:15Z

How to categorize this PR?

/area auto-scaling
/kind enhancement

What this PR does / why we need it:
This PR implements proposal outlined in #9562 to replace HVPA for Shoot kube-apiservers with a native VPA and HPA. For more details, see the proposal described in #9562

Which issue(s) this PR fixes:
Part of #9562

Special notes for your reviewer:
All credits goes to @vlerenc and all the involved people for analysing the data and for creating the proposal how we could drop HVPA in a relatively simple and efficient way with native VPA and HPA resources.

Release note:

A new feature gate named `VPAAndHPAForAPIServer` is introduced to gardenlet. When enabled, the Shoot Kubernetes API Server is scaled simultaneously by VPA and HPA on the same metric (CPU and memory usage). The new feature aims to replace the existing HVPA autoscaling mechanism for the Shoot Kubernetes API server.

gardener-prow · 2024-04-26T14:55:18Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

rfranzke · 2024-04-26T15:09:38Z

/assign

plkokanov · 2024-04-27T17:21:50Z

/assign

docs/concepts/kubernetes-apiserver.md

plkokanov

Thanks a lot for the quick PR and adding nice documentation!

WDYT about also enabling the new feature gate for the local setup and e2e tests?

docs/concepts/kubernetes-apiserver.md

ialidzhikov · 2024-05-14T07:14:21Z

@rfranzke @plkokanov @voelzmo could you PTAL? I would like include this PR in 1.95.

rfranzke

I am fine, please proceed without me. Thank you!

voelzmo

Just one typo correction.

I don't want to block this just because of the discussions around behavior for alpha.control-plane.scaling.shoot.gardener.cloud/scale-down-disabled and copying over the resources from existing Deployments – if you feel we should merge it this way, we can go ahead.

pkg/component/kubernetes/apiserver/horizontalpodautoscaler.go

voelzmo · 2024-05-14T08:53:09Z

pkg/component/kubernetes/apiserver/horizontalpodautoscaler.go

+		if k.values.Autoscaling.ScaleDownDisabled && hpa.Spec.MinReplicas != nil {
+			// If scale-down is disabled and the HPA resource exists and HPA's spec.minReplicas is not nil,
+			// then minReplicas is max(spec.minReplicas, status.desiredReplcias).
+			// When scale-down is disabled, this allows operators to specify a custom value for HPA spec.minReplicas


Thanks for explaining the reasoning behind this change. So what exactly do we want as new behavior when alpha.control-plane.scaling.shoot.gardener.cloud/scale-down-disabled is set? Previously, we had HPA effectively turned off, because minReplicas == maxReplicas == 4.
Now, we allow our operators to set minReplicas on HPA themselves, still set maxReplicas as 4. Which values do we think make sense here? Given that the whole purpose is to stop scaling down I'm not sure if 1-3 are reasonable values?
You're right: We may or may not choose a different value as maxReplicas in the future, but I'm not sure how that influences our decision to make this change in behavior right now.

At the very least: We should document how this annotation works, how you can interact (modify the HPA's minReplicas, not the Deployment's replicas) and note that horizontal downscaling does happen.

voelzmo · 2024-05-14T09:01:01Z

pkg/component/shared/kubeapiserver.go

+	// - When transitioning from HVPA to HPAAndVPA autoscaling mode, we need to preserve the kube-apiserver container resources
+	//   to do not cause an unwanted rollout that might be breaking. Otherwise, we would scale down from the potentially
+	//   high resource requests (set by HVPA) to the initial resource requests in HPAAndVPA mode.
+	if deployment != nil {


If I understand this correctly, this would mean that we will never get rid of this piece of code, which was only introduced because that's how HVPA worked. To me, this is harder to understand and keep in mind as context for the future than clearly scoping this method to the single purpose of not resetting snowflakes.
Do we already have a dedicated test documenting the desired behavior for snowflakes? Just to make sure that future versions of us don't go in and just remove this method.

pkg/component/kubernetes/apiserver/hvpa.go

plkokanov · 2024-05-14T10:49:46Z

pkg/component/kubernetes/apiserver/horizontalpodautoscaler.go

+		if k.values.Autoscaling.ScaleDownDisabled && hpa.Spec.MinReplicas != nil {
+			// If scale-down is disabled and the HPA resource exists and HPA's spec.minReplicas is not nil,
+			// then minReplicas is max(spec.minReplicas, status.desiredReplcias).
+			// When scale-down is disabled, this allows operators to specify a custom value for HPA spec.minReplicas


Technically, shouldn't we also set maxReplicas to be equal to minReplicas, even with the current proposal (note, I don't mean equal to 4, just set maxReplicas to w/e the calculated value of minReplicas is)?
If I set the minReplicas in the HPA to 3, maxReplicas would still be 4. Depending on when reconciliation happens and when desiredReplicas changes there could still be scale down from 4 to 3 replicas even if the annotaiton is set:

Let's say that desiredReplicas is 3, then this code sets minReplicas to 3 and maxReplicas to 4.

Replicas get scaled up to 4.

Then, no reconciliation happens to set minReplicas to the new number of desiredReplicas (4) for some time.

Some time later (before the next reconciliation occurs), scale down happens and replicas are set back to 3.

plkokanov · 2024-05-14T13:36:35Z

/lgtm

gardener-prow · 2024-05-14T13:36:39Z

LGTM label has been added.

Git tree hash: 1817a7bb68aa6e97d51263dda8dc28649d648988

plkokanov · 2024-05-15T06:39:58Z

/test pull-gardener-e2e-kind-ha-single-zone

ialidzhikov · 2024-05-15T09:46:59Z

/approve

gardener-prow · 2024-05-15T09:47:07Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andrerun, ialidzhikov, voelzmo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ialidzhikov]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gardener-prow bot requested review from rfranzke and timebertt April 26, 2024 14:55

gardener-prow bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 26, 2024

gardener-prow bot assigned rfranzke Apr 26, 2024

gardener-prow bot added cla: no Indicates the PR's author has not signed the cla-assistant.io CLA. cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. and removed cla: no Indicates the PR's author has not signed the cla-assistant.io CLA. labels Apr 26, 2024

gardener-prow bot assigned plkokanov Apr 27, 2024

vpnachev reviewed Apr 27, 2024

View reviewed changes

docs/concepts/kubernetes-apiserver.md Outdated Show resolved Hide resolved

docs/concepts/kubernetes-apiserver.md Outdated Show resolved Hide resolved

ialidzhikov force-pushed the enh/hvpa-alternative-for-apiserver branch from e721c46 to ad38c30 Compare April 29, 2024 06:45

gardener-prow bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 29, 2024

ialidzhikov force-pushed the enh/hvpa-alternative-for-apiserver branch 2 times, most recently from e3732e1 to 1b8d6c8 Compare April 29, 2024 08:20

plkokanov reviewed Apr 29, 2024

View reviewed changes

docs/concepts/kubernetes-apiserver.md Outdated Show resolved Hide resolved

docs/concepts/kubernetes-apiserver.md Outdated Show resolved Hide resolved

ialidzhikov force-pushed the enh/hvpa-alternative-for-apiserver branch from 1b8d6c8 to 95488be Compare April 29, 2024 11:37

ialidzhikov force-pushed the enh/hvpa-alternative-for-apiserver branch from 95488be to 0bbad7c Compare April 29, 2024 11:40

ialidzhikov marked this pull request as ready for review April 29, 2024 11:43

gardener-prow bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 29, 2024

gardener-prow bot requested a review from ary1992 April 29, 2024 11:44

gardener-prow bot removed the lgtm Indicates that a PR is ready to be merged. label May 10, 2024

gardener-prow bot requested review from andrerun and plkokanov May 10, 2024 12:59

ialidzhikov mentioned this pull request May 10, 2024

Add the VPAAndHPAForAPIServer feature gate for the gardener-operator #9735

Open

1 task

ialidzhikov force-pushed the enh/hvpa-alternative-for-apiserver branch 2 times, most recently from 6bc5c1d to d68b7a3 Compare May 13, 2024 12:46

ialidzhikov requested review from vpnachev and vlerenc May 13, 2024 12:47

Minor nits

756e37e

ialidzhikov force-pushed the enh/hvpa-alternative-for-apiserver branch from d68b7a3 to 756e37e Compare May 13, 2024 12:54

rfranzke reviewed May 14, 2024

View reviewed changes

voelzmo approved these changes May 14, 2024

View reviewed changes

gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label May 14, 2024

plkokanov requested changes May 14, 2024

View reviewed changes

gardener-prow bot removed the lgtm Indicates that a PR is ready to be merged. label May 14, 2024

gardener-prow bot requested review from plkokanov, rfranzke and voelzmo May 14, 2024 10:49

ialidzhikov added 2 commits May 14, 2024 16:22

Address review comments from voelzmo (3)

1433c5f

Address review comments from plkokanov

5da9401

gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label May 14, 2024

gardener-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 15, 2024

gardener-prow bot merged commit 6d6c06c into gardener:master May 15, 2024
18 checks passed

ialidzhikov deleted the enh/hvpa-alternative-for-apiserver branch May 15, 2024 12:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce a new autoscaling mode (`VPAAndHPA`) for Shoot Kubernetes API servers #9678

Introduce a new autoscaling mode (`VPAAndHPA`) for Shoot Kubernetes API servers #9678

ialidzhikov commented Apr 26, 2024 •

edited

gardener-prow bot commented Apr 26, 2024

rfranzke commented Apr 26, 2024

plkokanov commented Apr 27, 2024

plkokanov left a comment

ialidzhikov commented May 14, 2024

rfranzke left a comment •

edited

voelzmo left a comment

voelzmo May 14, 2024

voelzmo May 14, 2024

plkokanov May 14, 2024 •

edited

plkokanov commented May 14, 2024

gardener-prow bot commented May 14, 2024

plkokanov commented May 15, 2024

ialidzhikov commented May 15, 2024

gardener-prow bot commented May 15, 2024

Introduce a new autoscaling mode (VPAAndHPA) for Shoot Kubernetes API servers #9678

Introduce a new autoscaling mode (VPAAndHPA) for Shoot Kubernetes API servers #9678

Conversation

ialidzhikov commented Apr 26, 2024 • edited

gardener-prow bot commented Apr 26, 2024

rfranzke commented Apr 26, 2024

plkokanov commented Apr 27, 2024

plkokanov left a comment

Choose a reason for hiding this comment

ialidzhikov commented May 14, 2024

rfranzke left a comment • edited

Choose a reason for hiding this comment

voelzmo left a comment

Choose a reason for hiding this comment

voelzmo May 14, 2024

Choose a reason for hiding this comment

voelzmo May 14, 2024

Choose a reason for hiding this comment

plkokanov May 14, 2024 • edited

Choose a reason for hiding this comment

plkokanov commented May 14, 2024

gardener-prow bot commented May 14, 2024

plkokanov commented May 15, 2024

ialidzhikov commented May 15, 2024

gardener-prow bot commented May 15, 2024

Introduce a new autoscaling mode (`VPAAndHPA`) for Shoot Kubernetes API servers #9678

Introduce a new autoscaling mode (`VPAAndHPA`) for Shoot Kubernetes API servers #9678

ialidzhikov commented Apr 26, 2024 •

edited

rfranzke left a comment •

edited

plkokanov May 14, 2024 •

edited