Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a new autoscaling mode (VPAAndHPA) for Shoot Kubernetes API servers #9678

Merged

Conversation

ialidzhikov
Copy link
Member

@ialidzhikov ialidzhikov commented Apr 26, 2024

How to categorize this PR?

/area auto-scaling
/kind enhancement

What this PR does / why we need it:
This PR implements proposal outlined in #9562 to replace HVPA for Shoot kube-apiservers with a native VPA and HPA. For more details, see the proposal described in #9562

Which issue(s) this PR fixes:
Part of #9562

Special notes for your reviewer:
All credits goes to @vlerenc and all the involved people for analysing the data and for creating the proposal how we could drop HVPA in a relatively simple and efficient way with native VPA and HPA resources.

Release note:

A new feature gate named `VPAAndHPAForAPIServer` is introduced to gardenlet. When enabled, the Shoot Kubernetes API Server is scaled simultaneously by VPA and HPA on the same metric (CPU and memory usage). The new feature aims to replace the existing HVPA autoscaling mechanism for the Shoot Kubernetes API server.

Copy link
Contributor

gardener-prow bot commented Apr 26, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@gardener-prow gardener-prow bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. area/auto-scaling Auto-scaling (CA/HPA/VPA/HVPA, predominantly control plane, but also otherwise) related kind/enhancement Enhancement, improvement, extension needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Apr 26, 2024
@gardener-prow gardener-prow bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 26, 2024
@rfranzke
Copy link
Member

/assign

@gardener-prow gardener-prow bot added cla: no Indicates the PR's author has not signed the cla-assistant.io CLA. cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. and removed cla: no Indicates the PR's author has not signed the cla-assistant.io CLA. labels Apr 26, 2024
@plkokanov
Copy link
Contributor

/assign

docs/concepts/kubernetes-apiserver.md Outdated Show resolved Hide resolved
docs/concepts/kubernetes-apiserver.md Outdated Show resolved Hide resolved
@ialidzhikov ialidzhikov force-pushed the enh/hvpa-alternative-for-apiserver branch from e721c46 to ad38c30 Compare April 29, 2024 06:45
@gardener-prow gardener-prow bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 29, 2024
@ialidzhikov ialidzhikov force-pushed the enh/hvpa-alternative-for-apiserver branch 2 times, most recently from e3732e1 to 1b8d6c8 Compare April 29, 2024 08:20
Copy link
Contributor

@plkokanov plkokanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the quick PR and adding nice documentation!

WDYT about also enabling the new feature gate for the local setup and e2e tests?

docs/concepts/kubernetes-apiserver.md Outdated Show resolved Hide resolved
docs/concepts/kubernetes-apiserver.md Outdated Show resolved Hide resolved
@ialidzhikov ialidzhikov force-pushed the enh/hvpa-alternative-for-apiserver branch from 1b8d6c8 to 95488be Compare April 29, 2024 11:37
@gardener-prow gardener-prow bot added cla: no Indicates the PR's author has not signed the cla-assistant.io CLA. cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. and removed cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. cla: no Indicates the PR's author has not signed the cla-assistant.io CLA. labels Apr 29, 2024
@ialidzhikov ialidzhikov force-pushed the enh/hvpa-alternative-for-apiserver branch from 95488be to 0bbad7c Compare April 29, 2024 11:40
@ialidzhikov ialidzhikov marked this pull request as ready for review April 29, 2024 11:43
@gardener-prow gardener-prow bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 29, 2024
@gardener-prow gardener-prow bot requested a review from ary1992 April 29, 2024 11:44
@gardener-prow gardener-prow bot removed the lgtm Indicates that a PR is ready to be merged. label May 10, 2024
@ialidzhikov ialidzhikov force-pushed the enh/hvpa-alternative-for-apiserver branch 2 times, most recently from 6bc5c1d to d68b7a3 Compare May 13, 2024 12:46
@ialidzhikov ialidzhikov force-pushed the enh/hvpa-alternative-for-apiserver branch from d68b7a3 to 756e37e Compare May 13, 2024 12:54
@ialidzhikov
Copy link
Member Author

@rfranzke @plkokanov @voelzmo could you PTAL? I would like include this PR in 1.95.

Copy link
Member

@rfranzke rfranzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine, please proceed without me. Thank you!

Copy link
Member

@voelzmo voelzmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one typo correction.

I don't want to block this just because of the discussions around behavior for alpha.control-plane.scaling.shoot.gardener.cloud/scale-down-disabled and copying over the resources from existing Deployments – if you feel we should merge it this way, we can go ahead.

if k.values.Autoscaling.ScaleDownDisabled && hpa.Spec.MinReplicas != nil {
// If scale-down is disabled and the HPA resource exists and HPA's spec.minReplicas is not nil,
// then minReplicas is max(spec.minReplicas, status.desiredReplcias).
// When scale-down is disabled, this allows operators to specify a custom value for HPA spec.minReplicas
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining the reasoning behind this change. So what exactly do we want as new behavior when alpha.control-plane.scaling.shoot.gardener.cloud/scale-down-disabled is set? Previously, we had HPA effectively turned off, because minReplicas == maxReplicas == 4.
Now, we allow our operators to set minReplicas on HPA themselves, still set maxReplicas as 4. Which values do we think make sense here? Given that the whole purpose is to stop scaling down I'm not sure if 1-3 are reasonable values?
You're right: We may or may not choose a different value as maxReplicas in the future, but I'm not sure how that influences our decision to make this change in behavior right now.

At the very least: We should document how this annotation works, how you can interact (modify the HPA's minReplicas, not the Deployment's replicas) and note that horizontal downscaling does happen.

// - When transitioning from HVPA to HPAAndVPA autoscaling mode, we need to preserve the kube-apiserver container resources
// to do not cause an unwanted rollout that might be breaking. Otherwise, we would scale down from the potentially
// high resource requests (set by HVPA) to the initial resource requests in HPAAndVPA mode.
if deployment != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, this would mean that we will never get rid of this piece of code, which was only introduced because that's how HVPA worked. To me, this is harder to understand and keep in mind as context for the future than clearly scoping this method to the single purpose of not resetting snowflakes.
Do we already have a dedicated test documenting the desired behavior for snowflakes? Just to make sure that future versions of us don't go in and just remove this method.

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label May 14, 2024
pkg/component/kubernetes/apiserver/hvpa.go Outdated Show resolved Hide resolved
pkg/component/kubernetes/apiserver/hvpa.go Outdated Show resolved Hide resolved
pkg/component/kubernetes/apiserver/hvpa.go Outdated Show resolved Hide resolved
if k.values.Autoscaling.ScaleDownDisabled && hpa.Spec.MinReplicas != nil {
// If scale-down is disabled and the HPA resource exists and HPA's spec.minReplicas is not nil,
// then minReplicas is max(spec.minReplicas, status.desiredReplcias).
// When scale-down is disabled, this allows operators to specify a custom value for HPA spec.minReplicas
Copy link
Contributor

@plkokanov plkokanov May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, shouldn't we also set maxReplicas to be equal to minReplicas, even with the current proposal (note, I don't mean equal to 4, just set maxReplicas to w/e the calculated value of minReplicas is)?
If I set the minReplicas in the HPA to 3, maxReplicas would still be 4. Depending on when reconciliation happens and when desiredReplicas changes there could still be scale down from 4 to 3 replicas even if the annotaiton is set:

  1. Let's say that desiredReplicas is 3, then this code sets minReplicas to 3 and maxReplicas to 4.
  2. Replicas get scaled up to 4.
  3. Then, no reconciliation happens to set minReplicas to the new number of desiredReplicas (4) for some time.
  4. Some time later (before the next reconciliation occurs), scale down happens and replicas are set back to 3.

@gardener-prow gardener-prow bot removed the lgtm Indicates that a PR is ready to be merged. label May 14, 2024
@plkokanov
Copy link
Contributor

/lgtm

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label May 14, 2024
Copy link
Contributor

gardener-prow bot commented May 14, 2024

LGTM label has been added.

Git tree hash: 1817a7bb68aa6e97d51263dda8dc28649d648988

@plkokanov
Copy link
Contributor

/test pull-gardener-e2e-kind-ha-single-zone

@ialidzhikov
Copy link
Member Author

/approve

Copy link
Contributor

gardener-prow bot commented May 15, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andrerun, ialidzhikov, voelzmo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gardener-prow gardener-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 15, 2024
@gardener-prow gardener-prow bot merged commit 6d6c06c into gardener:master May 15, 2024
18 checks passed
@ialidzhikov ialidzhikov deleted the enh/hvpa-alternative-for-apiserver branch May 15, 2024 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/auto-scaling Auto-scaling (CA/HPA/VPA/HVPA, predominantly control plane, but also otherwise) related cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. kind/enhancement Enhancement, improvement, extension lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants