Add the `VPAAndHPAForAPIServer` feature gate for the gardener-operator #9735

ialidzhikov · 2024-05-10T15:21:30Z

How to categorize this PR?

/area auto-scaling
/kind enhancement

What this PR does / why we need it:

Which issue(s) this PR fixes:
Part of #9562
A follow-up of #9678

Special notes for your reviewer:

This PR is based on Introduce a new autoscaling mode (VPAAndHPA) for Shoot Kubernetes API servers #9678, hence it is in draft state until Introduce a new autoscaling mode (VPAAndHPA) for Shoot Kubernetes API servers #9678 is merged. The PR is now rebased after the merge of Introduce a new autoscaling mode (VPAAndHPA) for Shoot Kubernetes API servers #9678.

Release note:

The `VPAAndHPAForAPIServer` feature gate is now also implemented for the gardener-operator. When enabled, the virtual-garden-kube-apiserver and gardener-apiserver are scaled simultaneously by VPA and HPA on the same metric (CPU and memory usage).

gardener-prow · 2024-05-10T15:21:34Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

gardener-prow · 2024-05-10T15:21:39Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from ialidzhikov. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…server

voelzmo · 2024-05-17T11:41:05Z

Hey @ialidzhikov, thanks for the PR! While looking at the changes, I was wondering if we're missing the removal code for HVPA, HPA and VPA objects for the cases when the autoscalingMode is changed? This seems to have been broken already for any switch between HVPA enabled or HVPA disabled, but we never saw this?
Once we merge this and the mode gets changed from HVPA to VPAAndHPA, I guess we would still have an HVPA object and the corresponding VPA and HPA objects created by the hvpa-controller, right?

ialidzhikov · 2024-05-20T12:53:42Z

While looking at the changes, I was wondering if we're missing the removal code for HVPA, HPA and VPA objects for the cases when the autoscalingMode is changed? This seems to have been broken already for any switch between HVPA enabled or HVPA disabled, but we never saw this?
Once we merge this and the mode gets changed from HVPA to VPAAndHPA, I guess we would still have an HVPA object and the corresponding VPA and HPA objects created by the hvpa-controller, right?

For the kubernetes apiserver component (pkg/component/kubernetes/apiserver, used for the Shoot kube-apiserver and virtual-garden-kube-apiserver) - this is a component that is NOT deployed via GRM, but with a client. Hence, we have everywhere explicit client invocations to delete the no longer needed objects:

gardener/pkg/component/kubernetes/apiserver/hvpa.go

Lines 32 to 36 in 6d6c06c

    
           if k.values.Autoscaling.Mode != apiserver.AutoscalingModeHVPA || 
        
           	k.values.Autoscaling.Replicas == nil || 
        
           	*k.values.Autoscaling.Replicas == 0 { 
        
           	return kubernetesutils.DeleteObject(ctx, k.client.Client(), hvpa) 
        
           }

gardener/pkg/component/kubernetes/apiserver/verticalpodautoscaler.go

Lines 28 to 37 in 6d6c06c

    
           func (k *kubeAPIServer) reconcileVerticalPodAutoscaler(ctx context.Context, verticalPodAutoscaler *vpaautoscalingv1.VerticalPodAutoscaler, deployment *appsv1.Deployment) error { 
        
           	switch k.values.Autoscaling.Mode { 
        
           	case apiserver.AutoscalingModeHVPA: 
        
           		return kubernetesutils.DeleteObject(ctx, k.client.Client(), verticalPodAutoscaler) 
        
           	case apiserver.AutoscalingModeVPAAndHPA: 
        
           		return k.reconcileVerticalPodAutoscalerInVPAAndHPAMode(ctx, verticalPodAutoscaler, deployment) 
        
           	default: 
        
           		return k.reconcileVerticalPodAutoscalerInBaselineMode(ctx, verticalPodAutoscaler, deployment) 
        
           	} 
        
           }

gardener/pkg/component/kubernetes/apiserver/horizontalpodautoscaler.go

Lines 38 to 50 in 6d6c06c

    
           func (k *kubeAPIServer) reconcileHorizontalPodAutoscaler(ctx context.Context, hpa *autoscalingv2.HorizontalPodAutoscaler, deployment *appsv1.Deployment) error { 
        
           	if k.values.Autoscaling.Mode == apiserver.AutoscalingModeHVPA || 
        
           		k.values.Autoscaling.Replicas == nil || 
        
           		*k.values.Autoscaling.Replicas == 0 { 
        
           		return kubernetesutils.DeleteObject(ctx, k.client.Client(), hpa) 
        
           	} 
        
           	if k.values.Autoscaling.Mode == apiserver.AutoscalingModeVPAAndHPA { 
        
           		return k.reconcileHorizontalPodAutoscalerInVPAAndHPAMode(ctx, hpa, deployment) 
        
           	} 
        
           	return k.reconcileHorizontalPodAutoscalerInBaselineMode(ctx, hpa, deployment) 
        
           }

For the gardener apiserver (pkg/component/gardener/apiserver) - this is a component deployed via GRM:

gardener/pkg/component/gardener/apiserver/apiserver.go

Lines 145 to 153 in 876f6f0

    
           runtimeResources, err := runtimeRegistry.AddAllAndSerialize( 
        
           	g.podDisruptionBudget(), 
        
           	g.serviceRuntime(), 
        
           	g.horizontalPodAutoscaler(), 
        
           	g.verticalPodAutoscaler(), 
        
           	g.hvpa(), 
        
           	g.deployment(secretCAETCD, secretETCDClient, secretGenericTokenKubeconfig, secretServer, secretAdmissionKubeconfigs, secretETCDEncryptionConfiguration, secretAuditWebhookKubeconfig, secretVirtualGardenAccess, configMapAuditPolicy, configMapAdmissionConfigs), 
        
           	g.serviceMonitor(), 
        
           )

Hence, for the gardener apiserver component returning nil from the verticalPodAutoscaler/horizontalPodAutoscaler/hvpa funcs is enough, GRM takes care to delete the no longer desired objects.

rfranzke · 2024-05-21T07:24:11Z

/assign

rfranzke · 2024-05-21T11:56:32Z

docs/deployment/feature_gates.md

@@ -203,4 +203,4 @@ A *General Availability* (GA) feature is also referred to as a *stable* feature.
 | UseNamespacedCloudProfile       | `gardener-apiserver`              | Enables usage of `NamespacedCloudProfile`s in `Shoot`s.                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
 | ShootManagedIssuer              | `gardenlet`                       | Enables the shoot managed issuer functionality described in GEP 24.                                                                                                                                                                                                                                                                                                                                                                                                                                       |
 | VPAForETCD                      | `gardenlet`, `gardener-operator`  | Enables VPA for `etcd-main` and `etcd-events`, regardless of HVPA enablement.                                                                                                                                                                                                                                                                                                                                                                                                                             |
-| VPAAndHPAForAPIServer           | `gardenlet`                       | Enables an autoscaling mechanism for shoot kube-apiserver where it is scaled simultaneously by VPA and HPA on the same metric (CPU and memory usage). The pod-trashing cycle between VPA and HPA scaling on the same metric is avoided by configuring the HPA to scale on average usage (not on average utilization) and by picking the target average utilization values in sync with VPA's allowed maximums. The feature gate takes precedence over the `HVPA` feature gate when they are both enabled. |
+| VPAAndHPAForAPIServer           | `gardenlet`, `gardener-operator`  | Enables an autoscaling mechanism for shoot kube-apiserver where it is scaled simultaneously by VPA and HPA on the same metric (CPU and memory usage). The pod-trashing cycle between VPA and HPA scaling on the same metric is avoided by configuring the HPA to scale on average usage (not on average utilization) and by picking the target average utilization values in sync with VPA's allowed maximums. The feature gate takes precedence over the `HVPA` feature gate when they are both enabled. |


Suggested change

| VPAAndHPAForAPIServer | `gardenlet`, `gardener-operator` | Enables an autoscaling mechanism for shoot kube-apiserver where it is scaled simultaneously by VPA and HPA on the same metric (CPU and memory usage). The pod-trashing cycle between VPA and HPA scaling on the same metric is avoided by configuring the HPA to scale on average usage (not on average utilization) and by picking the target average utilization values in sync with VPA's allowed maximums. The feature gate takes precedence over the `HVPA` feature gate when they are both enabled. |

| VPAAndHPAForAPIServer | `gardenlet`, `gardener-operator` | Enables an autoscaling mechanism for `kube-apiserver` of shoot or virtual garden clusters, and the `gardener-apiserver`. They are scaled simultaneously by VPA and HPA on the same metric (CPU and memory usage). The pod-trashing cycle between VPA and HPA scaling on the same metric is avoided by configuring the HPA to scale on average usage (not on average utilization) and by picking the target average utilization values in sync with VPA's allowed maximums. The feature gate takes precedence over the `HVPA` feature gate when they are both enabled. |

rfranzke · 2024-05-21T11:57:26Z

pkg/component/gardener/apiserver/hpa.go

+	return g.horizontalPodAutoscalerInVPAAndHPAMode()
+}
+
+func (g *gardenerAPIServer) horizontalPodAutoscalerInVPAAndHPAMode() *autoscalingv2.HorizontalPodAutoscaler {


Suggested change

return g.horizontalPodAutoscalerInVPAAndHPAMode()

}

func (g *gardenerAPIServer) horizontalPodAutoscalerInVPAAndHPAMode() *autoscalingv2.HorizontalPodAutoscaler {

rfranzke · 2024-05-21T11:57:45Z

pkg/component/gardener/apiserver/hpa.go

+	// The chosen value is 6 CPU: 1 CPU less than the VPA's maxAllowed 7 CPU in VPAAndHPA mode to have a headroom for the horizontal scaling.
+	hpaTargetAverageValueCPU := resource.MustParse("6")
+	// The chosen value is 24G: 4G less than the VPA's maxAllowed 28G in VPAAndHPA mode to have a headroom for the horizontal scaling.
+	hpaTargetAverageValueMemory := resource.MustParse("24G")


You could use ptr.To() instead of defining these variables here.

gardener-prow bot added the cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. label May 10, 2024

gardener-prow bot requested review from rfranzke and ScheererJ May 10, 2024 15:21

gardener-prow bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 10, 2024

ialidzhikov force-pushed the enh/vpaandhpa-for-operator branch from a4651e6 to 25deeb4 Compare May 13, 2024 12:45

ialidzhikov added 5 commits May 15, 2024 15:05

Add the VPAAndHPAForAPIServer feature gate for the gardener-operator

307ec0b

Enable the VPAAndHPA autoscaling mode for the virtual-garden-kube-api…

22a40a8

…server

Enable the VPAAndHPA autoscaling mode for the gardener-apiserver

876f6f0

Add docs for virtual-garden-apiserver and gardener-apiserver autoscaling

f032b09

Address review comments from vlerenc

656589f

ialidzhikov force-pushed the enh/vpaandhpa-for-operator branch from 8391e51 to 656589f Compare May 15, 2024 12:13

gardener-prow bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 15, 2024

ialidzhikov marked this pull request as ready for review May 15, 2024 13:16

gardener-prow bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 15, 2024

gardener-prow bot requested review from ary1992 and oliver-goetz May 15, 2024 13:16

gardener-prow bot assigned rfranzke May 21, 2024

rfranzke reviewed May 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the `VPAAndHPAForAPIServer` feature gate for the gardener-operator #9735

Add the `VPAAndHPAForAPIServer` feature gate for the gardener-operator #9735

ialidzhikov commented May 10, 2024 •

edited

gardener-prow bot commented May 10, 2024

gardener-prow bot commented May 10, 2024

voelzmo commented May 17, 2024

ialidzhikov commented May 20, 2024

rfranzke commented May 21, 2024

rfranzke May 21, 2024

rfranzke May 21, 2024

rfranzke May 21, 2024

Add the VPAAndHPAForAPIServer feature gate for the gardener-operator #9735

Are you sure you want to change the base?

Add the VPAAndHPAForAPIServer feature gate for the gardener-operator #9735

Conversation

ialidzhikov commented May 10, 2024 • edited

gardener-prow bot commented May 10, 2024

gardener-prow bot commented May 10, 2024

voelzmo commented May 17, 2024

ialidzhikov commented May 20, 2024

rfranzke commented May 21, 2024

rfranzke May 21, 2024

Choose a reason for hiding this comment

rfranzke May 21, 2024

Choose a reason for hiding this comment

rfranzke May 21, 2024

Choose a reason for hiding this comment

Add the `VPAAndHPAForAPIServer` feature gate for the gardener-operator #9735

Add the `VPAAndHPAForAPIServer` feature gate for the gardener-operator #9735

ialidzhikov commented May 10, 2024 •

edited