Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the default capacity of volumes for Etcd from 80Gi to 25Gi. #933

Merged
merged 4 commits into from
May 28, 2024

Conversation

renormalize
Copy link
Member

How to categorize this PR?

/area cost
/area storage
/kind technical-debt
/platform aws

What this PR does / why we need it:

This PR changes the capacity for volumes for Etcds from 80Gi to 25Gi.

  • Change the default capacity from 80Gi to 25Gi in charts/gardener-extension-provider-aws/values.yaml.

  • Change the default capacity from 80Gi to 25Gi in example/00-componentconfig.yaml.

  • Change tests to check if *etcd.Spec.StorageCapacity equals 25Gi instead of 80Gi.

  • Run make generate to generate the new chart in example/controller-registration.yaml.

Volumes of size 80Gi are unnecessary since moving from gp2 to gp3. See #932 for more.

Which issue(s) this PR fixes:
Fixes #932

Special notes for your reviewer:

/invite @shreyas-s-rao

Release note:

New `Etcd` `gp3` volumes are now created with `25Gi` capacity instead of `80Gi` to save on storage costs.

@gardener-robot gardener-robot added area/cost Cost related area/storage Storage related kind/technical-debt Something that is only solved on the surface, but requires more (re)work to be done properly needs/review Needs review platform/aws Amazon web services platform/infrastructure size/xs Size of pull request is tiny (see gardener-robot robot/bots/size.py) labels Apr 24, 2024
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Apr 24, 2024
@gardener-robot-ci-1 gardener-robot-ci-1 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Apr 24, 2024
@renormalize renormalize marked this pull request as ready for review April 25, 2024 04:54
@renormalize renormalize requested review from a team as code owners April 25, 2024 04:54
Copy link
Contributor

@shreyas-s-rao shreyas-s-rao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@gardener-robot gardener-robot added reviewed/lgtm Has approval for merging and removed needs/review Needs review labels Apr 25, 2024
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Apr 25, 2024
@shreyas-s-rao
Copy link
Contributor

/hold
We will need to test how druid would react to a change in Etcd Spec.StorageCapacity, since currently it does not automatically change the PVC template for the etcd statefulset (forbidden change).

@gardener-robot gardener-robot added the reviewed/do-not-merge Has no approval for merging as it may break things, be of poor quality or have (ext.) dependencies label Apr 25, 2024
@renormalize renormalize marked this pull request as draft April 28, 2024 14:15
@gardener-robot gardener-robot added size/s Size of pull request is small (see gardener-robot robot/bots/size.py) and removed size/xs Size of pull request is tiny (see gardener-robot robot/bots/size.py) labels Apr 28, 2024
@gardener-robot-ci-1 gardener-robot-ci-1 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Apr 28, 2024
@renormalize
Copy link
Member Author

@shreyas-s-rao
As discussed, I've added checks to see if the older Etcd has an 80Gi volume. If it does, resize to the new default 25Gi is not performed. For any other value, regular reconciliation occurs. Please let me know if you have any suggestions.

@shreyas-s-rao
Copy link
Contributor

@renormalize thanks for making the change. I would suggest that you simply check whether the oldEtcd is nil (ie, it's a new etcd), or if the oldEtcd.Spec.StorageCapacity is not set, then set it to 25Gi. Or else, don't mutate it. In other words, we want to set 25Gi only for new etcds, since right now druid does not allow changing storageCapacity for existing etcds yet.

@gardener-robot-ci-1 gardener-robot-ci-1 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label May 6, 2024
@gardener-robot-ci-2 gardener-robot-ci-2 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label May 6, 2024
@gardener-robot-ci-1 gardener-robot-ci-1 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label May 13, 2024
@gardener-robot-ci-1 gardener-robot-ci-1 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label May 13, 2024
Copy link
Contributor

@shreyas-s-rao shreyas-s-rao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@renormalize thanks for making the changes. Overall looks good. Just a few changes required in the tests. PTAL.

pkg/webhook/controlplaneexposure/ensurer_test.go Outdated Show resolved Hide resolved
pkg/webhook/controlplaneexposure/ensurer_test.go Outdated Show resolved Hide resolved
@gardener-robot gardener-robot added size/m Size of pull request is medium (see gardener-robot robot/bots/size.py) and removed size/s Size of pull request is small (see gardener-robot robot/bots/size.py) labels May 13, 2024
@gardener-robot-ci-2 gardener-robot-ci-2 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels May 13, 2024
Copy link
Contributor

@shreyas-s-rao shreyas-s-rao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@renormalize thanks for making the requested changes.

/lgtm

@renormalize renormalize marked this pull request as ready for review May 17, 2024 11:51
@renormalize
Copy link
Member Author

I've tested the changes on an extension-provider setup through @shreyas-s-rao's guidance.

I've done the following to test the changes:

  • Spawn the setup with gardener/gardener on master, with the latest release of gardener-extension-provider-aws.
  • Create 2 shoots, and hibernate one of them. Etcd volumes created with size 80Gi.
  • Change the chart and image of the controllerdeployment of provider-aws to chart generated in this PR, and the image that is built from this PR.
  • Check the volume sizes of the Etcds for the two spawned shoots. They did not change.
  • Create a new shoot. Etcd volume is created with size 25Gi.
  • Wake the hibernated shoot. Volume size did not change.
  • Reconcile all three shoots. Volume size did not change.
  • Change the capacity in the Etcd CR, to some value other than the defined value.
    • Old Etcds remain at 80Gi when changed to some other value.
    • New Etcds remain at 25Gi when changed to some other value.

Thus:

  • We've validated that older Etcds of running shoots do not get resized due to the changes, size remains at 80Gi.
  • We've validated that hibernated shoots which are woken up after the changes from this PR are deployed behave as expected, just like shoots which are already running with 80Gi.
  • New shoots that are created get deployed with volumes of size 25Gi.

/ping @gardener/gardener-extension-provider-aws-maintainers

@gardener-robot
Copy link

@gardener/gardener-extension-provider-aws-maintainers ℹ️ please take some time to help renormalize or redirect to someone else if you can't.

@renormalize
Copy link
Member Author

/ping @gardener/gardener-extension-provider-aws-maintainers Could you take a look, since this PR directly affects storage costs?

@gardener-robot
Copy link

@gardener/gardener-extension-provider-aws-maintainers ℹ️ please take some time to help renormalize or redirect to someone else if you can't.

* Change the default capacity from `80Gi` to `25Gi` in `charts/gardener-extension-provider-aws/values.yaml`.

* Change the default capacity from `80Gi` to `25Gi` in `example/00-componentconfig.yaml`.

* Change tests to check if `*etcd.Spec.StorageCapacity` equals `25Gi` instead of `80Gi`.

* Run `make generate` to generate the new chart in `example/controller-registration.yaml`.
…ed on reconcile.

* Add logic to `EnsureETCD()` to check if the old Etcd has a `80Gi` volume.
  If it does, the volumes are not resized. If it does not, regular
  reconiliation continues and whatever capacity specified by the ensurer is set.

* Adapted tests to handle this case.
* Test spec names in `pkg/webhook/controlplaneexposure/ensurer_test.go`
  changed according to the suggestions in the pull request.

* Introduce `checkOldETCDMain()` and `checkNewETCDMain()` in
  `pkg/webhook/controlplaneexposure/ensurer_test.go`.

* Enhance `checkETCDEvents()` in
  `pkg/webhook/controlplaneexposure/ensurer_test.go` by adding a
  storage capacity check.
@gardener-robot-ci-1 gardener-robot-ci-1 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels May 28, 2024
Copy link
Contributor

@kon-angelo kon-angelo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@kon-angelo kon-angelo merged commit b985c85 into gardener:master May 28, 2024
9 checks passed
@gardener-robot gardener-robot added the status/closed Issue is closed (either delivered or triaged) label May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cost Cost related area/storage Storage related kind/technical-debt Something that is only solved on the surface, but requires more (re)work to be done properly needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) platform/aws Amazon web services platform/infrastructure reviewed/do-not-merge Has no approval for merging as it may break things, be of poor quality or have (ext.) dependencies reviewed/lgtm Has approval for merging size/m Size of pull request is medium (see gardener-robot robot/bots/size.py) status/closed Issue is closed (either delivered or triaged)
Projects
None yet
6 participants