Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gardener fails to create shoots with > ~80 worker pools #9545

Open
7 of 8 tasks
hown3d opened this issue Apr 5, 2024 · 3 comments
Open
7 of 8 tasks

Gardener fails to create shoots with > ~80 worker pools #9545

hown3d opened this issue Apr 5, 2024 · 3 comments
Assignees
Labels
area/ipcei IPCEI (Important Project of Common European Interest) area/scalability Scalability related kind/bug Bug kind/epic Large multi-story topic

Comments

@hown3d
Copy link

hown3d commented Apr 5, 2024

How to categorize this issue?

/area scalability
/kind bug

What happened:
When attempting to create a shoot with over approximately 80 nodepools, the shoot becomes stuck in the Create Processing state. The shoot generates an error message stating: Flow "Shoot cluster reconciliation" encountered task errors: [task "Configuring shoot worker pools" failed: retry failed with context deadline exceeded, last error: etcdserver: request is too large] Operation will be retried.

Upon investigation, it was discovered that the Worker resource, which is created for the shoot, becomes excessively large due to each WorkerPool containing userData necessary for machine bootstrap. This exceeds the etcd's max-request-bytes limit of 1.5MiB for the worker resource.

What you expected to happen:
The shoot should be successfully created.

How to reproduce it (as minimally and precisely as possible):
Create shoots with a large number (80-90+) of nodepools.

Proposed Solution:
Suggest replacing the userData field in the WorkerPool type with a secretReference. This approach aligns with the OperatingSystemConfig resource, which already stores its cloud_config in a secret. Refer to the OperatingSystemConfig documentation for more details.

Tasks:

Environment:

  • Gardener version: v1.82.3
  • Kubernetes version (use kubectl version): v1.26.14
  • Cloud provider or hardware configuration:
    • openstack-provider v1.39.2
    • coreos extension v1.21
  • Others:
    • etcd-druid v0.20.0
@gardener-prow gardener-prow bot added area/scalability Scalability related kind/bug Bug labels Apr 5, 2024
@rfranzke
Copy link
Member

rfranzke commented May 8, 2024

We checked a Worker with > 80 pools and saw that we could reduce the size by ~90% when we referenced the userData instead of putting it inline.

@rfranzke
Copy link
Member

rfranzke commented May 8, 2024

/assign

@rfranzke
Copy link
Member

For now, all related PRs have been opened. Still, we have to wait until gardener/gardener@v1.100 has been released (end of July 2024) before we can finally cleanup the inlined, now deprecated .spec.userData field from the extensions.gardener.cloud/v1alpha1.Worker API. This is to give extensions enough time to adapt to the API change.
Before this is done, the issue cannot be closed.

@rfranzke rfranzke added kind/epic Large multi-story topic area/ipcei IPCEI (Important Project of Common European Interest) labels May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ipcei IPCEI (Important Project of Common European Interest) area/scalability Scalability related kind/bug Bug kind/epic Large multi-story topic
Projects
None yet
Development

No branches or pull requests

2 participants