-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GCP Provider] Always clean-up on breaking errors when deploying K3s cluster on GCP #542
Comments
Lets revisit once the workflow is actually known to work |
Can we please open this again. We have the workflow working now and are also using clm-autoscaling-up-7hhdf-kl958 knl-clm [DEBUG]: The generated new nodes YAML is: test-test-worker-xx:
clm-autoscaling-up-7hhdf-kl958 knl-clm nets:
clm-autoscaling-up-7hhdf-kl958 knl-clm - name: knl-test
clm-autoscaling-up-7hhdf-kl958 knl-clm ip: 192.168.x.x
clm-autoscaling-up-7hhdf-kl958 knl-clm public: false
clm-autoscaling-up-7hhdf-kl958 knl-clm cmds:
clm-autoscaling-up-7hhdf-kl958 knl-clm - bash /root/worker.sh --diskCount 2 --provider gcp
clm-autoscaling-up-7hhdf-kl958 knl-clm files:
clm-autoscaling-up-7hhdf-kl958 knl-clm - path: /root/worker.sh
clm-autoscaling-up-7hhdf-kl958 knl-clm currentdir: True
clm-autoscaling-up-7hhdf-kl958 knl-clm origin: ~/iac-conductor/kubernetes/deploy/bootstrapping/worker.sh
clm-autoscaling-up-7hhdf-kl958 knl-clm - path: /etc/udev/longhorn-data-disk-add.sh
clm-autoscaling-up-7hhdf-kl958 knl-clm currentdir: True
clm-autoscaling-up-7hhdf-kl958 knl-clm origin: ~/iac-conductor/kubernetes/deploy/cluster-configuration/storage/longhorn-data-disk-add.sh
clm-autoscaling-up-7hhdf-kl958 knl-clm - path: /etc/logrotate.d/rsyslog
clm-autoscaling-up-7hhdf-kl958 knl-clm currentdir: True
clm-autoscaling-up-7hhdf-kl958 knl-clm origin: ~/iac-conductor/kubernetes/deploy/bootstrapping/rsyslog
clm-autoscaling-up-7hhdf-kl958 knl-clm tags:
clm-autoscaling-up-7hhdf-kl958 knl-clm - ssh-enabled-server
clm-autoscaling-up-7hhdf-kl958 knl-clm - knl-test-node
clm-autoscaling-up-7hhdf-kl958 knl-clm [DEBUG]: The determined new worker count is going to be: 4.
clm-autoscaling-up-7hhdf-kl958 knl-clm Exception in thread Thread-1 (threaded_create_vm):
clm-autoscaling-up-7hhdf-kl958 knl-clm Traceback (most recent call last):
clm-autoscaling-up-7hhdf-kl958 knl-clm File "/usr/local/lib/python3.11/threading.py", line 1038, in _bootstrap_inner
clm-autoscaling-up-7hhdf-kl958 knl-clm self.run()
clm-autoscaling-up-7hhdf-kl958 knl-clm File "/usr/local/lib/python3.11/threading.py", line 975, in run
clm-autoscaling-up-7hhdf-kl958 knl-clm self._target(*self._args, **self._kwargs)
clm-autoscaling-up-7hhdf-kl958 knl-clm File "/usr/local/lib/python3.11/site-packages/kvirt/config.py", line 3322, in threaded_create_vm
clm-autoscaling-up-7hhdf-kl958 knl-clm result = self.create_vm(name, profilename, overrides=currentoverrides, customprofile=profile, k=z,
clm-autoscaling-up-7hhdf-kl958 knl-clm ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
clm-autoscaling-up-7hhdf-kl958 knl-clm File "/usr/local/lib/python3.11/site-packages/kvirt/config.py", line 956, in create_vm
clm-autoscaling-up-7hhdf-kl958 knl-clm result = k.create(name=name, virttype=virttype, plan=plan, profile=profilename, flavor=flavor,
clm-autoscaling-up-7hhdf-kl958 knl-clm ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
clm-autoscaling-up-7hhdf-kl958 knl-clm File "/usr/local/lib/python3.11/site-packages/kvirt/providers/gcp/__init__.py", line 285, in create
clm-autoscaling-up-7hhdf-kl958 knl-clm conn.disks().insert(zone=zone, project=project, body=info).execute()
clm-autoscaling-up-7hhdf-kl958 knl-clm File "/usr/local/lib/python3.11/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
clm-autoscaling-up-7hhdf-kl958 knl-clm return wrapped(*args, **kwargs)
clm-autoscaling-up-7hhdf-kl958 knl-clm ^^^^^^^^^^^^^^^^^^^^^^^^
clm-autoscaling-up-7hhdf-kl958 knl-clm File "/usr/local/lib/python3.11/site-packages/googleapiclient/http.py", line 938, in execute
clm-autoscaling-up-7hhdf-kl958 knl-clm raise HttpError(resp, content, uri=self.uri)
clm-autoscaling-up-7hhdf-kl958 knl-clm googleapiclient.errors.HttpError: <HttpError 409 when requesting https://compute.googleapis.com/compute/v1/projects/kubernemlig-test/zones/europe .... could happen when So same suggestion and wish. Would be great if Thank you. |
Further info. We're usually seeing this when a VM previously existed and therefore there's dangling disks on e.g. in this case GCE on GCP. See: Which results in the node never coming up. And in the logs: To counter this situation we'll have to implement some logic in our auto-scaling code that controls whether there's any remnants of a worker to be scaled in on the underlying HCI ( e.g. vm or disks ... ). |
Hard to counter this when
I'll try parsing the plain output from list disks ... with grep and so forth ... Further, it would be great if there was a kcli get disk DISK_NAME < so a specific disk. Then I could control whether one disk out of a set of previously used disks on X VM is on the underlying HCI and get on with it. Thoughts @karmab ? |
Okay filtering with grep was actually quite easy. So: Thanks |
sorry I didnt get back at this before... |
We worked around this by:
However, I still think that it would be great if Deleting the cluster all together is pretty tough when the feature having issues is our auto-scaling feature. So there's nothing wrong with the state of the cluster. More so, that there's leftovers on the underlying |
I'm using
KCLI
version version: 99.0 commit: d4befb7 2023/05/15and if I bump into X breaking error. E.g.:
This causes the deploy to stop/break because a previous deploy had a breaking err ... apparently all remnants of a broken deploy is not automatically cleaned up by
KCLI
.That being done would be lovely.
The text was updated successfully, but these errors were encountered: