Bug: The `dsChecksum` wasn't updated on the autoscaling request #1340

JKBGIT1 · 2024-04-18T10:17:00Z

Current Behaviour

On the autoscaling request Claudie didn't change the dsChecksum, only the desiredState. Due to that, the autoscaler didn't work (see).

According to the logs, the cluster-autoscaler failed to retrieve the resource lock kube-system/cluster-autoscaler and then lost master. The same error occurrence #1065 (comment) .

$ kubectl logs -n claudie autoscaler-wox01-cluster-qy5w5zl-57f74c7dd5-v8gbd -c cluster-autoscaler -p
...
I0417 18:08:17.383459       1 static_autoscaler.go:673] Decreasing size of compute01-ccx23-auto-fy7ww3o, expected=7 current=6 delta=-1
I0417 18:08:17.383801       1 static_autoscaler.go:426] Some node group target size was fixed, skipping the iteration
I0417 18:08:27.485122       1 static_autoscaler.go:673] Decreasing size of compute01-ccx23-auto-fy7ww3o, expected=7 current=6 delta=-1
I0417 18:08:27.485366       1 static_autoscaler.go:426] Some node group target size was fixed, skipping the iteration
I0417 18:08:28.291313       1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0417 18:08:28.292135       1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 787.249µs
E0417 18:08:33.691966       1 leaderelection.go:330] error retrieving resource lock kube-system/cluster-autoscaler: Get "https://loadbalancer.worldofpotter.eu:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cluster-autoscaler": context deadline exceeded
I0417 18:08:33.695145       1 leaderelection.go:283] failed to renew lease kube-system/cluster-autoscaler: timed out waiting for the condition
F0417 18:08:40.377907       1 main.go:578] lost master

Besides that, there was an error in the builder from 8 days ago. The error was produced by the ansibler due to the timeout on static nodes while installing the VPN. However, the InputManifest was in a DONE state constantly... (see #1339)

One more thing. The cluster-autoscaler states that there are 6 nodes and the expected amount is 7. On the other hand, the InputManifest record in Mongo has 7 autoscaled nodes in the currentState and 6 autoscaled nodes in the desiredState.

Expected Behaviour

Claudie should update the value of the dsChecksum when it updates the desiredState.

Steps To Reproduce

I don't know.

Anything else to note

The same error in the cluster-autoscaler #1065 (comment)

EDIT: a workaround for this error, when it appears.

The text was updated successfully, but these errors were encountered:

JKBGIT1 added the bug Something isn't working label Apr 18, 2024

Despire added the groomed Task that everybody agrees to pass the gatekeeper label Apr 19, 2024

Danielss89 mentioned this issue May 15, 2024

Fix autoscaling request to update dsChecksum #1387

Closed

Despire mentioned this issue May 20, 2024

Chore: Revistit tracking changes between current and desired state in Builder #1392

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: The `dsChecksum` wasn't updated on the autoscaling request #1340

Bug: The `dsChecksum` wasn't updated on the autoscaling request #1340

JKBGIT1 commented Apr 18, 2024 •

edited

Bug: The dsChecksum wasn't updated on the autoscaling request #1340

Bug: The dsChecksum wasn't updated on the autoscaling request #1340

Comments

JKBGIT1 commented Apr 18, 2024 • edited

Current Behaviour

Expected Behaviour

Steps To Reproduce

Anything else to note

Bug: The `dsChecksum` wasn't updated on the autoscaling request #1340

Bug: The `dsChecksum` wasn't updated on the autoscaling request #1340

JKBGIT1 commented Apr 18, 2024 •

edited