Active-Active Zookeeper Operator Deployment #245

hikhvar · 2021-10-19T18:37:02Z

Hello,
is it possible to run two instances of the zookeeper operator? I didn't find an option to activate some sort of leader election. Many operators use some sort of leader election to allow the deployment of multiple instances to mitigate node failures in the k8s cluster.

nightkr · 2021-10-20T15:56:00Z

Not currently (this is also blocked on an upstream issue, kube-rs/kube#485).

That said, the managed ZooKeeper should still be available, even if the operator managing it is down.

hikhvar · 2021-10-21T07:41:48Z

That is true. I think beeing able to deploy the zookeeper operator in an active active fashion does help from this two design goals:

I want to recover faster in case of a single node failure. At the moment the pod is only rescheduled after the default 5 minutes. In a former project I observed this failure: The node did become unready but did not disappear from the cluster. The pods didn't get deleted by the kubelet. They stayed in state terminating, since the node was knocked out by an kernel error. As a result the replicaset controller didn't create a new pod for our single instance horizontal pod autoscaler (HPA). This rendered the horizontal pod autoscaling disabled. If we had an active-active HPA, the second non-terminating, non-stuck HPA would had become the active one.
I want to be sure, only a single operator is in control in case of an administrative error creating two instances (e.g. scaling the operator deployment by accident)

I admit, both cases are edge cases. But from an operational safety perspective those are

nightkr · 2021-10-21T14:05:05Z

Sadly, I'm not sure this would do much to help with goal 1, since you'd just end up stuck waiting for the lease to expire, rather than the pod being deleted.

Safe forced lease-takeover would require fencing, either at the networking or Kubernetes API layer. Neither of those are available out of the box for Kubernetes clusters.

hikhvar · 2021-10-21T16:08:13Z

Yes, but the lease will expire at some point in time. The pod was in state terminating for 11h. The lease has a duration in the range of minutes.

nightkr self-assigned this Feb 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Active-Active Zookeeper Operator Deployment #245

Active-Active Zookeeper Operator Deployment #245

hikhvar commented Oct 19, 2021

nightkr commented Oct 20, 2021

hikhvar commented Oct 21, 2021

nightkr commented Oct 21, 2021

hikhvar commented Oct 21, 2021

Active-Active Zookeeper Operator Deployment #245

Active-Active Zookeeper Operator Deployment #245

Comments

hikhvar commented Oct 19, 2021

nightkr commented Oct 20, 2021

hikhvar commented Oct 21, 2021

nightkr commented Oct 21, 2021

hikhvar commented Oct 21, 2021