Skip to content
This repository has been archived by the owner on Mar 28, 2020. It is now read-only.

The operator gets stuck in terminating state #2119

Open
pavelnikolov opened this issue Sep 17, 2019 · 0 comments
Open

The operator gets stuck in terminating state #2119

pavelnikolov opened this issue Sep 17, 2019 · 0 comments

Comments

@pavelnikolov
Copy link

I have an issue with the operator that I am unable to reproduce consistently but it keeps happening every now and again.
I have a 3-node cluster set up in DigitalOcean hosted kubernetes

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:11:31Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.6", GitCommit:"96fac5cd13a5dc064f7d9f4f23030a6aeface6cc", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:16Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

Here is my operator definition:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: etcd-operator
  namespace: etcd
spec:
  replicas: 1
  selector:
    matchLabels:
      name: etcd-operator
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        name: etcd-operator
    spec:
      containers:
        - name: etcd-operator
          image: quay.io/coreos/etcd-operator:v0.9.4
          command:
          - etcd-operator
          # Uncomment to act for resources in all namespaces. More information in doc/user/clusterwide.md
          #- -cluster-wide
          env:
          - name: MY_POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: MY_POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          resources:
            limits:
              cpu: 300m
              memory: 200Mi
            requests:
              cpu: 50m
              memory: 50Mi

At some point a second operator pod appears and the first one loses leader election gets stuck in Terminating state with a final log message like this:

level=fatal msg="leader election lost"

What's really strange to me is that my deployment has 2 out of 1 replicas. Any ideas why this might be happening?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant