Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minio-operator CrashLoopBackOff after kubectl minio init #1114

Closed
x3nb63 opened this issue May 9, 2022 · 13 comments
Closed

minio-operator CrashLoopBackOff after kubectl minio init #1114

x3nb63 opened this issue May 9, 2022 · 13 comments

Comments

@x3nb63
Copy link

x3nb63 commented May 9, 2022

Expected Behavior

it should be in Running state

Current Behavior

I followed the README.md setup instructions, basically kubectl minio init --namespace=io, which then decided to run two minio-operator pods, which both fail to start with this:

I0509 15:06:01.470313       1 main.go:70] Starting MinIO Operator
I0509 15:06:01.959635       1 main.go:149] caBundle on CRD updated
I0509 15:06:01.960231       1 main-controller.go:239] Setting up event handlers
I0509 15:06:01.960408       1 leaderelection.go:243] attempting to acquire leader lease io/minio-operator-lock...
I0509 15:06:01.967512       1 main-controller.go:484] new leader elected: minio-operator-7d97cf97b4-rbt7p
I0509 15:07:06.036381       1 leaderelection.go:253] successfully acquired lease io/minio-operator-lock
I0509 15:07:06.036491       1 main-controller.go:465] minio-operator-7d97cf97b4-7t6gl: I've become the leader
I0509 15:07:06.036683       1 main-controller.go:377] Waiting for API to start
I0509 15:07:06.036722       1 main-controller.go:369] Starting HTTP Upgrade Tenant Image server
panic: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: an error on the server ("Internal Server Error: \"/apis/metrics.k8s.io/v1beta1?timeout=32s\": the server could not find the requested resource") has prevented the request from succeeding

goroutine 225 [running]:
github.com/minio/operator/pkg/controller/cluster/certificates.GetCertificatesAPIVersion.func1()
	github.com/minio/operator/pkg/controller/cluster/certificates/csr.go:100 +0x1e5
sync.(*Once).doSlow(0x16daca0, 0x1000000000001)
	sync/once.go:68 +0xd2
sync.(*Once).Do(...)
	sync/once.go:59
github.com/minio/operator/pkg/controller/cluster/certificates.GetCertificatesAPIVersion({0x1b18b30, 0xc00047cf20})
	github.com/minio/operator/pkg/controller/cluster/certificates/csr.go:89 +0x5e
github.com/minio/operator/pkg/controller/cluster.(*Controller).Start.func1.1()
	github.com/minio/operator/pkg/controller/cluster/main-controller.go:345 +0x4b
created by github.com/minio/operator/pkg/controller/cluster.(*Controller).Start.func1
	github.com/minio/operator/pkg/controller/cluster/main-controller.go:343 +0xc5

a search on that error about server APIs shows this is well known for clusters with apiservices in Available=False state. But none in that state exists on my cluster.

There is no configuration and its the first time I try the minio-operator on this cluster. Kubernetesversion is v1.22.2 on CentOS Stream 8 and kubectl minio version says v4.4.17.

Here I am stuck on how to debug further.

@dvaldivia
Copy link
Collaborator

can you try installing on the default minio-operator namespace and tell us if it works? it may be a bug on the init on a custom namesapce

@x3nb63
Copy link
Author

x3nb63 commented May 10, 2022

init with defaults looks the same

$ kubectl minio init
namespace/minio-operator created
serviceaccount/minio-operator created
clusterrole.rbac.authorization.k8s.io/minio-operator-role unchanged
clusterrolebinding.rbac.authorization.k8s.io/minio-operator-binding configured
customresourcedefinition.apiextensions.k8s.io/tenants.minio.min.io configured
service/operator created
deployment.apps/minio-operator created
serviceaccount/console-sa created
clusterrole.rbac.authorization.k8s.io/console-sa-role unchanged
clusterrolebinding.rbac.authorization.k8s.io/console-sa-binding configured
configmap/console-env created
service/console created
deployment.apps/console created
-----------------

To open Operator UI, start a port forward using this command:

kubectl minio proxy -n minio-operator

-----------------

and the error also looks the same - to me at least:

$ kubectl logs -n minio-operator deploy/minio-operator
Found 2 pods, using pod/minio-operator-7d97cf97b4-mdkvl
I0510 07:48:48.596646       1 main.go:70] Starting MinIO Operator
I0510 07:48:49.193322       1 main.go:149] caBundle on CRD updated
I0510 07:48:49.194290       1 main-controller.go:239] Setting up event handlers
I0510 07:48:49.194513       1 leaderelection.go:243] attempting to acquire leader lease minio-operator/minio-operator-lock...
I0510 07:48:49.200266       1 main-controller.go:484] new leader elected: minio-operator-7d97cf97b4-2wp9k
I0510 07:51:22.389881       1 leaderelection.go:253] successfully acquired lease minio-operator/minio-operator-lock
I0510 07:51:22.390011       1 main-controller.go:465] minio-operator-7d97cf97b4-mdkvl: I've become the leader
I0510 07:51:22.390094       1 main-controller.go:377] Waiting for API to start
I0510 07:51:22.390134       1 main-controller.go:369] Starting HTTP Upgrade Tenant Image server
panic: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: an error on the server ("Internal Server Error: \"/apis/metrics.k8s.io/v1beta1?timeout=32s\": the server could not find the requested resource") has prevented the request from succeeding

goroutine 217 [running]:
github.com/minio/operator/pkg/controller/cluster/certificates.GetCertificatesAPIVersion.func1()
	github.com/minio/operator/pkg/controller/cluster/certificates/csr.go:100 +0x1e5
sync.(*Once).doSlow(0x0, 0x0)
	sync/once.go:68 +0xd2
sync.(*Once).Do(...)
	sync/once.go:59
github.com/minio/operator/pkg/controller/cluster/certificates.GetCertificatesAPIVersion({0x1b18b30, 0xc000502580})
	github.com/minio/operator/pkg/controller/cluster/certificates/csr.go:89 +0x5e
github.com/minio/operator/pkg/controller/cluster.(*Controller).Start.func1.1()
	github.com/minio/operator/pkg/controller/cluster/main-controller.go:345 +0x4b
created by github.com/minio/operator/pkg/controller/cluster.(*Controller).Start.func1
	github.com/minio/operator/pkg/controller/cluster/main-controller.go:343 +0xc5

@x3nb63
Copy link
Author

x3nb63 commented May 10, 2022

interestingly with kubectl minio delete -n minio-operator the namespace object gets stuck in Terminating state with pretty much the same problem:

status:
   conditions:
   - lastTransitionTime: "2022-05-10T07:55:23Z"
     message: 'Discovery failed for some groups, 1 failing: unable to retrieve the
       complete list of server APIs: metrics.k8s.io/v1beta1: an error on the server
       ("Internal Server Error: \"/apis/metrics.k8s.io/v1beta1?timeout=32s\": the server
       could not find the requested resource") has prevented the request from succeeding'
     reason: DiscoveryFailed

inspecting the metrics-server in kube-system namespace I find tons of these:

E0228 00:24:36.812379       1 webhook.go:196] Failed to make webhook authorizer request: the server could not find the requested resource
E0228 00:24:36.812493       1 errors.go:77] the server could not find the requested resource

... not that I understand them ...

@x3nb63
Copy link
Author

x3nb63 commented May 10, 2022

may that be a incompatibility with that Kubernetes version (1.22.2)? Cause I read many things with "webhook vs Go Metrics", which appears some feature in Go programs or the way things can be done...

(only guess work, I have no clue about Go)

@x3nb63
Copy link
Author

x3nb63 commented May 12, 2022

any idea here?

I am out of ideas and only see "forward upgrading the cluster" as way to keep trying things. This is, however, not planned right now... so I need to schedule, ask people, plan first ...

@andrew-musoke
Copy link

I have the same issue. Running below k8 version. installed with kubectl minio init -n minio

Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6", GitCommit:"ad3338546da947756e8a88aa6822e9c11e7eac22", GitTreeState:"clean", BuildDate:"2022-04-21T03:15:11Z", GoVersion:"go1.17.9", Compiler:"gc", Platform:"linux/amd64"}

@harshavardhana
Copy link
Member

I have the same issue. Running below k8 version. installed with kubectl minio init -n minio

use minio-operator namespace @andrew-musoke

@andrew-musoke
Copy link

Thanks @harshavardhana working now. Is this a bug or expected behaviour?

@andrew-musoke
Copy link

andrew-musoke commented May 17, 2022

@harshavardhana my bad, it worked for a short while and then the error came back. Currently the operator is in crashloopbackoff status. Started with kubectl-minio init -n minio-operator

I0517 19:41:19.427540       1 main-controller.go:369] Starting HTTP Upgrade Tenant Image server
panic: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
goroutine 145 [running]:
github.com/minio/operator/pkg/controller/cluster/certificates.GetCertificatesAPIVersion.func1()
	github.com/minio/operator/pkg/controller/cluster/certificates/csr.go:100 +0x1e5
sync.(*Once).doSlow(0x0, 0x0)
	sync/once.go:68 +0xd2
sync.(*Once).Do(...)
	sync/once.go:59
github.com/minio/operator/pkg/controller/cluster/certificates.GetCertificatesAPIVersion({0x1b18b30, 0xc0004fcdc0})
	github.com/minio/operator/pkg/controller/cluster/certificates/csr.go:89 +0x5e
github.com/minio/operator/pkg/controller/cluster.(*Controller).Start.func1.1()
	github.com/minio/operator/pkg/controller/cluster/main-controller.go:345 +0x4b
created by github.com/minio/operator/pkg/controller/cluster.(*Controller).Start.func1
	github.com/minio/operator/pkg/controller/cluster/main-controller.go:343 +0xc5

@x3nb63
Copy link
Author

x3nb63 commented May 31, 2022

upgraded cluster to kubernetes v1.24.1 and the original problem persists

@x3nb63
Copy link
Author

x3nb63 commented Jun 1, 2022

I learn kubernetes has a open bug for 1.23.1 on this -> kubernetes/kubernetes#108657 . There one commenter points to helm project, where helm/helm#6361 was solved from their end (with helm/helm#6908) ...

I dont understand anything of what they actually do in these helm issues; however, it reads to me as a common problem "operator design programs" run into again and again.

Interestingly i see my minio-operator namespace being stuck in Terminating state and its because of the very same problem:

$ kubectl get namespace minio-operator -o yaml
Switched to context "kmaster".
apiVersion: v1
kind: Namespace
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Namespace","metadata":{"annotations":{"operator.min.io/authors":"MinIO, Inc.","operator.min.io/license":"AGPLv3","operator.min.io/support":"https://subnet.min.io"},"name":"minio-operator"}}
    operator.min.io/authors: MinIO, Inc.
    operator.min.io/license: AGPLv3
    operator.min.io/support: https://subnet.min.io
  creationTimestamp: "2022-05-10T07:46:00Z"
  deletionTimestamp: "2022-05-10T07:55:18Z"
  labels:
    kubernetes.io/metadata.name: minio-operator
  name: minio-operator
  resourceVersion: "262508007"
  uid: 2fcddbbb-e3cc-4481-8566-7ebc1dedbb8a
spec:
  finalizers:
  - kubernetes
status:
  conditions:
  - lastTransitionTime: "2022-05-10T07:55:23Z"
    message: 'Discovery failed for some groups, 1 failing: unable to retrieve the
      complete list of server APIs: metrics.k8s.io/v1beta1: an error on the server
      ("Internal Server Error: \"/apis/metrics.k8s.io/v1beta1\": the server could
      not find the requested resource") has prevented the request from succeeding'
    reason: DiscoveryFailed
    status: "True"
    type: NamespaceDeletionDiscoveryFailure
...

Here I still dont get what is wrong with my API or api-resources. Any idea what "the requested resource" actually is? How to find out?

@x3nb63
Copy link
Author

x3nb63 commented Jun 1, 2022

all right, got minio-operator pods to Running state!

Culprit was a APIService object of name v1beta1.metrics.k8s.io that existed in my cluster for 2 years. Deleting it made the namespace termination finish and got the minio-operator pods to start.

... no clue if I deleted something necessary from the cluster or its really a remnant of some old kubernetes version?!

@x3nb63 x3nb63 closed this as completed Jun 1, 2022
@mshanmu
Copy link
Contributor

mshanmu commented Feb 7, 2023

I hit the same issue when upgrading kubernetes to v1.22. Due to metrics server v0.3 not compatible with v1.22. See the compatibility matrix https://github.com/kubernetes-sigs/metrics-server#compatibility-matrix

So, fixed it by upgrading metrics server to v0.6.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants