Release "prometheus-operator" failed: rpc error: code = Canceled #6130

rnkhouse · 2019-07-31T13:36:50Z

Describe the bug
When I try to install prometheus operator on AKS with helm install stable/prometheus-operator --name prometheus-operator -f prometheus-operator-values.yaml I am getting this error:

prometheus-operator" failed: rpc error: code = Canceled

I checked with history:

helm history prometheus-operator -o yaml
- chart: prometheus-operator-6.3.0
  description: 'Release "prometheus-operator" failed: rpc error: code = Canceled desc
    = grpc: the client connection is closing'
  revision: 1
  status: FAILED
  updated: Tue Jul 30 12:36:52 2019

Chart
[stable/prometheus-operator]

Additional Info
I am using below configurations to deploy a chart:

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheusrule.crd.yaml
 kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/servicemonitor.crd.yaml

In values file: createCustomResource is set to false,

Output of helm version:
Client: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}

Output of kubectl version:
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.4", GitCommit:"5ca598b4ba5abb89bb773071ce452e33fb66339d", GitTreeState:"clean", BuildDate:"2018-06-06T08:13:03Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.7", GitCommit:"4683545293d792934a7a7e12f2cc47d20b2dd01b", GitTreeState:"clean", BuildDate:"2019-06-06T01:39:30Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

Cloud Provider/Platform (AKS, GKE, Minikube etc.):
AKS

The text was updated successfully, but these errors were encountered:

janvdvegt · 2019-08-09T14:25:26Z

We have the same issue on minikube so it does not seem to be specific to AWS.

robinelfrink · 2019-08-23T07:45:08Z

We have the same issue on kubespray-deployed clusters.

DLV111 · 2019-09-02T00:10:33Z

I'm also seeing the issue on both k8s 12.x and 13.x k8s kubespray deployed clusters in our automated pipeline - 100% failure rate. The previous version of prometheus-operator(0.30.1) works without issues.
Funny things is - that if I run the command manually instead of via the CD pipeline it works - so i'm a little confused as to what would be the cause.

DLV111 · 2019-09-02T02:09:43Z

Saw there was an update to promethus chart today. I bumped it to

NAME                            CHART VERSION   APP VERSION
stable/prometheus-operator      6.8.0           0.32.0

and i'm no longer seeing the issue.

hickeyma · 2019-09-02T07:24:41Z

@rnkhouse Can you check with the latest chart version as mentioned by @dlevene1 in #6130 (comment)?

PaulusTM · 2019-09-02T12:48:02Z

I have this same issue with version 6.8.1 on AKS.

NAME                      	CHART VERSION	APP VERSION
stable/prometheus-operator	6.8.1        	0.32.0

❯ helm version 
Client: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}

 ❯ helm install -f prd.yaml --name prometheus --namespace monitoring stable/prometheus-operator 
Error: release prometheus failed: grpc: the client connection is closing
>>> elapsed time 1m56s

zarvd · 2019-09-04T08:17:12Z

We have the same issue on kubespray-deployed clusters.

Kubernete version: v1.4.1
Helm version:

Client: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.0", GitCommit:"05811b84a3f93603dd6c2fcfe57944dfa7ab7fd0", GitTreeState:"clean"}

Prometheus-operator version:

NAME                            CHART VERSION   APP VERSION
stable/prometheus-operator      6.8.1           0.32.0

will-beta · 2019-09-06T11:58:12Z

I have the same issue on aks.

bacongobbler · 2019-09-06T13:30:00Z

Can anyone reproduce this issue in Helm 3, or does it propagate as a different error? My assumption is that with the removal of tiller this should no longer be an issue.

will-beta · 2019-09-07T16:17:25Z

@bacongobbler This is still an issue in Helm 3.

bash$ helm install r-prometheus-operator stable/prometheus-operator --version 6.8.2 -f prometheus-operator/helm/prometheus-operator.yaml

manifest_sorter.go:179: info: skipping unknown hook: "crd-install"
Error: apiVersion "monitoring.coreos.com/v1" in prometheus-operator/templates/exporters/kube-controller-manager/servicemonitor.yaml is not available

bacongobbler · 2019-09-07T16:35:24Z

That seems to be a different issue than the issue raised by the OP, though.

description: 'Release "prometheus-operator" failed: rpc error: code = Canceled desc
= grpc: the client connection is closing'

Can you check and see if you're using the latest beta release as well? That error was seemingly addressed in #6332 which was released in 3.0.0-beta.3. If not can you open a new issue?

will-beta · 2019-09-07T16:56:29Z

@bacongobbler i'm using the latest Helm v3.0.0-beta.3.

ghost · 2019-09-08T21:14:42Z

I had to go back to --version 6.7.3 to get it to install properly

robinelfrink · 2019-09-09T08:47:36Z

Our workaround is to keep prometheus operator image on v0.31.1.

pyadminn · 2019-09-10T20:47:07Z

helm.log
Also just encountered this issue on DockerEE kubernetes install

After some fiddling with install options --debug and such, am now getting:

Error: release prom failed: context canceled

Edit: May try updating my helm versions, currently at v2.12.3
Edit2: Updated to 2.14.3 and still problematic
grpc: the client connection is closing
Edit3: Installed version 6.7.3 per above suggestions to get things going again
Edit4: Attached tiller log for a failed install as helm.log

related: helm/charts#15977

vsliouniaev · 2019-09-12T08:10:39Z

After doing some digging with @cyp3d it appears that the issue could be caused by a helm delete timeout that's too short for some clusters. I cannot reproduce the issue anywhere, so if someone who is experiencing this could validate a potential fix in the linked pull request branch I would much appreciate it!

helm/charts#17090

xvzf · 2019-09-13T08:19:08Z

Same here on several Clusters created with kops on AWS.
No issues when running on K3S though.

vsliouniaev · 2019-09-13T09:19:51Z

@xvzf

Could you try the potential fix in this PR? helm/charts#17090

pyadminn · 2019-09-13T14:42:39Z

I gave the PR a run through and still the same Error: release prom failed: context canceled
tiller.log

xvzf · 2019-09-13T15:04:33Z

@vsliouniaev Nope, does not fix the issue here

vsliouniaev · 2019-09-14T19:31:35Z

Thanks for checking @xvzf and @pyadminn. I have made another change in the same PR. Could you see if this helps?

pyadminn · 2019-09-16T13:41:10Z

Just checked the updated PR still seeing the following on our infra: Error: release prom failed: rpc error: code = Canceled desc = grpc: the client connection is closing

FYI we are on Kuber 1.14.3
Helm vers v2.14.3

quantumhype · 2019-09-20T14:41:38Z

I was able to get around this issue by following the 'Helm fails to create CRDs' section in readme.md. I'm not sure how they're related, but it worked.

Step 1: Manually create the CRDS

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/podmonitor.crd.yaml

Step 2:
Wait for CRDs to be created, which should only take a few seconds

Step 3:
Install the chart, but disable the CRD provisioning by setting prometheusOperator.createCustomResource=false

$ helm install --name my-release stable/prometheus-operator --set prometheusOperator.createCustomResource=false

xvzf · 2019-09-23T08:19:51Z

@vsliouniaev Still same issue! Though the workaround from lethalwire works.

pyadminn · 2019-09-25T21:29:05Z

The lethalwire workaround has me resolved as well.

Typositoire · 2019-10-02T17:57:02Z

So 4 days a part the workaround worked and stopped working I had to use the CRDs file from 0.32.0 not master.

waynekhan · 2019-11-28T09:13:43Z

I tried on chart v8.2.4: if prometheusOperator.admissionWebhooks=false, prometheus.tlsProxy.enabled=false too.

Also, like what vsliouniaev said, what does --debug and --dry-run say?

vsliouniaev · 2019-11-28T11:06:09Z

@truealex81 Since helm3 is meant to give more information about this, can you please post verbose logs from the install process?

sschne · 2019-11-29T07:35:21Z

I am receiving the same issue deploying 8.2.4 on Azure AKS.

Helm Version:
version.BuildInfo{Version:"v3.0.0", GitCommit:"e29ce2a54e96cd02ccfce88bee4f58bb6e2a28b6", GitTreeState:"clean", GoVersion:"go1.13.4"}

Helm --debug produces this output:

install.go:148: [debug] Original chart version: ""
install.go:165: [debug] CHART PATH: /root/.cache/helm/repository/prometheus-operator-8.2.4.tgz
client.go:87: [debug] creating 1 resource(s)
client.go:87: [debug] creating 1 resource(s)
client.go:87: [debug] creating 1 resource(s)
client.go:87: [debug] creating 1 resource(s)
client.go:87: [debug] creating 1 resource(s)
install.go:139: [debug] Clearing discovery cache
wait.go:51: [debug] beginning wait for 5 resources with timeout of 1m0s
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ServiceAccount
client.go:245: [debug] serviceaccounts "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" PodSecurityPolicy
client.go:245: [debug] podsecuritypolicies.policy "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" RoleBinding
client.go:245: [debug] rolebindings.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" Role
client.go:245: [debug] roles.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRoleBinding
client.go:245: [debug] clusterrolebindings.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRole
client.go:245: [debug] clusterroles.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission-create" Job
client.go:245: [debug] jobs.batch "prometheus-operator-admission-create" not found
client.go:87: [debug] creating 1 resource(s)
client.go:420: [debug] Watching for changes to Job prometheus-operator-admission-create with timeout of 5m0s
client.go:445: [debug] Add/Modify event for prometheus-operator-admission-create: MODIFIED
client.go:484: [debug] prometheus-operator-admission-create: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:445: [debug] Add/Modify event for prometheus-operator-admission-create: MODIFIED
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ServiceAccount
client.go:220: [debug] Starting delete for "prometheus-operator-admission" PodSecurityPolicy
client.go:220: [debug] Starting delete for "prometheus-operator-admission" RoleBinding
client.go:220: [debug] Starting delete for "prometheus-operator-admission" Role
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRoleBinding
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRole
client.go:220: [debug] Starting delete for "prometheus-operator-admission-create" Job
client.go:87: [debug] creating 120 resource(s)
Error: context canceled

I can reproduce this reliably. If there is a way to get more verbose logs, please let me know and i post the output here

vsliouniaev · 2019-11-29T10:51:36Z

@pather87 thanks a lot!

Here's the order of what's meant to happen in the chart:

CRDs are provisioned
There is a pre-install;pre-upgrade job which runs a container to create a secret with certificates for the admission hooks. This job and its resources are cleaned up on success
All the resources are created
There is a post-install;post-upgrade job that runs a container to patch the created validationgwebhookconfiguration and mutatingwebhookconfiguration with the CA from the certificates created in step 2. This job and its resources are cleaned up on success

Could you please check if you have any failed jobs still present? From the logs it reads like you shouldn't because they were all successful.

Are there any other resources present in the cluster after the Error: context canceled happens?

willsilvano · 2019-11-29T13:39:31Z

Same here when install prometheus-operator:

helm install prometheus-operator stable/prometheus-operator \
  --namespace=monitoring \
  --values=values.yaml

Error: rpc error: code = Canceled desc = grpc: the client connection is closing

sschne · 2019-11-29T13:41:04Z

@vsliouniaev thanks for your answer!

There are no jobs laying around after the deployment.
Deployments and services are present in the Cluster after the deployment, see kubectl output:

kubectl get all -lrelease=prometheus-operator

NAME                                                     READY   STATUS    RESTARTS   AGE
pod/prometheus-operator-grafana-59d489899-4b5kd          2/2     Running   0          3m56s
pod/prometheus-operator-operator-8549bcd687-4kb2x        2/2     Running   0          3m56s
pod/prometheus-operator-prometheus-node-exporter-4km6x   1/1     Running   0          3m56s
pod/prometheus-operator-prometheus-node-exporter-7dgn6   1/1     Running   0          3m56s

NAME                                                   TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)            AGE
service/prometheus-operator-alertmanager               ClusterIP   xxx   <none>        9093/TCP           3m57s
service/prometheus-operator-grafana                    ClusterIP   xxx   <none>        80/TCP             3m57s
service/prometheus-operator-operator                   ClusterIP   xxx     <none>        8080/TCP,443/TCP   3m57s
service/prometheus-operator-prometheus                 ClusterIP   xxx   <none>        9090/TCP           3m57s
service/prometheus-operator-prometheus-node-exporter   ClusterIP   xxx    <none>        9100/TCP           3m57s

NAME                                                          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/prometheus-operator-prometheus-node-exporter   2         2         2       2            2           <none>          3m57s

NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/prometheus-operator-grafana    1/1     1            1           3m57s
deployment.apps/prometheus-operator-operator   1/1     1            1           3m57s

NAME                                                      DESIRED   CURRENT   READY   AGE
replicaset.apps/prometheus-operator-grafana-59d489899     1         1         1       3m57s
replicaset.apps/prometheus-operator-operator-8549bcd687   1         1         1       3m57s

NAME                                                             READY   AGE
statefulset.apps/alertmanager-prometheus-operator-alertmanager   1/1     3m44s
statefulset.apps/prometheus-prometheus-operator-prometheus       1/1     3m34s

willsilvano · 2019-11-29T13:59:20Z

Installation with debug:

client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD alertmanagers.monitoring.coreos.com is already present. Skipping.
client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD podmonitors.monitoring.coreos.com is already present. Skipping.
client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD prometheuses.monitoring.coreos.com is already present. Skipping.
client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD prometheusrules.monitoring.coreos.com is already present. Skipping.
client.go:87: [debug] creating 1 resource(s)
install.go:126: [debug] CRD servicemonitors.monitoring.coreos.com is already present. Skipping.
install.go:139: [debug] Clearing discovery cache
wait.go:51: [debug] beginning wait for 0 resources with timeout of 1m0s
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRoleBinding
client.go:245: [debug] clusterrolebindings.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" RoleBinding
client.go:245: [debug] rolebindings.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRole
client.go:245: [debug] clusterroles.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ServiceAccount
client.go:245: [debug] serviceaccounts "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" PodSecurityPolicy
client.go:245: [debug] podsecuritypolicies.policy "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission" Role
client.go:245: [debug] roles.rbac.authorization.k8s.io "prometheus-operator-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prometheus-operator-admission-create" Job
client.go:245: [debug] jobs.batch "prometheus-operator-admission-create" not found
client.go:87: [debug] creating 1 resource(s)
client.go:420: [debug] Watching for changes to Job prometheus-operator-admission-create with timeout of 5m0s
client.go:445: [debug] Add/Modify event for prometheus-operator-admission-create: MODIFIED
client.go:484: [debug] prometheus-operator-admission-create: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:445: [debug] Add/Modify event for prometheus-operator-admission-create: MODIFIED
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRoleBinding
client.go:220: [debug] Starting delete for "prometheus-operator-admission" RoleBinding
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ClusterRole
client.go:220: [debug] Starting delete for "prometheus-operator-admission" ServiceAccount
client.go:220: [debug] Starting delete for "prometheus-operator-admission" PodSecurityPolicy
client.go:220: [debug] Starting delete for "prometheus-operator-admission" Role
client.go:220: [debug] Starting delete for "prometheus-operator-admission-create" Job
client.go:87: [debug] creating 122 resource(s)
Error: context canceled
helm.go:76: [debug] context canceled

After, then I execute: kubectl get all -lrelease=prometheus-operator -A

NAMESPACE    NAME                                                     READY   STATUS    RESTARTS   AGE
monitoring   pod/prometheus-operator-grafana-d6676b794-r6cg9          2/2     Running   0          2m45s
monitoring   pod/prometheus-operator-operator-6584f4b5f5-wdkrx        2/2     Running   0          2m45s
monitoring   pod/prometheus-operator-prometheus-node-exporter-2g4tg   1/1     Running   0          2m45s
monitoring   pod/prometheus-operator-prometheus-node-exporter-798p5   1/1     Running   0          2m45s
monitoring   pod/prometheus-operator-prometheus-node-exporter-pvk5t   1/1     Running   0          2m45s
monitoring   pod/prometheus-operator-prometheus-node-exporter-r9j2r   1/1     Running   0          2m45s

NAMESPACE     NAME                                                   TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)            AGE
kube-system   service/prometheus-operator-coredns                    ClusterIP   None           <none>        9153/TCP           2m46s
kube-system   service/prometheus-operator-kube-controller-manager    ClusterIP   None           <none>        10252/TCP          2m46s
kube-system   service/prometheus-operator-kube-etcd                  ClusterIP   None           <none>        2379/TCP           2m46s
kube-system   service/prometheus-operator-kube-proxy                 ClusterIP   None           <none>        10249/TCP          2m46s
kube-system   service/prometheus-operator-kube-scheduler             ClusterIP   None           <none>        10251/TCP          2m46s
monitoring    service/prometheus-operator-alertmanager               ClusterIP   10.0.238.102   <none>        9093/TCP           2m46s
monitoring    service/prometheus-operator-grafana                    ClusterIP   10.0.16.19     <none>        80/TCP             2m46s
monitoring    service/prometheus-operator-operator                   ClusterIP   10.0.97.114    <none>        8080/TCP,443/TCP   2m45s
monitoring    service/prometheus-operator-prometheus                 ClusterIP   10.0.57.153    <none>        9090/TCP           2m46s
monitoring    service/prometheus-operator-prometheus-node-exporter   ClusterIP   10.0.83.30     <none>        9100/TCP           2m46s

NAMESPACE    NAME                                                          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
monitoring   daemonset.apps/prometheus-operator-prometheus-node-exporter   4         4         4       4            4           <none>          2m46s

NAMESPACE    NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
monitoring   deployment.apps/prometheus-operator-grafana    1/1     1            1           2m46s
monitoring   deployment.apps/prometheus-operator-operator   1/1     1            1           2m46s

NAMESPACE    NAME                                                      DESIRED   CURRENT   READY   AGE
monitoring   replicaset.apps/prometheus-operator-grafana-d6676b794     1         1         1       2m46s
monitoring   replicaset.apps/prometheus-operator-operator-6584f4b5f5   1         1         1       2m46s

NAMESPACE    NAME                                                             READY   AGE
monitoring   statefulset.apps/alertmanager-prometheus-operator-alertmanager   1/1     2m40s
monitoring   statefulset.apps/prometheus-prometheus-operator-prometheus       1/1     2m30s

sschne · 2019-11-29T14:27:21Z

What I've also discovered by trying to work around this: The issue persists, if i delete the chart and the CRDs afterwards and install the chart again, but the issue does not persist, if i do not delete the crds.

I tried out and installed the crds beforehand, and do a helm install --skip-crds, but still the issue persists. This somewhat confusing.

vsliouniaev · 2019-11-29T15:46:02Z

The next log line I would expect after this is about post-install,post-upgrade hooks, but it does not appear in your case. I'm not certain what helm is waiting on here

...
lient.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" RoleBinding
client.go:245: [debug] rolebindings.rbac.authorization.k8s.io "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" Role
client.go:245: [debug] roles.rbac.authorization.k8s.io "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ClusterRole
client.go:245: [debug] clusterroles.rbac.authorization.k8s.io "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ServiceAccount
client.go:245: [debug] serviceaccounts "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ClusterRoleBinding
client.go:245: [debug] clusterrolebindings.rbac.authorization.k8s.io "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" PodSecurityPolicy
client.go:245: [debug] podsecuritypolicies.policy "prom-op-prometheus-operato-admission" not found
client.go:87: [debug] creating 1 resource(s)
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission-patch" Job
client.go:245: [debug] jobs.batch "prom-op-prometheus-operato-admission-patch" not found
client.go:87: [debug] creating 1 resource(s)
client.go:420: [debug] Watching for changes to Job prom-op-prometheus-operato-admission-patch with timeout of 5m0s
client.go:445: [debug] Add/Modify event for prom-op-prometheus-operato-admission-patch: MODIFIED
client.go:484: [debug] prom-op-prometheus-operato-admission-patch: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:445: [debug] Add/Modify event for prom-op-prometheus-operato-admission-patch: MODIFIED
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" RoleBinding
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" Role
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ClusterRole
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ServiceAccount
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" ClusterRoleBinding
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission" PodSecurityPolicy
client.go:220: [debug] Starting delete for "prom-op-prometheus-operato-admission-patch" Job

truealex81 · 2019-11-29T20:55:08Z

Manual CRDs creation helps at least on Azure.
Firstly create crds from this link https://github.com/coreos/prometheus-operator/tree/release-0.34/example/prometheus-operator-crd
"kubectl create -f alertmanager.crd.yaml" and so on for all files
Then
helm install prometheus-operator stable/prometheus-operator --namespace monitoring --version 8.2.4 --set prometheusOperator.createCustomResource=false

willsilvano · 2019-12-02T11:35:37Z

Thanks @truealex81 ! That works on Azure.

bierhov · 2019-12-05T07:31:23Z

myenv:
k8s 1.11.2 helm 2.13.1 tiller 2.13.1
prometheus-operator-5.5 APP VERSION 0.29 is OK!!!

but:
prometheus-operator-8 APP VERSION 0.32 hava same problem:
"context canceled" or "grpc: the client connection is closing"!!!

i guess the lastest version of prometheus-operator is not compatible?!!!

vsliouniaev · 2019-12-05T08:57:29Z

@bierhov please can you post the resources in the namespace after a failure?

bierhov · 2019-12-05T09:03:42Z

yes!
shell execute "helm ls" i can see my prometheus-operator release status "failed",but the namespace where prometheus-operator i installed have all prometheus-operator resourses
but,
promethues web can't get any data!

vsliouniaev · 2019-12-05T09:36:38Z

Can you please post the resources though?

bierhov · 2019-12-05T10:26:18Z

Can you please post the resources though?

sorry,i cant reappear,unless i remove my stable helm env and do it again!

vsliouniaev · 2019-12-05T10:37:10Z

@bierhov do you have any failed jobs left after the install?

bierhov · 2019-12-05T11:41:27Z

@bierhov do you have any failed jobs left after the install?

my k8s version is 1.11.2 helm an tiller version is 2.13.1
if i install prometheus-operator version 8.x
shell exec command "helm ls",the job status is failed
but i install prometheus-operator version 5.x
shell exec command "helm ls",the job status is deployed!!!

zomarg · 2019-12-12T14:16:53Z

Not reproducable using:

Kubernetes version: v1.13.12"
Kubectl version: v1.16.2
Helm version: 3.0.1
Prometheus-operator version: 8.3.3

Install CRDs manually:

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/podmonitor.crd.yaml

Configure operator to not create crds in Values.yaml or when installing using

--set prometheusOperator.createCustomResource=false

prometheusOperator: createCustomResource: false

vsliouniaev · 2019-12-12T15:43:19Z

@gramozkrasniqi
What if you don't create CRDs manually? That's one of the workarounds for the issue

zomarg · 2019-12-12T17:22:24Z

@vsliouniaev if you dont create them you will get the error.
But in the original issue in Additional Info @rnkhouse stated that he was creating the CRDs manually.

alfonzso · 2019-12-17T23:22:24Z

We use prometheus-operator in our deployment, in a nutshell, we upgraded prom-op from 6.9.3 to 8.3.3 and always failed with "Error: context canceled".
Also we always install crds before install/upgrade prometheus-operator, and ofc we didn't change or update these crd-s.

I try to refresh crds, which in 'github.com/helm/charts/tree/master/stable/prometheus-operator' mentions ( like this kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/alertmanager.crd.yaml ), but these don't exists anymore.
After that I try to these from here: https://github.com/helm/charts/tree/master/stable/prometheus-operator/crds
But It failed again.

I almost gave up, but with these crds, helm deploy succeeded ! yeyyyy
https://github.com/coreos/kube-prometheus/tree/master/manifests/setup

My setup:

Kubernetes version: v1.14.3
Kubectl version: v1.14.2
Helm version: 2.14.3
Prometheus-operator version: 8.3.3

Purge prometheus-operator from k8s !

Then:

kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0alertmanagerCustomResourceDefinition.yaml   
kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0podmonitorCustomResourceDefinition.yaml     
kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0prometheusCustomResourceDefinition.yaml     
kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0prometheusruleCustomResourceDefinition.yaml 
kubectl apply -f https://raw.githubusercontent.com/coreos/kube-prometheus/master/manifests/setup/prometheus-operator-0servicemonitorCustomResourceDefinition.yaml

helm upgrade -i prom-op                               \
  --version 8.3.3                                     \
  --set prometheusOperator.createCustomResource=false \
  stable/prometheus-operator

That's all !

pandvan · 2019-12-19T15:37:39Z

Does this mean that it's necessary to do a clean install and lose historical metrics data?

truealex81 · 2019-12-20T06:27:36Z

Аfter upgarding AKS k8s to 1.15.5, helm to 3.0.1 and Prometheus-operator chart to 8.3.3 the problem is gone.

infa-ddeore · 2020-01-14T09:35:14Z

Our workaround is to keep prometheus operator image on v0.31.1.

worked for me as well on AKS v1.14.8 and helm+tiller v2.16.1 and changing operator image to v0.31.1

cocuba · 2020-01-28T19:41:51Z

Manual CRDs creation helps at least on Azure.
Firstly create crds from this link https://github.com/coreos/prometheus-operator/tree/release-0.34/example/prometheus-operator-crd
"kubectl create -f alertmanager.crd.yaml" and so on for all files
Then
helm install prometheus-operator stable/prometheus-operator --namespace monitoring --version 8.2.4 --set prometheusOperator.createCustomResource=false

in azure kubernetes works, thanks

Superset1986 · 2020-03-24T08:44:38Z

I was able to get around this issue by following the 'Helm fails to create CRDs' section in readme.md. I'm not sure how they're related, but it worked.

Step 1: Manually create the CRDS
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/example/prometheus-operator-crd/podmonitor.crd.yaml
Step 2:
Wait for CRDs to be created, which should only take a few seconds

Step 3:
Install the chart, but disable the CRD provisioning by setting prometheusOperator.createCustomResource=false
$ helm install --name my-release stable/prometheus-operator --set prometheusOperator.createCustomResource=false

Thanks, this worked for me with AKS cluster. had to change the URL for the CRD's.

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_alertmanagers.yaml --validate=false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_podmonitors.yaml --validate=false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_prometheuses.yaml --validate=false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_prometheusrules.yaml --validate=false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_servicemonitors.yaml --validate=false
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.37/example/prometheus-operator-crd/monitoring.coreos.com_thanosrulers.yaml --validate=false

helm install stable/prometheus-operator --name prometheus-operator --namespace monitoring --set prometheusOperator.createCustomResource=false

bacongobbler · 2020-10-14T21:36:51Z

Closing. Looks like this has been since resolved, according to the last three commenters. Thanks!

bacongobbler added the question/support label Aug 2, 2019

pyadminn mentioned this issue Sep 10, 2019

[stable/prometheus-operator] Release "prometheus-operator" failed: rpc error: code = Canceled helm/charts#15977

Closed

vsliouniaev mentioned this issue Sep 12, 2019

[stable/prometheus-operator] Add 10m timeout, don't clean up to delete hooks helm/charts#17090

Closed

3 tasks

svparijs mentioned this issue Oct 1, 2019

[stable/prometheus] Installation fails with: Release "metrics" failed: context canceled helm/charts#17545

Closed

bacongobbler reopened this Nov 28, 2019

bacongobbler closed this as completed Oct 14, 2020

Release "prometheus-operator" failed: rpc error: code = Canceled #6130

Release "prometheus-operator" failed: rpc error: code = Canceled #6130

Comments

rnkhouse commented Jul 31, 2019

janvdvegt commented Aug 9, 2019

robinelfrink commented Aug 23, 2019

DLV111 commented Sep 2, 2019

DLV111 commented Sep 2, 2019

hickeyma commented Sep 2, 2019

PaulusTM commented Sep 2, 2019

zarvd commented Sep 4, 2019

will-beta commented Sep 6, 2019

bacongobbler commented Sep 6, 2019

will-beta commented Sep 7, 2019

bacongobbler commented Sep 7, 2019 • edited

will-beta commented Sep 7, 2019

ghost commented Sep 8, 2019

robinelfrink commented Sep 9, 2019

pyadminn commented Sep 10, 2019 • edited

vsliouniaev commented Sep 12, 2019 • edited

xvzf commented Sep 13, 2019

vsliouniaev commented Sep 13, 2019

pyadminn commented Sep 13, 2019

xvzf commented Sep 13, 2019

vsliouniaev commented Sep 14, 2019

pyadminn commented Sep 16, 2019 • edited

quantumhype commented Sep 20, 2019 • edited

xvzf commented Sep 23, 2019

pyadminn commented Sep 25, 2019

Typositoire commented Oct 2, 2019

waynekhan commented Nov 28, 2019

vsliouniaev commented Nov 28, 2019

sschne commented Nov 29, 2019

vsliouniaev commented Nov 29, 2019

willsilvano commented Nov 29, 2019

sschne commented Nov 29, 2019

willsilvano commented Nov 29, 2019

sschne commented Nov 29, 2019 • edited

vsliouniaev commented Nov 29, 2019

truealex81 commented Nov 29, 2019

willsilvano commented Dec 2, 2019

bierhov commented Dec 5, 2019

vsliouniaev commented Dec 5, 2019

bierhov commented Dec 5, 2019

vsliouniaev commented Dec 5, 2019

bierhov commented Dec 5, 2019

vsliouniaev commented Dec 5, 2019

bierhov commented Dec 5, 2019

zomarg commented Dec 12, 2019

vsliouniaev commented Dec 12, 2019

zomarg commented Dec 12, 2019

alfonzso commented Dec 17, 2019

pandvan commented Dec 19, 2019

truealex81 commented Dec 20, 2019

infa-ddeore commented Jan 14, 2020 • edited

cocuba commented Jan 28, 2020

Superset1986 commented Mar 24, 2020

bacongobbler commented Oct 14, 2020

bacongobbler commented Sep 7, 2019 •

edited

pyadminn commented Sep 10, 2019 •

edited

vsliouniaev commented Sep 12, 2019 •

edited

pyadminn commented Sep 16, 2019 •

edited

quantumhype commented Sep 20, 2019 •

edited

sschne commented Nov 29, 2019 •

edited

infa-ddeore commented Jan 14, 2020 •

edited