brig error on deploy wire-server #261

maaaaaaav · 2020-05-12T10:41:26Z

Hi there,

thanks again for all the help and assistance.

Currently trying to deploy wire-server using helm; everything is working fine except for when the brig kubes are stopping on CrashLoopBackOff

When i pull the logs, this is all I get:

wireadmin@wire-controller:~/wire-server-deploy/ansible$ kubectl logs brig-8674744bc7-ccbtf
{"logger":"cassandra.brig","msgs":["I","Known hosts: [datacenter1:rack1:172.16.32.31:9042,datacenter1:rack1:172.16.32.32:9042,datacenter1:rack1:172.16.32.33:9042]"]}
{"logger":"cassandra.brig","msgs":["I","New control connection: datacenter1:rack1:172.16.32.33:9042#<socket: 11>"]}

NAME                                  READY   STATUS             RESTARTS   AGE
brig-8674744bc7-ccbtf                 0/1     CrashLoopBackOff   6          7m58s
brig-8674744bc7-jlpgn                 0/1     CrashLoopBackOff   7          7m58s
brig-8674744bc7-mbh5m                 0/1     CrashLoopBackOff   7          7m58s
cannon-0                              1/1     Running            0          7m58s
cannon-1                              1/1     Running            0          7m58s
cannon-2                              1/1     Running            0          7m58s
cargohold-d474c7847-mpj7w             1/1     Running            0          7m58s
cargohold-d474c7847-phms7             1/1     Running            0          7m58s
cargohold-d474c7847-r4j8b             1/1     Running            0          7m58s
cassandra-migrations-g667z            0/1     Completed          0          8m7s
demo-smtp-84b7b85ff6-k2djh            1/1     Running            0          9h
elasticsearch-index-create-xnzwm      0/1     Completed          0          8m1s
fake-aws-dynamodb-84f87cd86b-dsz2v    2/2     Running            0          9h
fake-aws-s3-5468cdf989-fccm9          1/1     Running            0          9h
fake-aws-s3-reaper-7c6d9cddd6-ff8fn   1/1     Running            0          9h
fake-aws-sns-5c56774d95-dwcsw         2/2     Running            0          9h
fake-aws-sqs-554bbc684d-cqxzl         2/2     Running            0          9h
galley-87df7b65f-kp588                1/1     Running            0          7m58s
galley-87df7b65f-t7wtd                1/1     Running            0          7m58s
galley-87df7b65f-vhzpg                1/1     Running            0          7m58s
gundeck-f9bf469f9-b9rxt               1/1     Running            0          7m58s
gundeck-f9bf469f9-clff6               1/1     Running            0          7m58s
gundeck-f9bf469f9-gx8d4               1/1     Running            0          7m57s
nginz-77f7ff6f5d-5m94p                2/2     Running            1          7m58s
nginz-77f7ff6f5d-h7w5n                2/2     Running            1          7m58s
nginz-77f7ff6f5d-pwbzl                2/2     Running            1          7m58s
redis-ephemeral-69bb4885bb-qbmdw      1/1     Running            0          8h
spar-59fd5db594-gbsbz                 1/1     Running            0          7m58s
spar-59fd5db594-jclmh                 1/1     Running            0          7m58s
spar-59fd5db594-zvbl6                 1/1     Running            0          7m58s
webapp-6cb84759d9-wfhc9               1/1     Running            0          7m58s
wireadmin@wire-controller:~/wire-server-deploy/ansible$

those are the correct IPs for my three cassandra nodes and they seem to be up fine. I'm using cassandra-external to point them there.

any guidance as to what I should upload to help with this would be much appreciated too.

Thanks!

The text was updated successfully, but these errors were encountered:

akshaymankar · 2020-05-13T07:26:43Z

Hello @maaaaaaav, sometimes while pods are in crash loop, the logs can be from just before it crashed. The logs that you've added don't look like failure. Can you please check the logs of a couple of times again see if there is anything new?

ramesh8830 · 2020-05-16T16:54:23Z

I had the same issue especially when trying to configure own smtp server using #266. Below are the warnings and failure messages inside kubectl describe pod.

Normal Scheduled 4m22s default-scheduler Successfully assigned production/brig-69969b5bdc-ndn8b to kubenode02
Warning Unhealthy 3m29s (x5 over 4m9s) kubelet, kubenode02 Readiness probe failed: Get http://10.233.65.172:8080/i/status: dial tcp 10.233.65.172:8080: connect: connection refused
Normal Pulling 3m23s (x3 over 4m20s) kubelet, kubenode02 Pulling image "quay.io/wire/brig:latest"
Warning Unhealthy 3m23s (x6 over 4m13s) kubelet, kubenode02 Liveness probe failed: Get http://10.233.65.172:8080/i/status: dial tcp 10.233.65.172:8080: connect: connection refused
Normal Killing 3m23s (x2 over 3m53s) kubelet, kubenode02 Container brig failed liveness probe, will be restarted
Normal Pulled 3m22s (x3 over 4m16s) kubelet, kubenode02 Successfully pulled image "quay.io/wire/brig:latest"
Normal Created 3m22s (x3 over 4m16s) kubelet, kubenode02 Created container brig
Normal Started 3m22s (x3 over 4m15s) kubelet, kubenode02 Started container brig

akshaymankar · 2020-05-18T09:41:11Z

@ramesh8830 Do you also see nothing interesting in kubectl logs for the brig pods?

ramesh8830 · 2020-05-20T13:10:02Z

Hi @akshaymankar

Nothing there in the kubectl logs for brig pods. Brig pods keep waiting on the Ready Status and eventually fall into CrashLoopBackOff.

I am also getting same log @maaaaaaav reported in his post showing all cassandra nodes.

wireadmin@wire-controller:~/wire-server-deploy/ansible$ kubectl logs brig-8674744bc7-ccbtf
{"logger":"cassandra.brig","msgs":["I","Known hosts: [datacenter1:rack1:172.16.32.31:9042,datacenter1:rack1:172.16.32.32:9042,datacenter1:rack1:172.16.32.33:9042]"]}
{"logger":"cassandra.brig","msgs":["I","New control connection: datacenter1:rack1:172.16.32.33:9042#<socket: 11>"]}

akshaymankar · 2020-05-20T14:35:34Z

 Warning Unhealthy 3m23s (x6 over 4m13s) kubelet, kubenode02 Liveness probe failed: Get http://10.233.65.172:8080/i/status: dial tcp 10.233.65.172:8080: connect: connection refused

This indicates that brig is taking some time to come up and K8s is not patient enough for that. Usually brig prints a line like this when it starts listening on the port:

I, Listening on 0.0.0.0:8080

I would make sure that the pod is getting enough CPU/RAM. And if that is the case, I would bump up the logging in brig to Debug or even Trace and see if I find anything in the logs. Hope this helps!

ramesh8830 · 2020-06-04T19:17:41Z

Its has enough CPU/RAM. It happens only when we use user name password for SMTP configuration other than demo credentials. If I use demo credentials for smtp then brig pods running successfully.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

brig error on deploy wire-server #261

brig error on deploy wire-server #261

maaaaaaav commented May 12, 2020 •

edited

akshaymankar commented May 13, 2020

ramesh8830 commented May 16, 2020

akshaymankar commented May 18, 2020

ramesh8830 commented May 20, 2020 •

edited

akshaymankar commented May 20, 2020

ramesh8830 commented Jun 4, 2020

brig error on deploy wire-server #261

brig error on deploy wire-server #261

Comments

maaaaaaav commented May 12, 2020 • edited

akshaymankar commented May 13, 2020

ramesh8830 commented May 16, 2020

akshaymankar commented May 18, 2020

ramesh8830 commented May 20, 2020 • edited

akshaymankar commented May 20, 2020

ramesh8830 commented Jun 4, 2020

maaaaaaav commented May 12, 2020 •

edited

ramesh8830 commented May 20, 2020 •

edited