Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snmp-mongodb and snmp-redis-master in CrashLoopBackOff status #894

Closed
thasteve opened this issue Oct 19, 2023 · 11 comments
Closed

snmp-mongodb and snmp-redis-master in CrashLoopBackOff status #894

thasteve opened this issue Oct 19, 2023 · 11 comments
Assignees

Comments

@thasteve
Copy link

Issue during initial configuration. Followed steps to run
microk8s helm3 upgrade --install snmp -f values.yaml splunk-connect-for-snmp/splunk-connect-for-snmp --namespace=sc4snmp --create-namespace
After running microk8s kubectl get pods -n sc4snmp to verify deployment I see snmp-redis-master-0 and snmp-mongodb-... are in a CrashLoopBackOff status.

I'm running on an ESXi hosted Rocky linux VM with plenty of resources.

@ajasnosz
Copy link
Collaborator

ajasnosz commented Oct 20, 2023

Could you share the output of microk8s kubectl describe pod <pod-name> -n sc4snmp and events for both failing pods.
What is the version of sc4snmp you are trying tu run?

@ajasnosz ajasnosz self-assigned this Oct 20, 2023
@thasteve
Copy link
Author

When I search repo snmp I see the following app version. This was what I redeployed yesterday trying to troubleshoot the issue.

NAME CHART VERSION APP VERSION DESCRIPTION
splunk-connect-for-snmp/splunk-connect-for-snmp 1.9.2 1.9.2 A Helm chart for SNMP Connect for SNMP

Below is the output microk8s kubectl describe pod <pod-name> -n sc4snmp. Edited the Node name and IP for security.

snmp-redis-master-0

Name: snmp-redis-master-0
Namespace: sc4snmp
Priority: 0
Service Account: snmp-redis
Node: snmp01.domain.com/192.168.1.2
Start Time: Thu, 19 Oct 2023 10:03:11 -0400
Labels: app.kubernetes.io/component=master
app.kubernetes.io/instance=snmp
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=redis
controller-revision-hash=snmp-redis-master-54465fc56
helm.sh/chart=redis-17.3.18
statefulset.kubernetes.io/pod-name=snmp-redis-master-0
Annotations: checksum/configmap: 04422870eebf6e73b372f1816da4f48d5d9a753f31c07f4e8decf26858647c5e
checksum/health: 230c16035014813c1ed5dca4b334fceead271bc2437e3adc38ba319bfa89ad67
checksum/scripts: 50226e5366a7aaef5c150dc915b32e209d9248fdf6ca19b9e2517edebe8aa072
checksum/secret: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
cni.projectcalico.org/containerID: a08e8a7e22b5416faf0e4db68bde97043cb51596f2eeac34327a0e74930aae83
cni.projectcalico.org/podIP: 10.1.89.19/32
cni.projectcalico.org/podIPs: 10.1.89.19/32
Status: Running
IP: 10.1.89.19
IPs:
IP: 10.1.89.19
Controlled By: StatefulSet/snmp-redis-master
Containers:
redis:
Container ID: containerd://f4744cfba8a15bfe85c910c3adb5d2a1b6c13fbbd66105ae944a1f1c0006e991
Image: docker.io/bitnami/redis:7.0.7-debian-11-r2
Image ID: docker.io/bitnami/redis@sha256:5481f3ce531dd4d756806491ef911c23eda0636dd9568eb654fbba4c6a854a9e
Port: 6379/TCP
Host Port: 0/TCP
Command:
/bin/bash
Args:
-c
/opt/bitnami/scripts/start-scripts/start-master.sh
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 20 Oct 2023 09:29:52 -0400
Finished: Fri, 20 Oct 2023 09:29:52 -0400
Ready: False
Restart Count: 278
Liveness: exec [sh -c /health/ping_liveness_local.sh 5] delay=20s timeout=6s period=5s #success=1 #failure=5
Readiness: exec [sh -c /health/ping_readiness_local.sh 1] delay=20s timeout=2s period=5s #success=1 #failure=5
Environment:
BITNAMI_DEBUG: false
REDIS_REPLICATION_MODE: master
ALLOW_EMPTY_PASSWORD: yes
REDIS_TLS_ENABLED: no
REDIS_PORT: 6379
Mounts:
/data from redis-data (rw)
/health from health (rw)
/opt/bitnami/redis/etc/ from redis-tmp-conf (rw)
/opt/bitnami/redis/mounted-etc from config (rw)
/opt/bitnami/scripts/start-scripts from start-scripts (rw)
/tmp from tmp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-kx7wn (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
redis-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: redis-data-snmp-redis-master-0
ReadOnly: false
start-scripts:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: snmp-redis-scripts
Optional: false
health:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: snmp-redis-health
Optional: false
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: snmp-redis-configuration
Optional: false
redis-tmp-conf:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:
tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:
kube-api-access-kx7wn:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
Warning BackOff 72s (x6937 over 23h) kubelet Back-off restarting failed container

snmp-mongodb-75b89b595f-qftj9

Name: snmp-mongodb-75b89b595f-qftj9
Namespace: sc4snmp
Priority: 0
Service Account: snmp-mongodb
Node: snmp01.domain.com/192.168.1.2
Start Time: Thu, 19 Oct 2023 10:03:11 -0400
Labels: app.kubernetes.io/component=mongodb
app.kubernetes.io/instance=snmp
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=mongodb
helm.sh/chart=mongodb-12.1.31
pod-template-hash=75b89b595f
Annotations: cni.projectcalico.org/containerID: f21b10d2a2c3e790bc3600f23bca0d6357dd0d8e51ce0832631807ce4c025622
cni.projectcalico.org/podIP: 10.1.89.18/32
cni.projectcalico.org/podIPs: 10.1.89.18/32
Status: Running
IP: 10.1.89.18
IPs:
IP: 10.1.89.18
Controlled By: ReplicaSet/snmp-mongodb-75b89b595f
Init Containers:
volume-permissions:
Container ID: containerd://9c7f0325183523ffdb3da1b859050630ffbe573ea07d6c3f0d71acf369368ccf
Image: docker.io/bitnami/bitnami-shell:11-debian-11-r21
Image ID: docker.io/bitnami/bitnami-shell@sha256:d05ec18b29aed67267a0a9c2c64c02594e6aa5791ccac2b7b1f5bab3f7ff7851
Port:
Host Port:
Command:
/bin/bash
Args:
-ec
mkdir -p /bitnami/mongodb/
chown 1001:1001 /bitnami/mongodb/
find /bitnami/mongodb/ -mindepth 1 -maxdepth 1 -not -name ".snapshot" -not -name "lost+found" | xargs -r chown -R 1001:1001

State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 19 Oct 2023 10:08:50 -0400
Finished: Thu, 19 Oct 2023 10:08:50 -0400
Ready: True
Restart Count: 0
Environment:
Mounts:
/bitnami/mongodb from datadir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8wx9z (ro)
Containers:
mongodb:
Container ID: containerd://0276593d96780786602b4bf6e8729ee68b8063ca61c8c7e13b37e1e4cb7b8e0b
Image: docker.io/bitnami/mongodb:5.0.10-debian-11-r3
Image ID: docker.io/bitnami/mongodb@sha256:563e1572db6c23a7bc5d8970d4cf06de1f1a80bd41c4b5e273a92bfa9f26d0f1
Port: 27017/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 132
Started: Fri, 20 Oct 2023 09:35:44 -0400
Finished: Fri, 20 Oct 2023 09:35:44 -0400
Ready: False
Restart Count: 279
Liveness: exec [/bitnami/scripts/ping-mongodb.sh] delay=30s timeout=10s period=20s #success=1 #failure=6
Readiness: exec [/bitnami/scripts/readiness-probe.sh] delay=5s timeout=5s period=10s #success=1 #failure=6
Environment:
BITNAMI_DEBUG: false
ALLOW_EMPTY_PASSWORD: yes
MONGODB_SYSTEM_LOG_VERBOSITY: 0
MONGODB_DISABLE_SYSTEM_LOG: no
MONGODB_DISABLE_JAVASCRIPT: no
MONGODB_ENABLE_JOURNAL: yes
MONGODB_PORT_NUMBER: 27017
MONGODB_ENABLE_IPV6: no
MONGODB_ENABLE_DIRECTORY_PER_DB: no
Mounts:
/bitnami/mongodb from datadir (rw)
/bitnami/scripts from common-scripts (rw)
/docker-entrypoint-initdb.d from custom-init-scripts (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8wx9z (ro)
metrics:
Container ID: containerd://a3a3b7e866af19c0e90adafb4c2584cbabb83f3c97b17a6b554bfae6bfd3ceac
Image: docker.io/bitnami/mongodb-exporter:0.33.0-debian-11-r9
Image ID: docker.io/bitnami/mongodb-exporter@sha256:078725e342e6c77343e121c1dc784a1bb38c38516814ab79ca8853a1385188c0
Port: 9216/TCP
Host Port: 0/TCP
Command:
/bin/bash
-ec
Args:
/bin/mongodb_exporter --collect-all --compatible-mode --web.listen-address ":9216" --mongodb.uri "mongodb://localhost:27017/admin?"

State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Fri, 20 Oct 2023 09:34:32 -0400
Finished: Fri, 20 Oct 2023 09:35:02 -0400
Ready: False
Restart Count: 461
Liveness: http-get http://:metrics/metrics delay=15s timeout=5s period=5s #success=1 #failure=3
Readiness: http-get http://:metrics/metrics delay=5s timeout=1s period=5s #success=1 #failure=3
Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8wx9z (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
common-scripts:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: snmp-mongodb-common-scripts
Optional: false
custom-init-scripts:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: snmp-mongodb-init-scripts
Optional: false
datadir:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: snmp-mongodb
ReadOnly: false
kube-api-access-8wx9z:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
Warning BackOff 61s (x8196 over 23h) kubelet Back-off restarting failed container

@ajasnosz
Copy link
Collaborator

Can you reinstall the sc4snmp and then collect the logs and pvc information for redis and mongo with commands:

microk8s kubectl logs -f <pod-name>  -n sc4snmp
microk8s kubectl get pvc  -n sc4snmp
microk8s kubectl pvc/<pvc-name>  -n sc4snmp

@thasteve
Copy link
Author

Got it.

I uninstalled microk8s to remove all of the pods. I'm following this guide for installation - https://splunk.github.io/splunk-connect-for-snmp/main/gettingstarted/sc4snmp-installation/

Here are the requested outputs after reinstalling -

microk8s kubectl logs -f snmp-redis-master-0 -n sc4snmp

1:C 23 Oct 2023 16:39:46.377 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 23 Oct 2023 16:39:46.377 # Redis version=7.0.7, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 23 Oct 2023 16:39:46.377 # Configuration loaded
1:M 23 Oct 2023 16:39:46.378 * monotonic clock: POSIX clock_gettime
1:M 23 Oct 2023 16:39:46.378 * Running mode=standalone, port=6379.
1:M 23 Oct 2023 16:39:46.378 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 23 Oct 2023 16:39:46.378 # Server initialized
1:M 23 Oct 2023 16:39:46.379 # Can't open or create append-only dir appendonlydir: Permission denied

microk8s kubectl logs -f snmp-mongodb-75b89b595f-fbhbq -n sc4snmp

Defaulted container "mongodb" out of: mongodb, metrics, volume-permissions (init)
mongodb 16:58:06.38
mongodb 16:58:06.38 Welcome to the Bitnami mongodb container
mongodb 16:58:06.38 Subscribe to project updates by watching https://github.com/bitnami/containers
mongodb 16:58:06.38 Submit issues and feature requests at https://github.com/bitnami/containers/issues
mongodb 16:58:06.38
mongodb 16:58:06.39 INFO ==> ** Starting MongoDB setup **
mongodb 16:58:06.40 INFO ==> Validating settings in MONGODB_* env vars...
mongodb 16:58:06.66 WARN ==> You set the environment variable ALLOW_EMPTY_PASSWORD=yes. For safety reasons, do not use this flag in a production environment.
mongodb 16:58:06.68 INFO ==> Initializing MongoDB...
mongodb 16:58:06.70 INFO ==> Deploying MongoDB from scratch...
/opt/bitnami/scripts/libos.sh: line 336: 46 Illegal instruction (core dumped) "$@" > /dev/null 2>&1

microk8s kubectl get pvc -n sc4snmp

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
snmp-mongodb Bound pvc-d6013ea1-49bc-4cb8-a508-2f32cf90f5ba 8Gi RWO microk8s-hostpath 67m
redis-data-snmp-redis-master-0 Bound pvc-5bbde51a-75a1-4ff1-a8ee-0322df00447d 8Gi RWO microk8s-hostpath 67m

microk8s kubectl describe pvc/redis-data-snmp-redis-master-0 -n sc4snmp

Name: redis-data-snmp-redis-master-0
Namespace: sc4snmp
StorageClass: microk8s-hostpath
Status: Bound
Volume: pvc-5bbde51a-75a1-4ff1-a8ee-0322df00447d
Labels: app.kubernetes.io/component=master
app.kubernetes.io/instance=snmp
app.kubernetes.io/name=redis
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: microk8s.io/hostpath
volume.kubernetes.io/selected-node: snmp01.domain.com
volume.kubernetes.io/storage-provisioner: microk8s.io/hostpath
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 8Gi
Access Modes: RWO
VolumeMode: Filesystem
Used By: snmp-redis-master-0
Events:

microk8s kubectl describe pvc/snmp-mongodb -n sc4snmp

Name: snmp-mongodb
Namespace: sc4snmp
StorageClass: microk8s-hostpath
Status: Bound
Volume: pvc-d6013ea1-49bc-4cb8-a508-2f32cf90f5ba
Labels: app.kubernetes.io/component=mongodb
app.kubernetes.io/instance=snmp
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=mongodb
helm.sh/chart=mongodb-12.1.31
Annotations: meta.helm.sh/release-name: snmp
meta.helm.sh/release-namespace: sc4snmp
pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: microk8s.io/hostpath
volume.kubernetes.io/selected-node: snmp01.domain.com
volume.kubernetes.io/storage-provisioner: microk8s.io/hostpath
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 8Gi
Access Modes: RWO
VolumeMode: Filesystem
Used By: snmp-mongodb-75b89b595f-fbhbq
Events:

@ajasnosz
Copy link
Collaborator

From what I saw about redis it's about permission and in this issue it's mentioned that it can be solved by enabling volumePermissions. To do it you can add it into yours value.yaml like this:

redis:
  volumePermissions: 
    enabled: true

And reinstall the snmp, you can uninstall with the mentioned link.

If that doesn't work you can look at those links for more informations:
https://stackoverflow.com/questions/55201167/redis-service-fails-with-permission-denied-on-append-file
helm/charts#5041

About mongodb, I found that mongo above version 5 requires to run on cpu with avx. Please check if your environment supports that, as we are currently running mongodb in v6 in sc4snmp.
To check if your cpu is supporing avx you can use commands:
lscpu | grep avx
or
cat /proc/cpuinfo and search for avx inside flags section
You can see those issues for more information:
bitnami/charts#10255
bitnami/charts#12834

@thasteve
Copy link
Author

From what I saw about redis it's about permission and in this bitnami/charts#14327 it's mentioned that it can be solved by enabling volumePermissions. To do it you can add it into yours value.yaml like this:

redis:
  volumePermissions: 
    enabled: true

This worked. The redis pod is now running. I'm working with the VI team to understand why my virtual device doesn't have an AVX flag from the CPU. I'll update the issue when I hear back from them on a solution.

Thanks.

@thasteve
Copy link
Author

About mongodb, I found that mongo above version 5 requires to run on cpu with avx. Please check if your environment supports that, as we are currently running mongodb in v6 in sc4snmp.
To check if your cpu is supporing avx you can use commands:
lscpu | grep avx
or
cat /proc/cpuinfo and search for avx inside flags section
You can see those issues for more information:
bitnami/charts#10255
bitnami/charts#12834

Is there a way or can you provide any information on downgrading the MongoDB package to version 4? It appears that was the solution for a lot of individuals that were not able to provide AVX support for a guest device. In our case the virtualization platform will not pass the AVX instruction set from the VI host to the guest.

If it turns out we can't use an older MongoDB version we may have to scrap this sc4snmp project.

@ajasnosz
Copy link
Collaborator

First option is to run the last version of sc4snmp that used mongo v4, which is 1.8.4.

I'm not sure if the latest version of code will be compatible with the older mongo but you can try to update it.
To do it you have to download the repository.
Go to dir:
cd splunk-connect-for-snmp/charts/splunk-connect-for-snmp/
Update Chart.yaml, last bitnami version running mongodb was 11.1.10

dependencies:
  - name: mongodb
    version: ~11.1.10

Run:
microk8s helm3 dep update
Go back to to directory with sc4snmp repository. Then to load new values to sc4snmp run command:
microk8s helm3 upgrade --install snmp -f values.yaml ~/splunk-connect-for-snmp/charts/splunk-connect-for-snmp/ --namespace=sc4snmp --create-namespace

@thasteve
Copy link
Author

First option is to run the last version of sc4snmp that used mongo v4, which is 1.8.4.

This got all of the pods running, However, there is now some fuss from the trap pods that appears to be a similar open issue from a year ago

Still waiting for redis://snmp-redis-headless:6379/0 #629

When I run microk8s kubectl logs snmp-splunk-connect-for-snmp-trap-86d79cf9c5-jwb6j -n sc4snmp I see

Still waiting for redis://snmp-redis-headless:6379/0 (3240s elapsed)
Still waiting for redis://snmp-redis-headless:6379/0 (3260s elapsed)
Still waiting for redis://snmp-redis-headless:6379/0 (3280s elapsed)
Still waiting for redis://snmp-redis-headless:6379/0 (3300s elapsed)
Still waiting for redis://snmp-redis-headless:6379/0 (3320s elapsed)

That's for both traps pods.

Curling my Splunk instance with the assigned HEC token generates an event. Sending a test SNMP trap generates an event in TCPDump on the sc4snmp host. But the trap event does not get sent to Splunk Cloud nor do I see the port 443 traffic that would suggest it's being sent.

I'm thinking that trap pod isn't running and won't receive trap events.

Here is microk8s kubectl describe service snmp-redis-headless -n sc4snmp

Name: snmp-redis-headless
Namespace: sc4snmp
Labels: app.kubernetes.io/instance=snmp
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=redis
helm.sh/chart=redis-16.8.10
Annotations: meta.helm.sh/release-name: snmp
meta.helm.sh/release-namespace: sc4snmp
Selector: app.kubernetes.io/instance=snmp,app.kubernetes.io/name=redis
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: None
IPs: None
Port: tcp-redis 6379/TCP
TargetPort: redis/TCP
Endpoints: 10.1.89.26:6379
Session Affinity: None
Events:

@thasteve
Copy link
Author

I wanted to add that I have been able to get the pods running effectively with the latest and greatest code. VI team was able to pass the AVX instruction set to my SNMP server. However, after reinstalling the above issue still exists with the "Still waiting for redis-headless" log events from traps. And SNMP traps are not getting sent to Splunkcloud. i believe as a result of the "Still waiting..." issue.

@ajasnosz
Copy link
Collaborator

The "Still waiting ..." most of the times is caused by the kubernetes dns issues. It was similar with the issue you referenced above.

  1. You can check if the addon dns is enabled.
    Run microk8s status and check for the dns section is enabled.
  2. Check if the coredns pod is up
    microk8s kubectl get pods -A
  3. Check the logs from coredns pod
    microk8s kubectl logs pod/coredns-<id> -n kube-system > coredns.log
    microk8s kubectl logs describe pod/coredns-<id> -n kube-system > coredns_describe.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants