Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chaoskube is restarting with status as Running and crashLoopBackoff #202

Open
ravikumar2000 opened this issue Apr 27, 2020 · 16 comments
Open

Comments

@ravikumar2000
Copy link

Every 2.0s: kubectl get deployments,pods --all-namespaces kmaster: Mon Apr 27 17:58:26 2020

NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
default deployment.apps/apache 0/1 1 0 2d7h
default deployment.apps/chaoskube-1587990114 0/1 1 0 6m28s
default deployment.apps/chaoskube-1587990210 0/1 1 0 4m55s
default deployment.apps/nginx 2/2 2 2 2d6h
kube-system deployment.apps/coredns 2/2 2 2 2d7h
more-apps deployment.apps/chaoskube 0/1 1 0 2d
more-apps deployment.apps/ghost 2/2 2 2 2d6h

NAMESPACE NAME READY STATUS RESTARTS AGE
default pod/apache-8454694d99-xgvrl 0/1 ImagePullBackOff 0 2d7h
default pod/chaoskube-1587990114-79c757f5cb-q8b8j 0/1 CrashLoopBackOff 5 6m28s
default pod/chaoskube-1587990210-795f6d7848-bs8dg 0/1 CrashLoopBackOff 4 4m55s
default pod/nginx-5ccf85b585-cv8m5 1/1 Running 0 2d6h
default pod/nginx-5ccf85b585-ws4xj 1/1 Running 0 2d6h
kube-system pod/coredns-66bff467f8-jnx5b 1/1 Running 0 2d7h
kube-system pod/coredns-66bff467f8-s9qp5 1/1 Running 0 2d7h
kube-system pod/etcd-kmaster 1/1 Running 0 2d7h
kube-system pod/kube-apiserver-kmaster 1/1 Running 0 2d7h
kube-system pod/kube-controller-manager-kmaster 1/1 Running 0 2d7h
kube-system pod/kube-flannel-ds-amd64-6sd7v 1/1 Running 0 2d7h
kube-system pod/kube-flannel-ds-amd64-9gg8w 1/1 Running 2 2d7h
kube-system pod/kube-flannel-ds-amd64-bqvbg 1/1 Running 0 2d7h
kube-system pod/kube-proxy-58zlr 1/1 Running 0 2d7h
kube-system pod/kube-proxy-dp6xb 1/1 Running 0 2d7h
kube-system pod/kube-proxy-vm75r 1/1 Running 0 2d7h
kube-system pod/kube-scheduler-kmaster 1/1 Running 0 2d7h
more-apps pod/chaoskube-56998c669c-kp5rx 0/1 CrashLoopBackOff 16 2d
more-apps pod/ghost-588cb7bd9f-746bg 1/1 Running 0 2d6h
more-apps pod/ghost-588cb7bd9f-9qtm6 1/1 Running 0 2d6h

Details of deployment file details
root@kmaster:~# helm install stable/chaoskube --generate-name
NAME: chaoskube-1587990114
LAST DEPLOYED: Mon Apr 27 17:51:58 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
chaoskube is running and will kill arbitrary pods every 10m.

You can follow the logs to see what chaoskube does:

POD=$(kubectl -n default get pods -l='app.kubernetes.io/instance=chaoskube-1587990114' --output=jsonpath='{.items[0].metadata.name}')
kubectl -n default logs -f $POD

You are running in dry-run mode. No pod is actually terminated.
root@kmaster:~# helm install stable/chaoskube --generate-name --debug --set dryRun=false
install.go:159: [debug] Original chart version: ""
install.go:176: [debug] CHART PATH: /root/.cache/helm/repository/chaoskube-3.1.4.tgz

client.go:108: [debug] creating 1 resource(s)
NAME: chaoskube-1587990210
LAST DEPLOYED: Mon Apr 27 17:53:31 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
USER-SUPPLIED VALUES:
dryRun: false

COMPUTED VALUES:
affinity: {}
annotations: null
debug: false
dryRun: false
excludedDaysOfYear: null
excludedPodNames: null
excludedTimesOfDay: null
excludedWeekdays: null
gracePeriod: -1s
image: quay.io/linki/chaoskube
imageTag: v0.14.0
includedPodNames: null
interval: 10m
labels: null
logFormat: null
metrics:
enabled: false
port: 8080
service:
port: 8080
type: ClusterIP
serviceMonitor:
additionalLabels: {}
enabled: false
minimumAge: 0s
name: chaoskube
namespaces: null
nodeSelector: {}
podAnnotations: {}
podLabels: {}
priorityClassName: ""
rbac:
create: false
serviceAccountName: default
replicas: 1
resources: {}
timezone: UTC
tolerations: []

HOOKS:
MANIFEST:

Source: chaoskube/templates/deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
name: chaoskube-1587990210
labels:
app.kubernetes.io/name: chaoskube
app.kubernetes.io/managed-by: "Helm"
app.kubernetes.io/instance: "chaoskube-1587990210"
helm.sh/chart: chaoskube-3.1.4
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: chaoskube
app.kubernetes.io/instance: chaoskube-1587990210
template:
metadata:
labels:
app.kubernetes.io/name: chaoskube
app.kubernetes.io/managed-by: "Helm"
app.kubernetes.io/instance: "chaoskube-1587990210"
helm.sh/chart: chaoskube-3.1.4
spec:
containers:
- name: chaoskube
image: quay.io/linki/chaoskube:v0.14.0
args:
- --interval=10m
- --labels=
- --annotations=
- --namespaces=
- --no-dry-run
- --excluded-weekdays=
- --excluded-times-of-day=
- --excluded-days-of-year=
- --timezone=UTC
- --minimum-age=0s
- --grace-period=-1s
- --metrics-address=
resources:
{}
securityContext:
runAsNonRoot: true
runAsUser: 65534
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
serviceAccountName: "default"

NOTES:
chaoskube is running and will kill arbitrary pods every 10m.

You can follow the logs to see what chaoskube does:

POD=$(kubectl -n default get pods -l='app.kubernetes.io/instance=chaoskube-1587990210' --output=jsonpath='{.items[0].metadata.name}')
kubectl -n default logs -f $POD

root@kmaster:# POD=$(kubectl -n default get pods -l='app.kubernetes.io/instance=chaoskube-1587990210' --output=jsonpath='{.items[0].metadata.name}')
root@kmaster:
#
root@kmaster:# kubectl -n default logs -f $POD
Error from server (NotFound): the server could not find the requested resource ( pods/log chaoskube-1587990210-795f6d7848-bs8dg)
root@kmaster:
# kubectl describe pods chaoskube-1587990210-795f6d7848-bs8dg -n default
Name: chaoskube-1587990210-795f6d7848-bs8dg
Namespace: default
Priority: 0
Node: knode/10.0.3.15
Start Time: Mon, 27 Apr 2020 17:53:31 +0530
Labels: app.kubernetes.io/instance=chaoskube-1587990210
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=chaoskube
helm.sh/chart=chaoskube-3.1.4
pod-template-hash=795f6d7848
Annotations:
Status: Running
IP: 192.168.1.16
IPs:
IP: 192.168.1.16
Controlled By: ReplicaSet/chaoskube-1587990210-795f6d7848
Containers:
chaoskube:
Container ID: docker://189430f9e1c73b8d8f91fa83202c39a5c6b090cfd5847684b06cc1bad1a9fc8c
Image: quay.io/linki/chaoskube:v0.14.0
Image ID: docker-pullable://quay.io/linki/chaoskube@sha256:74a8314513d94da26d407f29d2dba621ec9e607f5b2abbe07c3f6a521e00c7a4
Port:
Host Port:
Args:
--interval=10m
--labels=
--annotations=
--namespaces=
--no-dry-run
--excluded-weekdays=
--excluded-times-of-day=
--excluded-days-of-year=
--timezone=UTC
--minimum-age=0s
--grace-period=-1s
--metrics-address=
State: Running
Started: Mon, 27 Apr 2020 17:54:46 +0530
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 27 Apr 2020 17:54:04 +0530
Finished: Mon, 27 Apr 2020 17:54:34 +0530
Ready: True
Restart Count: 2
Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-gdsnt (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-gdsnt:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-gdsnt
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message


Normal Scheduled 80s default-scheduler Successfully assigned default/chaoskube-1587990210-795f6d7848-bs8dg to knode
Warning BackOff 16s kubelet, knode Back-off restarting failed container
Normal Pulled 5s (x3 over 79s) kubelet, knode Container image "quay.io/linki/chaoskube:v0.14.0" already present on machine
Normal Created 5s (x3 over 78s) kubelet, knode Created container chaoskube
Normal Started 5s (x3 over 78s) kubelet, knode Started container chaoskube
root@kmaster:~# kubectl describe pods chaoskube-1587990210-795f6d7848-bs8dg -n default
Name: chaoskube-1587990210-795f6d7848-bs8dg
Namespace: default
Priority: 0
Node: knode/10.0.3.15
Start Time: Mon, 27 Apr 2020 17:53:31 +0530
Labels: app.kubernetes.io/instance=chaoskube-1587990210
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=chaoskube
helm.sh/chart=chaoskube-3.1.4
pod-template-hash=795f6d7848
Annotations:
Status: Running
IP: 192.168.1.16
IPs:
IP: 192.168.1.16
Controlled By: ReplicaSet/chaoskube-1587990210-795f6d7848
Containers:
chaoskube:
Container ID: docker://189430f9e1c73b8d8f91fa83202c39a5c6b090cfd5847684b06cc1bad1a9fc8c
Image: quay.io/linki/chaoskube:v0.14.0
Image ID: docker-pullable://quay.io/linki/chaoskube@sha256:74a8314513d94da26d407f29d2dba621ec9e607f5b2abbe07c3f6a521e00c7a4
Port:
Host Port:
Args:
--interval=10m
--labels=
--annotations=
--namespaces=
--no-dry-run
--excluded-weekdays=
--excluded-times-of-day=
--excluded-days-of-year=
--timezone=UTC
--minimum-age=0s
--grace-period=-1s
--metrics-address=
State: Running
Started: Mon, 27 Apr 2020 17:54:46 +0530
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 27 Apr 2020 17:54:04 +0530
Finished: Mon, 27 Apr 2020 17:54:34 +0530
Ready: True
Restart Count: 2
Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-gdsnt (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-gdsnt:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-gdsnt
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message


Normal Scheduled 85s default-scheduler Successfully assigned default/chaoskube-1587990210-795f6d7848-bs8dg to knode
Warning BackOff 21s kubelet, knode Back-off restarting failed container
Normal Pulled 10s (x3 over 84s) kubelet, knode Container image "quay.io/linki/chaoskube:v0.14.0" already present on machine
Normal Created 10s (x3 over 83s) kubelet, knode Created container chaoskube
Normal Started 10s (x3 over 83s) kubelet, knode Started container chaoskube

Appreciate your support on this issue...

@linki
Copy link
Owner

linki commented Apr 29, 2020

Please let me know your output of kubectl version and kubectl -n default logs -f $POD. Try running chaoskube with the --debug flag as well.

@ravikumar2000
Copy link
Author

Hi ,
Please find the details
root@kmaster:~# kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T11:56:40Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T11:48:36Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

root@kmaster:~# helm version --short
v3.2.0+ge11b7ce

root@kmaster:~# helm install chaoskube stable/chaoskube -n chaoskube --set dryRun=false --debug install.go:159: [debug] Original chart version: ""
install.go:176: [debug] CHART PATH: /root/.cache/helm/repository/chaoskube-3.1.4.tgz

client.go:108: [debug] creating 1 resource(s)
NAME: chaoskube
LAST DEPLOYED: Wed Apr 29 14:33:53 2020
NAMESPACE: chaoskube
STATUS: deployed
REVISION: 1
TEST SUITE: None
USER-SUPPLIED VALUES:
dryRun: false

COMPUTED VALUES:
affinity: {}
annotations: null
debug: false
dryRun: false
excludedDaysOfYear: null
excludedPodNames: null
excludedTimesOfDay: null
excludedWeekdays: null
gracePeriod: -1s
image: quay.io/linki/chaoskube
imageTag: v0.14.0
includedPodNames: null
interval: 10m
labels: null
logFormat: null
metrics:
enabled: false
port: 8080
service:
port: 8080
type: ClusterIP
serviceMonitor:
additionalLabels: {}
enabled: false
minimumAge: 0s
name: chaoskube
namespaces: null
nodeSelector: {}
podAnnotations: {}
podLabels: {}
priorityClassName: ""
rbac:
create: false
serviceAccountName: default
replicas: 1
resources: {}
timezone: UTC
tolerations: []

HOOKS:
MANIFEST:

Source: chaoskube/templates/deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
name: chaoskube
labels:
app.kubernetes.io/name: chaoskube
app.kubernetes.io/managed-by: "Helm"
app.kubernetes.io/instance: "chaoskube"
helm.sh/chart: chaoskube-3.1.4
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: chaoskube
app.kubernetes.io/instance: chaoskube
template:
metadata:
labels:
app.kubernetes.io/name: chaoskube
app.kubernetes.io/managed-by: "Helm"
app.kubernetes.io/instance: "chaoskube"
helm.sh/chart: chaoskube-3.1.4
spec:
containers:
- name: chaoskube
image: quay.io/linki/chaoskube:v0.14.0
args:
- --interval=10m
- --labels=
- --annotations=
- --namespaces=
- --no-dry-run
- --excluded-weekdays=
- --excluded-times-of-day=
- --excluded-days-of-year=
- --timezone=UTC
- --minimum-age=0s
- --grace-period=-1s
- --metrics-address=
resources:
{}
securityContext:
runAsNonRoot: true
runAsUser: 65534
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
serviceAccountName: "default"

NOTES:
chaoskube is running and will kill arbitrary pods every 10m.

You can follow the logs to see what chaoskube does:

POD=$(kubectl -n chaoskube get pods -l='app.kubernetes.io/instance=chaoskube' --output=jsonpath='{.items[0].metadata.name}')
kubectl -n chaoskube logs -f $POD

root@kmaster:# POD=$(kubectl -n chaoskube get pods -l='app.kubernetes.io/instance=chaoskube' --output=jsonpath='{.items[0].metadata.name}')
root@kmaster:
# kubectl -n chaoskube logs -f $POD
Error from server (InternalError): Internal error occurred: Authorization error (user=kube-apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy)
root@kmaster:~# kubectl get pods,deployments --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
chaoskube pod/chaoskube-744fc5f769-ltw9v 1/1 Running 1 74s
default pod/apache-8454694d99-xgvrl 0/1 ImagePullBackOff 0 4d3h
default pod/chaoskube-68466fb9db-4p59b 1/1 Running 2 96s
default pod/nginx-deployment-6b474476c4-l5g5r 1/1 Running 0 4h40m
default pod/nginx-deployment-6b474476c4-t5jk8 1/1 Running 0 4h40m
kube-system pod/coredns-66bff467f8-bwggs 0/1 Running 0 4h11m
kube-system pod/coredns-66bff467f8-jnx5b 1/1 Terminating 0 4d4h
kube-system pod/coredns-66bff467f8-krmr4 0/1 Running 0 4h11m
kube-system pod/coredns-66bff467f8-s9qp5 1/1 Terminating 0 4d4h
kube-system pod/etcd-kmaster 1/1 Running 0 4d4h
kube-system pod/kube-apiserver-kmaster 1/1 Running 1 4d4h
kube-system pod/kube-controller-manager-kmaster 0/1 Error 3 4d4h
kube-system pod/kube-flannel-ds-amd64-6sd7v 1/1 Running 0 4d3h
kube-system pod/kube-flannel-ds-amd64-9gg8w 1/1 Running 2 4d3h
kube-system pod/kube-flannel-ds-amd64-bqvbg 1/1 Running 0 4d4h
kube-system pod/kube-proxy-58zlr 1/1 Running 0 4d3h
kube-system pod/kube-proxy-dp6xb 1/1 Running 0 4d3h
kube-system pod/kube-proxy-vm75r 1/1 Running 0 4d4h
kube-system pod/kube-scheduler-kmaster 1/1 Running 2 4d4h
monitoring pod/grafana-5c55845445-d74k6 1/1 Running 0 4h18m
monitoring pod/kube-state-metrics-957fd6c75-hwwf4 2/3 CrashLoopBackOff 48 4h18m
monitoring pod/node-exporter-98rb4 0/2 Pending 0 4h18m
monitoring pod/node-exporter-nq52g 2/2 Running 0 4h17m
monitoring pod/node-exporter-z4hqj 2/2 Running 0 4h17m
monitoring pod/prometheus-adapter-5949969998-4nd47 0/1 CrashLoopBackOff 47 4h18m
monitoring pod/prometheus-operator-574fd8ccd9-2q84p 1/2 CrashLoopBackOff 48 4h18m

NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
chaoskube deployment.apps/chaoskube 1/1 1 1 74s
default deployment.apps/apache 0/1 1 0 4d3h
default deployment.apps/chaoskube 1/1 1 1 96s
default deployment.apps/nginx-deployment 2/2 2 2 4h40m
kube-system deployment.apps/coredns 0/2 2 0 4d4h
monitoring deployment.apps/grafana 1/1 1 1 4h19m
monitoring deployment.apps/kube-state-metrics 0/1 1 0 4h19m
monitoring deployment.apps/prometheus-adapter 0/1 1 0 4h18m
monitoring deployment.apps/prometheus-operator 0/1 1 0 4h22m

root@kmaster:~# kubectl describe pod chaoskube-68466fb9db-4p59b
Name: chaoskube-68466fb9db-4p59b
Namespace: default
Priority: 0
Node: knode01/10.0.3.15
Start Time: Wed, 29 Apr 2020 14:32:59 +0530
Labels: app.kubernetes.io/instance=chaoskube
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=chaoskube
helm.sh/chart=chaoskube-3.1.4
pod-template-hash=68466fb9db
Annotations:
Status: Running
IP: 192.168.2.8
IPs:
IP: 192.168.2.8
Controlled By: ReplicaSet/chaoskube-68466fb9db
Containers:
chaoskube:
Container ID: docker://54e39e4c91699add269250ae494265812831e06c1f63ed94afd355cd3c8c88d3
Image: quay.io/linki/chaoskube:v0.14.0
Image ID: docker-pullable://quay.io/linki/chaoskube@sha256:74a8314513d94da26d407f29d2dba621ec9e607f5b2abbe07c3f6a521e00c7a4
Port:
Host Port:
Args:
--interval=10m
--labels=
--annotations=
--namespaces=
--excluded-weekdays=
--excluded-times-of-day=
--excluded-days-of-year=
--timezone=UTC
--minimum-age=0s
--grace-period=-1s
--metrics-address=
State: Running
Started: Wed, 29 Apr 2020 14:46:58 +0530
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 29 Apr 2020 14:46:02 +0530
Finished: Wed, 29 Apr 2020 14:46:32 +0530
Ready: True
Restart Count: 3
Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-gdsnt (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-gdsnt:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-gdsnt
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message


Normal Scheduled default-scheduler Successfully assigned default/chaoskube-68466fb9db-4p59b to knode01
Normal Pulling 3m5s kubelet, knode01 Pulling image "quay.io/linki/chaoskube:v0.14.0"
Normal Pulled 2m51s kubelet, knode01 Successfully pulled image "quay.io/linki/chaoskube:v0.14.0"
Warning BackOff (x3 over ) kubelet, knode01 Back-off restarting failed container
Normal Created (x4 over 2m51s) kubelet, knode01 Created container chaoskube
Normal Started (x4 over 2m51s) kubelet, knode01 Started container chaoskube
Normal Pulled (x3 over ) kubelet, knode01 Container image "quay.io/linki/chaoskube:v0.14.0" already present on machine

Please find the describe logs for the chaoskube pod & let me know if you need any other details

@linki
Copy link
Owner

linki commented Apr 29, 2020

Please make sure you have permission to read the logs. Currently you get:

$ kubectl -n chaoskube logs -f $POD
Error from server (InternalError): Internal error occurred: Authorization error (user=kube-apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy)

Please set the --debug flag on chaoskube, not on helm itself, with:

helm install chaoskube stable/chaoskube -n chaoskube --set dryRun=false,debug=true

@ravikumar2000
Copy link
Author

I am getting below issues
root@kmaster:~# helm install chaoskube stable/chaoskube -n chaoskube --set dryRun=false --set namespaces='!kube-system' --set labels=app-purpose=chaos --set interval=20s --debug=true
install.go:159: [debug] Original chart version: ""
install.go:176: [debug] CHART PATH: /root/.cache/helm/repository/chaoskube-3.1.4.tgz

Error: cannot re-use a name that is still in use
helm.go:84: [debug] cannot re-use a name that is still in use
helm.sh/helm/v3/pkg/action.(*Install).availableName
/home/circleci/helm.sh/helm/pkg/action/install.go:424
helm.sh/helm/v3/pkg/action.(*Install).Run
/home/circleci/helm.sh/helm/pkg/action/install.go:175
main.runInstall
/home/circleci/helm.sh/helm/cmd/helm/install.go:229
main.newInstallCmd.func1
/home/circleci/helm.sh/helm/cmd/helm/install.go:117
github.com/spf13/cobra.(*Command).execute
/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:842
github.com/spf13/cobra.(*Command).ExecuteC
/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:950
github.com/spf13/cobra.(*Command).Execute
/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:887
main.main
/home/circleci/helm.sh/helm/cmd/helm/helm.go:83
runtime.main
/usr/local/go/src/runtime/proc.go:203
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1357

@ravikumar2000
Copy link
Author

If i set debug =true also not getting any logs
root@kmaster:~# helm install chaoskube stable/chaoskube -n chaoskube --set dryRun=false --set namespaces='!kube-system' --set labels=app-purpose=chaos --set interval=20s --set debug=true
NAME: chaoskube
LAST DEPLOYED: Wed Apr 29 16:38:43 2020
NAMESPACE: chaoskube
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
chaoskube is running and will kill arbitrary pods every 20s.

You can follow the logs to see what chaoskube does:

POD=$(kubectl -n chaoskube get pods -l='app.kubernetes.io/instance=chaoskube' --output=jsonpath='{.items[0].metadata.name}')
kubectl -n chaoskube logs -f $POD

root@kmaster:# POD=$(kubectl -n chaoskube get pods -l='app.kubernetes.io/instance=chaoskube' --output=jsonpath='{.items[0].metadata.name}')
root@kmaster:
# kubectl -n chaoskube logs -f $POD
Error from server (InternalError): Internal error occurred: Authorization error (user=kube-apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy)

@ravikumar2000
Copy link
Author

Can i try in minikube server with chaoskube
Please let me know the steps if you have handy

@ravikumar2000
Copy link
Author

I restarted kubectl and daemon services and tried still getting this issues unable to get pod detsila chaoskube is restarting not killing pods

namespace/chaoskube created
root@kmaster:~# helm install chaoskube stable/chaoskube -n chaoskube --set dryRun=false --set namespaces='!kube-system' --set labels=app-purpose=chaos --set interval=20s --set debug=true
NAME: chaoskube
LAST DEPLOYED: Wed Apr 29 20:16:34 2020
NAMESPACE: chaoskube
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
chaoskube is running and will kill arbitrary pods every 20s.

You can follow the logs to see what chaoskube does:

POD=$(kubectl -n chaoskube get pods -l='app.kubernetes.io/instance=chaoskube' --output=jsonpath='{.items[0].metadata.name}')
kubectl -n chaoskube logs -f $POD

root@kmaster:# POD=$(kubectl -n chaoskube get pods -l='app.kubernetes.io/instance=chaoskube' --output=jsonpath='{.items[0].metadata.name}')
root@kmaster:
# kubectl -n chaoskube logs -f $POD
Error from server (NotFound): the server could not find the requested resource ( pods/log chaoskube-66b94c48b6-wt7zl)

@ravikumar2000
Copy link
Author

I ran with chaoskube01 & below are the details
with debug
root@kmaster:~# helm install chaoskube01 stable/chaoskube -n chaoskube --set dryRun=false --set namespaces='!kube-system' --set labels=app-purpose=chaos --set interval=20s --debug
install.go:159: [debug] Original chart version: ""
install.go:176: [debug] CHART PATH: /root/.cache/helm/repository/chaoskube-3.1.4.tgz

client.go:108: [debug] creating 1 resource(s)
NAME: chaoskube01
LAST DEPLOYED: Wed Apr 29 20:19:20 2020
NAMESPACE: chaoskube
STATUS: deployed
REVISION: 1
TEST SUITE: None
USER-SUPPLIED VALUES:
dryRun: false
interval: 20s
labels: app-purpose=chaos
namespaces: '!kube-system'

COMPUTED VALUES:
affinity: {}
annotations: null
debug: false
dryRun: false
excludedDaysOfYear: null
excludedPodNames: null
excludedTimesOfDay: null
excludedWeekdays: null
gracePeriod: -1s
image: quay.io/linki/chaoskube
imageTag: v0.14.0
includedPodNames: null
interval: 20s
labels: app-purpose=chaos
logFormat: null
metrics:
enabled: false
port: 8080
service:
port: 8080
type: ClusterIP
serviceMonitor:
additionalLabels: {}
enabled: false
minimumAge: 0s
name: chaoskube
namespaces: '!kube-system'
nodeSelector: {}
podAnnotations: {}
podLabels: {}
priorityClassName: ""
rbac:
create: false
serviceAccountName: default
replicas: 1
resources: {}
timezone: UTC
tolerations: []

HOOKS:
MANIFEST:

Source: chaoskube/templates/deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
name: chaoskube01
labels:
app.kubernetes.io/name: chaoskube
app.kubernetes.io/managed-by: "Helm"
app.kubernetes.io/instance: "chaoskube01"
helm.sh/chart: chaoskube-3.1.4
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: chaoskube
app.kubernetes.io/instance: chaoskube01
template:
metadata:
labels:
app.kubernetes.io/name: chaoskube
app.kubernetes.io/managed-by: "Helm"
app.kubernetes.io/instance: "chaoskube01"
helm.sh/chart: chaoskube-3.1.4
spec:
containers:
- name: chaoskube
image: quay.io/linki/chaoskube:v0.14.0
args:
- --interval=20s
- --labels=app-purpose=chaos
- --annotations=
- --namespaces=!kube-system
- --no-dry-run
- --excluded-weekdays=
- --excluded-times-of-day=
- --excluded-days-of-year=
- --timezone=UTC
- --minimum-age=0s
- --grace-period=-1s
- --metrics-address=
resources:
{}
securityContext:
runAsNonRoot: true
runAsUser: 65534
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
serviceAccountName: "default"

NOTES:
chaoskube is running and will kill arbitrary pods every 20s.

You can follow the logs to see what chaoskube does:

POD=$(kubectl -n chaoskube get pods -l='app.kubernetes.io/instance=chaoskube01' --output=jsonpath='{.items[0].metadata.name}')
kubectl -n chaoskube logs -f $POD

root@kmaster:# POD=$(kubectl -n chaoskube get pods -l='app.kubernetes.io/instance=chaoskube01' --output=jsonpath='{.items[0].metadata.name}')
root@kmaster:
# kubectl -n chaoskube logs -f $POD
Error from server (NotFound): the server could not find the requested resource ( pods/log chaoskube01-745fdf6db8-tlbdj)

@linki
Copy link
Owner

linki commented May 5, 2020

Please capture the logs of chaoskube so we can see what's going wrong:

kubectl -n <namespace> logs -f <chaoskube pod>

Just use kubectl get pods --all-namespaces | grep chaos to find it.

@ravikumar2000
Copy link
Author

Getting timeout issues
-bash-4.2$ kubectl get deployments,pods --all-namespaces
NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
chaoskube deployment.extensions/chaoskube 1 1 1 0 39h
default deployment.extensions/nginx-shabeer 2 2 2 2 37h
kube-system deployment.extensions/coredns 2 2 2 2 11d
kube-system deployment.extensions/kubernetes-dashboard 1 1 1 1 11d
litmus deployment.extensions/chaos-operator-ce 1 1 1 1 38h
monitoring deployment.extensions/prometheus-operator 1 1 1 1 7d16h
nginx deployment.extensions/nginx 1 1 1 1 17h

NAMESPACE NAME READY STATUS RESTARTS AGE
chaoskube pod/chaoskube-7dcc48947b-vp4ns 0/1 CrashLoopBackOff 420 39h
default pod/mynginx 1/1 Running 0 12h
default pod/nginx-shabeer-85b6bf8c87-k5cdg 1/1 Running 0 37h
default pod/nginx-shabeer-85b6bf8c87-l4hh9 1/1 Running 0 37h
kube-system pod/coredns-6d5cc884f4-5f25h 1/1 Running 4 11d
kube-system pod/coredns-6d5cc884f4-ggmc2 1/1 Running 4 11d
kube-system pod/etcd-slc17twm 1/1 Running 3 11d
kube-system pod/kube-apiserver-slc17twm 1/1 Running 3 11d
kube-system pod/kube-controller-manager-slc17twm 1/1 Running 4 11d
kube-system pod/kube-flannel-ds-hjlfm 1/1 Running 0 11d
kube-system pod/kube-flannel-ds-mp227 1/1 Running 4 11d
kube-system pod/kube-flannel-ds-wfdt9 1/1 Running 0 11d
kube-system pod/kube-proxy-skc8t 1/1 Running 0 11d
kube-system pod/kube-proxy-smzzl 1/1 Running 0 11d
kube-system pod/kube-proxy-wblgs 1/1 Running 3 11d
kube-system pod/kube-scheduler-slc17twm 1/1 Running 4 11d
kube-system pod/kubernetes-dashboard-f6b58ff9c-lnj48 1/1 Running 4 11d
litmus pod/chaos-operator-ce-874897c4b-hhz48 1/1 Running 0 15h
monitoring pod/prometheus-operator-b4b6c96f8-z2gt9 2/2 Running 0 7d16h
nginx pod/nginx-dbddb74b8-9ms7f 1/1 Running 0 17h
-bash-4.2$ kubectl -n chaoskube logs -f chaoskube-7dcc48947b-vp4ns
time="2020-05-06T03:59:58Z" level=info msg="starting up" dryRun=false interval=20s version=v0.14.0
time="2020-05-06T04:00:28Z" level=fatal msg="failed to connect to cluster" err="Get https://10.96.0.1:443/version?timeout=32s: dial tcp 10.96.0.1:443: i/o timeout"
-bash-4.2$ kubectl get pods --all-namespaces | grep chaos
chaoskube chaoskube-7dcc48947b-vp4ns 0/1 CrashLoopBackOff 420 39h
litmus chaos-operator-ce-874897c4b-hhz48 1/1 Running 0 15h
-bash-4.2$

@ravikumar2000
Copy link
Author

Hi Martin,

I have deleted chaoskube pod and started running & observing forbidden.
Please find the logs
-bash-4.2$ kubectl logs -f chaoskube-7dcc48947b-w7fls -n chaoskube
time="2020-05-11T04:34:13Z" level=info msg="starting up" dryRun=false interval=20s version=v0.14.0
time="2020-05-11T04:34:13Z" level=info msg="connected to cluster" master="https://10.96.0.1:443" serverVersion=v1.12.10+1.0.11.el7
time="2020-05-11T04:34:13Z" level=info msg="setting pod filter" annotations= excludedPodNames="" includedPodNames="" labels="app-purpose=chaos" minimumAge=0s namespaces="!kube-system"
time="2020-05-11T04:34:13Z" level=info msg="setting quiet times" daysOfYear="[]" timesOfDay="[]" weekdays="[]"
time="2020-05-11T04:34:13Z" level=info msg="setting timezone" location=UTC name=UTC offset=0
time="2020-05-11T04:34:13Z" level=error msg="failed to terminate victim" err="pods is forbidden: User "system:serviceaccount:chaoskube:default" cannot list resource "pods" in API group "" at the cluster scope"
time="2020-05-11T04:34:33Z" level=error msg="failed to terminate victim" err="pods is forbidden: User "system:serviceaccount:chaoskube:default" cannot list resource "pods" in API group "" at the cluster scope"
time="2020-05-11T04:34:53Z" level=error msg="failed to terminate victim" err="pods is forbidden: User "system:serviceaccount:chaoskube:default" cannot list resource "pods" in API group "" at the cluster scope"
time="2020-05-11T04:35:13Z" level=error msg="failed to terminate victim" err="pods is forbidden: User "system:serviceaccount:chaoskube:default" cannot list resource "pods" in API group "" at the cluster scope"
time="2020-05-11T04:35:33Z" level=error msg="failed to terminate victim" err="pods is forbidden: User "system:serviceaccount:chaoskube:default" cannot list resource "pods" in API group "" at the cluster scope"
time="2020-05-11T04:35:53Z" level=error msg="failed to terminate victim" err="pods is forbidden: User "system:serviceaccount:chaoskube:default" cannot list resource "pods" in API group "" at the cluster scope"
time="2020-05-11T04:36:13Z" level=error msg="failed to terminate victim" err="pods is forbidden: User "system:serviceaccount:chaoskube:default" cannot list resource "pods" in API group "" at the cluster scope"

Appreciate your inputs on this.

Regards,
Ravi

@ravikumar2000
Copy link
Author

Hi Martin,

Chaoskube is not killing random pods. Appreciate your help on this..

Regards,
Ravi

@linki
Copy link
Owner

linki commented May 12, 2020

It doesn't have permission to connect to the Kubernetes API.

Install your Helm chart with --set rbac.create=true in order to create and use a ServiceAccount that has permissions to list and delete pods.

@ravikumar2000
Copy link
Author

Hi Martin,

After updating rbac.create =true
I am getting unexpected EOF issues still pod is running
-bash-4.2$ helm install stable/chaoskube --set rbac.create=true --generate-name
NAME: chaoskube-1589295169
LAST DEPLOYED: Tue May 12 07:52:52 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
chaoskube is running and will kill arbitrary pods every 10m.

You can follow the logs to see what chaoskube does:

POD=$(kubectl -n default get pods -l='app.kubernetes.io/instance=chaoskube-1589295169' --output=jsonpath='{.items[0].metadata.name}')
kubectl -n default logs -f $POD

You are running in dry-run mode. No pod is actually terminated.
-bash-4.2$ POD=$(kubectl -n default get pods -l='app.kubernetes.io/instance=chaoskube-1589295169' --output=jsonpath='{.items[0].metadata.name}')
-bash-4.2$ kubectl -n default logs -f $POD
time="2020-05-12T14:52:54Z" level=info msg="starting up" dryRun=true interval=10m0s version=v0.14.0
time="2020-05-12T14:52:54Z" level=info msg="connected to cluster" master="https://10.96.0.1:443" serverVersion=v1.12.10+1.0.11.el7
time="2020-05-12T14:52:54Z" level=info msg="setting pod filter" annotations= excludedPodNames="" includedPodNames="" labels= minimumAge=0s namespaces=
time="2020-05-12T14:52:54Z" level=info msg="setting quiet times" daysOfYear="[]" timesOfDay="[]" weekdays="[]"
time="2020-05-12T14:52:54Z" level=info msg="setting timezone" location=UTC name=UTC offset=0
time="2020-05-12T14:52:54Z" level=info msg="terminating pod" name=nginx-deployment-7bdbfcbc9c-tr889 namespace=nginx
error: unexpected EOF

@ravikumar2000
Copy link
Author

Hi Martin,

I am getting only the messages for terminating but it is not actually killing the pods
-bash-4.2$ kubectl get pods --all-namespaces | grep chaos
chaoskube chaoskube-844c5874bc-knxzx 1/1 Running 0 6h5m
default chaoskube-1589295169-c5bb5b85-h9kqw 1/1 Running 0 7m20s
litmus chaos-operator-ce-559656f698-v7ppg 1/1 Running 0 6h8m
-bash-4.2$ kubectl logs chaoskube-1589295169-c5bb5b85-h9kqw -f
time="2020-05-12T14:52:54Z" level=info msg="starting up" dryRun=true interval=10m0s version=v0.14.0
time="2020-05-12T14:52:54Z" level=info msg="connected to cluster" master="https://10.96.0.1:443" serverVersion=v1.12.10+1.0.11.el7
time="2020-05-12T14:52:54Z" level=info msg="setting pod filter" annotations= excludedPodNames="" includedPodNames="" labels= minimumAge=0s namespaces=
time="2020-05-12T14:52:54Z" level=info msg="setting quiet times" daysOfYear="[]" timesOfDay="[]" weekdays="[]"
time="2020-05-12T14:52:54Z" level=info msg="setting timezone" location=UTC name=UTC offset=0
time="2020-05-12T14:52:54Z" level=info msg="terminating pod" name=nginx-deployment-7bdbfcbc9c-tr889 namespace=nginx
time="2020-05-12T15:02:54Z" level=info msg="terminating pod" name=kube-flannel-ds-cggkb namespace=kube-system
time="2020-05-12T15:12:54Z" level=info msg="terminating pod" name=chaoskube-844c5874bc-knxzx namespace=chaoskube
nginx-deployment-7bdbfcbc9c-tr889

time="2020-05-12T15:22:54Z" level=info msg="terminating pod" name=kube-proxy-2qjcz namespace=kube-system
time="2020-05-12T15:32:54Z" level=info msg="terminating pod" name=kube-flannel-ds-tcx2d namespace=kube-system
rpc error: code = Unknown desc = Error: No such container: c81122bd1446bdc5ce820aef2786bae37038627fe322e11aeabd380f0d888a2a-bash-4.2$ nginx-deployment-7bdbfcbc9c-tr889
-bash: nginx-deployment-7bdbfcbc9c-tr889: command not found
-bash-4.2$
-bash-4.2$
-bash-4.2$ kubectl get pods --all-namespaces | grep chaos
chaoskube chaoskube-844c5874bc-knxzx 1/1 Running 0 6h47m
litmus chaos-operator-ce-559656f698-v7ppg 1/1 Running 0 6h49m
-bash-4.2$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
chaoskube chaoskube-844c5874bc-knxzx 1/1 Running 0 6h47m
default shuttleproxy-c48598f74-kfm7p 1/1 Running 0 7h24m
kube-system coredns-6d5cc884f4-pdkwl 1/1 Running 0 7h56m
kube-system coredns-6d5cc884f4-ws6zt 1/1 Running 0 7h56m
kube-system etcd-slc17twm 1/1 Running 0 7h56m
kube-system kube-apiserver-slc17twm 1/1 Running 0 7h55m
kube-system kube-controller-manager-slc17twm 1/1 Running 1 7h55m
kube-system kube-flannel-ds-cggkb 1/1 Running 0 7h50m
kube-system kube-flannel-ds-tcx2d 1/1 Running 0 7h56m
kube-system kube-flannel-ds-znwzp 1/1 Running 0 7h51m
kube-system kube-proxy-2qjcz 1/1 Running 0 7h51m
kube-system kube-proxy-c6gct 1/1 Running 0 7h50m
kube-system kube-proxy-fbv4b 1/1 Running 0 7h56m
kube-system kube-scheduler-slc17twm 1/1 Running 1 7h55m
kube-system kubernetes-dashboard-f6b58ff9c-k29hw 1/1 Running 0 7h56m
litmus chaos-operator-ce-559656f698-v7ppg 1/1 Running 0 6h50m
mangle nfs-provisioner-586b498d77-82ntq 0/1 CrashLoopBackOff 34 172m
nginx nginx-deployment-7bdbfcbc9c-cqt4t 1/1 Running 0 6h45m
nginx nginx-deployment-7bdbfcbc9c-tr889 1/1 Running 0 6h45m

I can see the nginx is still running.

Regards,
Ravi

@linki
Copy link
Owner

linki commented May 12, 2020

Because you re-enabled dry-run mode:

time="2020-05-12T14:52:54Z" level=info msg="starting up" dryRun=true interval=10m0s version=v0.14.0

Make sure to use --set dryRun=false like you did before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants