Skip to content
This repository has been archived by the owner on Jul 7, 2020. It is now read-only.

glusterfs pods will not deploy #341

Closed
Justluckyg opened this issue Sep 4, 2017 · 116 comments
Closed

glusterfs pods will not deploy #341

Justluckyg opened this issue Sep 4, 2017 · 116 comments

Comments

@Justluckyg
Copy link

Justluckyg commented Sep 4, 2017

Im trying to deploy this glusterfs in a kubernetes cluster that was deployed via kubespray. I have 3 vm's (on baremetal running centos 7. I believe I followed all the prerequisites but im getting this once i run the ./gk-deploy-g

./gk-deploy -g
Welcome to the deployment tool for GlusterFS on Kubernetes and OpenShift.

Before getting started, this script has some requirements of the execution
environment and of the container platform that you should verify.

The client machine that will run this script must have:
 * Administrative access to an existing Kubernetes or OpenShift cluster
 * Access to a python interpreter 'python'

Each of the nodes that will host GlusterFS must also have appropriate firewall
rules for the required GlusterFS ports:
 * 2222  - sshd (if running GlusterFS in a pod)
 * 24007 - GlusterFS Management
 * 24008 - GlusterFS RDMA
 * 49152 to 49251 - Each brick for every volume on the host requires its own
   port. For every new brick, one new port will be used starting at 49152. We
   recommend a default range of 49152-49251 on each host, though you can adjust
   this to fit your needs.

The following kernel modules must be loaded:
 * dm_snapshot
 * dm_mirror
 * dm_thin_pool

For systems with SELinux, the following settings need to be considered:
 * virt_sandbox_use_fusefs should be enabled on each node to allow writing to
   remote GlusterFS volumes

In addition, for an OpenShift deployment you must:
 * Have 'cluster_admin' role on the administrative account doing the deployment
 * Add the 'default' and 'router' Service Accounts to the 'privileged' SCC
 * Have a router deployed that is configured to allow apps to access services
   running in the cluster

Do you wish to proceed with deployment?

[Y]es, [N]o? [Default: Y]: y
Using Kubernetes CLI.
2017-09-04 15:33:58.778503 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-04 15:33:58.778568 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-04 15:33:58.778582 I | proto: duplicate proto type registered: google.protobuf.Timestamp
Using namespace "default".
Checking for pre-existing resources...
  GlusterFS pods ... not found.
  deploy-heketi pod ... not found.
  heketi pod ... not found.
Creating initial resources ... 2017-09-04 15:34:07.986783 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-04 15:34:07.986853 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-04 15:34:07.986867 I | proto: duplicate proto type registered: google.protobuf.Timestamp
Error from server (AlreadyExists): error when creating "/root/gluster-kubernetes/deploy/kube-templates/heketi-service-account.yaml": serviceaccounts "heketi-service-account" already exists
2017-09-04 15:34:08.288683 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-04 15:34:08.288765 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-04 15:34:08.288779 I | proto: duplicate proto type registered: google.protobuf.Timestamp
Error from server (AlreadyExists): clusterrolebindings.rbac.authorization.k8s.io "heketi-sa-view" already exists
2017-09-04 15:34:08.479687 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-04 15:34:08.479766 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-04 15:34:08.479780 I | proto: duplicate proto type registered: google.protobuf.Timestamp
clusterrolebinding "heketi-sa-view" not labeled
OK
2017-09-04 15:34:08.751038 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-04 15:34:08.751103 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-04 15:34:08.751116 I | proto: duplicate proto type registered: google.protobuf.Timestamp
error: 'storagenode' already has a value (glusterfs), and --overwrite is false
Failed to label node 'sum-vm-1'
@jarrpa
Copy link
Contributor

jarrpa commented Sep 5, 2017

Looks like you've run this more than once? What happened the first time you ran it? Otherwise you somehow manually added a storagenode=glusterfs label on the node.

BTW this error should be able to be overcome if you have a recent enough version of this repo that has #339 merged.

@Justluckyg
Copy link
Author

@jarrpa i was getting the same the first time i ran it.. do you mind elaborating the last part about overcoming it?

@jarrpa
Copy link
Contributor

jarrpa commented Sep 5, 2017

Given that your log ended with Failed to label node 'sum-vm-1' I thought that's what you were raising the Issue about. The PR I linked gets rid of that message.

If you're raising the issue about the proto: duplicate proto lines, I have never seen that before and have no idea what it means. :( Did the deployment fail somehow?

@Justluckyg
Copy link
Author

@jarrpa i tried it and was gettng this...

 Using namespace "default".
Checking for pre-existing resources...
  GlusterFS pods ... not found.
  deploy-heketi pod ... found.
  heketi pod ... not found.
  gluster-s3 pod ... not found.

@jarrpa
Copy link
Contributor

jarrpa commented Sep 5, 2017

...yes, that's normal output. Did the deployment fail somehow?

@Justluckyg
Copy link
Author

@jarrpa

it failed :(
Error from server (AlreadyExists): error when creating "STDIN": daemonsets.extensions "glusterfs" already exists
Waiting for GlusterFS pods to start ... pods not found.

@jarrpa
Copy link
Contributor

jarrpa commented Sep 5, 2017

...okay, reset your environment:

  • Run gk-deploy -gy --abort
  • Run rm -rf /etc/glusterfs /var/lib/glusterd on every node

Then run gk-deploy -gvy. if it fails, paste the full output here along with the output of kubectl get deploy,ds,po -o wide.

@Justluckyg
Copy link
Author

hi @jarrpa it failed

./gk-deploy -gvy
Using Kubernetes CLI.
2017-09-05 10:58:04.307027 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 10:58:04.307092 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 10:58:04.307108 I | proto: duplicate proto type registered: google.protobuf.Timestamp

Checking status of namespace matching 'default':
default   Active    11d
Using namespace "default".
Checking for pre-existing resources...
  GlusterFS pods ...
Checking status of pods matching '--selector=glusterfs=pod':

Timed out waiting for pods matching '--selector=glusterfs=pod'.
not found.
  deploy-heketi pod ...
Checking status of pods matching '--selector=deploy-heketi=pod':

Timed out waiting for pods matching '--selector=deploy-heketi=pod'.
not found.
  heketi pod ...
Checking status of pods matching '--selector=heketi=pod':

Timed out waiting for pods matching '--selector=heketi=pod'.
not found.
  gluster-s3 pod ...
Checking status of pods matching '--selector=glusterfs=s3-pod':

Timed out waiting for pods matching '--selector=glusterfs=s3-pod'.
not found.
Creating initial resources ... /usr/local/bin/kubectl -n default create -f /root/gluster-kubernetes/deploy/kube-templates/heketi-service-account.yaml 2>&1
2017-09-05 10:58:15.580733 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 10:58:15.580803 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 10:58:15.580816 I | proto: duplicate proto type registered: google.protobuf.Timestamp
serviceaccount "heketi-service-account" created
/usr/local/bin/kubectl -n default create clusterrolebinding heketi-sa-view --clusterrole=edit --serviceaccount=default:heketi-service-account 2>&1
2017-09-05 10:58:15.882600 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 10:58:15.882667 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 10:58:15.882679 I | proto: duplicate proto type registered: google.protobuf.Timestamp
clusterrolebinding "heketi-sa-view" created
/usr/local/bin/kubectl -n default label --overwrite clusterrolebinding heketi-sa-view glusterfs=heketi-sa-view heketi=sa-view
2017-09-05 10:58:16.071163 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 10:58:16.071223 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 10:58:16.071237 I | proto: duplicate proto type registered: google.protobuf.Timestamp
clusterrolebinding "heketi-sa-view" labeled
OK
Marking 'sum-vm-1' as a GlusterFS node.
/usr/local/bin/kubectl -n default label nodes lenddo-vm-1 storagenode=glusterfs --overwrite 2>&1
2017-09-05 10:58:16.277147 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 10:58:16.277213 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 10:58:16.277226 I | proto: duplicate proto type registered: google.protobuf.Timestamp
node "sum-vm-1" labeled
Marking 'sum-vm-2' as a GlusterFS node.
/usr/local/bin/kubectl -n default label nodes lenddo-vm-2 storagenode=glusterfs --overwrite 2>&1
2017-09-05 10:58:16.503202 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 10:58:16.503260 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 10:58:16.503273 I | proto: duplicate proto type registered: google.protobuf.Timestamp
node "sum-vm-2" labeled
Marking 'lenddo-vm-3' as a GlusterFS node.
/usr/local/bin/kubectl -n default label nodes lenddo-vm-3 storagenode=glusterfs --overwrite 2>&1
2017-09-05 10:58:16.715662 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 10:58:16.715720 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 10:58:16.715733 I | proto: duplicate proto type registered: google.protobuf.Timestamp
node "sum-vm-3" labeled
Deploying GlusterFS pods.
sed -e 's/storagenode\: glusterfs/storagenode\: 'glusterfs'/g' /root/gluster-kubernetes/deploy/kube-templates/glusterfs-daemonset.yaml | /usr/local/bin/kubectl -n default create -f - 2>&1
2017-09-05 10:58:16.928411 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 10:58:16.928467 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 10:58:16.928479 I | proto: duplicate proto type registered: google.protobuf.Timestamp
daemonset "glusterfs" created
Waiting for GlusterFS pods to start ...
Checking status of pods matching '--selector=glusterfs=pod':
glusterfs-h996x   0/1       CrashLoopBackOff   5         5m
glusterfs-hmln9   0/1       CrashLoopBackOff   5         5m
Timed out waiting for pods matching '--selector=glusterfs=pod'.
pods not found.
kubectl get deploy,ds,po -o wide
2017-09-05 11:05:25.072539 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 11:05:25.072617 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 11:05:25.072632 I | proto: duplicate proto type registered: google.protobuf.Timestamp
NAME           DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE-SELECTOR           AGE       CONTAINER(S)   IMAGE(S)                        SELECTOR
ds/glusterfs   3         2         0         2            0           storagenode=glusterfs   7m        glusterfs      gluster/gluster-centos:latest   glusterfs=pod,glusterfs-node=pod

NAME                 READY     STATUS             RESTARTS   AGE       IP              NODE
po/glusterfs-h996x   0/1       CrashLoopBackOff   6          7m        192.168.1.240   sum-vm-1
po/glusterfs-hmln9   0/1       CrashLoopBackOff   6          7m        192.168.1.241   sum-vm-2

@jarrpa
Copy link
Contributor

jarrpa commented Sep 5, 2017

...Hm. So two of your GlusterFS pods are failing, and the third one is missing entirely. Is there anything useful if you run "kubectl describe" on the daemonset or pods?

@Justluckyg
Copy link
Author

Justluckyg commented Sep 5, 2017

# kubectl describe pod glusterfs-h996x --namespace=default
2017-09-05 11:20:25.629923 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-05 11:20:25.629983 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-05 11:20:25.629997 I | proto: duplicate proto type registered: google.protobuf.Timestamp
Name:		glusterfs-h996x
Namespace:	default
Node:		sum-vm-1/192.168.1.240
Start Time:	Tue, 05 Sep 2017 10:58:17 +0800
Labels:		controller-revision-hash=1016952396
		glusterfs=pod
		glusterfs-node=pod
		pod-template-generation=1
Annotations:	kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"DaemonSet","namespace":"default","name":"glusterfs","uid":"10ecb06a-91e6-11e7-8884-000c29580815","apiVersi...
Status:		Running
IP:		192.168.1.240
Created By:	DaemonSet/glusterfs
Controlled By:	DaemonSet/glusterfs
Containers:
  glusterfs:
    Container ID:	docker://597ef206a63bbc4a6416163fd2c60d6eecd7b9c260507107f0a5bdfcc38eb75e
    Image:		gluster/gluster-centos:latest
    Image ID:		docker-pullable://gluster/gluster-centos@sha256:e3e2881af497bbd76e4d3de90a4359d8167aa8410db2c66196f0b99df6067cb2
    Port:		<none>
    State:		Waiting
      Reason:		CrashLoopBackOff
    Last State:		Terminated
      Reason:		Error
      Exit Code:	1
      Started:		Tue, 05 Sep 2017 11:19:23 +0800
      Finished:		Tue, 05 Sep 2017 11:19:23 +0800
    Ready:		False
    Restart Count:	9
    Requests:
      cpu:		100m
      memory:		100Mi
    Liveness:		exec [/bin/bash -c systemctl status glusterd.service] delay=40s timeout=3s period=25s #success=1 #failure=15
    Readiness:		exec [/bin/bash -c systemctl status glusterd.service] delay=40s timeout=3s period=25s #success=1 #failure=15
    Environment:	<none>
    Mounts:
      /dev from glusterfs-dev (rw)
      /etc/glusterfs from glusterfs-etc (rw)
      /etc/ssl from glusterfs-ssl (ro)
      /run from glusterfs-run (rw)
      /run/lvm from glusterfs-lvm (rw)
      /sys/fs/cgroup from glusterfs-cgroup (ro)
      /var/lib/glusterd from glusterfs-config (rw)
      /var/lib/heketi from glusterfs-heketi (rw)
      /var/lib/misc/glusterfsd from glusterfs-misc (rw)
      /var/log/glusterfs from glusterfs-logs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-1n7cc (ro)
Conditions:
  Type		Status
  Initialized 	True
  Ready 	False
  PodScheduled 	True
Volumes:
  glusterfs-heketi:
    Type:	HostPath (bare host directory volume)
    Path:	/var/lib/heketi
  glusterfs-run:
    Type:	EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
  glusterfs-lvm:
    Type:	HostPath (bare host directory volume)
    Path:	/run/lvm
  glusterfs-etc:
    Type:	HostPath (bare host directory volume)
    Path:	/etc/glusterfs
  glusterfs-logs:
    Type:	HostPath (bare host directory volume)
    Path:	/var/log/glusterfs
  glusterfs-config:
    Type:	HostPath (bare host directory volume)
    Path:	/var/lib/glusterd
  glusterfs-dev:
    Type:	HostPath (bare host directory volume)
    Path:	/dev
  glusterfs-misc:
    Type:	HostPath (bare host directory volume)
    Path:	/var/lib/misc/glusterfsd
  glusterfs-cgroup:
    Type:	HostPath (bare host directory volume)
    Path:	/sys/fs/cgroup
  glusterfs-ssl:
    Type:	HostPath (bare host directory volume)
    Path:	/etc/ssl
  default-token-1n7cc:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-1n7cc
    Optional:	false
QoS Class:	Burstable
Node-Selectors:	storagenode=glusterfs
Tolerations:	node.alpha.kubernetes.io/notReady:NoExecute
		node.alpha.kubernetes.io/unreachable:NoExecute
Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath			Type		Reason			Message
  ---------	--------	-----	----			-------------			--------	------			-------
  22m		22m		2	kubelet, sum-vm-1					Normal		SuccessfulMountVolume	(combined from similar events): MountVolume.SetUp succeeded for volume "default-token-1n7cc"
  22m		22m		1	kubelet, sum-vm-1					Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "glusterfs-misc"
  22m		22m		1	kubelet, sum-vm-1					Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "glusterfs-dev"
  22m		22m		1	kubelet, sum-vm-1					Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "glusterfs-cgroup"
  22m		22m		1	kubelet, sum-vm-1					Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "glusterfs-ssl"
  22m		22m		1	kubelet, sum-vm-1					Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "glusterfs-heketi"
  22m		22m		1	kubelet, sum-vm-1					Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "glusterfs-lvm"
  22m		22m		1	kubelet, sum-vm-1					Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "glusterfs-etc"
  22m		22m		1	kubelet, sum-vm-1					Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "glusterfs-run"
  22m		22m		1	kubelet, sum-vm-1					Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "glusterfs-config"
  22m		1m		10	kubelet, sum-vm-1	spec.containers{glusterfs}	Normal		Pulled			Container image "gluster/gluster-centos:latest" already present on machine
  22m		1m		10	kubelet, sum-vm-1	spec.containers{glusterfs}	Normal		Created			Created container
  22m		1m		10	kubelet, sum-vm-1	spec.containers{glusterfs}	Normal		Started			Started container
  22m		13s		106	kubelet, sum-vm-1	spec.containers{glusterfs}	Warning		BackOff			Back-off restarting failed container
  22m		13s		106	kubelet, sum-vm-1					Warning		FailedSync		Error syncing pod

@jarrpa heres the output

@erinboyd
Copy link

erinboyd commented Sep 7, 2017

Does the OS you are using support systemd?

@erinboyd
Copy link

erinboyd commented Sep 7, 2017

@Justluckyg can you do a describe on the pod, we are looking for an event that references 'dbus'

@jarrpa
Copy link
Contributor

jarrpa commented Sep 7, 2017

@erinboyd I don't see how the OS supporting systemd matters, systemd is in the container. And he just gave us the describe of the pod.

@Justluckyg sorry for the delay, can you also do a describe of the glusterfs daemonset when it reaches such a state?

@erinboyd
Copy link

erinboyd commented Sep 7, 2017

@jarrpa we ran into a similar issue with the service broker integration on a non-RHEL OS. The error was bubbling up via the container...

@Justluckyg
Copy link
Author

Justluckyg commented Sep 8, 2017

**kubectl describe pod glusterfs-2dccz**
2017-09-08 11:48:24.425706 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-08 11:48:24.425777 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-08 11:48:24.425790 I | proto: duplicate proto type registered: google.protobuf.Timestamp
Name:		glusterfs-2dccz
Namespace:	default
Node:		sum-vm-2/192.168.1.241
Start Time:	Fri, 08 Sep 2017 11:45:58 +0800
Labels:		controller-revision-hash=1016952396
		glusterfs=pod
		glusterfs-node=pod
		pod-template-generation=1
Annotations:	kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"DaemonSet","namespace":"default","name":"glusterfs","uid":"38954545-9448-11e7-8884-000c29580815","apiVersi...
Status:		Running
IP:		192.168.1.241
Created By:	DaemonSet/glusterfs
Controlled By:	DaemonSet/glusterfs
Containers:
  glusterfs:
    Container ID:	docker://46a4ffeef4c1a4682eb0ac780b49851c0384cc6e714fd8731467b052fb393f64
    Image:		gluster/gluster-centos:latest
    Image ID:		docker-pullable://gluster/gluster-centos@sha256:e3e2881af497bbd76e4d3de90a4359d8167aa8410db2c66196f0b99df6067cb2
    Port:		<none>
    State:		Waiting
      Reason:		CrashLoopBackOff
    Last State:		Terminated
      Reason:		Error
      Exit Code:	1
      Started:		Fri, 08 Sep 2017 11:47:18 +0800
      Finished:		Fri, 08 Sep 2017 11:47:18 +0800
    Ready:		False
    Restart Count:	4
    Requests:
      cpu:		100m
      memory:		100Mi
    Liveness:		exec [/bin/bash -c systemctl status glusterd.service] delay=40s timeout=3s period=25s #success=1 #failure=15
    Readiness:		exec [/bin/bash -c systemctl status glusterd.service] delay=40s timeout=3s period=25s #success=1 #failure=15
    Environment:	<none>
    Mounts:
      /dev from glusterfs-dev (rw)
      /etc/glusterfs from glusterfs-etc (rw)
      /etc/ssl from glusterfs-ssl (ro)
      /run from glusterfs-run (rw)
      /run/lvm from glusterfs-lvm (rw)
      /sys/fs/cgroup from glusterfs-cgroup (ro)
      /var/lib/glusterd from glusterfs-config (rw)
      /var/lib/heketi from glusterfs-heketi (rw)
      /var/lib/misc/glusterfsd from glusterfs-misc (rw)
      /var/log/glusterfs from glusterfs-logs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-1n7cc (ro)
Conditions:
  Type		Status
  Initialized 	True
  Ready 	False
  PodScheduled 	True
Volumes:
  glusterfs-heketi:
    Type:	HostPath (bare host directory volume)
    Path:	/var/lib/heketi
  glusterfs-run:
    Type:	EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
  glusterfs-lvm:
    Type:	HostPath (bare host directory volume)
    Path:	/run/lvm
  glusterfs-etc:
    Type:	HostPath (bare host directory volume)
    Path:	/etc/glusterfs
  glusterfs-logs:
    Type:	HostPath (bare host directory volume)
    Path:	/var/log/glusterfs
  glusterfs-config:
    Type:	HostPath (bare host directory volume)
    Path:	/var/lib/glusterd
  glusterfs-dev:
    Type:	HostPath (bare host directory volume)
    Path:	/dev
  glusterfs-misc:
    Type:	HostPath (bare host directory volume)
    Path:	/var/lib/misc/glusterfsd
  glusterfs-cgroup:
    Type:	HostPath (bare host directory volume)
    Path:	/sys/fs/cgroup
  glusterfs-ssl:
    Type:	HostPath (bare host directory volume)
    Path:	/etc/ssl
  default-token-1n7cc:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-1n7cc
    Optional:	false
QoS Class:	Burstable
Node-Selectors:	storagenode=glusterfs
Tolerations:	node.alpha.kubernetes.io/notReady:NoExecute
		node.alpha.kubernetes.io/unreachable:NoExecute
Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath			Type		Reason			Message
  ---------	--------	-----	----			-------------			--------	------			-------
  2m		2m		1	kubelet, sum-vm-2					Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "glusterfs-etc"
  2m		2m		1	kubelet, sum-vm-2					Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "glusterfs-run"
  2m		2m		1	kubelet, sum-vm-2					Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "glusterfs-ssl"
  2m		2m		1	kubelet, sum-vm-2					Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "glusterfs-lvm"
  2m		2m		1	kubelet, sum-vm-2					Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "glusterfs-misc"
  2m		2m		1	kubelet, sum-vm-2					Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "glusterfs-dev"
  2m		2m		1	kubelet, sum-vm-2					Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "glusterfs-logs"
  2m		2m		1	kubelet, sum-vm-2					Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "glusterfs-cgroup"
  2m		2m		1	kubelet, sum-vm-2					Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "glusterfs-heketi"
  2m		2m		2	kubelet, sum-vm-2					Normal		SuccessfulMountVolume	(combined from similar events): MountVolume.SetUp succeeded for volume "default-token-1n7cc"
  2m		1m		5	kubelet, sum-vm-2	spec.containers{glusterfs}	Normal		Pulled			Container image "gluster/gluster-centos:latest" already present on machine
  2m		1m		5	kubelet, sum-vm-2	spec.containers{glusterfs}	Normal		Created			Created container
  2m		1m		5	kubelet, sum-vm-2	spec.containers{glusterfs}	Normal		Started			Started container
  2m		7s		13	kubelet, sum-vm-2	spec.containers{glusterfs}	Warning		BackOff			Back-off restarting failed container
  2m		7s		13	kubelet, sum-vm-2					Warning		FailedSync		Error syncing pod
cat /etc/*-release
CentOS Linux release 7.3.1611 (Core)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

CentOS Linux release 7.3.1611 (Core)
CentOS Linux release 7.3.1611 (Core)

@erinboyd @jarrpa pls see describe pod of glusterfs daemonset, im using centos 7 core.

@erinboyd also to answer your question about systemd, yes.

[[ systemctl =~ -.mount ]] && echo yes || echo no
yes

@jarrpa
Copy link
Contributor

jarrpa commented Sep 8, 2017

@Justluckyg That's not the daemonset, that's the pod. I'd like the output of kubectl describe ds glusterfs.

@Justluckyg
Copy link
Author

@jarrpa ohh i didnt know that command, here you go:

kubectl describe ds glusterfs
2017-09-08 12:16:22.526270 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-08 12:16:22.526534 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-08 12:16:22.526548 I | proto: duplicate proto type registered: google.protobuf.Timestamp
Name:		glusterfs
Selector:	glusterfs=pod,glusterfs-node=pod
Node-Selector:	storagenode=glusterfs
Labels:		glusterfs=daemonset
Annotations:	description=GlusterFS DaemonSet
		tags=glusterfs
Desired Number of Nodes Scheduled: 3
Current Number of Nodes Scheduled: 3
Number of Nodes Scheduled with Up-to-date Pods: 3
Number of Nodes Scheduled with Available Pods: 0
Number of Nodes Misscheduled: 0
Pods Status:	3 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:	glusterfs=pod
		glusterfs-node=pod
  Containers:
   glusterfs:
    Image:	gluster/gluster-centos:latest
    Port:	<none>
    Requests:
      cpu:		100m
      memory:		100Mi
    Liveness:		exec [/bin/bash -c systemctl status glusterd.service] delay=40s timeout=3s period=25s #success=1 #failure=15
    Readiness:		exec [/bin/bash -c systemctl status glusterd.service] delay=40s timeout=3s period=25s #success=1 #failure=15
    Environment:	<none>
    Mounts:
      /dev from glusterfs-dev (rw)
      /etc/glusterfs from glusterfs-etc (rw)
      /etc/ssl from glusterfs-ssl (ro)
      /run from glusterfs-run (rw)
      /run/lvm from glusterfs-lvm (rw)
      /sys/fs/cgroup from glusterfs-cgroup (ro)
      /var/lib/glusterd from glusterfs-config (rw)
      /var/lib/heketi from glusterfs-heketi (rw)
      /var/lib/misc/glusterfsd from glusterfs-misc (rw)
      /var/log/glusterfs from glusterfs-logs (rw)
  Volumes:
   glusterfs-heketi:
    Type:	HostPath (bare host directory volume)
    Path:	/var/lib/heketi
   glusterfs-run:
    Type:	EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
   glusterfs-lvm:
    Type:	HostPath (bare host directory volume)
    Path:	/run/lvm
   glusterfs-etc:
    Type:	HostPath (bare host directory volume)
    Path:	/etc/glusterfs
   glusterfs-logs:
    Type:	HostPath (bare host directory volume)
    Path:	/var/log/glusterfs
   glusterfs-config:
    Type:	HostPath (bare host directory volume)
    Path:	/var/lib/glusterd
   glusterfs-dev:
    Type:	HostPath (bare host directory volume)
    Path:	/dev
   glusterfs-misc:
    Type:	HostPath (bare host directory volume)
    Path:	/var/lib/misc/glusterfsd
   glusterfs-cgroup:
    Type:	HostPath (bare host directory volume)
    Path:	/sys/fs/cgroup
   glusterfs-ssl:
    Type:	HostPath (bare host directory volume)
    Path:	/etc/ssl
Events:
  FirstSeen	LastSeen	Count	From		SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----		-------------	--------	------			-------
  30m		30m		1	daemon-set			Normal		SuccessfulCreate	Created pod: glusterfs-vvr33
  30m		30m		1	daemon-set			Normal		SuccessfulCreate	Created pod: glusterfs-2dccz
  30m		30m		1	daemon-set			Normal		SuccessfulCreate	Created pod: glusterfs-2sgr0

@jarrpa
Copy link
Contributor

jarrpa commented Sep 8, 2017

@Justluckyg The gluster containers write their logs to the /var/log/glusterfs directories of the nodes they're running on. Can you inspect one of the nodes and see if the glusterd.log file shows any useful error messages?

@Justluckyg
Copy link
Author

hi @jarrpa theres nothing in the /var/log/glusterfs, here what i saw in the /var/log/containers

[root@sum-vm-1 containers]# tail glusterfs-2sgr0_default_glusterfs-0cba0fb47aadda22f5f0fe2aca8a260213a5849db32a7f0411e72f0e3dfe5847.log
{"log":"Couldn't find an alternative telinit implementation to spawn.\n","stream":"stderr","time":"2017-09-08T04:28:02.038319988Z"}
[root@sum-vm-1 containers]# cd ..
[root@sum-vm-1 log]# ls -la glusterfs/
total 4
drwxr-xr-x.  2 root root    6 Nov 15  2016 .
drwxr-xr-x. 12 root root 4096 Sep  4 11:28 ..

@jarrpa
Copy link
Contributor

jarrpa commented Sep 8, 2017

What's your version of Kube?

@jarrpa
Copy link
Contributor

jarrpa commented Sep 8, 2017

You might be running into this: #298

@Justluckyg
Copy link
Author

@jarrpa
Kubernetes v1.7.3+coreos.0

Docker version 1.13.1, build 092cba3

I did try downgrading to 1.12 but that version conflicts with the kubespray ansible i ran to deploy the cluster https://github.com/kubernetes-incubator/kubespray

@Justluckyg
Copy link
Author

@jarrpa according to this
if i add this to the container config, it will get the systemd to run

env:
      - name: SYSTEMD_IGNORE_CHROOT
        value: "1"
      command:
      - /usr/lib/systemd/systemd
      - --system

how and where can i change it ?.. thanks!!

@jarrpa
Copy link
Contributor

jarrpa commented Sep 8, 2017

Yes, but you don't want to do that. :) The systemd folks have said that they do not support running under --system configuration when it's not actually PID 1. Unfortunately, right now there is not a way around this. There are only two official workarounds: 1.) downgrade Docker, or 2.) Pass a flag to all kubelets in your cluster #298 (comment)

Your other option is to wait for another release of Kube v1.8, as this PR should also solve the problem: kubernetes/kubernetes#51634

Finally, if you want to get something working now, you can try out these experimental images I've put together more-or-less just for fun: #298 (comment)

@Justluckyg
Copy link
Author

@jarrpa appreciate your response.. when will 1.8 be released?
and as per your custom image, if i try to use that, ill just replace the one i cloned initially, specifically the glusterfs-daemonset.yaml right?

@jarrpa
Copy link
Contributor

jarrpa commented Sep 9, 2017

Looks like no date has been set yet. And yeah, replace the glusterfs-daemonset.yaml file.

@Justluckyg
Copy link
Author

@jarrpa hi again.. i think im making some progress after cloning your yaml file.

I was getting below, how do i change the request of resources to something lower. i testing this in a desktop server only before testing in our production env.

Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----			-------------	--------	------			-------
  6m		6m		1	daemon-set				Normal		SuccessfulCreate	Created pod: glusterfs-wc5hm
  6m		6m		1	daemon-set				Normal		SuccessfulCreate	Created pod: glusterfs-t35z8
  6m		2s		92	daemonset-controller			Warning		FailedPlacement		failed to place pod on "sum-vm-3": Node didn't have enough resource: cpu, requested: 100, used: 895, capacity: 900

@Justluckyg
Copy link
Author

@jarrpa but then the 2 pods that has the sufficient resources are still crashing, logs here below

[glusterd......] [2017-09-11 04:30:00.371392] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.10.5/rpc-transport/rdma.so: cannot open shared object file: No such file or directory
[glusterd......] [2017-09-11 04:30:00.371427] W [rpc-transport.c:287:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine
[glusterd......] [2017-09-11 04:30:00.371435] W [rpcsvc.c:1661:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed
[glusterd......] [2017-09-11 04:30:00.371441] E [MSGID: 106243] [glusterd.c:1720:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport
[glusterd......] [2017-09-11 04:30:02.063040] I [MSGID: 106513] [glusterd-store.c:2197:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 31004
[glusterd......] [2017-09-11 04:30:02.063186] I [MSGID: 106194] [glusterd-store.c:3776:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list.
[glusterd......] Final graph:
[glusterd......] +------------------------------------------------------------------------------+
[glusterd......] 1: volume management
[glusterd......] 2:     type mgmt/glusterd
[glusterd......] 3:     option rpc-auth.auth-glusterfs on
[glusterd......] 4:     option rpc-auth.auth-unix on
[glusterd......] 5:     option rpc-auth.auth-null on
[glusterd......] 6:     option rpc-auth-allow-insecure on
[glusterd......] 7:     option transport.socket.listen-backlog 128
[glusterd......] 8:     option event-threads 1
[glusterd......] 9:     option ping-timeout 0
[glusterd......] 10:     option transport.socket.read-fail-log off
[glusterd......] 11:     option transport.socket.keepalive-interval 2
[glusterd......] 12:     option transport.socket.keepalive-time 10
[glusterd......] 13:     option transport-type rdma
[glusterd......] 14:     option working-directory /var/lib/glusterd
[glusterd......] 15: end-volume
[glusterd......] 16:
[glusterd......] +------------------------------------------------------------------------------+
[glusterd......] [2017-09-11 04:30:02.064674] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[glusterd......] [2017-09-11 04:30:03.135422] I [MSGID: 106488] [glusterd-handler.c:1538:__glusterd_handle_cli_get_volume] 0-management: Received get vol req

@Justluckyg
Copy link
Author

Justluckyg commented Sep 11, 2017

@jarrpa does having a working glusterfs cluster a pre req before i could even run gk-deploy?

and is the instructions below the same ?
https://wiki.centos.org/HowTos/GlusterFSonCentOS

one more rather "dumb" question, accdg to the requirements, it should have a completely empty block device storage. im running this on a test desktop server with a single disk and using vsphere esxi to create vm's and create a cluster for these vm's. i only added a virtual disk to this nodes and declared those in the topology.json.

am i required to use a totally different physical block storage device for this to work?

@SaravanaStorageNetwork
Copy link
Member

@Justluckyg

does having a working glusterfs cluster a pre req before i could even run gk-deploy?

nope.. gk-deploy helps in setting up glusterfs cluster setup.

Refer this link for pre-requisites:
https://github.com/gluster/gluster-kubernetes/blob/master/deploy/gk-deploy#L514

am i required to use a totally different physical block storage device for this to work?

not required. You can mention virtual disk in topology.json

@jarrpa
Copy link
Contributor

jarrpa commented Oct 6, 2017

@Justluckyg As I suspected. Yes, follow the guidance we just gave: run the abort, the check to see if there are any heketi svc or ep resources left. Report if there are any. Run the rm and wipefs -a commands on all storage nodes. Run the deploy again. If you get to this same state, with the copy job timing out, inspect the GlusterFS logs on the node being targeted for the mount. Also verify whether you can manually mount the gluster volume (mount -t glusterfs) from the node running the copy job.

@Justluckyg
Copy link
Author

Justluckyg commented Oct 6, 2017

@jarrpa i just did the above, i was the getting the same. i manually deleted the heketi service that didnt get deleted by the gk-deploy command. but the heketi-storage is still failing

[root@lenddo-vm-1 ~]# kubectl describe pod heketi-storage-copy-job-gkszq

Name:		heketi-storage-copy-job-gkszq
Namespace:	default
Node:		lenddo-vm-2/192.168.1.241
Start Time:	Fri, 06 Oct 2017 12:08:03 +0800
Labels:		controller-uid=f2f74b50-aa4b-11e7-9eb4-000c29580815
		job-name=heketi-storage-copy-job
Annotations:	kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"Job","namespace":"default","name":"heketi-storage-copy-job","uid":"f2f74b50-aa4b-11e7-9eb4-000c29580815","...
Status:		Pending
IP:
Created By:	Job/heketi-storage-copy-job
Controlled By:	Job/heketi-storage-copy-job
Containers:
  heketi:
    Container ID:
    Image:		heketi/heketi:dev
    Image ID:
    Port:		<none>
    Command:
      cp
      /db/heketi.db
      /heketi
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Environment:	<none>
    Mounts:
      /db from heketi-storage-secret (rw)
      /heketi from heketi-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-s24hk (ro)
Conditions:
  Type		Status
  Initialized 	True
  Ready 	False
  PodScheduled 	True
Volumes:
  heketi-storage:
    Type:		Glusterfs (a Glusterfs mount on the host that shares a pod's lifetime)
    EndpointsName:	heketi-storage-endpoints
    Path:		heketidbstorage
    ReadOnly:		false
  heketi-storage-secret:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	heketi-storage-secret
    Optional:	false
  default-token-s24hk:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-s24hk
    Optional:	false
QoS Class:	BestEffort
Node-Selectors:	<none>
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----			-------------	--------	------			-------
  5m		5m		1	default-scheduler			Normal		Scheduled		Successfully assigned heketi-storage-copy-job-gkszq to lenddo-vm-2
  5m		5m		1	kubelet, lenddo-vm-2			Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "default-token-s24hk"
  5m		5m		1	kubelet, lenddo-vm-2			Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "heketi-storage-secret"
  5m		5m		1	kubelet, lenddo-vm-2			Warning		FailedMount		MountVolume.SetUp failed for volume "heketi-storage" : glusterfs: mount failed: mount failed: exit status 1
Mounting command: mount
Mounting arguments: 192.168.1.242:heketidbstorage /var/lib/kubelet/pods/f2fc138d-aa4b-11e7-9eb4-000c29580815/volumes/kubernetes.io~glusterfs/heketi-storage glusterfs [log-level=ERROR log-file=/var/lib/kubelet/plugins/kubernetes.io/glusterfs/heketi-storage/heketi-storage-copy-job-gkszq-glusterfs.log backup-volfile-servers=192.168.1.240:192.168.1.241:192.168.1.242]
Output: ERROR: Mount point does not exist
Please specify a mount point
Usage:
man 8 /sbin/mount.glusterfs

 the following error information was pulled from the glusterfs log to help diagnose this issue:
/usr/sbin/glusterfs(+0x69a0)[0x7f0d59d799a0]
---------

  5m	1m	9	kubelet, lenddo-vm-2		Warning	FailedMount	MountVolume.SetUp failed for volume "heketi-storage" : stat /var/lib/kubelet/pods/f2fc138d-aa4b-11e7-9eb4-000c29580815/volumes/kubernetes.io~glusterfs/heketi-storage: transport endpoint is not connected
  3m	1m	2	kubelet, lenddo-vm-2		Warning	FailedMount	Unable to mount volumes for pod "heketi-storage-copy-job-gkszq_default(f2fc138d-aa4b-11e7-9eb4-000c29580815)": timeout expired waiting for volumes to attach/mount for pod "default"/"heketi-storage-copy-job-gkszq". list of unattached/unmounted volumes=[heketi-storage]
  3m	1m	2	kubelet, lenddo-vm-2		Warning	FailedSync	Error syncing pod

and this is the glusterd logs from that node:

[2017-10-06 04:07:58.903816] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2017-10-06 04:07:58.918264] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600
[2017-10-06 04:07:58.918736] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-nfs: setting frame-timeout to 600
[2017-10-06 04:07:58.918837] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped
[2017-10-06 04:07:58.918874] I [MSGID: 106568] [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: nfs service is stopped
[2017-10-06 04:07:58.919237] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-glustershd: setting frame-timeout to 600
[2017-10-06 04:07:58.919905] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: glustershd already stopped
[2017-10-06 04:07:58.919973] I [MSGID: 106568] [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: glustershd service is stopped
[2017-10-06 04:07:58.920005] I [MSGID: 106567] [glusterd-svc-mgmt.c:196:glusterd_svc_start] 0-management: Starting glustershd service
[2017-10-06 04:07:59.926164] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-quotad: setting frame-timeout to 600
[2017-10-06 04:07:59.926590] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-bitd: setting frame-timeout to 600
[2017-10-06 04:07:59.926736] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
[2017-10-06 04:07:59.926772] I [MSGID: 106568] [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: bitd service is stopped
[2017-10-06 04:07:59.926899] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-scrub: setting frame-timeout to 600
[2017-10-06 04:07:59.927069] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
[2017-10-06 04:07:59.927103] I [MSGID: 106568] [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: scrub service is stopped
[2017-10-06 04:08:02.146806] I [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcf4fa) [0x7f63cb75a4fa] -->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcefac) [0x7f63cb759fac] -->/lib64/libglusterfs.so.0(runner_log+0x105) [0x7f63d0781ee5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh --volname=heketidbstorage --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2017-10-06 04:08:02.156404] E [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcf4fa) [0x7f63cb75a4fa] -->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcef52) [0x7f63cb759f52] -->/lib64/libglusterfs.so.0(runner_log+0x105) [0x7f63d0781ee5] ) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=heketidbstorage --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd

>/lib64/libglusterfs.so.0(runner_log+0x105) [0x7f63d0781ee5] ) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=heketidbstorage --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd

I did the mount -t glusterfs from node2, didnt make any difference...

@jarrpa
Copy link
Contributor

jarrpa commented Oct 6, 2017

@Justluckyg Okay, but you were able to mount, yes? And what was the exact name of the service you had to delete?

@Justluckyg
Copy link
Author

@jarrpa with the mount -t glusterfs? i only did that after trying to run the gk-deploy -gvy script while the heketi storage copy job was containercreating.

the heketi-service was the one i deleted.

@jarrpa
Copy link
Contributor

jarrpa commented Oct 6, 2017

@Justluckyg Right, but it worked? And what was the EXACT name of the service, was it heketi-storage-endpoints?

@Justluckyg
Copy link
Author

@jarrpa when i entered the command mount -t glusterfs it took it, there was no error. and yes for the service. heketi-storage-endpoints

@jarrpa
Copy link
Contributor

jarrpa commented Oct 6, 2017

@Justluckyg Idea: Try mounting the GlusterFS volume again, on a node not running the copy job pod, and see if you can do an ls on the mounted directory.

@Justluckyg
Copy link
Author

@jarrpa should i just do mount -t glusterfs or is there anything after that? because when i do that on node 3 and do df -h, i dont see any newly mounted directory?

@jarrpa
Copy link
Contributor

jarrpa commented Oct 6, 2017

@Justluckyg Yes: Create a new directory somewhere, then run mount -t glusterfs 192.168.1.242:heketidbstorage <SOME_DIR>. This should tell us if the GlusterFS volume can be accessed from the node you're working on. Afterwards, run umount <SOME_DIR> to unmount the volume.

@Justluckyg
Copy link
Author

@jarrpa its unable to mount..
[root@lenddo-vm-3 ~]# mkdir test
[root@lenddo-vm-3 ~]# mount -t glusterfs 192.168.1.242:heketidbstorage test
Mount failed. Please check the log file for more details.

@jarrpa
Copy link
Contributor

jarrpa commented Oct 6, 2017

@Justluckyg Ah-ha! try running the mount command again with -v.

@Justluckyg
Copy link
Author

@jarrpa it didnt take it

root@lenddo-vm-3 ~]# mount -v -t  glusterfs 192.168.1.242:heketidbstorage test
/sbin/mount.glusterfs: illegal option -- v
Usage: /sbin/mount.glusterfs <volumeserver>:<volumeid/volumeport> -o<options> <mountpoint>
Options:
man 8 /sbin/mount.glusterfs
To display the version number of the mount helper: /sbin/mount.glusterfs -V

@jarrpa
Copy link
Contributor

jarrpa commented Oct 6, 2017

@Justluckyg Irritating... see if there are any logs in /var/log that mention heketidbstorage, something like grep -R heketidbstorage /var/log.

@Justluckyg
Copy link
Author

@jarrpa theres a bunch, not sure which one will be helpful:

so i put it here

@jarrpa
Copy link
Contributor

jarrpa commented Oct 6, 2017

@Justluckyg Good start. Look for errors towards the end of /var/log/glusterfs/bricks/var-lib-heketi-mounts-vg_bb208485241a154b8d3070d2da837a53-brick_80e7daa7eb94cae8e4d1c81ccbdad92b-brick.log.

@jarrpa
Copy link
Contributor

jarrpa commented Oct 6, 2017

@Justluckyg Also /var/log/glusterfs/root-test.log.

@Justluckyg
Copy link
Author

@jarrpa no error there. all Informational line. But in glusterfs/glusterd.log, im getting this:
same on the node where the copy-job is trying to create.

[2017-10-06 04:08:01.705565] I [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcf4fa) [0x7fa89dc9b4fa] -->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcefac) [0x7fa89dc9afac] -->/lib64/libglusterfs.so.0(runner_log+0x105) [0x7fa8a2cc2ee5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh --volname=heketidbstorage --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2017-10-06 04:08:01.898044] E [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcf4fa) [0x7fa89dc9b4fa] -->/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so(+0xcef52) [0x7fa89dc9af52] -->/lib64/libglusterfs.so.0(runner_log+0x105) [0x7fa8a2cc2ee5] ) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=heketidbstorage --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[root@lenddo-vm-3 glusterfs]# cat /var/log/glusterfs/root-test.log
[2017-10-06 13:02:02.140216] I [MSGID: 100030] [glusterfsd.c:2338:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20 (args: /usr/sbin/glusterfs --volfile-server=192.168.1.242 --volfile-id=heketidbstorage /root/test)
pending frames:
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2017-10-06 13:02:02
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.20
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f6a92360722]
/lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f6a92385ddd]
/lib64/libc.so.6(+0x35250)[0x7f6a909d4250]
/lib64/libglusterfs.so.0(gf_ports_reserved+0x142)[0x7f6a92386442]
/lib64/libglusterfs.so.0(gf_process_reserved_ports+0x7e)[0x7f6a923866be]
/usr/lib64/glusterfs/3.7.20/rpc-transport/socket.so(+0xc958)[0x7f6a86d50958]
/usr/lib64/glusterfs/3.7.20/rpc-transport/socket.so(client_bind+0x93)[0x7f6a86d50d83]
/usr/lib64/glusterfs/3.7.20/rpc-transport/socket.so(+0xa153)[0x7f6a86d4e153]
/lib64/libgfrpc.so.0(rpc_clnt_reconnect+0xc9)[0x7f6a9212de19]
/lib64/libgfrpc.so.0(rpc_clnt_start+0x39)[0x7f6a9212ded9]
/usr/sbin/glusterfs(glusterfs_mgmt_init+0x24c)[0x7f6a928493ac]
/usr/sbin/glusterfs(glusterfs_volumes_init+0x46)[0x7f6a928442b6]
/usr/sbin/glusterfs(main+0x810)[0x7f6a92840860]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f6a909c0b35]
/usr/sbin/glusterfs(+0x69a0)[0x7f6a928409a0]
---------

no root-test.log in the node where the copy-job is trying to create. lenddo-vm-2

@jarrpa
Copy link
Contributor

jarrpa commented Oct 6, 2017

@Justluckyg You can ignore those errors, they're "normal". :)

The root-test log would only appear on nodes where you've mounted a gluster volume to /root/test.

Do me a favor and check the version of Gluster running in the pods (gluster --version) and the version of the GlusterFS FUSE client on your nodes (mount.glusterfs -V).

@Justluckyg
Copy link
Author

@jarrpa same on all 3

[root@lenddo-vm-3 glusterfs]# mount.glusterfs -V
glusterfs 3.7.20 built on Jan 30 2017 15:30:07
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
[root@lenddo-vm-3 glusterfs]# gluster --version
glusterfs 3.7.20 built on Jan 30 2017 15:30:09
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.

@jarrpa
Copy link
Contributor

jarrpa commented Oct 6, 2017

@Justluckyg Hmm... that might be too old. Do you have any way of updating that?

@Justluckyg
Copy link
Author

@jarrpa i noticed upgrading it is not as easy as just doing a yum upgrade as it throws a lot of dependency errors.

@jarrpa
Copy link
Contributor

jarrpa commented Oct 6, 2017

@Justluckyg Darn. Well, that's my current thinking, unfortunately. Can you try to see if you can resolve the dependency issues?

@Justluckyg
Copy link
Author

@jarrpa sure ill try that, do you have any preferred stable version?

@jarrpa
Copy link
Contributor

jarrpa commented Oct 6, 2017

@Justluckyg Preferably at or newer than the version in the GlusterFS pods, which I think is 3.10.5.

@Justluckyg
Copy link
Author

whew... after a month of testing and your unwavering support @jarrpa @SaravanaStorageNetwork... I finally successfully completed the deployment and deployed my first storage pvc using your sample nginx hello world application!
So your recommendation to upgrade the gluster ultimately resolved the issue where the heketi-storage pod was not creating @jarrpa. im now running 3.8.4 of glusterfs-fuse package.

I cant thank you enough for helping me! Now I am off to testing this further, with that I am glad to close this issue now :)

@jarrpa
Copy link
Contributor

jarrpa commented Oct 9, 2017

YEEEESSS!! HAH! :D Happy to hear that! Always feel free to come back any time you need additional help. Or if you just want to gice us praise, we won't turn that down either. ;)

@verizonold
Copy link

@jarrpa I have the right version of mount.glusterfs, and glusterFS is running on 3 nodes. However, I still see the error: "Waiting for GlusterFS pods to start ... pods not found."

@jarrpa
Copy link
Contributor

jarrpa commented Jun 11, 2018

@verizonold If you are still having trouble please open a new issue and provide any information about your environment, what you've done, as well as the output of kubectl logs <heketi_pod>.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants