Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong form. #679

Open
mikeatform opened this issue Apr 20, 2021 · 1 comment
Open

wrong form. #679

mikeatform opened this issue Apr 20, 2021 · 1 comment

Comments

@mikeatform
Copy link

mikeatform commented Apr 20, 2021

All the DNS names are in host files. I can ssh between the nodes.
Yesterday, I just reset a the offline nodes and things got back online.
Today, just the same errors.

[2021-04-20 16:03:47.685059] E [name.c:266:af_inet_client_get_remote_sockaddr] 0-glusterfs: DNS resolution failed on host srv-3:srv-4 [2021-04-20 16:03:50.685442] E [name.c:266:af_inet_client_get_remote_sockaddr] 0-glusterfs: DNS resolution failed on host srv-3:srv-4 [2021-04-20 16:03:50.725539] W [fuse-bridge.c:1276:fuse_attr_cbk] 0-glusterfs-fuse: 6885431: LOOKUP() / => -1 (Transport endpoint is not connected) [2021-04-20 16:03:50.753676] I [fuse-bridge.c:6083:fuse_thread_proc] 0-fuse: initiating unmount of /shared The message "E [MSGID: 101075] [common-utils.c:505:gf_resolve_ip6] 0-resolver: getaddrinfo failed (family:2) (Name or service not known)" repeated 17 times between [2021-04-20 16:02:59.633395] and [2021-04-20 16:03:50.685439] [2021-04-20 16:03:50.753827] W [glusterfsd.c:1596:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e65) [0x7f14965bae65] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x5626eff99625] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x5626eff9948b] ) 0-: received signum (15), shutting down [2021-04-20 16:03:50.753846] I [fuse-bridge.c:6871:fini] 0-fuse: Unmounting '/shared'. [2021-04-20 16:03:50.753852] I [fuse-bridge.c:6876:fini] 0-fuse: Closing fuse connection to '/shared'.

`# gluster volume status
Status of volume: hpc-admin
Gluster process TCP Port RDMA Port Online Pid

Brick serv-2:/DATA/hpc-admin/brick1 49152 0 Y 9181
Brick serv-3:/DATA/hpc-admin/brick1 49152 0 Y 10828
Brick serv-4:/DATA/hpc-admin/brick1 49152 0 Y 9264
Self-heal Daemon on localhost N/A N/A Y 15218
Self-heal Daemon on serv-2 N/A N/A Y 18495
Self-heal Daemon on serv-3 N/A N/A Y 48312

Task Status of Volume hpc-admin

There are no active volume tasks

Status of volume: shared
Gluster process TCP Port RDMA Port Online Pid

Brick serv-2:/DATA/shared/brick1 N/A N/A N N/A
Brick serv-3:/DATA/shared/brick1 49153 0 Y 36391
Brick serv-4:/DATA/shared/brick1 N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y 15218
Self-heal Daemon on serv-3 N/A N/A Y 48312
Self-heal Daemon on serv-2 N/A N/A Y 18495

Task Status of Volume shared

There are no active volume tasks
`

@mikeatform
Copy link
Author

Two nodes not playing along have this:
DATA-shared-brick1[16996]: [2021-04-20 17:48:28.186345] C [MSGID: 113081] [posix-common.c:639:posix_init] 0-shared-posix: Extended attribute not supported, exiting.

@mikeatform mikeatform changed the title Working cluster with 2 volumes. spontaneously one volume can't resolve it's peer servers. wrong form. Apr 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant