Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to mount local volume at boot (Debian bullseye) #4308

Open
figure002 opened this issue Feb 13, 2024 · 3 comments
Open

Unable to mount local volume at boot (Debian bullseye) #4308

figure002 opened this issue Feb 13, 2024 · 3 comments

Comments

@figure002
Copy link

figure002 commented Feb 13, 2024

Description of problem:

I have a GlusterFS cluster with 3 nodes. All nodes serve as both server and client, as the servers are running Docker containers that use the GlusterFS volume.

My /etc/fstab:

/dev/mapper/xenon--vg-gluster /data/glusterfs/vol1/brick1 xfs defaults 0 0
localhost:/vol1 /mnt/gluster glusterfs defaults,_netdev 0 0

But the volume fails to mount at boot. But if I login via SSH and run mount -a, the mount works just fine.

The exact command to reproduce the issue:

ssh xenon
reboot
ssh xenon
df -h  # volume is not mounted
mount -a
df -h  # volume is successfully mounted

Expected results:

The volume should automatically mount at boot.

Mandatory info:

- The output of the gluster volume info command:

Volume Name: vol1
Type: Replicate
Volume ID: 59265c39-58ca-48b6-ad4c-ce3a6cbb4aa3
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: xenon:/data/glusterfs/vol1/brick1/brick
Brick2: neon:/data/glusterfs/vol1/brick1/brick
Brick3: helium:/data/glusterfs/vol1/brick1/brick
Options Reconfigured:
cluster.self-heal-daemon: enable
cluster.granular-entry-heal: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
performance.client-io-threads: off

- The output of the gluster volume status command:

Status of volume: vol1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick xenon:/data/gluster
fs/vol1/brick1/brick                        52406     0          Y       1320 
Brick neon:/data/glusterf
s/vol1/brick1/brick                         52033     0          Y       1023 
Brick helium:/data/gluste
rfs/vol1/brick1/brick                       53215     0          Y       1387 
Self-heal Daemon on localhost               N/A       N/A        Y       1067 
Self-heal Daemon on helium                  N/A       N/A        Y       1428 
Self-heal Daemon on xenon                   N/A       N/A        Y       1362 
 
Task Status of Volume vol1
------------------------------------------------------------------------------
There are no active volume tasks

- The output of the gluster volume heal command:

$ sudo gluster volume heal vol1 statistics heal-count
Gathering count of entries to be healed on volume vol1 has been successful 

Brick xenon:/data/glusterfs/vol1/brick1/brick
Number of entries: 1

Brick neon:/data/glusterfs/vol1/brick1/brick
Number of entries: 1

Brick helium:/data/glusterfs/vol1/brick1/brick
Number of entries: 0

- Provide logs present on following locations of client and server nodes -

/var/log/syslog

Feb 13 17:12:31 neon systemd[1]: Started GlusterFS, a clustered file-system server.
Feb 13 17:12:31 neon systemd[1]: Mounting /mnt/gluster...
Feb 13 17:12:31 neon systemd[1]: Starting Docker Application Container Engine...
Feb 13 17:12:31 neon mount[927]: Mount failed. Check the log file  for more details.
Feb 13 17:12:31 neon systemd[1]: mnt-gluster.mount: Mount process exited, code=exited, status=1/FAILURE
Feb 13 17:12:31 neon systemd[1]: mnt-gluster.mount: Failed with result 'exit-code'.
Feb 13 17:12:31 neon systemd[1]: Failed to mount /mnt/gluster.
Feb 13 17:12:31 neon systemd[1]: Dependency failed for Remote File Systems.
Feb 13 17:12:31 neon systemd[1]: remote-fs.target: Job remote-fs.target/start failed with result 'dependency'.

/var/log/glusterfs/glusterd.log

2024-02-12 23:04:50.261009 +0000] I [MSGID: 106496] [glusterd-handshake.c:922:__server_getspec] 0-management: Received mount request for volume /vol1 
[2024-02-13 16:06:51.057196 +0000] I [MSGID: 106061] [glusterd-utils.c:9576:glusterd_volume_status_copy_to_op_ctx_dict] 0-management: Dict get failed [{Key=count}] 
[2024-02-13 16:06:51.057795 +0000] I [MSGID: 106499] [glusterd-handler.c:4535:__glusterd_handle_status_volume] 0-management: Received status volume req for volume vol1 
[2024-02-13 16:08:02.181366 +0000] I [MSGID: 106533] [glusterd-volume-ops.c:724:__glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume vol1 
[2024-02-13 16:08:39.034280 +0000] I [MSGID: 106496] [glusterd-handshake.c:922:__server_getspec] 0-management: Received mount request for volume vol1 
[2024-02-13 16:09:16.910247 +0000] I [MSGID: 106533] [glusterd-volume-ops.c:724:__glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume vol1 
[2024-02-13 16:10:07.142481 +0000] I [MSGID: 106496] [glusterd-handshake.c:922:__server_getspec] 0-management: Received mount request for volume vol1 
[2024-02-13 16:09:39.194735 +0000] I [MSGID: 106533] [glusterd-volume-ops.c:724:__glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume vol1 
[2024-02-13 16:10:40.504086 +0000] I [MSGID: 106496] [glusterd-handshake.c:922:__server_getspec] 0-management: Received mount request for volume vol1 
[2024-02-13 16:11:58.926563 +0000] W [glusterfsd.c:1427:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7f6e73b9cea7] -->/usr/sbin/glusterd(+0x12725) [0x5593754f3725] -->/usr/sbin/glusterd>
[2024-02-13 16:12:28.746919 +0000] I [MSGID: 100030] [glusterfsd.c:2872:main] 0-/usr/sbin/glusterd: Started running version [{arg=/usr/sbin/glusterd}, {version=11.1}, {cmdlinestr=/usr/sbin/glusterd -p /var/run/gl>
[2024-02-13 16:12:28.748213 +0000] I [glusterfsd.c:2562:daemonize] 0-glusterfs: Pid of current running process is 583
[2024-02-13 16:12:28.768104 +0000] I [MSGID: 0] [glusterfsd.c:1597:volfile_init] 0-glusterfsd-mgmt: volume not found, continuing with init 
[2024-02-13 16:12:28.839481 +0000] I [MSGID: 106479] [glusterd.c:1660:init] 0-management: Using /var/lib/glusterd as working directory 
[2024-02-13 16:12:28.839518 +0000] I [MSGID: 106479] [glusterd.c:1664:init] 0-management: Using /var/run/gluster as pid file working directory 
[2024-02-13 16:12:28.847678 +0000] I [socket.c:973:__socket_server_bind] 0-socket.management: process started listening on port (24007)
[2024-02-13 16:12:28.850741 +0000] I [socket.c:916:__socket_server_bind] 0-socket.management: closing (AF_UNIX) reuse check socket 13
[2024-02-13 16:12:28.851997 +0000] I [MSGID: 106059] [glusterd.c:1923:init] 0-management: max-port override: 60999 
[2024-02-13 16:12:29.129934 +0000] E [MSGID: 106061] [glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed [{Key=log-group}, {errno=2}, {error=No such file or directory}] 
[2024-02-13 16:12:31.514666 +0000] I [MSGID: 106513] [glusterd-store.c:2198:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 100000 
[2024-02-13 16:12:31.528340 +0000] W [MSGID: 106204] [glusterd-store.c:3273:glusterd_store_update_volinfo] 0-management: Unknown key: tier-enabled 
[2024-02-13 16:12:31.528503 +0000] W [MSGID: 106204] [glusterd-store.c:3273:glusterd_store_update_volinfo] 0-management: Unknown key: brick-0 
[2024-02-13 16:12:31.528531 +0000] W [MSGID: 106204] [glusterd-store.c:3273:glusterd_store_update_volinfo] 0-management: Unknown key: brick-1 
[2024-02-13 16:12:31.528565 +0000] W [MSGID: 106204] [glusterd-store.c:3273:glusterd_store_update_volinfo] 0-management: Unknown key: brick-2 
[2024-02-13 16:12:31.532496 +0000] I [MSGID: 106544] [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: 4bc4e6c4-5b8a-4445-9e2a-17582cbb4882 
[2024-02-13 16:12:31.560340 +0000] I [MSGID: 106498] [glusterd-handler.c:3794:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 
[2024-02-13 16:12:31.563230 +0000] I [MSGID: 106498] [glusterd-handler.c:3794:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 
[2024-02-13 16:12:31.563286 +0000] W [MSGID: 106061] [glusterd-handler.c:3589:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout 
[2024-02-13 16:12:31.563326 +0000] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2024-02-13 16:12:31.564245 +0000] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
Final graph:
+------------------------------------------------------------------------------+
  1: volume management
  2:     type mgmt/glusterd
  3:     option rpc-auth.auth-glusterfs on
  4:     option rpc-auth.auth-unix on
  5:     option rpc-auth.auth-null on
  6:     option rpc-auth-allow-insecure on
  7:     option transport.listen-backlog 1024
  8:     option max-port 60999
  9:     option event-threads 1
 10:     option ping-timeout 0
 11:     option transport.socket.listen-port 24007
 12:     option transport.socket.read-fail-log off
 13:     option transport.socket.keepalive-interval 2
 14:     option transport.socket.keepalive-time 10
 15:     option transport-type socket
 16:     option working-directory /var/lib/glusterd
 17: end-volume
 18:  
+------------------------------------------------------------------------------+
[2024-02-13 16:12:31.564241 +0000] W [MSGID: 106061] [glusterd-handler.c:3589:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout 
[2024-02-13 16:12:31.569188 +0000] I [MSGID: 101188] [event-epoll.c:643:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=0}] 
[2024-02-13 16:12:31.654880 +0000] I [MSGID: 106496] [glusterd-handshake.c:922:__server_getspec] 0-management: Received mount request for volume /vol1 
[2024-02-13 16:12:31.661620 +0000] I [MSGID: 106493] [glusterd-rpc-ops.c:461:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: bd6e8bc3-bcf8-46c0-8c92-89e7e646e766, host: xenon, por>
[2024-02-13 16:12:31.671605 +0000] I [MSGID: 106163] [glusterd-handshake.c:1493:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 100000 
[2024-02-13 16:12:31.683010 +0000] I [MSGID: 106493] [glusterd-rpc-ops.c:461:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: 05219b5b-5f44-49f9-a594-2ecab369f9cf, host: helium, port: 0 
[2024-02-13 16:12:31.692899 +0000] I [glusterd-utils.c:6446:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/glusterfs/vol1/brick1/brick
[2024-02-13 16:12:31.693080 +0000] I [MSGID: 106492] [glusterd-handler.c:2896:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 05219b5b-5f44-49f9-a594-2ecab369f9cf 
[2024-02-13 16:12:31.693115 +0000] I [MSGID: 106502] [glusterd-handler.c:2943:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend 
[2024-02-13 16:12:31.693375 +0000] I [MSGID: 106490] [glusterd-handler.c:2691:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: bd6e8bc3-bcf8-46c0-8c92-89e7e646e766 
[2024-02-13 16:12:31.693485 +0000] E [MSGID: 106010] [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: Version of Cksums vol1 differ. local cksum = 1538809992, remote cksum = 2332600560 on peer>
[2024-02-13 16:12:31.693648 +0000] I [MSGID: 106493] [glusterd-handler.c:3982:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to xenon (0), ret: 0, op_ret: -1 
[2024-02-13 16:12:31.709791 +0000] I [MSGID: 106493] [glusterd-rpc-ops.c:675:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 05219b5b-5f44-49f9-a594-2ecab369f9cf 
[2024-02-13 16:12:31.714733 +0000] I [MSGID: 106163] [glusterd-handshake.c:1493:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 100000 
[2024-02-13 16:12:31.723285 +0000] I [MSGID: 106496] [glusterd-handshake.c:922:__server_getspec] 0-management: Received mount request for volume vol1.neon.data-glusterfs-vol1-brick1-brick 
[2024-02-13 16:12:31.724044 +0000] I [MSGID: 106490] [glusterd-handler.c:2691:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 05219b5b-5f44-49f9-a594-2ecab369f9cf 
[2024-02-13 16:12:31.737725 +0000] I [MSGID: 106493] [glusterd-handler.c:3982:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to helium (0), ret: 0, op_ret: 0 
[2024-02-13 16:12:31.766989 +0000] I [MSGID: 106492] [glusterd-handler.c:2896:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 05219b5b-5f44-49f9-a594-2ecab369f9cf 
[2024-02-13 16:12:31.767029 +0000] I [MSGID: 106502] [glusterd-handler.c:2943:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend 
[2024-02-13 16:12:31.767108 +0000] I [MSGID: 106493] [glusterd-rpc-ops.c:675:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 05219b5b-5f44-49f9-a594-2ecab369f9cf 
[2024-02-13 16:12:31.863685 +0000] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2024-02-13 16:12:31.915261 +0000] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-quotad: setting frame-timeout to 600
[2024-02-13 16:12:31.915448 +0000] I [MSGID: 106131] [glusterd-proc-mgmt.c:81:glusterd_proc_stop] 0-management: quotad already stopped 
[2024-02-13 16:12:31.915471 +0000] I [MSGID: 106568] [glusterd-svc-mgmt.c:262:glusterd_svc_stop] 0-management: quotad service is stopped 
[2024-02-13 16:12:31.915513 +0000] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-bitd: setting frame-timeout to 600
[2024-02-13 16:12:31.916113 +0000] I [MSGID: 106131] [glusterd-proc-mgmt.c:81:glusterd_proc_stop] 0-management: bitd already stopped 
[2024-02-13 16:12:31.916136 +0000] I [MSGID: 106568] [glusterd-svc-mgmt.c:262:glusterd_svc_stop] 0-management: bitd service is stopped 
[2024-02-13 16:12:31.916188 +0000] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-scrub: setting frame-timeout to 600
[2024-02-13 16:12:31.916349 +0000] I [MSGID: 106131] [glusterd-proc-mgmt.c:81:glusterd_proc_stop] 0-management: scrub already stopped 
[2024-02-13 16:12:31.916371 +0000] I [MSGID: 106568] [glusterd-svc-mgmt.c:262:glusterd_svc_stop] 0-management: scrub service is stopped 
[2024-02-13 16:12:31.916984 +0000] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600
[2024-02-13 16:12:31.917674 +0000] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-gfproxyd: setting frame-timeout to 600
[2024-02-13 16:12:31.928704 +0000] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-glustershd: setting frame-timeout to 600
[2024-02-13 16:12:32.933400 +0000] I [MSGID: 106496] [glusterd-handshake.c:922:__server_getspec] 0-management: Received mount request for volume shd/vol1 

/var/log/glusterfs/mnt-gluster.log

[2024-02-13 16:00:04.798773 +0000] I [socket.c:835:__socket_shutdown] 0-vol1-client-0: intentional socket shutdown(6)
[2024-02-13 16:02:10.976749 +0000] I [socket.c:835:__socket_shutdown] 0-vol1-client-0: intentional socket shutdown(6)
[2024-02-13 16:04:17.102808 +0000] I [socket.c:835:__socket_shutdown] 0-vol1-client-0: intentional socket shutdown(6)
[2024-02-13 16:06:23.229738 +0000] I [socket.c:835:__socket_shutdown] 0-vol1-client-0: intentional socket shutdown(6)
[2024-02-13 16:08:29.349078 +0000] I [socket.c:835:__socket_shutdown] 0-vol1-client-0: intentional socket shutdown(6)
[2024-02-13 16:10:35.464729 +0000] I [socket.c:835:__socket_shutdown] 0-vol1-client-0: intentional socket shutdown(6)
[2024-02-13 16:11:50.073610 +0000] W [glusterfsd.c:1427:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7fe871728ea7] -->/usr/sbin/glusterfs(+0x12725) [0x564a029dc725] -->/usr/sbin/glusterf
s(cleanup_and_exit+0x57) [0x564a029d4d77] ) 0-: received signum (15), shutting down 
[2024-02-13 16:11:50.073672 +0000] I [fuse-bridge.c:7051:fini] 0-fuse: Unmounting '/mnt/gluster'.
[2024-02-13 16:11:50.074065 +0000] I [fuse-bridge.c:6308:fuse_thread_proc] 0-fuse: initiating unmount of /mnt/gluster
[2024-02-13 16:11:50.074385 +0000] I [fuse-bridge.c:7055:fini] 0-fuse: Closing fuse connection to '/mnt/gluster'.
[2024-02-13 16:12:31.640599 +0000] I [MSGID: 100030] [glusterfsd.c:2872:main] 0-/usr/sbin/glusterfs: Started running version [{arg=/usr/sbin/glusterfs}, {version=11.1}, {cmdlinestr=/usr/sbin/glusterfs --process-na
me fuse --volfile-server=localhost --volfile-id=/vol1 /mnt/gluster}] 
[2024-02-13 16:12:31.646405 +0000] I [glusterfsd.c:2562:daemonize] 0-glusterfs: Pid of current running process is 997
[2024-02-13 16:12:31.654643 +0000] I [MSGID: 101188] [event-epoll.c:643:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=0}] 
[2024-02-13 16:12:31.654708 +0000] I [MSGID: 101188] [event-epoll.c:643:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=1}] 
[2024-02-13 16:12:31.655640 +0000] I [glusterfsd-mgmt.c:2346:mgmt_getspec_cbk] 0-glusterfs: Received list of available volfile servers: xenon:24007 helium:24007 
[2024-02-13 16:12:31.685731 +0000] I [io-stats.c:3784:ios_sample_buf_size_configure] 0-vol1: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
[2024-02-13 16:12:31.690764 +0000] I [MSGID: 114020] [client.c:2344:notify] 0-vol1-client-0: parent translators are ready, attempting connect on transport [] 
[2024-02-13 16:12:31.691551 +0000] I [MSGID: 114020] [client.c:2344:notify] 0-vol1-client-1: parent translators are ready, attempting connect on transport [] 
[2024-02-13 16:12:31.691953 +0000] I [MSGID: 114020] [client.c:2344:notify] 0-vol1-client-2: parent translators are ready, attempting connect on transport [] 
Final graph:
+------------------------------------------------------------------------------+
  1: volume vol1-client-0
  2:     type protocol/client
  3:     option ping-timeout 42
  4:     option remote-host xenon
  5:     option remote-subvolume /data/glusterfs/vol1/brick1/brick
  6:     option transport-type socket
  7:     option transport.address-family inet
  8:     option username f1e91cf1-2ed6-49b1-8acd-219ee712957e
  9:     option password 5fc710ba-079a-487b-b63a-aed73bede85a
 10:     option transport.socket.ssl-enabled off
 11:     option transport.tcp-user-timeout 0
 12:     option transport.socket.keepalive-time 20
 13:     option transport.socket.keepalive-interval 2
 14:     option transport.socket.keepalive-count 9
 15:     option strict-locks off
 16: end-volume
 17:  
 18: volume vol1-client-1
 19:     type protocol/client
 20:     option ping-timeout 42
 21:     option remote-host neon
 22:     option remote-subvolume /data/glusterfs/vol1/brick1/brick
 23:     option transport-type socket
 24:     option transport.address-family inet
 25:     option username f1e91cf1-2ed6-49b1-8acd-219ee712957e
 26:     option password 5fc710ba-079a-487b-b63a-aed73bede85a
 27:     option transport.socket.ssl-enabled off
 28:     option transport.tcp-user-timeout 0
 29:     option transport.socket.keepalive-time 20
 30:     option transport.socket.keepalive-interval 2
 31:     option transport.socket.keepalive-count 9
 32:     option strict-locks off
 33: end-volume
 34:  
 35: volume vol1-client-2
 36:     type protocol/client
 37:     option ping-timeout 42
 38:     option remote-host helium
 39:     option remote-subvolume /data/glusterfs/vol1/brick1/brick
 40:     option transport-type socket
 41:     option transport.address-family inet
 42:     option username f1e91cf1-2ed6-49b1-8acd-219ee712957e
 43:     option password 5fc710ba-079a-487b-b63a-aed73bede85a
 44:     option transport.socket.ssl-enabled off
 45:     option transport.tcp-user-timeout 0
 46:     option transport.socket.keepalive-time 20
 47:     option transport.socket.keepalive-interval 2
 48:     option transport.socket.keepalive-count 9
 49:     option strict-locks off
 50: end-volume
 51:  
 52: volume vol1-replicate-0
 53:     type cluster/replicate
 54:     option afr-pending-xattr vol1-client-0,vol1-client-1,vol1-client-2
 55:     option volume-id 59265c39-58ca-48b6-ad4c-ce3a6cbb4aa3
 56:     option granular-entry-heal on
 57:     option use-compound-fops off
 58:     option use-anonymous-inode yes
 59:     subvolumes vol1-client-0 vol1-client-1 vol1-client-2
 60: end-volume
 61:  
 62: volume vol1-dht
 63:     type cluster/distribute
 64:     option lock-migration off
 65:     option force-migration off
 66:     subvolumes vol1-replicate-0
 67: end-volume
 68:  
 69: volume vol1-utime
 70:     type features/utime
 71:     option noatime on
 72:     subvolumes vol1-dht
 73: end-volume
 74:  
 75: volume vol1-write-behind
 76:     type performance/write-behind
 77:     subvolumes vol1-utime
 78: end-volume
 79:  
 80: volume vol1-open-behind
 81:     type performance/open-behind
 82:     subvolumes vol1-write-behind
 83: end-volume
[2024-02-13 16:12:31.694026 +0000] E [MSGID: 114058] [client-handshake.c:946:client_query_portmap_cbk] 0-vol1-client-0: failed to get the port number for remote subvolume. Please run gluster volume status on server to see if brick process is running [] 
[2024-02-13 16:12:31.694065 +0000] I [socket.c:835:__socket_shutdown] 0-vol1-client-0: intentional socket shutdown(13)
[2024-02-13 16:12:31.694130 +0000] I [MSGID: 114018] [client.c:2242:client_rpc_notify] 0-vol1-client-0: disconnected from client, process will keep trying to connect glusterd until brick's port is available [{conn-name=vol1-client-0}] 
[2024-02-13 16:12:31.694149 +0000] E [MSGID: 108006] [afr-common.c:6103:__afr_handle_child_down_event] 0-vol1-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up. 
 84:  
 85: volume vol1-quick-read
 86:     type performance/quick-read
 87:     subvolumes vol1-open-behind
 88: end-volume
 89:  
 90: volume vol1-md-cache
 91:     type performance/md-cache
 92:     subvolumes vol1-quick-read
 93: end-volume
 94:  
 95: volume vol1
 96:     type debug/io-stats
 97:     option log-level INFO
 98:     option threads 16
[2024-02-13 16:12:31.694443 +0000] I [rpc-clnt.c:1899:rpc_clnt_reconfig] 0-vol1-client-2: changing port to 53215 (from 0)
[2024-02-13 16:12:31.694486 +0000] I [socket.c:835:__socket_shutdown] 0-vol1-client-2: intentional socket shutdown(15)
 99:     option latency-measurement off
100:     option count-fop-hits off
101:     option global-threading off
102:     subvolumes vol1-md-cache
103: end-volume
104:  
105: volume meta-autoload
106:     type meta
107:     subvolumes vol1
108: end-volume
109:  
+------------------------------------------------------------------------------+
[2024-02-13 16:12:31.695808 +0000] I [MSGID: 114057] [client-handshake.c:871:select_server_supported_programs] 0-vol1-client-2: Using Program [{Program-name=GlusterFS 4.x v1}, {Num=1298437}, {Version=400}] 
[2024-02-13 16:12:31.698027 +0000] I [MSGID: 114046] [client-handshake.c:621:client_setvolume_cbk] 0-vol1-client-2: Connected, attached to remote volume [{conn-name=vol1-client-2}, {remote_subvol=/data/glusterfs/vol1/brick1/brick}] 
[2024-02-13 16:12:31.698057 +0000] I [MSGID: 108005] [afr-common.c:6025:__afr_handle_child_up_event] 0-vol1-replicate-0: Subvolume 'vol1-client-2' came back up; going online. 
[2024-02-13 16:12:31.710676 +0000] E [MSGID: 114058] [client-handshake.c:946:client_query_portmap_cbk] 0-vol1-client-1: failed to get the port number for remote subvolume. Please run gluster volume status on server to see if brick process is running [] 
[2024-02-13 16:12:31.710802 +0000] I [socket.c:835:__socket_shutdown] 0-vol1-client-1: intentional socket shutdown(14)
[2024-02-13 16:12:31.710867 +0000] I [MSGID: 114018] [client.c:2242:client_rpc_notify] 0-vol1-client-1: disconnected from client, process will keep trying to connect glusterd until brick's port is available [{conn-name=vol1-client-1}] 
[2024-02-13 16:12:31.713338 +0000] I [fuse-bridge.c:5298:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.32
[2024-02-13 16:12:31.713359 +0000] I [fuse-bridge.c:5927:fuse_graph_sync] 0-fuse: switched to graph 0
[2024-02-13 16:12:31.715049 +0000] W [MSGID: 108027] [afr-common.c:2889:afr_attempt_readsubvol_set] 0-vol1-replicate-0: no read subvols for / 
[2024-02-13 16:12:31.715113 +0000] E [fuse-bridge.c:5365:fuse_first_lookup] 0-fuse: first lookup on root failed (Transport endpoint is not connected)
[2024-02-13 16:12:31.715982 +0000] W [fuse-bridge.c:1403:fuse_attr_cbk] 0-glusterfs-fuse: 17: LOOKUP() / => -1 (Transport endpoint is not connected)
[2024-02-13 16:12:31.720745 +0000] W [fuse-bridge.c:1403:fuse_attr_cbk] 0-glusterfs-fuse: 18: LOOKUP() / => -1 (Transport endpoint is not connected)
[2024-02-13 16:12:31.721302 +0000] I [fuse-bridge.c:6308:fuse_thread_proc] 0-fuse: initiating unmount of /mnt/gluster
The message "W [MSGID: 108027] [afr-common.c:2889:afr_attempt_readsubvol_set] 0-vol1-replicate-0: no read subvols for /" repeated 2 times between [2024-02-13 16:12:31.715049 +0000] and [2024-02-13 16:12:31.720735 +0000]
[2024-02-13 16:12:31.722241 +0000] W [glusterfsd.c:1427:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7fe6c2524ea7] -->/usr/sbin/glusterfs(+0x12725) [0x56075d8d3725] -->/usr/sbin/glusterfs(cleanup_and_exit+0x57) [0x56075d8cbd77] ) 0-: received signum (15), shutting down 
[2024-02-13 16:12:31.722263 +0000] I [fuse-bridge.c:7051:fini] 0-fuse: Unmounting '/mnt/gluster'.
[2024-02-13 16:12:31.722273 +0000] I [fuse-bridge.c:7055:fini] 0-fuse: Closing fuse connection to '/mnt/gluster'.

/var/log/glusterfs/glustershd.log

[2024-02-13 16:12:31.940979 +0000] I [glusterfsd.c:2562:daemonize] 0-glusterfs: Pid of current running process is 1067
[2024-02-13 16:12:31.948767 +0000] I [socket.c:916:__socket_server_bind] 0-socket.glusterfsd: closing (AF_UNIX) reuse check socket 10
[2024-02-13 16:12:31.950140 +0000] I [MSGID: 101188] [event-epoll.c:643:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=0}] 
[2024-02-13 16:12:31.950202 +0000] I [MSGID: 101188] [event-epoll.c:643:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=1}] 
[2024-02-13 16:12:32.933733 +0000] I [glusterfsd-mgmt.c:2346:mgmt_getspec_cbk] 0-glusterfs: Received list of available volfile servers: xenon:24007 helium:24007 
[2024-02-13 16:12:32.935792 +0000] I [io-stats.c:3784:ios_sample_buf_size_configure] 0-vol1: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
[2024-02-13 16:12:32.936230 +0000] W [inode.c:1701:inode_table_with_invalidator] 0-vol1-replicate-0: Set inode table size to minimum value of 65536 rather than the configured 128
[2024-02-13 16:12:32.938827 +0000] I [MSGID: 114020] [client.c:2344:notify] 0-vol1-client-0: parent translators are ready, attempting connect on transport [] 
[2024-02-13 16:12:32.939299 +0000] I [MSGID: 114020] [client.c:2344:notify] 0-vol1-client-1: parent translators are ready, attempting connect on transport [] 
[2024-02-13 16:12:32.940435 +0000] I [MSGID: 114020] [client.c:2344:notify] 0-vol1-client-2: parent translators are ready, attempting connect on transport [] 
Final graph:
+------------------------------------------------------------------------------+
  1: volume vol1-client-0
  2:     type protocol/client
  3:     option ping-timeout 42
  4:     option remote-host xenon
  5:     option remote-subvolume /data/glusterfs/vol1/brick1/brick
  6:     option transport-type socket
  7:     option transport.address-family inet
  8:     option username f1e91cf1-2ed6-49b1-8acd-219ee712957e
  9:     option password 5fc710ba-079a-487b-b63a-aed73bede85a
 10:     option transport.socket.ssl-enabled off
 11:     option transport.tcp-user-timeout 0
 12:     option transport.socket.keepalive-time 20
 13:     option transport.socket.keepalive-interval 2
 14:     option transport.socket.keepalive-count 9
 15:     option strict-locks off
 16: end-volume
 17:  
 18: volume vol1-client-1
 19:     type protocol/client
 20:     option ping-timeout 42
 21:     option remote-host neon
 22:     option remote-subvolume /data/glusterfs/vol1/brick1/brick
 23:     option transport-type socket
 24:     option transport.address-family inet
 25:     option username f1e91cf1-2ed6-49b1-8acd-219ee712957e
 26:     option password 5fc710ba-079a-487b-b63a-aed73bede85a
 27:     option transport.socket.ssl-enabled off
 28:     option transport.tcp-user-timeout 0
 29:     option transport.socket.keepalive-time 20
 30:     option transport.socket.keepalive-interval 2
 31:     option transport.socket.keepalive-count 9
 32:     option strict-locks off
 33: end-volume
 34:  
 35: volume vol1-client-2
 36:     type protocol/client
 37:     option ping-timeout 42
 38:     option remote-host helium
 39:     option remote-subvolume /data/glusterfs/vol1/brick1/brick
 40:     option transport-type socket
 41:     option transport.address-family inet
 42:     option username f1e91cf1-2ed6-49b1-8acd-219ee712957e
 43:     option password 5fc710ba-079a-487b-b63a-aed73bede85a
 44:     option transport.socket.ssl-enabled off
 45:     option transport.tcp-user-timeout 0
 46:     option transport.socket.keepalive-time 20
 47:     option transport.socket.keepalive-interval 2
 48:     option transport.socket.keepalive-count 9
 49:     option strict-locks off
 50: end-volume
 51:  
 52: volume vol1-replicate-0
 53:     type cluster/replicate
 54:     option node-uuid 4bc4e6c4-5b8a-4445-9e2a-17582cbb4882
 55:     option afr-pending-xattr vol1-client-0,vol1-client-1,vol1-client-2
 56:     option volume-id 59265c39-58ca-48b6-ad4c-ce3a6cbb4aa3
 57:     option background-self-heal-count 0
 58:     option metadata-self-heal on
 59:     option data-self-heal on
 60:     option entry-self-heal on
 61:     option self-heal-daemon enable
 62:     option granular-entry-heal on
 63:     option use-compound-fops off
 64:     option use-anonymous-inode yes
 65:     option iam-self-heal-daemon yes
 66:     subvolumes vol1-client-0 vol1-client-1 vol1-client-2
 67: end-volume
 68:  
 69: volume vol1
 70:     type debug/io-stats
 71:     option log-level INFO
 72:     option threads 16
 73:     subvolumes vol1-replicate-0
 74: end-volume
 75:  
+------------------------------------------------------------------------------+
[2024-02-13 16:12:32.942576 +0000] E [MSGID: 114058] [client-handshake.c:946:client_query_portmap_cbk] 0-vol1-client-0: failed to get the port number for remote subvolume. Please run gluster volume status on server to see if brick process is running [] 
[2024-02-13 16:12:32.942640 +0000] I [socket.c:835:__socket_shutdown] 0-vol1-client-0: intentional socket shutdown(13)
[2024-02-13 16:12:32.942774 +0000] I [MSGID: 114018] [client.c:2242:client_rpc_notify] 0-vol1-client-0: disconnected from client, process will keep trying to connect glusterd until brick's port is available [{conn-name=vol1-client-0}] 
[2024-02-13 16:12:32.943053 +0000] E [MSGID: 108006] [afr-common.c:6103:__afr_handle_child_down_event] 0-vol1-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up. 
[2024-02-13 16:12:32.943176 +0000] I [rpc-clnt.c:1899:rpc_clnt_reconfig] 0-vol1-client-2: changing port to 53215 (from 0)
[2024-02-13 16:12:32.943239 +0000] I [socket.c:835:__socket_shutdown] 0-vol1-client-2: intentional socket shutdown(15)
[2024-02-13 16:12:32.943859 +0000] I [rpc-clnt.c:1899:rpc_clnt_reconfig] 0-vol1-client-1: changing port to 52033 (from 0)
[2024-02-13 16:12:32.943922 +0000] I [socket.c:835:__socket_shutdown] 0-vol1-client-1: intentional socket shutdown(14)
[2024-02-13 16:12:32.944659 +0000] I [MSGID: 114057] [client-handshake.c:871:select_server_supported_programs] 0-vol1-client-2: Using Program [{Program-name=GlusterFS 4.x v1}, {Num=1298437}, {Version=400}] 
[2024-02-13 16:12:32.944724 +0000] I [MSGID: 114057] [client-handshake.c:871:select_server_supported_programs] 0-vol1-client-1: Using Program [{Program-name=GlusterFS 4.x v1}, {Num=1298437}, {Version=400}] 
[2024-02-13 16:12:32.945902 +0000] I [MSGID: 114046] [client-handshake.c:621:client_setvolume_cbk] 0-vol1-client-2: Connected, attached to remote volume [{conn-name=vol1-client-2}, {remote_subvol=/data/glusterfs/vol1/brick1/brick}] 
[2024-02-13 16:12:32.945945 +0000] I [MSGID: 108005] [afr-common.c:6025:__afr_handle_child_up_event] 0-vol1-replicate-0: Subvolume 'vol1-client-2' came back up; going online. 
[2024-02-13 16:12:32.984035 +0000] I [MSGID: 114046] [client-handshake.c:621:client_setvolume_cbk] 0-vol1-client-1: Connected, attached to remote volume [{conn-name=vol1-client-1}, {remote_subvol=/data/glusterfs/vol1/brick1/brick}] 
[2024-02-13 16:12:32.984079 +0000] I [MSGID: 108002] [afr-common.c:6404:afr_notify] 0-vol1-replicate-0: Client-quorum is met 
[2024-02-13 16:13:24.600239 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-vol1-client-2: remote operation failed. [{path=<gfid:1eeaa006-58a0-4b46-a9a9-02ecdc60f80f>}, {gfid=1eeaa006-58a0-4b46-a9a9-02ecdc60f80f}, {errno=2}, {error=No such file or directory}] 
[2024-02-13 16:13:32.596500 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-vol1-client-2: remote operation failed. [{path=<gfid:c2d60428-161a-49e0-8dad-6c625ba35870>}, {gfid=c2d60428-161a-49e0-8dad-6c625ba35870}, {errno=2}, {error=No such file or directory}] 
[2024-02-13 16:13:38.484030 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-vol1-client-2: remote operation failed. [{path=<gfid:edc8971f-11a5-42bd-8709-a852a66a4907>}, {gfid=edc8971f-11a5-42bd-8709-a852a66a4907}, {errno=2}, {error=No such file or directory}] 
[2024-02-13 16:13:40.656444 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-vol1-client-2: remote operation failed. [{path=<gfid:70138b51-ab8b-488d-a2ac-efbab2ad2418>}, {gfid=70138b51-ab8b-488d-a2ac-efbab2ad2418}, {errno=2}, {error=No such file or directory}] 
[2024-02-13 16:14:38.037780 +0000] I [socket.c:835:__socket_shutdown] 0-vol1-client-0: intentional socket shutdown(13)
[2024-02-13 16:16:44.169669 +0000] I [socket.c:835:__socket_shutdown] 0-vol1-client-0: intentional socket shutdown(13)
[2024-02-13 16:18:50.274405 +0000] I [socket.c:835:__socket_shutdown] 0-vol1-client-0: intentional socket shutdown(13)
[2024-02-13 16:20:56.385071 +0000] I [socket.c:835:__socket_shutdown] 0-vol1-client-0: intentional socket shutdown(13)
[2024-02-13 16:23:02.494493 +0000] I [socket.c:835:__socket_shutdown] 0-vol1-client-0: intentional socket shutdown(13)

Additional info:

In an attempt to fix the above issue, I created some systemd drop-in files to ensure that the volume isn't mounted until the GlusterFS service is started:

# /etc/systemd/system/mnt-gluster.mount.d/override.conf
[Unit]
After=glusterd.service
Wants=glusterd.service

# /etc/systemd/system/docker.service.d/override.conf
[Unit]
After=glusterd.service
Wants=glusterd.service

I can tell from the timestamps that the mount isn't done until GlusterFS is started, but it still fails to mount. The above logs are with the drop-in files in place.

- The operating system / glusterfs version:

Debian GNU/Linux 11 (bullseye)

glusterfs-server:
Installed: 11.1-1

glusterfs-client:
Installed: 11.1-1

@figure002 figure002 changed the title Unable to mount volume at boot (Debian bullseye) Unable to mount local GlusterFS volume at boot (Debian bullseye) Feb 14, 2024
@figure002 figure002 changed the title Unable to mount local GlusterFS volume at boot (Debian bullseye) Unable to mount local volume at boot (Debian bullseye) Feb 14, 2024
@aravindavk
Copy link
Member

This issue is discussed many times in the mailing list (https://lists.gluster.org/pipermail/gluster-users/2015-March/021333.html) and other issues. We need to come up with the proper solution to this use case.

Issue:

  • Glusterd is up and bricks are not ready (In current node or other nodes)
  • Mount process exits since no subvols are up.
[2024-02-13 16:12:31.713359 +0000] I [fuse-bridge.c:5927:fuse_graph_sync] 0-fuse: switched to graph 0
[2024-02-13 16:12:31.715049 +0000] W [MSGID: 108027] [afr-common.c:2889:afr_attempt_readsubvol_set] 0-vol1-replicate-0: no read subvols for / 
[2024-02-13 16:12:31.715113 +0000] E [fuse-bridge.c:5365:fuse_first_lookup] 0-fuse: first lookup on root failed (Transport endpoint is not connected)
[2024-02-13 16:12:31.715982 +0000] W [fuse-bridge.c:1403:fuse_attr_cbk] 0-glusterfs-fuse: 17: LOOKUP() / => -1 (Transport endpoint is not connected)
[2024-02-13 16:12:31.720745 +0000] W [fuse-bridge.c:1403:fuse_attr_cbk] 0-glusterfs-fuse: 18: LOOKUP() / => -1 (Transport endpoint is not connected)
[2024-02-13 16:12:31.721302 +0000] I [fuse-bridge.c:6308:fuse_thread_proc] 0-fuse: initiating unmount of /mnt/gluster
The message "W [MSGID: 108027] [afr-common.c:2889:afr_attempt_readsubvol_set] 0-vol1-replicate-0: no read subvols for /" repeated 2 times between [2024-02-13 16:12:31.715049 +0000] and [2024-02-13 16:12:31.720735 +0000]
[2024-02-13 16:12:31.722241 +0000] W [glusterfsd.c:1427:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7fe6c2524ea7] -->/usr/sbin/glusterfs(+0x12725) [0x56075d8d3725] -->/usr/sbin/glusterfs(cleanup_and_exit+0x57) [0x56075d8cbd77] ) 0-: received signum (15), shutting down 
[2024-02-13 16:12:31.722263 +0000] I [fuse-bridge.c:7051:fini] 0-fuse: Unmounting '/mnt/gluster'.
[2024-02-13 16:12:31.722273 +0000] I [fuse-bridge.c:7055:fini] 0-fuse: Closing fuse connection to '/mnt/gluster'.

Expected:

  • Mount process should be alive and retry connecting to bricks or quorum check.
  • Mount command should return success. If the volume is not usable, then clients will anyway get ENOTCONN during IO.

@amarts @pranithk what you guys think?

@mohit84
Copy link
Contributor

mohit84 commented Feb 15, 2024

I think we can do it after make the option transport.socket.ignore-enoent to configurable. As of now the option is hard coded and use by glusterd, it call connect in a loop at max 5 times after 1s sleep even connect received ENOENT.

@adu99999
Copy link

so many glusterd nodes, so we have sleep 10 sec,and retry,until all subvolume is up or 20 times limit。
i don't known the common solution, but we use this solution and work fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants