Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange issue with permission denied and bizarre mtime #4314

Open
Nuitari opened this issue Mar 14, 2024 · 5 comments
Open

Strange issue with permission denied and bizarre mtime #4314

Nuitari opened this issue Mar 14, 2024 · 5 comments

Comments

@Nuitari
Copy link

Nuitari commented Mar 14, 2024

Description of problem:

Randomly we'll start getting permission denied errors accompanied by strange mtimes on the fuse mount.

We could not find a way to reproduce the problem, and it happens on directories that has been present for multiple years.

The symptom are always similar in that the Modified Time for the directory is set to some bizarre, inaccurate year:

From the FUSE mount point:

$ stat folder1
  File: folder1
  Size: 4096            Blocks: 8          IO Block: 131072 directory
Device: 36h/54d Inode: 12741275528126725710  Links: 2
Access: (0755/drwxr-xr-x)  Uid: (   33/www-data)   Gid: (   33/www-data)
Access: 1969-12-31 19:00:00.074592330 -0500
Modify: 4455343-06-23 16:55:20.000032721 -0400
Change: 1969-12-31 19:00:00.000000000 -0500
 Birth: -

$ ls -l folder1
ls: cannot open directory 'folder1': Permission denied

$ sudo ls -la folder1
total 10
drwxr-xr-x 2 www-data www-data 4096 Jun 23  4455343 .
drwxr-xr-x 3 www-data www-data 4096 May  6  4455343 ..
-rw-r--r-- 1 www-data www-data  170 Sep 30  2022 skin-bootstrap4.css
-rw-r--r-- 1 www-data www-data  904 Sep 30  2022 skin.css

From the Brick folder (independant of the brick)

$ ls -la /var/brick/folder1
total 32
drwxr-xr-x 2 www-data www-data 4096 May 10  2446 .
drwxr-xr-x 3 www-data www-data 4096 May 10  2446 ..
-rw-r--r-- 2 www-data www-data  170 Sep 30  2022 skin-bootstrap4.css
-rw-r--r-- 2 www-data www-data  904 Sep 30  2022 skin.css

$ stat /var/brick/folder1
  File: /var/brick/folder1  Size: 4096            Blocks: 16         IO Block: 4096   directory
Device: fc03h/64515d    Inode: 3018351     Links: 2
Access: (0755/drwxr-xr-x)  Uid: (   33/www-data)   Gid: (   33/www-data)
Access: 1969-12-31 19:00:00.074592330 -0500
Modify: 2446-05-10 18:38:55.000000000 -0400
Change: 2024-01-31 01:33:04.035612866 -0500
 Birth: -

In the logs we see:

[2024-03-14 22:14:12.193346] E [MSGID: 114031] [client-rpc-fops_v2.c:2534:client4_0_opendir_cbk] 0-sharedProd-client-1: remote operation failed. Path: /folder1 (372c80c9-3769-4c40-b0d2-1a962f5efe4e) [Permission denied]
[2024-03-14 22:14:12.193739] E [MSGID: 114031] [client-rpc-fops_v2.c:2534:client4_0_opendir_cbk] 0-sharedProd-client-3: remote operation failed. Path: /folder1 (372c80c9-3769-4c40-b0d2-1a962f5efe4e) [Permission denied]
[2024-03-14 22:14:12.196071] E [MSGID: 114031] [client-rpc-fops_v2.c:2534:client4_0_opendir_cbk] 0-sharedProd-client-7: remote operation failed. Path: /folder1 (372c80c9-3769-4c40-b0d2-1a962f5efe4e) [Permission denied]
[2024-03-14 22:14:12.196081] E [MSGID: 114031] [client-rpc-fops_v2.c:2534:client4_0_opendir_cbk] 0-sharedProd-client-6: remote operation failed. Path: /folder1 (372c80c9-3769-4c40-b0d2-1a962f5efe4e) [Permission denied]
[2024-03-14 22:14:12.196227] E [MSGID: 114031] [client-rpc-fops_v2.c:2534:client4_0_opendir_cbk] 0-sharedProd-client-5: remote operation failed. Path: /folder1 (372c80c9-3769-4c40-b0d2-1a962f5efe4e) [Permission denied]
[2024-03-14 22:14:12.196258] E [MSGID: 114031] [client-rpc-fops_v2.c:2534:client4_0_opendir_cbk] 0-sharedProd-client-4: remote operation failed. Path: /folder1 (372c80c9-3769-4c40-b0d2-1a962f5efe4e) [Permission denied]
[2024-03-14 22:14:12.196361] W [fuse-bridge.c:1513:fuse_fd_cbk] 0-glusterfs-fuse: 1842104: OPENDIR() /folder1 => -1 (Permission denied)

Doing sudo touch resets the timestamp and the directories are now accessible again.

Expected results:
Access as a normal user

Mandatory info:
- The output of the gluster volume info command:

Volume Name: sharedProd
Type: Replicate
Volume ID: 5955f185-5008-42cf-9cf2-aceff041c8f2
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 9 = 9
Transport-type: tcp
Bricks:
Brick1: srv1:/var/brick
Brick2: srv2:/var/brick
Brick3: srv3:/var/brick
Brick4: srv4:/var/brick
Brick5: srv5:/var/brick
Brick6: srv6:/var/brick
Brick7: srv7:/var/brick
Brick8: srv8:/var/brick
Brick9: srv9:/var/brick
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: true
storage.fips-mode-rchecksum: on
transport.address-family: inet
auth.allow: (lots of private IPs)
network.ping-timeout: 5
features.cache-invalidation: off
features.cache-invalidation-timeout: 60
performance.stat-prefetch: on
performance.cache-invalidation: false
performance.md-cache-timeout: 1
network.inode-lru-limit: 200000
cluster.shd-max-threads: 8
disperse.shd-wait-qlength: 2048
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on

- The output of the gluster volume status command:

Status of volume: sharedProd
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick srv1:/var/brick  49152     0          Y       2287 
Brick srv2:/var/brick  49152     0          Y       2192 
Brick srv3:/var/brick 49152     0          Y       2393 
Brick srv4:/var/brick 49152     0          Y       3053 
Brick srv5:/var/brick 49152     0          Y       4698 
Brick srv6:/var/brick 49152     0          Y       1270 
Brick srv7:/var/brick                                           49152     0          Y       1920 
Brick srv8:/var/brick                                          49152     0          Y       1876 
Brick srv9:/var/brick     60618     0          Y       1473668
Self-heal Daemon on localhost               N/A       N/A        Y       2298 
Self-heal Daemon on srv9            N/A       N/A        Y       1473685
Self-heal Daemon on srv3                                         N/A       N/A        Y       2404 
Self-heal Daemon on srv2                                          N/A       N/A        Y       2203 
Self-heal Daemon on srv7           N/A       N/A        Y       1931 
Self-heal Daemon on srv4                                         N/A       N/A        Y       3064 
Self-heal Daemon on srv6                                         N/A       N/A        Y       1324 
Self-heal Daemon on srv5                                         N/A       N/A        Y       4709 
Self-heal Daemon on srv8                                      N/A       N/A        Y       1887 
 
Task Status of Volume sharedProd
------------------------------------------------------------------------------
There are no active volume tasks

- The output of the gluster volume heal command:

Usage:
volume heal <VOLNAME> [enable | disable | full |statistics [heal-count [replica <HOSTNAME:BRICKNAME>]] |info [summary | split-brain] |split-brain {bigger-file <FILE> | latest-mtime <FILE> |source-brick <HOSTNAME:BRICKNAME> [<FILE>]} |granular-entry-heal {enable | disable}]

**- Is there any crash ? Provide the backtrace and coredump
No crash, no coredumps

- The operating system / glusterfs version:
Mix of Ubuntu 20.04 and Ubuntu 22.04
glusterfs 10.1 on Ubuntu 22.04
glusterfs 7.2 on Ubuntu 20.04

The issue happens the same on either versions.

Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration

@aravindavk
Copy link
Member

Number of Bricks: 1 x 9 = 9

Volume type looks something wrong. Did you created the volume with replica count 9? or wanted to create distributed replicate with replica count 3?

Please share the Volume create command used here.

Use the below command to create Distributed Replicate volume with Replica count 3

gluster volume create sharedProd replica 3 \
    srv1:/var/brick                        \
    srv2:/var/brick                        \
    srv3:/var/brick                        \
    srv4:/var/brick                        \
    srv5:/var/brick                        \
    srv6:/var/brick                        \
    srv7:/var/brick                        \
    srv8:/var/brick                        \
    srv9:/var/brick

@Nuitari
Copy link
Author

Nuitari commented Mar 15, 2024

The goal is to have 9 replicas.
There is only about 20Gb of data, but we need high availability.

@aravindavk
Copy link
Member

This is not a supported configuration, Only Replica count 2 and 3 are the tested and supported ones. You can explore Disperse volume where you will get the high availability and more storage space with the same number of bricks. For example, Create a volume with 6 data bricks and 3 redundancy bricks. Your volume size will be 6 x size in each brick and the volume will be highly available even if 3 nodes/bricks goes down.

@xhernandez / @pranithk Is it possible to have redundancy count more than data bricks if high availability is more important than storage space?

@xhernandez
Copy link
Contributor

xhernandez commented Mar 18, 2024 via email

@Nuitari
Copy link
Author

Nuitari commented Mar 23, 2024

We also have a smaller testing environment

Volume Name: shared1
Type: Replicate
Volume ID: 2073f548-b89a-4687-92f6-486ac661750b
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: testsrv1:/var/brick
Brick2: testsrv2:/var/brick
Brick3: testsrv3:/var/brick
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: true
storage.fips-mode-rchecksum: on
transport.address-family: inet
auth.allow: 10.0.0.0/8

Same problem presentation.
All 3 nodes are glusterfs 10.1 on Ubuntu 22.04

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants