Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACL is not working for secondary groups, only works for primary groups. #1028

Open
Matt3o12 opened this issue Mar 26, 2022 · 6 comments
Open

Comments

@Matt3o12
Copy link

Matt3o12 commented Mar 26, 2022

Describe the bug

I cannot create directories when using ACL and the group is not the primary group. I can create the directory just fine if I either become the primary group (sudo -g group -s or if I create the directory in the real path).

I am using mergerfs 2.33.3.

To Reproduce

mount the file system like this:
/mnt/hdds/hdd* /mnt/media fuse.mergerfs allow_other,use_ino,cache.files=partial,dropcacheonclose=true,category.create=eppfrd,moveonenospc=true,posix_acl=true,fsname=media-merger,cache.readdir=true,noatime 0 2 (using fstab)

Set acl permissions:

setfacl -Rm g:hostwrite:rwx,d:g:hostwrite:rwx /mnt/hdds/hdd*

Ensure acl is working in mergerfs:

$  getfacl /mnt/media/
getfacl: Removing leading '/' from absolute path names
# file: mnt/media/
# owner: 100100
# group: hostwrite
user::rwx
group::r-x
group:2000:rwx
group:hostwrite:rwx
mask::rwx
other::r-x
default:user::rwx
default:group::r-x
default:group:2000:rwx
default:group:hostwrite:rwx
default:mask::rwx
default:other::r-x

Add group to user & verify that the group is not the primary:

$ usermod -aG hostwrite matt3o12
$ sudo -u matt3o12 id
uid=1000(matt3o12) gid=1000(matt3o12) groups=1000(matt3o12),27(sudo),50(staff),102000(hostwrite)

Note: gid must not be hostwrite

I cannot create directories:

$ mkdir /mnt/media/test
mkdir: cannot create directory ‘/mnt/media/test’: Permission denied
$ mkdir /mnt/hdds/hdd{0,1}/test
(succeeds without output)
$ sudo -g hostwrite mkdir test2
(also works just fine)
$ id && sudo -g hostwrite id
uid=1000(matt3o12) gid=1000(matt3o12) groups=1000(matt3o12),27(sudo),50(staff),102000(hostwrite)
uid=1000(matt3o12) gid=102000(hostwrite) groups=102000(hostwrite),27(sudo),50(staff),1000(matt3o12)

(in the second one, gid is hostwrite and everything works as expected)

System information:

  • OS, kernel version: uname -a: Linux pve 5.13.19-5-pve #1 SMP PVE 5.13.19-13 (Tue, 08 Mar 2022 07:32:25 +0100) x86_64 GNU/Linux (pve is proxmox, a custom debian bullseye distribution running an ubuntu kernel for zfs boot support)
  • mergerfs version: mergerfs -V: mergerfs version: 2.33.3
  • A strace of the application having a problem:

app.strace.txt

Additional context

Add any other context about the problem here.

@trapexit
Copy link
Owner

https://github.com/trapexit/mergerfs#supplemental-user-groups

Did you account for this?

@Matt3o12
Copy link
Author

Hello, thanks for your fast reply.

When I created this bug, I didn't account for yet and it is working on the main system now. However, I originally encountered that bug in LXC containers and it doesn't work there (when I wrote the bug, I wanted to keep the bug as simple as possible and I could reproduce it until I rebooted it, which is not the case for containers).

Inside the containers, I still cannot create mkdir:

# sudo -u matt3o12 bash
# id
uid=1000(matt3o12) gid=1000(matt3o12) groups=1000(matt3o12),2000(hostwrite)
# mkdir test
mkdir: cannot create directory 'test': Permission denied
# sudo -g hostwrite bash
# id
uid=0(root) gid=2000(hostwrite) groups=2000(hostwrite),0(root)
# mkdir test
(succeeds)

Inside LXC containers, UIDs and GIDs get mapped, so uid 1000 becomes 101000, and root (0) becomes 100000 and hostwrite (2000) becomes 102000. Note this behavior is different from docker where no uid mapping takes place. The mergerfs filesystem is shared via rbind mounts:

lxc.mount.entry: /mnt/media mnt/media none rbind,create=dir,optional 0 0

Here is the strace for the containers:

sudo -u matt3o12 cp /tmp/container.strace.txt /mnt/nas/homes/matt3o12/
sudo strace -fvTtt -s 256 -p 1390 -o /tmp/mergerfs.strace.txt

I am not sure how to assist you best here. Are you familiar with proxmox or at least LXC containers? Proxmox should not be required for the problem here but it makes managing LXC containers very easy.

My guess is that the problem is that the cache does not account for uid mapping properly. My best guess is that another user with that uid but different groups is already in the cache from another container. If that's the case, mergerfs needs to either: account for the cgroup namespace of the process when caching or disable caching for containers. This could also be a security issue because it would allow me to escape the ACL permissions if the cache picks up a uid with "better" secondary groups first.

Is there a way to dump the cache? I'd be happy to take a look. Thank for taking the time to read and understand my issue and for your hard on the mergerfs project! :)

@trapexit
Copy link
Owner

I'll have a closer look later but...

  1. You can't practically disable caching. As the docs mention the cost of querying supplemental groups is relatively very high. It's not something that is intended to be done possibly thousands of times a second.
  2. user namespace translation isn't something that can reasonably be managed either. The kernel includes the requestors uid and gid with every request. mergerfs doesn't know if something is or isn't being used in a container. Even so the translation is already handled by the kernel. mergerfs, running in the default namespace, will get the appropriate value. As with anything else it is on you to ensure that the real ids match up. Same as if you were using NFS or other network filesystem and needed users/groups to be shared. Your containers should share users as needed for your setup. If you create supplemental groups inside a container though mergerfs has no real way to look them up without entering that namespace and running those same expensive commands mentioned prior.

There isn't a way to dump the cache as it's extremely straight forward and there was never a need: https://github.com/trapexit/mergerfs/blob/master/src/gidcache.cpp

It simply calls get and set groups based on demand.

@moonman
Copy link

moonman commented Dec 26, 2022

I've encountered the same issue, unfortunately. I did find a workaround though.
In proxmox you can enable "Fuse" under Options -> Features. After you enable it, you will be able to use mergerfs in the container itself (instead of bindmount from the host). You will, however, have to bindmount all the storage drives/directories that mergerfs puts together, to the LXC container, which becomes a little messy. In the end I'm running mergerfs on the host and in the LXC container directly.

@trapexit
Copy link
Owner

It must be remembered that containers don't automatically share users/groups. They are different systems. The details have to be shared between them. This isn't a mergerfs thing. It's a basic multi system / multi container thing. When a request is made mergerfs has to look up the supplemental groups because that's not something the kernel manages. It's userspace. And so wherever mergerfs is is where it will pick up those supplemental groups.

For your average system you can bind in /etc/passwd, /etc/group, and /etc/nsswitch.conf

@moonman
Copy link

moonman commented Dec 26, 2022

Thank you for the explanation. I suppose because other in-kernel fs drivers or not running in userspace there are no issues with them. There is a way to map container users/groups to host's user/groups with lxc.idmap, so I suppose it is another way to work around the issue, it is however much more cumbersome as one needs to maintain both groups on the host and LXC container, and can be less secure because now a user/group in the LXC container may have some permissions on the host.

I'm perfectly happy with running mergerfs within the container. It has minimal overhead and I probably get to use a newer version of mergerfs at the same time as well (arch container vs debian host). I guess I thought there was a better way, but it looks like I was wrong. Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants