Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TopoLVM should not use the /dev/topolvm devices but use lvm2 lv output as source of truth #902

Open
jakobmoellerdev opened this issue Apr 25, 2024 · 11 comments
Labels
bug Something isn't working

Comments

@jakobmoellerdev
Copy link
Contributor

Describe the bug

TopoLVM currently uses /dev/topolvm as a base for all its created devices. This goes against the lvm2 usage recommendations:

[...] Other software and scripts should use the /dev/VolumeGroupName/LogicalVolumeName format to reduce the chance of needing amendment when the software is updated. [...]

We currently do not use this recommendation and thus we have seen bugs occur that are caused by the mismatch of the devices in the Host and the devices in the topolvm-node and lvmd containers. (namely btrfs unmounting leading to a udev refresh)

What TopoLVM does currently is considered a userspace device rename and should be avoided at all costs if not synchronized via udev. We do not use or respect udev of the host system so we should not attempt to rename the devices naively.

This bug is problematic as we currently have live volumes in the wild that use /dev/topolvm so switching to the native lvm path is gonna be hard. However if we do not do this, we are at risk of getting broken everytime a udev rule is triggering a refresh of device mapper paths.

Additionally this bug leads to confusing behavior because we do not use the "path" variable that is returned by lvm2 when interacting with lvmd, causing the codebase to be unnecessarily complex. In addition in currently prohibits workflows like lvmd-based resizing of volume mounts as lvmd is not aware of the correct lvmd path.

Environments

  • Version: All known
  • OS: All known

To Reproduce
Steps to reproduce the behavior:

  1. Create a PVC with any StorageClass by TopoLVM
  2. Create a Pod consuming it, causing a Mount
  3. Open a terminal on the host node and inspect /dev, observe separate entries for /dev/vgname/lvname and /dev/topolvm based on wether viewing the container FS or the Host FS
  4. Trigger a udev rule for the device mapper that causes a rename based on /dev/mapper, observe it is not synchronized with the container and deleting the Pod/PVC fails. (this part will be fixed with our unmount behavior getting corrected in fix: btrfs unmounting + e2e tests #879)

Expected behavior
I expect TopoLVM as the defacto most popular lvm CSI driver to use default device conventions for interacting with lvm2 and not to create own devices.

Additional context
Add any other context about the problem here.

@jakobmoellerdev jakobmoellerdev added the bug Something isn't working label Apr 25, 2024
@jakobmoellerdev
Copy link
Contributor Author

Note that previously it was discussed to add a limitation to not touch /proc/mounts. I believe this is moot since we should not assume /proc/mounts to be static if it can be touched by other host processes.

@ushitora-anqou
Copy link
Contributor

Some questions about this issue:

@jakobmoellerdev
Copy link
Contributor Author

Some questions about this issue:

* I believe this issue was derived from [topolvm-node does not unmount btrfs volumes after running `parted -l` #877](https://github.com/topolvm/topolvm/issues/877), which was only about btrfs. This issue ([TopoLVM should not use the `/dev/topolvm` devices but use lvm2 lv output as source of truth #902](https://github.com/topolvm/topolvm/issues/902)) seems to be aimed at changing TopoLVM's behavior for all fs types, including ext4 and xfs. Am I correct?

This is correct, I believe this is a change that should target all devices managed by lvm2 in TopoLVM.

* What do you mean by "the "path" variable that is returned by lvm2?" Presumably [this one](https://github.com/topolvm/topolvm/blob/ea553c56a1b87dd5923156bec28814e5c9f79fc8/internal/lvmd/command/lvm_lvs.go#L38)?

Thats also correct, Im referring to the path communicated via lvs

lvs -o help
  Logical Volume Fields
  ---------------------
    lv_all                      - All fields in this section.
    lv_uuid                     - Unique identifier.
    lv_name                     - Name.  LVs created for internal use are enclosed in brackets.
    lv_full_name                - Full name of LV including its VG, namely VG/LV.
    lv_path                     - Full pathname for LV. Blank for internal LVs.
    lv_dm_path                  - Internal device-mapper pathname for LV (in /dev/mapper directory).
...
* What this issue proposes seems to be that we should fix topolvm-node to stop using `/dev/topolvm/volume` entirely and start using `/dev/vgname/lvname` instead. Am I correct again?

that is correct again. We can also use /dev/mapper under the assumption we know everything about the underlying distro but I do not recommend it.

@ushitora-anqou
Copy link
Contributor

Thank you for your reply. I'll discuss this issue with my colleagues next week (successive public holidays have started in Japan, so not all of us are working now).

@ushitora-anqou
Copy link
Contributor

Sorry for the late reply. I've discussed this with my colleagues. We agree that the current /dev/topolvm design isn't that good, but at the same time, we have some doubts that this issue is worth fixing. Here are some concerns:

  • To maintain compatibility, we can't stop supporting /dev/topolvm for a while. The new (fixed) TopoLVM will remove LVs in /dev/topolvm when they are unmounted and use /dev/vgname/lvname instead. In other words, TopoLVM must continue to support /dev/topolvm until all the LVs in /dev/topolvm are unmounted. Since TopoLVM is already being used in the wild, it will take a long time for all the users to stop using /dev/topolvm, and we can't drop its support until then.
  • Even though the LVM's documentation says we should use /dev/vgname/lvname, the kernel seems to use /dev/mapper/... according to /proc/mounts. So, we're not sure if it is correct to use /dev/vgname/lvname (if so, why doesn't the kernel use them?) Also, problems similar to topolvm-node does not unmount btrfs volumes after running parted -l #877 may arise again if we mistakenly use /proc/mounts somewhere in the future.

What do you think about these concerns?

@jakobmoellerdev
Copy link
Contributor Author

Since TopoLVM is already being used in the wild, it will take a long time for all the users to stop using /dev/topolvm, and we can't drop its support until then.

I agree, but we have to start somewhere if we want to move away from it eventually. I fully support the thought of maintaining backwards compatibility as long as possible.

Even though the LVM's documentation says we should use /dev/vgname/lvname, the kernel seems to use /dev/mapper/... according to /proc/mounts. So, we're not sure if it is correct to use /dev/vgname/lvname (if so, why doesn't the kernel use them?)

I think this can be answered via the lvm documentation (man lvm, section "VALID NAMES"):

[...] A directory bearing the name of each Volume Group is created under /dev when any of its Logical Volumes
are activated. Each active Logical Volume is accessible from this directory as a symbolic link leading to a device node. Links or nodes in
/dev/mapper are intended only for internal use and the precise format and escaping might change between releases and distributions. Other software
and scripts should use the /dev/VolumeGroupName/LogicalVolumeName format to reduce the chance of needing amendment when the software is updated.
Should you need to process the node names in /dev/mapper, you may use dmsetup splitname to separate out the original VG, LV and internal layer names.

@toshipp
Copy link
Contributor

toshipp commented May 16, 2024

@jakobmoellerdev
IIUC, your motivation for the issue is to solve the inconsistency between kernel-using device paths and topolvm-using ones, right? I am concerned that it cannot be solved even if we use /dev/VG/LV path because kernel uses /dev/mapper/ path.
Will you change topolvm to use /dev/VG/LV? If so, how do you solve this inconsistency?

@jakobmoellerdev
Copy link
Contributor Author

The motivation is to use the device paths that are controlled by lvm2 and reported in the lvs output. This is to have consistency with pretty much every lvm2 tooling around. It also allow simpler volume expansion since now the controller can rely purely on lvmd to do volume expansion without having to delegate to the node server (the only one knowing of the custom device paths at the moment).

Even if the kernel uses /dev/mapper/ that is irrelevant because topolvm should not even be aware of this. There is no need to use /dev/mapper/ because /dev/VG/LV will always be linked (by lvm2) to the appropriate /dev/mapper device. This means that calling unmount and linking on /dev/VG/LV will work even without making use of the kernels /dev/mapper paths. I wonder which logic path you are concerned about that could break?

@toshipp
Copy link
Contributor

toshipp commented May 16, 2024

If #879 hadn't been implemented, this change would have been a solution for the btrfs unmounting issue, and I wouldn't have any concerns. However, with #879 in place, this change serves only as an improvement. My worry is that it might lead to higher maintenance costs since we would need to keep both the new and the old code for compatibility reasons. I believe it would be wise to set a deadline for dropping the old code if we decide to go ahead with this.

@jakobmoellerdev
Copy link
Contributor Author

I think you make a very valid point, maybe in general we should introduce a policy to drop old maintained code paths after 1-2 minors? That being said, I am mostly afraid of any udev rules in our host os breaking TopoLVM and we cannot control our customers OS rules that much.

For a self-contained setup where the infra teams own everything (including the nodes) that is an option, but for us it really isn't good to have a potential break so deep in the driver.

@toshipp
Copy link
Contributor

toshipp commented May 17, 2024

I understood your point.

Regarding maintenance costs, we may not need to sustain the old code. We can unmount or resize the volume using /dev/VG/LV even if we use devices on /dev/topolvm for mount.
Here is my test.

$ sudo lvcreate -L1G -n test /dev/ubuntu-vg
$ ls -l /dev/ubuntu-vg/test
lrwxrwxrwx 1 root root 7 May 17 08:43 /dev/ubuntu-vg/test -> ../dm-2
$ sudo cp -a /dev/dm-2 device
$ sudo mkfs.ext4  device
$ sudo mount device mnt
$ cat /proc/mounts |grep device
/home/fukaya/t/device /home/fukaya/t/mnt ext4 rw,relatime 0 0
$ sudo blkid -p -s TYPE -s PTTYPE -o export /dev/ubuntu-vg/test
DEVNAME=/dev/ubuntu-vg/test
TYPE=ext4
$ sudo lvresize -L+1G /dev/ubuntu-vg/test
  Size of logical volume ubuntu-vg/test changed from 1.00 GiB (256 extents) to 2.00 GiB (512 extents).
  Logical volume ubuntu-vg/test successfully resized.
$ sudo resize2fs /dev/ubuntu-vg/test
resize2fs 1.46.5 (30-Dec-2021)
Filesystem at /dev/ubuntu-vg/test is mounted on /home/fukaya/t/mnt; on-line resizing required
old_desc_blocks = 1, new_desc_blocks = 1
The filesystem on /dev/ubuntu-vg/test is now 524288 (4k) blocks long.
$ df -h mnt
Filesystem             Size  Used Avail Use% Mounted on
/home/fukaya/t/device  2.0G   24K  1.9G   1% /home/fukaya/t/mnt
$ sudo umount mnt
$ cat /proc/mounts |grep -E "mnt|device"
$

So, I have no concern about this issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: In progress
Development

No branches or pull requests

3 participants