Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_Partition_Count may not be all that useful on Solaris and Derivatives using ZFS #10

Open
szaydel opened this issue Mar 31, 2022 · 8 comments
Labels

Comments

@szaydel
Copy link
Contributor

szaydel commented Mar 31, 2022

Problem

This function pays attention to the /etc/mnttab file, looking for information about mounted partitions, but quite commonly, on systems where ZFS is the dominant filesystem, which is likely to be majority of Solaris, illumos and derivatives, there won't be anything referring to individual drives or partitions in mnttab, instead there will be ZFS mountpoints. Indeed a given drive may be in a pool and in use, but won't be seen in mnttab.

Expected behavior

I think, if this functionality is to exist on Solaris and friends, it might be necessary to enhance it with checks for ZFS labels. I am not sure about the best approach here, haven't had time to think about. Just wanted to raise this in case it was not already considered.

How to reproduce

NA

Deployment information

Anything with ZFS, which at this point would include Linux, BSD, Solaris, illumos, etc.

Additional information

No response

@szaydel szaydel added the bug label Mar 31, 2022
@vonericsen
Copy link
Contributor

I knew someone would ask about ZFS eventually 😄

The issue that was being solved when this code was added was related to erasing a drive that was still mounted.
On Linux, one of our labs observed that a drive formatted with ext4 that was erased could not be reformatted through gnome disks. It would dump an error that was obscure, and even in gparted there was odd behavior. I think we had to reboot, run the erase again, then reboot once more to clear this...way more complicated than any user should have to go through.
One of the cached files that tracks mounted disks was the problem, at least whatever blkid was reading seemed to continue caching this. When formatted with exFAT or FAT32, this error did not occur.
So it seems like it only affected some file systems.
What I found while reproducing this issue was by unmounting a partition first and then erasing, the error went away and it was easy to reformat with a new partition again. Checking the mounttab file (which varies slightly between linux, freebsd, and solaris) to find it seemed like the easiest way to catch most possible cases, but I did know zfs would be missed.

I do not know if this same kind of error will show up with ZFS partitions or not. I tested a few other partition types but only really saw the original error on ext4. I do not recall if I tested zfs at this point.

I think it would be valuable to check for ZFS, at least for detection on whether a drive has a file system and whether it is the boot device. These are currently only used by SeaTools and some limited checks around erase operations in SeaChest, so it should not prevent the tools from running, but there could be a similar error erasing a zfs disk.
I will have to research the best way to accomplish checking for ZFS.

@szaydel
Copy link
Contributor Author

szaydel commented Mar 31, 2022

I guess one of the assumptions being made is that the drive being operated on is not in use. :) I am not sure that there will be any issues like you described, though I suppose we cannot be absolutely sure. Thanks for all the context, this helps. More generally, detecting if a drive has a filesystem, or a part of a filesystem like ZFS would be valuable, I think.

@szaydel
Copy link
Contributor Author

szaydel commented Apr 1, 2022

I was thinking a bit more about this, and maybe it is worthwhile making a tiny tweak to what is output by the programs, so that instead of Partition count for it is Active partition count for, or some other word to replace active. It seems worthwhile to clarify that this is not an actual partition count because it might be misleading if you believe that there is at least one or more partitions on the device.

@vonericsen
Copy link
Contributor

Some operations should not be run while a partition is mounted, and others it is ok. It depends a lot on what is being done.

I like the idea of something like "active" or maybe "detected". I will think about the wording a bit to see if there is a better way to describe it to make sure it informs without misleading anyone.

@szaydel
Copy link
Contributor Author

szaydel commented Apr 4, 2022

Yeah, terminology here is key. Right now, I think language is quite misleading and not really helpful. Thanks for giving this a thought.

vonericsen added a commit that referenced this issue Jun 22, 2022
…cture

Adding an anonymous union to begin the namechange for an old variable to make the name more clear. Using "hasActiveFileSystem" to be more helpful in noting that it is something that is currently mounted.
This is far from perfect, but will cover most of the file systems out there.

[#10]

Signed-off-by: Tyler Erickson <tyler.erickson@seagate.com>
@vonericsen
Copy link
Contributor

I have been continuing to think about this issue to figure out a solution.
I pushed a change to rename the variable to be slightly more clear. I decided that "active" made a lot of sense to use instead.

For ZFS, I did some reading on how it works, config files, and the various zfs and zpool commands to get an idea of what we can do.
It looks like the file /etc/zfs/zpool.cache may be able to be parsed to check for drives containing a zfs pool. I will need to do more research before we try doing this as well as figuring out how to properly parse this file, but I think this could be a solution for systems using ZFS file systems.

@szaydel
Copy link
Contributor Author

szaydel commented Jun 22, 2022

@vonericsen , it is possible to parse the file yeah, but it is not guaranteed to exist. It is optional, though exists by default. What might be sensible to do is to figure out whether a given drive has a ZFS label, which would look something like this:

root@bsr-595529b8:~# zdb -l /dev/rdsk/c2t1d0s0
------------------------------------
LABEL 0
------------------------------------
    version: 5000
    name: 'p01'
    state: 0
    txg: 107675
    pool_guid: 3642064540761792299
    errata: 0
    hostid: 1211643264
    hostname: 'bsr-595529b8'
    top_guid: 12419761296629501954
    guid: 2040682562661304656
    vdev_children: 1
    vdev_tree:
        type: 'mirror'
        id: 0
        guid: 12419761296629501954
        metaslab_array: 68
        metaslab_shift: 29
        ashift: 9
        asize: 10724048896
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 2040682562661304656
            path: '/dev/dsk/c2t1d0s0'
            devid: 'id1,sd@n6000c29ca094599ccc159b2508fcfe08/a'
            phys_path: '/pci@0,0/pci15ad,1976@10/sd@1,0:a'
            whole_disk: 1
            DTL: 885
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 910833938474671540
            path: '/dev/dsk/c2t2d0s0'
            devid: 'id1,sd@n6000c2974567955b1151ff4984e0970b/a'
            phys_path: '/pci@0,0/pci15ad,1976@10/sd@2,0:a'
            whole_disk: 1
            DTL: 884
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
    labels = 0 1 2 3

Then from this information, maybe it is a matter of asking whether the given pool is imported. Or, maybe not even that far, and just stopping at having a label. If the label is present, I think it is safe to say that this drive is a member of a pool, maybe active, maybe offline. At which point it is a matter of checking whether this pool is imported, and that can be done by inspecting /etc/mnttab, perhaps.

@vonericsen
Copy link
Contributor

I will look into the label as well to see how that detection could be added!
Thanks for this idea too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants