(WiP) zfs pool draft #289

cfcs · 2019-10-28T10:27:17Z

This PR aims to introduce two new qvm-pool pool modules with this interface:

[user@dom0 ~]$ qvm-pool  --help-drivers
DRIVER         OPTIONS
zfs_encrypted  zpool_name, ask_password_domain, unload_timeout
zfs_zvol       block_devices

zfs_zvol which implements VM storage as ZFS zvol volumes. In the current implementation it also deals with zpool creation on each block_devices parameter since that was the easiest way to get started from a clean slate when testing. If there are valid use-case arguments for splitting that off to a third module that could be done (i.e. if anyone thinks that it would be useful to implement a zvol pool on top of an existing dataset).
zfs_encryption which largely inherits from zfs_zvol, but on top of encrypted datasets inside zpool_name that acts as "encryption key groups." The keys are optionally unloaded after a pool inactivity timeout (unload_timeout). The keys are loaded by passing a raw file descriptor to an invocation of zfs load-key {dataset} from QRexec service qubes.AskPassword in a target domain.

When used, this results in datasets and zvols in this form:

$ zfs list -o name,type,encryption
qbs/encryption/qbsdev                          filesystem  aes-256-gcm
qbs/encryption/qbsdev/import                   filesystem  aes-256-gcm
qbs/encryption/qbsdev/import/zfsdev            filesystem  aes-256-gcm
qbs/encryption/qbsdev/vm                       filesystem  aes-256-gcm
qbs/encryption/qbsdev/vm/zfsdev                filesystem  aes-256-gcm
qbs/encryption/qbsdev/vm/zfsdev/private        volume      aes-256-gcm
qbs/encryption/qbsdev/vm/zfsdev/volatile       volume      aes-256-gcm
qbs/vm                                         filesystem  none
qbs/vm/plainbrowser                            filesystem  none
qbs/vm/plainbrowser/private                    volume      none
qbs/vm/plainbrowser/volatile                   volume      none

Adding those was a matter of:

$ qvm-pool -a qbs zfs_zvol -o 'block_devices=/dev/loopMYFILE'
$ qvm-pool -a qbsdev zfs_encrypted -o 'zpool_name=qbs'

For the purpose of key entry I am currently using this implementation of /etc/qubes-rpc/qubes.AskPassword in the ask_password_domain, which should probably be reworked (remember to chmod +x):

#!/bin/sh
read -r -N 100 -s -t 5 subject

# TODO why does this not work, does it need to strip newline?
# might be fixed now, according to @marmarek's suggestion in comment below:
# [ "${subject}" = "${subject//[^-/_a-z:A-Z0-9]}" ] || exit 1

export DISPLAY="${DISPLAY:-:0}"
zenity --forms \
  --text "${QREXEC_REMOTE_DOMAIN}:qubes.AskPassword" \
  --add-password "Password for ${subject//[^a-zA-Z]}" 2>>/tmp/askpw.log

This requires having ZFS installed from zfs-on-linux with the pyzfs3 module built.
To get that going (in addition to their instructions) I needed:

sudo qubes-dom0-update kernel-devel git sysstat bc python3-devel
./autogen.sh && \
./configure --with-config=srpm --enable-pyzfs && \
make -j3 pkg-utils rpm-dkms && \
sudo dnf install ./libuutil1-0.8.0-1.qbs4.0.x86_64.rpm     \
                 ./libzfs2-0.8.0-1.qbs4.0.x86_64.rpm       \
                 ./libnvpair1-0.8.0-1.qbs4.0.x86_64.rpm    \
                 ./python3-pyzfs-0.8.0-1.qbs4.0.noarch.rpm \
                 ./zfs-0.8.0-1.qbs4.0.x86_64.rpm

# not sure if we need these:
sudo dnf install  ./libzpool2-0.8.0-1.qbs4.0.x86_64 \
                  ./libzfs2-devel-0.8.0-1.qbs4.0.x86_64.rpm

# install DKMS kernel module, this will take a while since it needs
# to compile a module for each dom0 kernel currently installed.
# may want to bring down the number of kernels:
#    rpm -qa kernel
#    sudo dnf remove kernel-4.14.74-1.pvops.qubes.x86_64
# pay attention that we want the "noarch" and not the "src" rpm:
sudo dnf install ./zfs-dkms-0.8.0-1.qbs4.0.noarch.rpm

This is merely a draft, as witnessed by the many TODO comments, but it does "work," and I thought it might be useful to place here for review, and in case someone wants to collaborate on this with me.
NOTABLY:

all the volume _import functions are probably horribly broken
the size reporting does not seem 100% accurate / implemented for all cases
currently no snapshotting takes place, even though there is some scaffolding in place for that
the "default" options for zpools/zvols/datasets are not uniformly applied
the suggestions for updates to the documentation are probably not 100% correct either

ping:

@Rudd-O who also seems to dabble in Qubes + ZFS
@tykling who provided invaluable help deciphering ZFS options, and might have further suggestions
@marmarek who provided a ton of help and suggestions re: the inner workings of Qubes and qubes-core-admin

marmarek · 2019-10-28T11:22:36Z

# TODO why does this not work, does it need to strip newline?
# [ "${subject}" = "${subject}//[^-/_a-z:A-Z0-9]}" ] || exit 1

Extra } in the middle.

marmarek

Added some comments, besides TODO already in the code.

Also some safety note: any unhandled exception while loading qubes.xml (__init__() of both pool and volume classes, pool.init_volume()) will prevent qubesd from starting, which is very bad (requires manual edit of qubes.xml to fix). Avoid any possibility of such case. One thing I'd be careful about is removing base zfs pool, while there are some encrypted pools based on it. I think you need to check for that in remove() explicitly.

There is also more generic remark: since this adds non-trivial dependencies, I'd prefer it in a separate package (with correct dependencies set). The way it is now, it's impossible to do it right (it would either depend on zfs-stuff always, making it impossible to install without, or be broken by default with unfriendly error messages).
But we can keep it here for development and review time, for convenience.

marmarek · 2019-10-28T11:25:33Z

doc/qubes-storage.rst

@@ -45,6 +45,7 @@ properties:
   with its own state of the volume, or rather a snapshot of its template volume
   (pointed by a :py:attr:`~qubes.storage.Volume.source` property). This can be
   set to `True` only if a domain do have `template` property (AppVM and DispVM).
+   TODO should `source` be folded into this arg?


I'm for it, but lets keep it compatible with the old API. For example: allow snap_on_start=Volume (and then assert for source not set), but also allow snap_on_start=True, and then expect source set. Alternatively add a new attribute replacing both, like snap_on_start_from.

doc/qubes-storage.rst

marmarek · 2019-10-28T11:32:28Z

doc/qubes-storage.rst

 - :py:meth:`~qubes.storage.Volume.import_data` - return a path the data should
   be written to, to import volume data; for complex formats, this can be pipe
   (connected to some data-importing process)
 - :py:meth:`~qubes.storage.Volume.import_data_end` - finish data import
   operation (cleanup temporary files etc); this methods is called always after
   :py:meth:`~qubes.storage.Volume.import_data` regardless if operation was
   successful or not
- - :py:meth:`~qubes.storage.Volume.import_volume` - import data from another volume
+ - :py:meth:`~qubes.storage.Volume.import_volume` - import data from another volume. TODO this is used when self.snap_on_data=True and self.source exists


No snap_on_data or snap_on_start. It is using when cloning VM or otherwise importing data between already existing volumes (not importing data from "the outside").
It may be worth noting that even if the volume is created with an intention of its data being replaced with this function, volume.create() is still called before it.

qubes/storage/zfs.py

marmarek · 2019-10-28T11:46:50Z

qubes/storage/zfs.py

+        """
+        self.log.warning("zfs init_volume() vm={} volume_config={}".format(
+            vm, volume_config
+        ))


log.debug? also, it's duplicated below
I won't comment on similar lines further below, but most of the log messages looks more like debug not warnings (probably to be changed when moving out of draft state).

Agree, the .warning was to make sure they all showed up. I think most if not all of the self.log statements can be removed once this is more stable.

marmarek · 2019-10-28T14:53:00Z

qubes/storage/zfs_encrypted.py

+            self.await_timeout()
+        )
+
+    async def await_timeout(self):


This whole function looks overly complex. I'd done it this way:

when stopping last volume (used_volumes become empty) schedule a key unload and save a reference to scheduled task

when starting a volume, cancel scheduled unload task (also handling a case where it wasn't scheduled)

marmarek · 2019-10-28T14:55:52Z

qubes/storage/zfs_encrypted.py

+           when no volumes are in use.
+        """
+        self.used_volumes.discard(vm)
+        self.used_volumes_last_empty = time.clock_gettime(


In this version of of await_timeout function, this looks wrong. Should update the timestamp only if used_volumes is empty.

qubes/storage/zfs_encrypted.py

marmarek · 2019-10-28T15:07:19Z

qubes/storage/zfs_encrypted.py

+        """
+        Returns the parent pool if found, otherwise raises an AssertionError.
+        This function also moonlights as our method to retrieve a handle for
+        'app' which we record and re-use when asking for passwords.


As noted elsewhere, there is a better way: you can get it via Pool.init_volume(vm) -> vm.app, which is guaranteed to be called before any volume.start() - contrary to this function.

marmarek · 2019-10-28T15:11:51Z

qubes/tests/storage.py

@@ -24,6 +24,7 @@
 from qubes.storage import pool_drivers
 from qubes.storage.file import FilePool
 from qubes.storage.reflink import ReflinkPool
+from qubes.storage.zfs import ZFSQpool


It's nice you import it here, but some actual tests would be even nicer ;)

Ideally, there would be general tests for all storage pools. @marmarek thoughts?

cfcs · 2019-10-31T17:43:31Z

Thank you for the review, I will try to get around to addressing this at some point and keep this updated.

This PR is relevant to the discussion at QubesOS/qubes-issues#1293 (I forgot to mention that in the brief).

marmarek · 2019-12-24T12:36:53Z

qubes/storage/zfs_encrypted.py

-                input=self.name.encode()+b'\n', # context for the prompt
-                stdout=pw_pipe_out)
+            # await pw_vm.run_service_for_stdio(
+            # TODO THIS used to work, now it fucking broke.


If you direct dom0, it needs R4.1 commit 50adc60. If you want to test on R4.0, you need to backport this commit.
Or maybe some other problem?

That seems likely. It used to work on 4.0, but after updating earlier this week it stopped working with the exception that run_service_for_stdio for the AdminVM class was not defined.

Rudd-O · 2020-01-05T04:09:11Z

I'm extremely interested in this. Do you need funding to get this to pass tests and work? How much would you consider to pique your interest?

cfcs · 2020-01-06T02:04:38Z

@Rudd-O I am working on this on my own time (and am using this patch set as-is on my laptop), but if you are able to eventually allocate funds to the Qubes project towards @marmarek's time spent reviewing and answering questions about the Qubes APIs, that would make me feel less guilty for taking up his time.

We also hit some sore spots re: the qvm-pool command-line interface that it would be great to have someone clean up.

Rudd-O · 2020-01-06T21:27:35Z

@marmarek would you like to discuss funding for this specific bug?

Rudd-O · 2020-01-06T21:53:56Z

(Lack of this is blocking me getting Qubes Network Server ported to Qubes 4.1 for a variety of reasons that would be super hard to explain shortly.)

marmarek · 2020-01-06T23:28:54Z

I'm fine with reviewing it whenever you give info it's ready for the next round of review.

If you like to help financially, contribute to our OpenCollective, so we can fund more people offloading other tasks from me.

Rudd-O · 2020-01-10T06:33:18Z

Review comment here. The structure of the datasets is not quite what it should be. According to the inter-distro working group on getting ZFS on root file systems, the structure should be more or less like this:

rpool/ROOT/qubes <- the OS goes here, following the pattern
              root pool name, ROOT, OS name (ubuntu, fedora)
rpool/BOOT/qubes <- if you'll use /boot in ZFS, not sure its status
rpool/USERDATA/VMs <- this can be named whatever you want, doesn't even
              have to be on rpool, but a good default would
              obviously be called VMs or vms or vm
rpool/USERDATA/VMs/plainbrowser
rpool/USERDATA/VMs/plainbrowser/private
rpool/USERDATA/VMs/plainbrowser/volatile

The spec appears to be what will be followed by Fedora as well.

Ref: https://docs.google.com/document/d/1m9VWOjfmdbujV4AQWzWruiXYgb0W4Hu0D8uQC_BWKjk/edit

Rudd-O · 2020-01-10T06:39:00Z

@marmarek 1) how do I contribute to the OpenCollective? 2) how do I earmark contributions for this project?

cfcs · 2020-01-10T14:51:50Z

@Rudd-O to clarify, which datasets do you propose prefixing with /USERDATA/ ? The encrypted ones, when adding to an existing pool?

EDIT: Did you link to the right document? This one seems to be an unpublished draft detailing zfs-on-root for Ubuntu systems (neither of which applies here), and there's no mention of an inter-distro working group?

EDIT 2: I didn't have a use-case for it, but adding an unencrypted wrapper in addition to the encrypted one, and allowing both to take arguments like prefix=/USERDATA,zpool=rpool should be easy enough if that would work for you?

Rudd-O · 2020-01-11T11:03:38Z

Not sure I understand. Why do we need unencrypted anything?

marmarek · 2020-01-11T12:11:28Z

@marmarek 1) how do I contribute to the OpenCollective?

https://opencollective.com/qubes-os

how do I earmark contributions for this project?

You can add a comment when you donate, but read also this: https://www.qubes-os.org/donate/#do-donors-get-personalized-support-or-special-feature-requests

cfcs · 2020-01-11T17:53:24Z

@Rudd-O I'm assuming that people are using full-disk encryption, so additional encryption keys to enter for e.g. your VM templates or your jazz music collection might be overkill UX-wise.

Rudd-O · 2020-01-13T10:27:17Z

That was my assumption too — FDE or no game. I know people want to have filesystem-level encryption, but that discloses too much — even the names of the datasets are not okay for adversaries to know.

Rudd-O · 2020-01-13T10:28:22Z

https://www.qubes-os.org/donate/#do-donors-get-personalized-support-or-special-feature-requests Huge bummer. I would have loved to earmark the donation to this specific feature. Lemme know when the policy changes so I can negotiate what the developers need to get this through.

DemiMarie

In addition to the various TODOs, there are various ways this code can be simplified quite a bit. Also, I suggest focusing on unencrypted pools, as QubesOS users generally use full-disk encryption.

DemiMarie · 2021-02-09T04:00:42Z

qubes/storage/zfs.py

+                         pool=self.pool,
+                         vid=self.vid,
+                         rw=self.rw,
+                         size=self._size)


Suggested change

size=self._size)

size=self._size,

**kwargs)

DemiMarie · 2021-02-09T04:00:52Z

qubes/storage/zfs.py

+
+        self.save_on_stop = False
+        self.snap_on_start = False
+        self.rw = rw


Suggested change

self.rw = rw

DemiMarie · 2021-02-09T04:01:45Z

qubes/storage/zfs.py

+        if self.save_on_stop:
+            raise qubes.storage.StoragePoolException(
+                "TODO save_on_stop not supported"
+            )


This code is more important than encryption.

Unfortunately I didn't (and still don't) understnad the expected semantics of save_on_stop. If you can lay it out in a issue on the other repo I'd be happy to discuss and/or implement it.
As for being more important than encryption: That's entirely subjective. I needed (and need) encrypted, incremental backups - that's why I wrote this. I did not (and do not) need snapshots on start/stop, so I didn't implement that.

I had not considered the backups, sorry!

DemiMarie · 2021-02-09T04:02:29Z

qubes/storage/zfs.py

+        libzfs_core.lzc_create(
+            self.vid.encode(),
+            ds_type="zvol",
+            props=i_totally_know_python


Suggested change

props=i_totally_know_python

props={b'volsize': self.size},

DemiMarie · 2021-02-09T04:02:48Z

qubes/storage/zfs.py

+        #i_totally_know_python = DEFAULT_ZVOL_PROPS.copy()
+        i_totally_know_python = {}
+        i_totally_know_python[b"volsize"] = self.size


Suggested change

#i_totally_know_python = DEFAULT_ZVOL_PROPS.copy()

i_totally_know_python = {}

i_totally_know_python[b"volsize"] = self.size

DemiMarie · 2021-02-09T04:21:26Z

qubes/storage/zfs.py

+        if self.source:
+            # we're supposed to clone from this, maybe.
+            self.log.warning("zfs volume.create(): there is a .source!")
+        # TODO probably want lzc_clone in some cases.


Suggested change

if self.source:

# we're supposed to clone from this, maybe.

self.log.warning("zfs volume.create(): there is a .source!")

# TODO probably want lzc_clone in some cases.

DemiMarie · 2021-02-09T04:22:58Z

qubes/storage/zfs.py

+        i = run_command(
+            [
+                "zpool",
+                "list",
+                "-Hp",
+                "-o",
+                "allocated",
+                self.zfs_ns
+            ]
+        )


Suggested change

i = run_command(

[

"zpool",

"list",

"-Hp",

"-o",

"allocated",

self.zfs_ns

]

)

i = run_command(["zpool", "list", "-Hp", "-o", "allocated", self.zfs_ns])

Would probably be more idiomatic if the use of lists for run_command was abolished and *args was used on the receiver in question.

DemiMarie · 2021-02-09T04:24:07Z

qubes/storage/zfs.py

+        i = run_command(
+            [
+                "zpool",
+                "list",
+                "-Hp",
+                "-o",
+                "size",
+                self.zfs_ns,
+            ])


Suggested change

i = run_command(

[

"zpool",

"list",

"-Hp",

"-o",

"size",

self.zfs_ns,

])

i = run_command(["zpool", "list", "-Hp", "-o", "size", self.zfs_ns])

DemiMarie · 2021-02-09T04:25:35Z

qubes/storage/zfs.py

+        # TODO if any of this stuff fails we should likely revert the
+        # whole thing to the previous state.
+
+        # TODO at least use the qubes_cmd_coro() here
+        # TODO get rid of sudo here if we can
+        environ = os.environ.copy()
+        environ["LC_ALL"] = "C.utf8"
+        p = yield from asyncio.create_subprocess_exec(
+            *[
+                "sudo",
+                "zpool",
+                "create",
+                *DEFAULT_ZPOOL_CMDLINE_OPTIONS,
+                self.zfs_ns,
+                *self.block_devices,
+            ],
+            stdout=subprocess.DEVNULL,
+            stderr=subprocess.PIPE,  # todo capture stderr
+            close_fds=True,
+            env=environ,
+        )
+        _, err = yield from p.communicate()
+        if p.returncode != 0:
+            raise qubes.storage.StoragePoolException(
+                "unable to create ZFS zpool:[{}] {}".format(p.returncode, err)
+            )
+
+        # Set up permissions for new zpool to avoid having to run other
+        # commands as root. This allows some basic protection from programming
+        # mistakes, since we can whitelist the operations that can be performed
+        # without root privileges:
+
+        # TODO should we attempt cleanup if any of this fails?
+
+        p = yield from asyncio.create_subprocess_exec(
+            *[
+                "sudo",
+                "zfs",
+                "allow",
+                "-ld",  # this zpool + its descendants
+                "-g", # -g TODO
+                "qubes",  # qubes gid / group TODO
+                ("encryption,load-key,refreservation,create,destroy,mount,clone,snapshot,hold,"
+                 "bookmark,send,diff,sync,volmode,rollback,receive,"
+                 "volsize,volblocksize,volmode"
+                ),
+                self.zfs_ns,
+            ],
+            stdout=subprocess.DEVNULL,
+            stderr=subprocess.PIPE,
+            close_fds=True,
+            env=environ,
+        )
+        _, err = yield from p.communicate()
+        if p.returncode != 0:
+            raise qubes.storage.StoragePoolException(
+                "unable to set permissions for ZFS zpool: {}".format(err)
+            )
+
+        p = yield from asyncio.create_subprocess_exec(
+            *[
+                "sudo",
+                "zfs",
+                "allow",
+                "-d",  # ONLY descendants (NOT this zpool itself)
+                "-g",
+                "qubes",  # qubes gid / group
+                ("refreservation,volmode,destroy,rollback,receive,volsize,"
+                 "devices,volblocksize,volmode,encryption"),
+                self.zfs_ns,
+            ],
+            stdout=subprocess.DEVNULL,
+            stderr=subprocess.PIPE,
+            close_fds=True,
+            env=environ,
+        )
+        _, err = yield from p.communicate()
+        if p.returncode != 0:
+            raise qubes.storage.StoragePoolException(
+                "unable to set child permissions for ZFS zpool: {}".format(
+                    err
+                )
+            )


Suggested change

# TODO if any of this stuff fails we should likely revert the

# whole thing to the previous state.

# TODO at least use the qubes_cmd_coro() here

# TODO get rid of sudo here if we can

environ = os.environ.copy()

environ["LC_ALL"] = "C.utf8"

p = yield from asyncio.create_subprocess_exec(

*[

"sudo",

"zpool",

"create",

*DEFAULT_ZPOOL_CMDLINE_OPTIONS,

self.zfs_ns,

*self.block_devices,

],

stdout=subprocess.DEVNULL,

stderr=subprocess.PIPE, # todo capture stderr

close_fds=True,

env=environ,

)

_, err = yield from p.communicate()

if p.returncode != 0:

raise qubes.storage.StoragePoolException(

"unable to create ZFS zpool:[{}] {}".format(p.returncode, err)

)

# Set up permissions for new zpool to avoid having to run other

# commands as root. This allows some basic protection from programming

# mistakes, since we can whitelist the operations that can be performed

# without root privileges:

# TODO should we attempt cleanup if any of this fails?

p = yield from asyncio.create_subprocess_exec(

*[

"sudo",

"zfs",

"allow",

"-ld", # this zpool + its descendants

"-g", # -g TODO

"qubes", # qubes gid / group TODO

("encryption,load-key,refreservation,create,destroy,mount,clone,snapshot,hold,"

"bookmark,send,diff,sync,volmode,rollback,receive,"

"volsize,volblocksize,volmode"

),

self.zfs_ns,

],

stdout=subprocess.DEVNULL,

stderr=subprocess.PIPE,

close_fds=True,

env=environ,

)

_, err = yield from p.communicate()

if p.returncode != 0:

raise qubes.storage.StoragePoolException(

"unable to set permissions for ZFS zpool: {}".format(err)

)

p = yield from asyncio.create_subprocess_exec(

*[

"sudo",

"zfs",

"allow",

"-d", # ONLY descendants (NOT this zpool itself)

"-g",

"qubes", # qubes gid / group

("refreservation,volmode,destroy,rollback,receive,volsize,"

"devices,volblocksize,volmode,encryption"),

self.zfs_ns,

],

stdout=subprocess.DEVNULL,

stderr=subprocess.PIPE,

close_fds=True,

env=environ,

)

_, err = yield from p.communicate()

if p.returncode != 0:

raise qubes.storage.StoragePoolException(

"unable to set child permissions for ZFS zpool: {}".format(

err

)

)

qubesd runs as root, so this code is not necessary.

DemiMarie · 2021-02-09T04:28:02Z

qubes/tests/storage.py

@@ -24,6 +24,7 @@
 from qubes.storage import pool_drivers
 from qubes.storage.file import FilePool
 from qubes.storage.reflink import ReflinkPool
+from qubes.storage.zfs import ZFSQpool


Ideally, there would be general tests for all storage pools. @marmarek thoughts?

cfcs · 2021-03-03T13:36:47Z

@DemiMarie Thank you for your review. I've since split the code out into its own package as advised by @marmarek, so this particular PR is abandoned. I've used this module as the primary pool driver on my personal laptop and have been quite happy with it.

You can find the revised code at: https://github.com/cfcs/qubes-storage-zfs

Twice I had to boot from a recovery disk because upstream API changes broke qubesd to the point where nothing worked. Copying changed code from a development vm to dom0 without mounting the file system turns out to not be easy when qubesd is broken. I've also had to manually edit the kernel header files because they either didn't compile (???) or were incompatible with the ZFS DKMS modules (that's to be expected I guess). So not quite ready for the primetime yet.

DemiMarie · 2021-03-03T20:22:35Z

@DemiMarie Thank you for your review. I've since split the code out into its own package as advised by @marmarek, so this particular PR is abandoned. I've used this module as the primary pool driver on my personal laptop and have been quite happy with it.

You’re welcome! Even if you don’t intend to get this merged, it might be useful to have an open PR for code review purposes.

You can find the revised code at: https://github.com/cfcs/qubes-storage-zfs

Would you be interested in submitting it as a contrib package? I understand that it isn’t ready for prime time yet, but if/when that changes, this would allow users to obtain it via qubes-dom0-update. This would most likely require adding the DKMS package first, if I understand correctly.

Twice I had to boot from a recovery disk because upstream API changes broke qubesd to the point where nothing worked. Copying changed code from a development vm to dom0 without mounting the file system turns out to not be easy when qubesd is broken. I've also had to manually edit the kernel header files because they either didn't compile (???) or were incompatible with the ZFS DKMS modules (that's to be expected I guess). So not quite ready for the primetime yet.

I have actually had to manually edit stuff in dom0 before (mostly qubes.xml). So I understand! I wonder if the compilation errors are due to dom0 having an old version of GCC.

One other question: what are the advantages of this over the BTRFS pool? RAID-Z is the only one that comes to my mind, but there could easily be others I am missing.

cfcs · 2021-03-24T09:32:04Z

@DemiMarie Thank you for your review. I've since split the code out into its own package as advised by @marmarek, so this particular PR is abandoned. I've used this module as the primary pool driver on my personal laptop and have been quite happy with it.

You’re welcome! Even if you don’t intend to get this merged, it might be useful to have an open PR for code review purposes.

You can find the revised code at: https://github.com/cfcs/qubes-storage-zfs

Would you be interested in submitting it as a contrib package? I understand that it isn’t ready for prime time yet, but if/when that changes, this would allow users to obtain it via qubes-dom0-update. This would most likely require adding the DKMS package first, if I understand correctly.

That is the hope, that it can end up as a contrib package.

Like I said I've been using it on my personal Qubes installations since I posted this, and it's worked pretty well for me, but there are still some gotchas that need to be hashed out, and I'd like to have it better documented as well.

I'm kind of worried about releasing it as a contrib package and then having people unable to boot their VMs due to some kind of DKMS issue --- right now, if something goes wrong, all of qubesd stops working, which means file copy to/from dom0 also stops working, and copy-paste, so it can be a grueling experience to have to recover from that situation, even if there's an upstream fix. I'd like to find a solution to that particular problem before encouraging other people to use it for anything but test systems.

Twice I had to boot from a recovery disk because upstream API changes broke qubesd to the point where nothing worked. Copying changed code from a development vm to dom0 without mounting the file system turns out to not be easy when qubesd is broken. I've also had to manually edit the kernel header files because they either didn't compile (???) or were incompatible with the ZFS DKMS modules (that's to be expected I guess). So not quite ready for the primetime yet.

I have actually had to manually edit stuff in dom0 before (mostly qubes.xml). So I understand! I wonder if the compilation errors are due to dom0 having an old version of GCC.

Note sure, but compiler and header files being out of sync does seem like a probable reason. It's been a while since I had anything as drastic as that happen though.

One other question: what are the advantages of this over the BTRFS pool? RAID-Z is the only one that comes to my mind, but there could easily be others I am missing.

Main feature over BTRFS is encryption, so your incremental backups are encrypted at-rest and their integrity can be validated on the backup system without loading the key. Main drawback compared to BTRFS is that ZFS lacks a COW file cloning system call on Linux (BTRFS has that). Not really of big consequence for this particular codebase, since most of the magic happens with zvols which do support cow cloning, but that's my impression of the overall state -- otherwise they're fairly similar. The ability to import ZFS data on FreeBSD systems is also an appealing factor.

tlaurion · 2022-07-21T19:58:27Z

@cfcs : was the effort abandoned? Have you seen #7009 bounty offer for exactly this work/from @Rudd-O ?

zpool live deduplication is a big advantage over brtfs from what I've been reading. Please feel free to shed some lights in the ongoing discussion of pool deduplication, here: https://forum.qubes-os.org/t/pool-level-deduplication/12654

Rudd-O · 2022-07-21T23:01:23Z

https://github.com/cfcs/qubes-storage-zfs

If this were released as a contrib package with the to-do items checked off such that I can trust my own systems with it, I would honor my promise of the bounty I offered before in #7009.

cfcs · 2022-08-11T19:51:28Z

@tlaurion (EDIT)

@cfcs : was the effort abandoned? Have you seen #7009 bounty offer for exactly this work/from @Rudd-O ?

Ah, no, I had not seen that.

In the sense that I use it every day on my own machines, the project is not abandoned, but it hasn't seen meaningful progress for a while now. This PR is effectively abandoned, since I deemed the approach @DemiMarie suggested of a plugin/contrib package better than merging it into qubes-core-admin (see @Rudd-O 's link to my package); but the contrib package does work and I update it when I run into things that annoy me.

Judging by the helpful issue posts on that repo, @DemiMarie has also taken it for a spin, but I don't know of any other users / testers.
There is also a fork by @ayakael here https://github.com/ayakael/qubes-storage-zfs that seems to do something with qubes-builder. I don't think they ever sent any pull requests, but I've picked up some of the patches and it's probably worth having a look at their changes if you're going to dig into this project.

It feels to me like there's some way to go before this would be something I'd recommend to people who are not comfortable fixing DKMS build errors; I've personally had a number of issues with the kernel headers/kernel being out of sync with the fc32 DKMS packages for ZFS. Basically you need to plan 1-2 days of downtime whenever you type qubes-dom0-update, and I don't have a good solution to that problem.

To sum up there are some things to deal with before I think it makes sense to offer this to non-expert users:

Figuring out a stable update story that doesn't cause massive pain on a daily basis (hard).
Fixing TODOs documented in the README (probably not so hard).
Reading up on the Qubes 4.0 -> 4.x changelogs and figuring out how the APIs are intended to be used. At the moment there's a number of things that can go wrong, and when they do go wrong, you need to manually stop&start qubesd. Potentially this can be solved by restructuring the code to do the "dangerous" stuff in places where qubesd presently expects exceptions; potentially this will require adding some try..except to qubes-core-admin. I felt figuring this out on my own without pestering the upstream Qubes devs was a hard task.
Finding a solution to some of the other problems, like the one where the COW updates to the template's storage (that are thrown away on next reboot anyway) are written to the template's storage instead of the ZFS storage. This is probably also hard without spending time with upstream devs, or a lot of time reading through the code and documentation.

I'd be happy to review and merge patches to my repo, and I'd also be fine with someone forking it into their own contrib package, copying what they like and deleting the rest. As I've explained before, packaging is not really my strong side, so I doubt I will be doing any of that on my own in the near future. I basically did a lot of the easy lifting on this project to achieve what I wanted, and now I have a system that works for me, but I left the heavy lifting of polishing it up and making it mainstream-deployable as an exercise to the reader. I would be delighted to find such a reader here. :-)

DemiMarie · 2022-08-11T22:34:13Z

Judging by the helpful issue posts on that repo, @DemiMarie has also taken it for a spin, but I don't know of any other users / testers.

I actually haven’t. All of my suggestions have come from manual review of the code.

Figuring out a stable update story that doesn't cause massive pain on a daily basis (hard).

The best approach I can think of is to ship a DKMS package that accompanies this package. Fedora 32 is EOL so I am not surprised you are having problem with its DKMS package.

Reading up on the Qubes 4.0 -> 4.x changelogs and figuring out how the APIs are intended to be used. At the moment there's a number of things that can go wrong, and when they do go wrong, you need to manually stop&start qubesd. Potentially this can be solved by restructuring the code to do the "dangerous" stuff in places where qubesd presently expects exceptions; potentially this will require adding some try..except to qubes-core-admin. I felt figuring this out on my own without pestering the upstream Qubes devs was a hard task.

This is something that definitely needs to be worked on upstream. Writing a third-party storage driver should not be so difficult. I myself had a LOT of problems adding ephemeral volume encryption and ensuring that storage is cleaned up at system boot, so better documentation/cleaner APIs/etc would be amazing.

Feel free to pester me with any questions you have, so that I can improve the documentation.

Finding a solution to some of the other problems, like the one where the COW updates to the template's storage (that are thrown away on next reboot anyway) are written to the template's storage instead of the ZFS storage. This is probably also hard without spending time with upstream devs, or a lot of time reading through the code and documentation.

This is also a problem with all of the other storage drivers, sadly.

ayakael · 2022-08-12T17:17:53Z

@tlaurion I'm in about the same place as @cfcs as I use this driver every day, but havn't yet taken the time to clean it up. I did make a spec file, which you'll find in my fork. I did enough work to integrate building the zfs drivers within my qubes-builder setup, and I also build an updated version of zfs and zfs-dkms that hooks into my build pipeline. While I'm a package maintainer on other platforms, my knowledge of RPM isn't quite where it needs to be for me to be comfortable pushing anything upstream. I would like to find time to clean this up, though.

I would like to collaborate with ya'll to get this in a good state, so don't hesitate to push tasks my way.

By the way, I push my built packages to the following repo: https://repo.gpg.nz/qubes/r4.1. I can't make any guarantees, thouhg.

cfcs · 2022-08-14T17:03:51Z

@DemiMarie

Judging by the helpful issue posts on that repo, @DemiMarie has also taken it for a spin, but I don't know of any other users / testers.

I actually haven’t. All of my suggestions have come from manual review of the code.

Ah, okay! Well, thanks for doing that!

Figuring out a stable update story that doesn't cause massive pain on a daily basis (hard).

The best approach I can think of is to ship a DKMS package that accompanies this package. Fedora 32 is EOL so I am not surprised you are having problem with its DKMS package.

Yes, iirc I'm using fc34 or fc35, but the real issue is that kernel and kernel-sources and the DKMS modules are not updated in lock-step, and it would probably go a long way to have binary ZFS module packages instead of relying on compiling C code in each user's dom0.

Reading up on the Qubes 4.0 -> 4.x changelogs and figuring out how the APIs are intended to be used. At the moment there's a number of things that can go wrong, and when they do go wrong, you need to manually stop&start qubesd. Potentially this can be solved by restructuring the code to do the "dangerous" stuff in places where qubesd presently expects exceptions; potentially this will require adding some try..except to qubes-core-admin. I felt figuring this out on my own without pestering the upstream Qubes devs was a hard task.

This is something that definitely needs to be worked on upstream. Writing a third-party storage driver should not be so difficult. I myself had a LOT of problems adding ephemeral volume encryption and ensuring that storage is cleaned up at system boot, so better documentation/cleaner APIs/etc would be amazing.

Feel free to pester me with any questions you have, so that I can improve the documentation.

Thank you! I will take you up on that offer next time I nerdsnipe myself into working on this project.

Finding a solution to some of the other problems, like the one where the COW updates to the template's storage (that are thrown away on next reboot anyway) are written to the template's storage instead of the ZFS storage. This is probably also hard without spending time with upstream devs, or a lot of time reading through the code and documentation.

This is also a problem with all of the other storage drivers, sadly.

That is great though, it means hopefully a solution can be found external to the people interested in ZFS integration?

@ayakael

I'm in about the same place as @cfcs as I use this driver every day, but havn't yet taken the time to clean it up. I did make a spec file, which you'll find in my fork. I did enough work to integrate building the zfs drivers within my qubes-builder setup, and I also build an updated version of zfs and zfs-dkms that hooks into my build pipeline. While I'm a package maintainer on other platforms, my knowledge of RPM isn't quite where it needs to be for me to be comfortable pushing anything upstream. I would like to find time to clean this up, though.

Ah, that is cool. I think the best solution in terms of absolutely avoiding DKMS build errors would be to ship a custom dom0 kernel package that includes a ZFS module (potentially as a loadable module so it's not running if not needed). Is that what you do with your qubes-builder setup?

Thanks for chiming in, I was happy to learn that I have a user! :-)

ayakael · 2022-08-14T19:06:07Z

@cfcs Indeed, the most fullproof solution is including zfs as a kernel module. That was my initial approach, but it was a lot of work maintaining a custom kernel, thus I moved to DKMS a while ago. My current appraoch is actually hooking in backported versions of zfs and zfs-dkms into qubes-builder which has avoided any DKMS issues. Thus, I do not rely on the out-of-date packages from Fedora's fc32 repo, but rather my own repo where the built zfs / zfs-dkms spec files (which were taken from up-to-date Fedora repos) are hosted.

Rudd-O · 2022-08-21T00:09:59Z

Honestly I don't mind if this project doesn't deliver ZFS as part of the deliverables. I compile ZFS on my dom0s every time there is an update to either the kernel or ZFS (I run ZFS directly from master).

DemiMarie · 2022-08-21T14:04:49Z

@cfcs: I would actually like to see this merged into Qubes core at some point, assuming @marmarek is okay with it. Even if we have problems shipping ZFS, I suspect that it should be possible to perform testing in CI.

Rudd-O · 2022-08-22T03:32:53Z

@DemiMarie seconded. If CI needs to run integration tests, the zfs and zpool commands can totally be mocked with actual output from a system that had ZFS installed.

ayakael · 2022-08-22T03:58:26Z

Honestly I don't mind if this project doesn't deliver ZFS as part of the deliverables. I compile ZFS on my dom0s every time there is an update to either the kernel or ZFS (I run ZFS directly from master).

It wouldn't be that much work for Qubes to have a backported zfs and zfs-dkms package available for the current dom0 repo. I suspect most of the errors @cfcs encounters are due to having to adapt the old zfs-dkms package from Fedora's EOL repo to current kernel versions shipped by Qubes.

Rudd-O · 2022-08-22T16:33:08Z

@ayakael what they're saying here is that they don't want to use the zfs-dkms package due to compilation in dom0 being necessary. I use the zfs-dkms package I roll from pristine upstream sources and that works very well for me.

cfcs · 2022-08-23T12:33:36Z

Honestly I don't mind if this project doesn't deliver ZFS as part of the deliverables. I compile ZFS on my dom0s every time there is an update to either the kernel or ZFS (I run ZFS directly from master).

It wouldn't be that much work for Qubes to have a backported zfs and zfs-dkms package available for the current dom0 repo. I suspect most of the errors @cfcs encounters are due to having to adapt the old zfs-dkms package from Fedora's EOL repo to current kernel versions shipped by Qubes.

I think I was maybe being dramatic re: the update story -- in practice it doesn't break that often, I just don't like to give people the impression that it's completely foolproof.

cfcs · 2022-08-23T12:47:29Z

@DemiMarie

@cfcs: I would actually like to see this merged into Qubes core at some point, assuming @marmarek is okay with it. Even if we have problems shipping ZFS, I suspect that it should be possible to perform testing in CI.

I think that the separate package approach is useful to highlight potential shortcomings, even if the separate package is then installed by default?

Anyway, we have a bunch of issues, some of which can be fixed in the separate package, and some of which require upstream work (I find the the template cow thing is the main thing I don't know how to fix).

It would also be nice to have a clear view of where the functions are allowed to / expected to throw exceptions.
The problem I had was something along the lines of this:

throw exception in __init__ if the dataset doesn't exist (for example).

this blows up (or used to at least) qubesd, meaning the whole system locks up and you have to fiddle with removing the module before you can restart qubesd manually.

don't throw exception in __init__, instead throw the exception when trying activate for example. this is then handled gracefully by qubesd as a startup error.

This blows up too, but in a different way: once __init__ has succeeded, Qubes expects to be able to read properties from the class.
These will then be stored somewhere (qubes.xml ?)
and later they will be fed back into __init__.
The problem is that the information might not be available in 1 if the dataset doesn't exist, and that can cause trouble down the line.

Maybe the solution is to structure my module differently, but I did not find a good solution regarding reporting of properties that are not available when the pool is faulted for example.

Rudd-O · 2022-08-23T23:17:46Z

I gave it a shot at writing my own module a while ago, and I gotta say the abstractions for the storage volumes just aren't there for me to understand how to implement one.

DemiMarie · 2022-08-24T01:17:45Z

Anyway, we have a bunch of issues, some of which can be fixed in the separate package, and some of which require upstream work (I find the the template cow thing is the main thing I don't know how to fix).

Template CoW is 100% an upstream problem. No work required on your side.

Rudd-O · 2022-08-25T01:13:57Z

I assume ZFS-backed pools can always instaclone the current (or rather the pre-boot) snapshot of the template to produce a R/W block device for a VM using the template.

cfcs · 2022-09-29T16:05:31Z

I assume ZFS-backed pools can always instaclone the current (or rather the pre-boot) snapshot of the template to produce a R/W block device for a VM using the template.

The "problem" is that the r/w snapshot is not written to the qvm-pool of the VM. Whether they're written they're written to the qvm-pool of the template, or if they somehow incorrectly just get dropped in a file-backed default pool for the VM, is something I never figured out.

Most of the time I don't really care where these are written; assuming both pools have enough space, it doesn't make much of a difference to the end-user. Given that they're only ever interpreted as sparse writes on top of an immutable base image, it would be nice if they would end up as sparse volume writes attributed to the VM since that helps with quotas and accounting. That would help you avoid scenarios where the thin provisioning blows up critical VMs (sys-net etc) when "less needed" VMs decide to overwrite the entire supposedly-sparse device, creating storage pressure in cases where the assumption that there is enough space doesn't hold.

In practice the main problem I have with it is that if the VM storage is encrypted, like the per-qvm-pool encryption that the ZFS qvm-pool implementation supports, it's really unintuitive and problematic (security-wise) that writes to e.g. /tmp/ are written to the template's potentially-not-encrypted-at-all storage. If there was an option to use ephemeral encryption for these writes (like has been discussed - and perhaps implemented? - for the volatile volume), that would solve the main issue.

DemiMarie · 2022-09-29T16:45:29Z

I assume ZFS-backed pools can always instaclone the current (or rather the pre-boot) snapshot of the template to produce a R/W block device for a VM using the template.

The "problem" is that the r/w snapshot is not written to the qvm-pool of the VM. Whether they're written they're written to the qvm-pool of the template, or if they somehow incorrectly just get dropped in a file-backed default pool for the VM, is something I never figured out.

It is placed on the VM’s template’s pool.

In practice the main problem I have with it is that if the VM storage is encrypted, like the per-qvm-pool encryption that the ZFS qvm-pool implementation supports, it's really unintuitive and problematic (security-wise) that writes to e.g. /tmp/ are written to the template's potentially-not-encrypted-at-all storage. If there was an option to use ephemeral encryption for these writes (like has been discussed - and perhaps implemented? - for the volatile volume), that would solve the main issue.

Volatile volume encryption (which includes swap) has been implemented. Encryption of the root volume has not yet been implemented. However, one can mark that volume as read-only, which will cause the VM to create a snapshot in the guest itself. This causes data that would be written to the root volume to be written to the volatile volume instead. @marmarek could this be made the default when volatile volume encryption is on?

cfcs · 2022-09-29T16:52:42Z

Volatile volume encryption (which includes swap) has been implemented. Encryption of the root volume has not yet been implemented. However, one can mark that volume as read-only, which will cause the VM to create a snapshot in the guest itself. This causes data that would be written to the root volume to be written to the volatile volume instead. @marmarek could this be made the default when volatile volume encryption is on?

Ah, awesome! It seems to me that would make a lot of sense (same rationale as encryption for volatile, basically), and that would take care of the security concern over the current behavior in terms of this pool driver. Eventually it'd be nice if it could write to the VM's storage, but I don't see that part as a blocker for this project.

Rudd-O · 2023-03-24T15:54:05Z

This can now be closed, right?

marmarek requested changes Oct 28, 2019

View reviewed changes

zfs pool draft

5b703db

cfcs force-pushed the zfs_pool branch from 8f462fb to 5b703db Compare December 22, 2019 13:30

replace AdminVM.run_service_for_stdio since that seems to have broken

39e6f53

marmarek reviewed Dec 24, 2019

View reviewed changes

Your Name added 4 commits December 27, 2019 15:38

zfs_encrypted: raises KeyError, not AssertionError

ba9eada

zfs: any() instead of sum() in assert not sum()

87ef7c6

docs: init_volume name argument

0b578df

docs: clarify arguments to init_volume()

4dc1273

cfcs mentioned this pull request Mar 13, 2020

Storage pool exceptions handled ungracefully QubesOS/qubes-issues#5723

Open

DemiMarie suggested changes Feb 9, 2021

View reviewed changes

tlaurion mentioned this pull request Jul 21, 2022

Due to deprecation of file-based pools, some of us need a ZFS qvm-pool backend QubesOS/qubes-issues#7009

Closed

	#i_totally_know_python = DEFAULT_ZVOL_PROPS.copy()
	i_totally_know_python = {}
	i_totally_know_python[b"volsize"] = self.size

(WiP) zfs pool draft #289

Are you sure you want to change the base?

(WiP) zfs pool draft #289

Conversation

cfcs commented Oct 28, 2019 • edited

marmarek commented Oct 28, 2019

marmarek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cfcs commented Oct 31, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rudd-O commented Jan 5, 2020

cfcs commented Jan 6, 2020

Rudd-O commented Jan 6, 2020

Rudd-O commented Jan 6, 2020

marmarek commented Jan 6, 2020

Rudd-O commented Jan 10, 2020 • edited

Rudd-O commented Jan 10, 2020

cfcs commented Jan 10, 2020 • edited

Rudd-O commented Jan 11, 2020

marmarek commented Jan 11, 2020

cfcs commented Jan 11, 2020

Rudd-O commented Jan 13, 2020 via email

Rudd-O commented Jan 13, 2020 via email

DemiMarie left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cfcs commented Mar 3, 2021

DemiMarie commented Mar 3, 2021

cfcs commented Mar 24, 2021

tlaurion commented Jul 21, 2022 • edited

Rudd-O commented Jul 21, 2022

cfcs commented Aug 11, 2022 • edited

DemiMarie commented Aug 11, 2022

ayakael commented Aug 12, 2022

cfcs commented Aug 14, 2022

ayakael commented Aug 14, 2022 • edited

Rudd-O commented Aug 21, 2022

DemiMarie commented Aug 21, 2022

Rudd-O commented Aug 22, 2022

ayakael commented Aug 22, 2022

Rudd-O commented Aug 22, 2022

cfcs commented Aug 23, 2022

cfcs commented Aug 23, 2022

Rudd-O commented Aug 23, 2022

DemiMarie commented Aug 24, 2022

Rudd-O commented Aug 25, 2022

cfcs commented Sep 29, 2022 • edited

DemiMarie commented Sep 29, 2022

cfcs commented Sep 29, 2022

Rudd-O commented Mar 24, 2023

cfcs commented Oct 28, 2019 •

edited

Rudd-O commented Jan 10, 2020 •

edited

cfcs commented Jan 10, 2020 •

edited

tlaurion commented Jul 21, 2022 •

edited

cfcs commented Aug 11, 2022 •

edited

ayakael commented Aug 14, 2022 •

edited

cfcs commented Sep 29, 2022 •

edited