Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WiP) zfs pool draft #289

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

(WiP) zfs pool draft #289

wants to merge 6 commits into from

Conversation

cfcs
Copy link

@cfcs cfcs commented Oct 28, 2019

This PR aims to introduce two new qvm-pool pool modules with this interface:

[user@dom0 ~]$ qvm-pool  --help-drivers
DRIVER         OPTIONS
zfs_encrypted  zpool_name, ask_password_domain, unload_timeout
zfs_zvol       block_devices
  • zfs_zvol which implements VM storage as ZFS zvol volumes. In the current implementation it also deals with zpool creation on each block_devices parameter since that was the easiest way to get started from a clean slate when testing. If there are valid use-case arguments for splitting that off to a third module that could be done (i.e. if anyone thinks that it would be useful to implement a zvol pool on top of an existing dataset).
  • zfs_encryption which largely inherits from zfs_zvol, but on top of encrypted datasets inside zpool_name that acts as "encryption key groups." The keys are optionally unloaded after a pool inactivity timeout (unload_timeout). The keys are loaded by passing a raw file descriptor to an invocation of zfs load-key {dataset} from QRexec service qubes.AskPassword in a target domain.

When used, this results in datasets and zvols in this form:

$ zfs list -o name,type,encryption
qbs/encryption/qbsdev                          filesystem  aes-256-gcm
qbs/encryption/qbsdev/import                   filesystem  aes-256-gcm
qbs/encryption/qbsdev/import/zfsdev            filesystem  aes-256-gcm
qbs/encryption/qbsdev/vm                       filesystem  aes-256-gcm
qbs/encryption/qbsdev/vm/zfsdev                filesystem  aes-256-gcm
qbs/encryption/qbsdev/vm/zfsdev/private        volume      aes-256-gcm
qbs/encryption/qbsdev/vm/zfsdev/volatile       volume      aes-256-gcm
qbs/vm                                         filesystem  none
qbs/vm/plainbrowser                            filesystem  none
qbs/vm/plainbrowser/private                    volume      none
qbs/vm/plainbrowser/volatile                   volume      none

Adding those was a matter of:

$ qvm-pool -a qbs zfs_zvol -o 'block_devices=/dev/loopMYFILE'
$ qvm-pool -a qbsdev zfs_encrypted -o 'zpool_name=qbs'

For the purpose of key entry I am currently using this implementation of /etc/qubes-rpc/qubes.AskPassword in the ask_password_domain, which should probably be reworked (remember to chmod +x):

#!/bin/sh
read -r -N 100 -s -t 5 subject

# TODO why does this not work, does it need to strip newline?
# might be fixed now, according to @marmarek's suggestion in comment below:
# [ "${subject}" = "${subject//[^-/_a-z:A-Z0-9]}" ] || exit 1

export DISPLAY="${DISPLAY:-:0}"
zenity --forms \
  --text "${QREXEC_REMOTE_DOMAIN}:qubes.AskPassword" \
  --add-password "Password for ${subject//[^a-zA-Z]}" 2>>/tmp/askpw.log

This requires having ZFS installed from zfs-on-linux with the pyzfs3 module built.
To get that going (in addition to their instructions) I needed:

sudo qubes-dom0-update kernel-devel git sysstat bc python3-devel
./autogen.sh && \
./configure --with-config=srpm --enable-pyzfs && \
make -j3 pkg-utils rpm-dkms && \
sudo dnf install ./libuutil1-0.8.0-1.qbs4.0.x86_64.rpm     \
                 ./libzfs2-0.8.0-1.qbs4.0.x86_64.rpm       \
                 ./libnvpair1-0.8.0-1.qbs4.0.x86_64.rpm    \
                 ./python3-pyzfs-0.8.0-1.qbs4.0.noarch.rpm \
                 ./zfs-0.8.0-1.qbs4.0.x86_64.rpm

# not sure if we need these:
sudo dnf install  ./libzpool2-0.8.0-1.qbs4.0.x86_64 \
                  ./libzfs2-devel-0.8.0-1.qbs4.0.x86_64.rpm

# install DKMS kernel module, this will take a while since it needs
# to compile a module for each dom0 kernel currently installed.
# may want to bring down the number of kernels:
#    rpm -qa kernel
#    sudo dnf remove kernel-4.14.74-1.pvops.qubes.x86_64
# pay attention that we want the "noarch" and not the "src" rpm:
sudo dnf install ./zfs-dkms-0.8.0-1.qbs4.0.noarch.rpm

This is merely a draft, as witnessed by the many TODO comments, but it does "work," and I thought it might be useful to place here for review, and in case someone wants to collaborate on this with me.
NOTABLY:

  • all the volume _import functions are probably horribly broken
  • the size reporting does not seem 100% accurate / implemented for all cases
  • currently no snapshotting takes place, even though there is some scaffolding in place for that
  • the "default" options for zpools/zvols/datasets are not uniformly applied
  • the suggestions for updates to the documentation are probably not 100% correct either

ping:

  • @Rudd-O who also seems to dabble in Qubes + ZFS
  • @tykling who provided invaluable help deciphering ZFS options, and might have further suggestions
  • @marmarek who provided a ton of help and suggestions re: the inner workings of Qubes and qubes-core-admin

@marmarek
Copy link
Member

# TODO why does this not work, does it need to strip newline?
# [ "${subject}" = "${subject}//[^-/_a-z:A-Z0-9]}" ] || exit 1

Extra } in the middle.

Copy link
Member

@marmarek marmarek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments, besides TODO already in the code.

Also some safety note: any unhandled exception while loading qubes.xml (__init__() of both pool and volume classes, pool.init_volume()) will prevent qubesd from starting, which is very bad (requires manual edit of qubes.xml to fix). Avoid any possibility of such case. One thing I'd be careful about is removing base zfs pool, while there are some encrypted pools based on it. I think you need to check for that in remove() explicitly.

There is also more generic remark: since this adds non-trivial dependencies, I'd prefer it in a separate package (with correct dependencies set). The way it is now, it's impossible to do it right (it would either depend on zfs-stuff always, making it impossible to install without, or be broken by default with unfriendly error messages).
But we can keep it here for development and review time, for convenience.

@@ -45,6 +45,7 @@ properties:
with its own state of the volume, or rather a snapshot of its template volume
(pointed by a :py:attr:`~qubes.storage.Volume.source` property). This can be
set to `True` only if a domain do have `template` property (AppVM and DispVM).
TODO should `source` be folded into this arg?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm for it, but lets keep it compatible with the old API. For example: allow snap_on_start=Volume (and then assert for source not set), but also allow snap_on_start=True, and then expect source set. Alternatively add a new attribute replacing both, like snap_on_start_from.

doc/qubes-storage.rst Outdated Show resolved Hide resolved
- :py:meth:`~qubes.storage.Volume.import_data` - return a path the data should
be written to, to import volume data; for complex formats, this can be pipe
(connected to some data-importing process)
- :py:meth:`~qubes.storage.Volume.import_data_end` - finish data import
operation (cleanup temporary files etc); this methods is called always after
:py:meth:`~qubes.storage.Volume.import_data` regardless if operation was
successful or not
- :py:meth:`~qubes.storage.Volume.import_volume` - import data from another volume
- :py:meth:`~qubes.storage.Volume.import_volume` - import data from another volume. TODO this is used when self.snap_on_data=True and self.source exists
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No snap_on_data or snap_on_start. It is using when cloning VM or otherwise importing data between already existing volumes (not importing data from "the outside").
It may be worth noting that even if the volume is created with an intention of its data being replaced with this function, volume.create() is still called before it.

qubes/storage/zfs.py Outdated Show resolved Hide resolved
"""
self.log.warning("zfs init_volume() vm={} volume_config={}".format(
vm, volume_config
))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log.debug? also, it's duplicated below
I won't comment on similar lines further below, but most of the log messages looks more like debug not warnings (probably to be changed when moving out of draft state).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, the .warning was to make sure they all showed up. I think most if not all of the self.log statements can be removed once this is more stable.

self.await_timeout()
)

async def await_timeout(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole function looks overly complex. I'd done it this way:

  • when stopping last volume (used_volumes become empty) schedule a key unload and save a reference to scheduled task
  • when starting a volume, cancel scheduled unload task (also handling a case where it wasn't scheduled)

when no volumes are in use.
"""
self.used_volumes.discard(vm)
self.used_volumes_last_empty = time.clock_gettime(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this version of of await_timeout function, this looks wrong. Should update the timestamp only if used_volumes is empty.

qubes/storage/zfs_encrypted.py Outdated Show resolved Hide resolved
"""
Returns the parent pool if found, otherwise raises an AssertionError.
This function also moonlights as our method to retrieve a handle for
'app' which we record and re-use when asking for passwords.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted elsewhere, there is a better way: you can get it via Pool.init_volume(vm) -> vm.app, which is guaranteed to be called before any volume.start() - contrary to this function.

@@ -24,6 +24,7 @@
from qubes.storage import pool_drivers
from qubes.storage.file import FilePool
from qubes.storage.reflink import ReflinkPool
from qubes.storage.zfs import ZFSQpool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice you import it here, but some actual tests would be even nicer ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, there would be general tests for all storage pools. @marmarek thoughts?

@cfcs
Copy link
Author

cfcs commented Oct 31, 2019

Thank you for the review, I will try to get around to addressing this at some point and keep this updated.

This PR is relevant to the discussion at QubesOS/qubes-issues#1293 (I forgot to mention that in the brief).

input=self.name.encode()+b'\n', # context for the prompt
stdout=pw_pipe_out)
# await pw_vm.run_service_for_stdio(
# TODO THIS used to work, now it fucking broke.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you direct dom0, it needs R4.1 commit 50adc60. If you want to test on R4.0, you need to backport this commit.
Or maybe some other problem?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems likely. It used to work on 4.0, but after updating earlier this week it stopped working with the exception that run_service_for_stdio for the AdminVM class was not defined.

@Rudd-O
Copy link
Contributor

Rudd-O commented Jan 5, 2020

I'm extremely interested in this. Do you need funding to get this to pass tests and work? How much would you consider to pique your interest?

@cfcs
Copy link
Author

cfcs commented Jan 6, 2020

@Rudd-O I am working on this on my own time (and am using this patch set as-is on my laptop), but if you are able to eventually allocate funds to the Qubes project towards @marmarek's time spent reviewing and answering questions about the Qubes APIs, that would make me feel less guilty for taking up his time.

We also hit some sore spots re: the qvm-pool command-line interface that it would be great to have someone clean up.

@Rudd-O
Copy link
Contributor

Rudd-O commented Jan 6, 2020

@marmarek would you like to discuss funding for this specific bug?

@Rudd-O
Copy link
Contributor

Rudd-O commented Jan 6, 2020

(Lack of this is blocking me getting Qubes Network Server ported to Qubes 4.1 for a variety of reasons that would be super hard to explain shortly.)

@marmarek
Copy link
Member

marmarek commented Jan 6, 2020

I'm fine with reviewing it whenever you give info it's ready for the next round of review.

If you like to help financially, contribute to our OpenCollective, so we can fund more people offloading other tasks from me.

@Rudd-O
Copy link
Contributor

Rudd-O commented Jan 10, 2020

Review comment here. The structure of the datasets is not quite what it should be. According to the inter-distro working group on getting ZFS on root file systems, the structure should be more or less like this:

rpool/ROOT/qubes <- the OS goes here, following the pattern
              root pool name, ROOT, OS name (ubuntu, fedora)
rpool/BOOT/qubes <- if you'll use /boot in ZFS, not sure its status
rpool/USERDATA/VMs <- this can be named whatever you want, doesn't even
              have to be on rpool, but a good default would
              obviously be called VMs or vms or vm
rpool/USERDATA/VMs/plainbrowser
rpool/USERDATA/VMs/plainbrowser/private
rpool/USERDATA/VMs/plainbrowser/volatile

The spec appears to be what will be followed by Fedora as well.

Ref: https://docs.google.com/document/d/1m9VWOjfmdbujV4AQWzWruiXYgb0W4Hu0D8uQC_BWKjk/edit

@Rudd-O
Copy link
Contributor

Rudd-O commented Jan 10, 2020

@marmarek 1) how do I contribute to the OpenCollective? 2) how do I earmark contributions for this project?

@cfcs
Copy link
Author

cfcs commented Jan 10, 2020

@Rudd-O to clarify, which datasets do you propose prefixing with /USERDATA/ ? The encrypted ones, when adding to an existing pool?

EDIT: Did you link to the right document? This one seems to be an unpublished draft detailing zfs-on-root for Ubuntu systems (neither of which applies here), and there's no mention of an inter-distro working group?

EDIT 2: I didn't have a use-case for it, but adding an unencrypted wrapper in addition to the encrypted one, and allowing both to take arguments like prefix=/USERDATA,zpool=rpool should be easy enough if that would work for you?

@Rudd-O
Copy link
Contributor

Rudd-O commented Jan 11, 2020

Not sure I understand. Why do we need unencrypted anything?

@marmarek
Copy link
Member

@marmarek 1) how do I contribute to the OpenCollective?

https://opencollective.com/qubes-os

  1. how do I earmark contributions for this project?

You can add a comment when you donate, but read also this: https://www.qubes-os.org/donate/#do-donors-get-personalized-support-or-special-feature-requests

@cfcs
Copy link
Author

cfcs commented Jan 11, 2020

@Rudd-O I'm assuming that people are using full-disk encryption, so additional encryption keys to enter for e.g. your VM templates or your jazz music collection might be overkill UX-wise.

@Rudd-O
Copy link
Contributor

Rudd-O commented Jan 13, 2020 via email

@Rudd-O
Copy link
Contributor

Rudd-O commented Jan 13, 2020 via email

Copy link
Contributor

@DemiMarie DemiMarie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the various TODOs, there are various ways this code can be simplified quite a bit. Also, I suggest focusing on unencrypted pools, as QubesOS users generally use full-disk encryption.

pool=self.pool,
vid=self.vid,
rw=self.rw,
size=self._size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
size=self._size)
size=self._size,
**kwargs)


self.save_on_stop = False
self.snap_on_start = False
self.rw = rw
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.rw = rw

Comment on lines +608 to +611
if self.save_on_stop:
raise qubes.storage.StoragePoolException(
"TODO save_on_stop not supported"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is more important than encryption.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I didn't (and still don't) understnad the expected semantics of save_on_stop. If you can lay it out in a issue on the other repo I'd be happy to discuss and/or implement it.
As for being more important than encryption: That's entirely subjective. I needed (and need) encrypted, incremental backups - that's why I wrote this. I did not (and do not) need snapshots on start/stop, so I didn't implement that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had not considered the backups, sorry!

libzfs_core.lzc_create(
self.vid.encode(),
ds_type="zvol",
props=i_totally_know_python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
props=i_totally_know_python
props={b'volsize': self.size},

Comment on lines +655 to +657
#i_totally_know_python = DEFAULT_ZVOL_PROPS.copy()
i_totally_know_python = {}
i_totally_know_python[b"volsize"] = self.size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#i_totally_know_python = DEFAULT_ZVOL_PROPS.copy()
i_totally_know_python = {}
i_totally_know_python[b"volsize"] = self.size

Comment on lines +650 to +653
if self.source:
# we're supposed to clone from this, maybe.
self.log.warning("zfs volume.create(): there is a .source!")
# TODO probably want lzc_clone in some cases.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self.source:
# we're supposed to clone from this, maybe.
self.log.warning("zfs volume.create(): there is a .source!")
# TODO probably want lzc_clone in some cases.

Comment on lines +486 to +495
i = run_command(
[
"zpool",
"list",
"-Hp",
"-o",
"allocated",
self.zfs_ns
]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
i = run_command(
[
"zpool",
"list",
"-Hp",
"-o",
"allocated",
self.zfs_ns
]
)
i = run_command(["zpool", "list", "-Hp", "-o", "allocated", self.zfs_ns])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would probably be more idiomatic if the use of lists for run_command was abolished and *args was used on the receiver in question.

Comment on lines +472 to +480
i = run_command(
[
"zpool",
"list",
"-Hp",
"-o",
"size",
self.zfs_ns,
])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
i = run_command(
[
"zpool",
"list",
"-Hp",
"-o",
"size",
self.zfs_ns,
])
i = run_command(["zpool", "list", "-Hp", "-o", "size", self.zfs_ns])

Comment on lines +349 to +431
# TODO if any of this stuff fails we should likely revert the
# whole thing to the previous state.

# TODO at least use the qubes_cmd_coro() here
# TODO get rid of sudo here if we can
environ = os.environ.copy()
environ["LC_ALL"] = "C.utf8"
p = yield from asyncio.create_subprocess_exec(
*[
"sudo",
"zpool",
"create",
*DEFAULT_ZPOOL_CMDLINE_OPTIONS,
self.zfs_ns,
*self.block_devices,
],
stdout=subprocess.DEVNULL,
stderr=subprocess.PIPE, # todo capture stderr
close_fds=True,
env=environ,
)
_, err = yield from p.communicate()
if p.returncode != 0:
raise qubes.storage.StoragePoolException(
"unable to create ZFS zpool:[{}] {}".format(p.returncode, err)
)

# Set up permissions for new zpool to avoid having to run other
# commands as root. This allows some basic protection from programming
# mistakes, since we can whitelist the operations that can be performed
# without root privileges:

# TODO should we attempt cleanup if any of this fails?

p = yield from asyncio.create_subprocess_exec(
*[
"sudo",
"zfs",
"allow",
"-ld", # this zpool + its descendants
"-g", # -g TODO
"qubes", # qubes gid / group TODO
("encryption,load-key,refreservation,create,destroy,mount,clone,snapshot,hold,"
"bookmark,send,diff,sync,volmode,rollback,receive,"
"volsize,volblocksize,volmode"
),
self.zfs_ns,
],
stdout=subprocess.DEVNULL,
stderr=subprocess.PIPE,
close_fds=True,
env=environ,
)
_, err = yield from p.communicate()
if p.returncode != 0:
raise qubes.storage.StoragePoolException(
"unable to set permissions for ZFS zpool: {}".format(err)
)

p = yield from asyncio.create_subprocess_exec(
*[
"sudo",
"zfs",
"allow",
"-d", # ONLY descendants (NOT this zpool itself)
"-g",
"qubes", # qubes gid / group
("refreservation,volmode,destroy,rollback,receive,volsize,"
"devices,volblocksize,volmode,encryption"),
self.zfs_ns,
],
stdout=subprocess.DEVNULL,
stderr=subprocess.PIPE,
close_fds=True,
env=environ,
)
_, err = yield from p.communicate()
if p.returncode != 0:
raise qubes.storage.StoragePoolException(
"unable to set child permissions for ZFS zpool: {}".format(
err
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# TODO if any of this stuff fails we should likely revert the
# whole thing to the previous state.
# TODO at least use the qubes_cmd_coro() here
# TODO get rid of sudo here if we can
environ = os.environ.copy()
environ["LC_ALL"] = "C.utf8"
p = yield from asyncio.create_subprocess_exec(
*[
"sudo",
"zpool",
"create",
*DEFAULT_ZPOOL_CMDLINE_OPTIONS,
self.zfs_ns,
*self.block_devices,
],
stdout=subprocess.DEVNULL,
stderr=subprocess.PIPE, # todo capture stderr
close_fds=True,
env=environ,
)
_, err = yield from p.communicate()
if p.returncode != 0:
raise qubes.storage.StoragePoolException(
"unable to create ZFS zpool:[{}] {}".format(p.returncode, err)
)
# Set up permissions for new zpool to avoid having to run other
# commands as root. This allows some basic protection from programming
# mistakes, since we can whitelist the operations that can be performed
# without root privileges:
# TODO should we attempt cleanup if any of this fails?
p = yield from asyncio.create_subprocess_exec(
*[
"sudo",
"zfs",
"allow",
"-ld", # this zpool + its descendants
"-g", # -g TODO
"qubes", # qubes gid / group TODO
("encryption,load-key,refreservation,create,destroy,mount,clone,snapshot,hold,"
"bookmark,send,diff,sync,volmode,rollback,receive,"
"volsize,volblocksize,volmode"
),
self.zfs_ns,
],
stdout=subprocess.DEVNULL,
stderr=subprocess.PIPE,
close_fds=True,
env=environ,
)
_, err = yield from p.communicate()
if p.returncode != 0:
raise qubes.storage.StoragePoolException(
"unable to set permissions for ZFS zpool: {}".format(err)
)
p = yield from asyncio.create_subprocess_exec(
*[
"sudo",
"zfs",
"allow",
"-d", # ONLY descendants (NOT this zpool itself)
"-g",
"qubes", # qubes gid / group
("refreservation,volmode,destroy,rollback,receive,volsize,"
"devices,volblocksize,volmode,encryption"),
self.zfs_ns,
],
stdout=subprocess.DEVNULL,
stderr=subprocess.PIPE,
close_fds=True,
env=environ,
)
_, err = yield from p.communicate()
if p.returncode != 0:
raise qubes.storage.StoragePoolException(
"unable to set child permissions for ZFS zpool: {}".format(
err
)
)

qubesd runs as root, so this code is not necessary.

@@ -24,6 +24,7 @@
from qubes.storage import pool_drivers
from qubes.storage.file import FilePool
from qubes.storage.reflink import ReflinkPool
from qubes.storage.zfs import ZFSQpool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, there would be general tests for all storage pools. @marmarek thoughts?

@cfcs
Copy link
Author

cfcs commented Mar 3, 2021

@DemiMarie Thank you for your review. I've since split the code out into its own package as advised by @marmarek, so this particular PR is abandoned. I've used this module as the primary pool driver on my personal laptop and have been quite happy with it.

You can find the revised code at: https://github.com/cfcs/qubes-storage-zfs

Twice I had to boot from a recovery disk because upstream API changes broke qubesd to the point where nothing worked. Copying changed code from a development vm to dom0 without mounting the file system turns out to not be easy when qubesd is broken. I've also had to manually edit the kernel header files because they either didn't compile (???) or were incompatible with the ZFS DKMS modules (that's to be expected I guess). So not quite ready for the primetime yet.

@DemiMarie
Copy link
Contributor

@DemiMarie Thank you for your review. I've since split the code out into its own package as advised by @marmarek, so this particular PR is abandoned. I've used this module as the primary pool driver on my personal laptop and have been quite happy with it.

You’re welcome! Even if you don’t intend to get this merged, it might be useful to have an open PR for code review purposes.

You can find the revised code at: https://github.com/cfcs/qubes-storage-zfs

Would you be interested in submitting it as a contrib package? I understand that it isn’t ready for prime time yet, but if/when that changes, this would allow users to obtain it via qubes-dom0-update. This would most likely require adding the DKMS package first, if I understand correctly.

Twice I had to boot from a recovery disk because upstream API changes broke qubesd to the point where nothing worked. Copying changed code from a development vm to dom0 without mounting the file system turns out to not be easy when qubesd is broken. I've also had to manually edit the kernel header files because they either didn't compile (???) or were incompatible with the ZFS DKMS modules (that's to be expected I guess). So not quite ready for the primetime yet.

I have actually had to manually edit stuff in dom0 before (mostly qubes.xml). So I understand! I wonder if the compilation errors are due to dom0 having an old version of GCC.

One other question: what are the advantages of this over the BTRFS pool? RAID-Z is the only one that comes to my mind, but there could easily be others I am missing.

@cfcs
Copy link
Author

cfcs commented Mar 24, 2021

@DemiMarie Thank you for your review. I've since split the code out into its own package as advised by @marmarek, so this particular PR is abandoned. I've used this module as the primary pool driver on my personal laptop and have been quite happy with it.

You’re welcome! Even if you don’t intend to get this merged, it might be useful to have an open PR for code review purposes.

You can find the revised code at: https://github.com/cfcs/qubes-storage-zfs

Would you be interested in submitting it as a contrib package? I understand that it isn’t ready for prime time yet, but if/when that changes, this would allow users to obtain it via qubes-dom0-update. This would most likely require adding the DKMS package first, if I understand correctly.

That is the hope, that it can end up as a contrib package.

Like I said I've been using it on my personal Qubes installations since I posted this, and it's worked pretty well for me, but there are still some gotchas that need to be hashed out, and I'd like to have it better documented as well.

I'm kind of worried about releasing it as a contrib package and then having people unable to boot their VMs due to some kind of DKMS issue --- right now, if something goes wrong, all of qubesd stops working, which means file copy to/from dom0 also stops working, and copy-paste, so it can be a grueling experience to have to recover from that situation, even if there's an upstream fix. I'd like to find a solution to that particular problem before encouraging other people to use it for anything but test systems.

Twice I had to boot from a recovery disk because upstream API changes broke qubesd to the point where nothing worked. Copying changed code from a development vm to dom0 without mounting the file system turns out to not be easy when qubesd is broken. I've also had to manually edit the kernel header files because they either didn't compile (???) or were incompatible with the ZFS DKMS modules (that's to be expected I guess). So not quite ready for the primetime yet.

I have actually had to manually edit stuff in dom0 before (mostly qubes.xml). So I understand! I wonder if the compilation errors are due to dom0 having an old version of GCC.

Note sure, but compiler and header files being out of sync does seem like a probable reason. It's been a while since I had anything as drastic as that happen though.

One other question: what are the advantages of this over the BTRFS pool? RAID-Z is the only one that comes to my mind, but there could easily be others I am missing.

Main feature over BTRFS is encryption, so your incremental backups are encrypted at-rest and their integrity can be validated on the backup system without loading the key. Main drawback compared to BTRFS is that ZFS lacks a COW file cloning system call on Linux (BTRFS has that). Not really of big consequence for this particular codebase, since most of the magic happens with zvols which do support cow cloning, but that's my impression of the overall state -- otherwise they're fairly similar. The ability to import ZFS data on FreeBSD systems is also an appealing factor.

@tlaurion
Copy link

tlaurion commented Jul 21, 2022

@cfcs : was the effort abandoned? Have you seen #7009 bounty offer for exactly this work/from @Rudd-O ?

zpool live deduplication is a big advantage over brtfs from what I've been reading. Please feel free to shed some lights in the ongoing discussion of pool deduplication, here: https://forum.qubes-os.org/t/pool-level-deduplication/12654

@Rudd-O
Copy link
Contributor

Rudd-O commented Jul 21, 2022

https://github.com/cfcs/qubes-storage-zfs

If this were released as a contrib package with the to-do items checked off such that I can trust my own systems with it, I would honor my promise of the bounty I offered before in #7009.

@cfcs
Copy link
Author

cfcs commented Aug 11, 2022

@tlaurion (EDIT)

@cfcs : was the effort abandoned? Have you seen #7009 bounty offer for exactly this work/from @Rudd-O ?

Ah, no, I had not seen that.

In the sense that I use it every day on my own machines, the project is not abandoned, but it hasn't seen meaningful progress for a while now. This PR is effectively abandoned, since I deemed the approach @DemiMarie suggested of a plugin/contrib package better than merging it into qubes-core-admin (see @Rudd-O 's link to my package); but the contrib package does work and I update it when I run into things that annoy me.

Judging by the helpful issue posts on that repo, @DemiMarie has also taken it for a spin, but I don't know of any other users / testers.
There is also a fork by @ayakael here https://github.com/ayakael/qubes-storage-zfs that seems to do something with qubes-builder. I don't think they ever sent any pull requests, but I've picked up some of the patches and it's probably worth having a look at their changes if you're going to dig into this project.

It feels to me like there's some way to go before this would be something I'd recommend to people who are not comfortable fixing DKMS build errors; I've personally had a number of issues with the kernel headers/kernel being out of sync with the fc32 DKMS packages for ZFS. Basically you need to plan 1-2 days of downtime whenever you type qubes-dom0-update, and I don't have a good solution to that problem.

To sum up there are some things to deal with before I think it makes sense to offer this to non-expert users:

  • Figuring out a stable update story that doesn't cause massive pain on a daily basis (hard).
  • Fixing TODOs documented in the README (probably not so hard).
  • Reading up on the Qubes 4.0 -> 4.x changelogs and figuring out how the APIs are intended to be used. At the moment there's a number of things that can go wrong, and when they do go wrong, you need to manually stop&start qubesd. Potentially this can be solved by restructuring the code to do the "dangerous" stuff in places where qubesd presently expects exceptions; potentially this will require adding some try..except to qubes-core-admin. I felt figuring this out on my own without pestering the upstream Qubes devs was a hard task.
  • Finding a solution to some of the other problems, like the one where the COW updates to the template's storage (that are thrown away on next reboot anyway) are written to the template's storage instead of the ZFS storage. This is probably also hard without spending time with upstream devs, or a lot of time reading through the code and documentation.

I'd be happy to review and merge patches to my repo, and I'd also be fine with someone forking it into their own contrib package, copying what they like and deleting the rest. As I've explained before, packaging is not really my strong side, so I doubt I will be doing any of that on my own in the near future. I basically did a lot of the easy lifting on this project to achieve what I wanted, and now I have a system that works for me, but I left the heavy lifting of polishing it up and making it mainstream-deployable as an exercise to the reader. I would be delighted to find such a reader here. :-)

@DemiMarie
Copy link
Contributor

Judging by the helpful issue posts on that repo, @DemiMarie has also taken it for a spin, but I don't know of any other users / testers.

I actually haven’t. All of my suggestions have come from manual review of the code.

  • Figuring out a stable update story that doesn't cause massive pain on a daily basis (hard).

The best approach I can think of is to ship a DKMS package that accompanies this package. Fedora 32 is EOL so I am not surprised you are having problem with its DKMS package.

  • Reading up on the Qubes 4.0 -> 4.x changelogs and figuring out how the APIs are intended to be used. At the moment there's a number of things that can go wrong, and when they do go wrong, you need to manually stop&start qubesd. Potentially this can be solved by restructuring the code to do the "dangerous" stuff in places where qubesd presently expects exceptions; potentially this will require adding some try..except to qubes-core-admin. I felt figuring this out on my own without pestering the upstream Qubes devs was a hard task.

This is something that definitely needs to be worked on upstream. Writing a third-party storage driver should not be so difficult. I myself had a LOT of problems adding ephemeral volume encryption and ensuring that storage is cleaned up at system boot, so better documentation/cleaner APIs/etc would be amazing.

Feel free to pester me with any questions you have, so that I can improve the documentation.

  • Finding a solution to some of the other problems, like the one where the COW updates to the template's storage (that are thrown away on next reboot anyway) are written to the template's storage instead of the ZFS storage. This is probably also hard without spending time with upstream devs, or a lot of time reading through the code and documentation.

This is also a problem with all of the other storage drivers, sadly.

@ayakael
Copy link

ayakael commented Aug 12, 2022

@tlaurion I'm in about the same place as @cfcs as I use this driver every day, but havn't yet taken the time to clean it up. I did make a spec file, which you'll find in my fork. I did enough work to integrate building the zfs drivers within my qubes-builder setup, and I also build an updated version of zfs and zfs-dkms that hooks into my build pipeline. While I'm a package maintainer on other platforms, my knowledge of RPM isn't quite where it needs to be for me to be comfortable pushing anything upstream. I would like to find time to clean this up, though.

I would like to collaborate with ya'll to get this in a good state, so don't hesitate to push tasks my way.

By the way, I push my built packages to the following repo: https://repo.gpg.nz/qubes/r4.1. I can't make any guarantees, thouhg.

@cfcs
Copy link
Author

cfcs commented Aug 14, 2022

@DemiMarie

Judging by the helpful issue posts on that repo, @DemiMarie has also taken it for a spin, but I don't know of any other users / testers.

I actually haven’t. All of my suggestions have come from manual review of the code.

Ah, okay! Well, thanks for doing that!

  • Figuring out a stable update story that doesn't cause massive pain on a daily basis (hard).

The best approach I can think of is to ship a DKMS package that accompanies this package. Fedora 32 is EOL so I am not surprised you are having problem with its DKMS package.

Yes, iirc I'm using fc34 or fc35, but the real issue is that kernel and kernel-sources and the DKMS modules are not updated in lock-step, and it would probably go a long way to have binary ZFS module packages instead of relying on compiling C code in each user's dom0.

  • Reading up on the Qubes 4.0 -> 4.x changelogs and figuring out how the APIs are intended to be used. At the moment there's a number of things that can go wrong, and when they do go wrong, you need to manually stop&start qubesd. Potentially this can be solved by restructuring the code to do the "dangerous" stuff in places where qubesd presently expects exceptions; potentially this will require adding some try..except to qubes-core-admin. I felt figuring this out on my own without pestering the upstream Qubes devs was a hard task.

This is something that definitely needs to be worked on upstream. Writing a third-party storage driver should not be so difficult. I myself had a LOT of problems adding ephemeral volume encryption and ensuring that storage is cleaned up at system boot, so better documentation/cleaner APIs/etc would be amazing.

Feel free to pester me with any questions you have, so that I can improve the documentation.

Thank you! I will take you up on that offer next time I nerdsnipe myself into working on this project.

  • Finding a solution to some of the other problems, like the one where the COW updates to the template's storage (that are thrown away on next reboot anyway) are written to the template's storage instead of the ZFS storage. This is probably also hard without spending time with upstream devs, or a lot of time reading through the code and documentation.

This is also a problem with all of the other storage drivers, sadly.

That is great though, it means hopefully a solution can be found external to the people interested in ZFS integration?

@ayakael

I'm in about the same place as @cfcs as I use this driver every day, but havn't yet taken the time to clean it up. I did make a spec file, which you'll find in my fork. I did enough work to integrate building the zfs drivers within my qubes-builder setup, and I also build an updated version of zfs and zfs-dkms that hooks into my build pipeline. While I'm a package maintainer on other platforms, my knowledge of RPM isn't quite where it needs to be for me to be comfortable pushing anything upstream. I would like to find time to clean this up, though.

Ah, that is cool. I think the best solution in terms of absolutely avoiding DKMS build errors would be to ship a custom dom0 kernel package that includes a ZFS module (potentially as a loadable module so it's not running if not needed). Is that what you do with your qubes-builder setup?

Thanks for chiming in, I was happy to learn that I have a user! :-)

@ayakael
Copy link

ayakael commented Aug 14, 2022

@cfcs Indeed, the most fullproof solution is including zfs as a kernel module. That was my initial approach, but it was a lot of work maintaining a custom kernel, thus I moved to DKMS a while ago. My current appraoch is actually hooking in backported versions of zfs and zfs-dkms into qubes-builder which has avoided any DKMS issues. Thus, I do not rely on the out-of-date packages from Fedora's fc32 repo, but rather my own repo where the built zfs / zfs-dkms spec files (which were taken from up-to-date Fedora repos) are hosted.

@Rudd-O
Copy link
Contributor

Rudd-O commented Aug 21, 2022

Honestly I don't mind if this project doesn't deliver ZFS as part of the deliverables. I compile ZFS on my dom0s every time there is an update to either the kernel or ZFS (I run ZFS directly from master).

@DemiMarie
Copy link
Contributor

@cfcs: I would actually like to see this merged into Qubes core at some point, assuming @marmarek is okay with it. Even if we have problems shipping ZFS, I suspect that it should be possible to perform testing in CI.

@Rudd-O
Copy link
Contributor

Rudd-O commented Aug 22, 2022

@DemiMarie seconded. If CI needs to run integration tests, the zfs and zpool commands can totally be mocked with actual output from a system that had ZFS installed.

@ayakael
Copy link

ayakael commented Aug 22, 2022

Honestly I don't mind if this project doesn't deliver ZFS as part of the deliverables. I compile ZFS on my dom0s every time there is an update to either the kernel or ZFS (I run ZFS directly from master).

It wouldn't be that much work for Qubes to have a backported zfs and zfs-dkms package available for the current dom0 repo. I suspect most of the errors @cfcs encounters are due to having to adapt the old zfs-dkms package from Fedora's EOL repo to current kernel versions shipped by Qubes.

@Rudd-O
Copy link
Contributor

Rudd-O commented Aug 22, 2022

@ayakael what they're saying here is that they don't want to use the zfs-dkms package due to compilation in dom0 being necessary. I use the zfs-dkms package I roll from pristine upstream sources and that works very well for me.

@cfcs
Copy link
Author

cfcs commented Aug 23, 2022

Honestly I don't mind if this project doesn't deliver ZFS as part of the deliverables. I compile ZFS on my dom0s every time there is an update to either the kernel or ZFS (I run ZFS directly from master).

It wouldn't be that much work for Qubes to have a backported zfs and zfs-dkms package available for the current dom0 repo. I suspect most of the errors @cfcs encounters are due to having to adapt the old zfs-dkms package from Fedora's EOL repo to current kernel versions shipped by Qubes.

I think I was maybe being dramatic re: the update story -- in practice it doesn't break that often, I just don't like to give people the impression that it's completely foolproof.

@cfcs
Copy link
Author

cfcs commented Aug 23, 2022

@DemiMarie

@DemiMarie

@cfcs: I would actually like to see this merged into Qubes core at some point, assuming @marmarek is okay with it. Even if we have problems shipping ZFS, I suspect that it should be possible to perform testing in CI.

I think that the separate package approach is useful to highlight potential shortcomings, even if the separate package is then installed by default?

Anyway, we have a bunch of issues, some of which can be fixed in the separate package, and some of which require upstream work (I find the the template cow thing is the main thing I don't know how to fix).

It would also be nice to have a clear view of where the functions are allowed to / expected to throw exceptions.
The problem I had was something along the lines of this:

  • throw exception in __init__ if the dataset doesn't exist (for example).
  1. this blows up (or used to at least) qubesd, meaning the whole system locks up and you have to fiddle with removing the module before you can restart qubesd manually.
  • don't throw exception in __init__, instead throw the exception when trying activate for example. this is then handled gracefully by qubesd as a startup error.
  1. This blows up too, but in a different way: once __init__ has succeeded, Qubes expects to be able to read properties from the class.
  2. These will then be stored somewhere (qubes.xml ?)
  3. and later they will be fed back into __init__.
  4. The problem is that the information might not be available in 1 if the dataset doesn't exist, and that can cause trouble down the line.

Maybe the solution is to structure my module differently, but I did not find a good solution regarding reporting of properties that are not available when the pool is faulted for example.

@Rudd-O
Copy link
Contributor

Rudd-O commented Aug 23, 2022

I gave it a shot at writing my own module a while ago, and I gotta say the abstractions for the storage volumes just aren't there for me to understand how to implement one.

@DemiMarie
Copy link
Contributor

Anyway, we have a bunch of issues, some of which can be fixed in the separate package, and some of which require upstream work (I find the the template cow thing is the main thing I don't know how to fix).

Template CoW is 100% an upstream problem. No work required on your side.

@Rudd-O
Copy link
Contributor

Rudd-O commented Aug 25, 2022

I assume ZFS-backed pools can always instaclone the current (or rather the pre-boot) snapshot of the template to produce a R/W block device for a VM using the template.

@cfcs
Copy link
Author

cfcs commented Sep 29, 2022

I assume ZFS-backed pools can always instaclone the current (or rather the pre-boot) snapshot of the template to produce a R/W block device for a VM using the template.

The "problem" is that the r/w snapshot is not written to the qvm-pool of the VM. Whether they're written they're written to the qvm-pool of the template, or if they somehow incorrectly just get dropped in a file-backed default pool for the VM, is something I never figured out.

Most of the time I don't really care where these are written; assuming both pools have enough space, it doesn't make much of a difference to the end-user. Given that they're only ever interpreted as sparse writes on top of an immutable base image, it would be nice if they would end up as sparse volume writes attributed to the VM since that helps with quotas and accounting. That would help you avoid scenarios where the thin provisioning blows up critical VMs (sys-net etc) when "less needed" VMs decide to overwrite the entire supposedly-sparse device, creating storage pressure in cases where the assumption that there is enough space doesn't hold.

In practice the main problem I have with it is that if the VM storage is encrypted, like the per-qvm-pool encryption that the ZFS qvm-pool implementation supports, it's really unintuitive and problematic (security-wise) that writes to e.g. /tmp/ are written to the template's potentially-not-encrypted-at-all storage. If there was an option to use ephemeral encryption for these writes (like has been discussed - and perhaps implemented? - for the volatile volume), that would solve the main issue.

@DemiMarie
Copy link
Contributor

I assume ZFS-backed pools can always instaclone the current (or rather the pre-boot) snapshot of the template to produce a R/W block device for a VM using the template.

The "problem" is that the r/w snapshot is not written to the qvm-pool of the VM. Whether they're written they're written to the qvm-pool of the template, or if they somehow incorrectly just get dropped in a file-backed default pool for the VM, is something I never figured out.

It is placed on the VM’s template’s pool.

In practice the main problem I have with it is that if the VM storage is encrypted, like the per-qvm-pool encryption that the ZFS qvm-pool implementation supports, it's really unintuitive and problematic (security-wise) that writes to e.g. /tmp/ are written to the template's potentially-not-encrypted-at-all storage. If there was an option to use ephemeral encryption for these writes (like has been discussed - and perhaps implemented? - for the volatile volume), that would solve the main issue.

Volatile volume encryption (which includes swap) has been implemented. Encryption of the root volume has not yet been implemented. However, one can mark that volume as read-only, which will cause the VM to create a snapshot in the guest itself. This causes data that would be written to the root volume to be written to the volatile volume instead. @marmarek could this be made the default when volatile volume encryption is on?

@cfcs
Copy link
Author

cfcs commented Sep 29, 2022

Volatile volume encryption (which includes swap) has been implemented. Encryption of the root volume has not yet been implemented. However, one can mark that volume as read-only, which will cause the VM to create a snapshot in the guest itself. This causes data that would be written to the root volume to be written to the volatile volume instead. @marmarek could this be made the default when volatile volume encryption is on?

Ah, awesome! It seems to me that would make a lot of sense (same rationale as encryption for volatile, basically), and that would take care of the security concern over the current behavior in terms of this pool driver. Eventually it'd be nice if it could write to the VM's storage, but I don't see that part as a blocker for this project.

@Rudd-O
Copy link
Contributor

Rudd-O commented Mar 24, 2023

This can now be closed, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants