-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KVM: replace -drive by -blockdev and make HMP obsolete #1667
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this @rbott!
At a quick glance, it looks good in principle. There are obviously some bits missing, please find my comments below - I admit I wrote them before reading the whole description where you already mention that these are known issues. Still, I view them as important points that I would like to see addressed.
I'll try to find some time in the following days for a more in-depth review.
Just a small request: if there's missing infrastructure that you decide to implement as part of this work (e.g. QMP event handling), please do so in new PRs against master. This leads to easier reviews and more self-contained code.
Cheers!
src/Ganeti/Constants.hs
Outdated
@@ -2731,7 +2731,7 @@ htHvmValidVifTypes = ConstantUtils.mkSet [htHvmVifIoemu, htHvmVifVif] | |||
-- * Disk types | |||
|
|||
htDiskIde :: String | |||
htDiskIde = "ide" | |||
htDiskIde = "ide-hd" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This affects serialization to/from the config, and thus will require a migration for config data. Since htDiskIde
is Ganeti's disk type, and it doesn't have to be the same as whatever QEMU uses for IDE, I would keep it as-is and just pass ide-hd
to QEMU where needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I will change that back!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
src/Ganeti/Constants.hs
Outdated
htCacheDefault = "default" | ||
|
||
htCacheNone :: String | ||
htCacheNone = "none" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, dropping these means that we need to take care of upgrading existing configs and mapping previous values to the new ones (e.g. convert none
to writethrough
), otherwise deserialization will fail and/or instances will fail to start e.g. because of https://github.com/ganeti/ganeti/pull/1667/files#diff-cd6e55ca6245e02cbd788a3e5a43583e83e2de3816766ce2a98552cec0841df4R203.
We should also check and see if there's any upgrade that needs to be done at the runtime files of already-running instances (I'm not sure it's needed, but haven't checked).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this would require upgrade/downgrade code and of course lots of testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed (disk_cache reverted to the old list of accepted values)
# TODO: implement receiving of QMP events | ||
# We need to wait for the DEVICE_DELETED event via QMP before proceeding. | ||
# The old implementation using HMP only worked because it is so slow. | ||
time.sleep(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would really like to see even some hack for event handling here, rather than time.sleep
:-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, entirely agree. That was more of a workaround to get things in a working state. OTOH it pretty much reflects what happened before (although not as explicit) and implementing events in QMP is probably way out of scope for this PR.
Thanks @rbott for your work. Without going into depth, I would prefer to keep old values for instance parameters I think it might be not enough to just support/translate In
The first column matches Ganeti's Also I think we can't drop the hassle of valid cache combinations with aio/disk template type. Especially for Without much investigation, initially I thought we need to use the table above? |
Thanks, I actually did not see that table while looking at the documentation. I guess I will add To safe yourself some time, please wait with further reviews until I have dealt with disk_type and disk_cache :-) |
58ca757
to
3efc5ed
Compare
I have udpated the PR:
|
I have also opened #1669 because I can not test all Ganeti disk_types (but they do not work on the stable-3.0 or master branch as well). But I don't think we should solve/address that as part of this PR. |
I have added one fix: The QA suite passes für Debian Bullseye. It fails on Debian Buster, because it tries to test |
184b503
to
9fdcdbb
Compare
CDROM code transitioned to -blockdev The latest push also updates the
Differences to old code:
Limitations of the code (which were already present in the old code):
|
8ade311
to
594c54f
Compare
floppy code transitioned to -blockdev The KVM hypervisor code has been extended with a Design Document for -blockdev I have also added a design document for 3.1, describing the code changes introduced with this PR. With this, we are now only lacking upgrade/downgrade code + proper testing of cluster upgrades. Could you please take another look, @apoikos @saschalucas? |
Thanks @rbott for your work. I'll have a look. I'm also going to make some tests, especially with |
doc/design-qemu-blockdev.rst
Outdated
Hot-Removing Disks | ||
++++++++++++++++++ | ||
|
||
Hot-adding a disk consists of two steps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be Hot-removing, instead of Hot-adding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, absolutely. Thanks for spotting this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just tested the ceph-rbd userspace and got an error
cat /var/log/ganeti/kvm/test.vm.log
qemu-system-x86_64: -blockdev driver=raw,node-name=disk-222ec87a-b27d-4756,auto-read-only=off,cache.direct=off,cache.no-flush=off,discard=ignore,file.aio=threads,file.driver=host_device,file.filename=rbd:rbd/4fdf6419-ffd5-4dff-a60f-7a4274a00566.rbd.disk0: Could not open 'rbd:rbd/4fdf6419-ffd5-4dff-a60f-7a4274a00566.rbd.disk0': No such file or directory
doc/design-3.1.rst
Outdated
@@ -0,0 +1,7 @@ | |||
================= | |||
Ganeti 3.0 design |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ganeti 3.1 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, copy/paste error. Sorry :-)
Did you already have the time to run some of the checks @saschalucas? |
98cd436
to
a5d6fad
Compare
I have added migration code to take care of the obsolete The cluster upgrades well from 3.0.2 to the code in this PR. Of course it does not do anything to running instances with regards to storage devices/disks. Trying to hot-remove a disk from a running instance after the migration will fail, until the instance gets re-created with -blockdev parameters for drives. OTOH, re-creating an instance (stop/start or reboot) works exactly as intended. I yet have to test the following scenario (need multiple nodes for that):
If anyone reading this has the time to carry out the test themselves, please proceed :-) |
Sorry, very busy ATM. I'll try in the next days. |
60df0e1
to
3d1e964
Compare
I have updated this PR and added code to support gluster (both kernel- and userspace). @atta this also adds the groundwork to support other userspace storage backends (e.g. rbd). I still do not have a ceph setup at hand, but if you could provide me with an example rbd URI I could adopt the code to possibly also support rbd userspace access. QEMU seems to expect the following options: pool, image - and a bunch of optional parameters as well. For gluster, I simply extracted the required information from the gluster URL (gluster://host:port/volume/path) and filled the required qemu fields with that information. //Edit: nevermind, @atta's comment above actually show a rbd "URL":
Let's see what we can make out of this :-) |
80a838f
to
f93e638
Compare
Finally I had some time to revisit this PR. I have now added/verified support for blockdev with RBD storage (both userspace/kernelspace access). There is a lot of room for improvement in the code (especially with the additions for gluster & rbd) and I would like to revise it before we move forward with this. However, recent QA runs have also shown that with Debian Bookworm hotplugging with the legacy QEMU storage implementation (e.g. in stable-3.0) is broken again. This PR should fix this issue. However I suppose its way to big of a change to get included into Debian Bookworm at his point (we are already knee deep into the freeze/release schedule). I have improved the QA suite setup to also support RBD setups (only kernelspace access for now). This should make it easier to test. There is only one main problem left: I have not tested ext-storage yet. According to the documentation, "userspace" access is also supported:
This is some sort of an issue now. The current implementation would just pass the userspace URI over to qemu (which will either understand it or not). With the blockdev implementation ganeti actually needs to parse the URI, select the corresponding driver and fill whatever parameters might be required with the information it found in the URI. This somewhat undermines the sense of the ext storage provider. Unless we also change the ext-storage-provider model to require e.g. passing of valid JSON configuration for qemu instead an URI. What would be your take on this @atta @apoikos @saschalucas? To illustrate this a bit - this is how the RBD userspace access looks like in the JSON syntax for QMP (the commandline syntax is just a flattened-string-representation of the same): {
"driver":"raw",
"node-name":"disk-234d1e63-effa-45b4",
"discard":"ignore",
"cache":{
"direct":true,
"no-flush":false
},
"file":{
"driver":"rbd",
"pool":"ganeti",
"image":"891ef23c-068e-44f2-9636-1bafb5f48c87.rbd.disk4"
}
} In this case, |
thanks @rbott for working on this important topic. I hope to find some time for review soon.... |
This PR has finally left the draft stage :-) The latest revision contains the following changes:
I still have not tested this against an implementation of the ext-storage-interface. However, in theory this should work when the storage provider returns a block device (which is the default) and also should work when it returns an additional userspace URI, as long as it is either RBD or gluster. Other URIs could also work but need to be implemented in the I'd say this is really good for testing now. Go for it :-) |
ce590da
to
9e45d94
Compare
doc/design-qemu-blockdev.rst
Outdated
@@ -0,0 +1,175 @@ | |||
========================== | |||
Ganeti daemons refactoring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this heading needs changing (copy-pasted from design-daemons.rst ?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks :-)
doc/design-qemu-blockdev.rst
Outdated
================ | ||
|
||
We have to eliminate all uses of the `-drive` parameter from the Ganeti codebase | ||
to ensure compatibility with future versions fo QEMU. With this, we can also |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ fo / of /
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks :-)
This commit drops support for the deprecated -drive parameter of QEMU (in favor of -blockdev). It also switches disk hotplugging to use blockdevadd/blockdev-del via QMP which was the last part of Ganeti which made use of QEMU's human monitor (HMP). Along some adjustments are made to KVM parameters: * disk_aio now also supports io_uring on QEMU 5.0+ (default: threads) * disk_discard now only accepts ignore/unmap (default: ignore) Signed-off-by: Rudolph Bott <r@spam.wtf>
This PR replaces the deprecated
-drive
commandline parameter of QEMU with-blockdev
. It also switches hotplugging to useblockdev-add|del
via QMP and hence cuts the last ties from Ganeti to QEMU's human monitor (HMP).Please consider this PR work-in-progress!
This PR also touches several of KVMs parameters:
disk_type
ide
is now namedide-hd
disk_cache
now only acceptswritethrough
/writeback
(default:writeback
)disk_aio
now also supportsio_uring
when QEMU 5.0+ is present (default:threads
)disk_discard
now only acceptsignore
/unmap
(default:ignore
)Instead of changing parameters to reflect QEMU (e.g.ide
->ide-hd
) we could also try to keep parameters as they are and just translate in the code. What would you prefer? Regarding thedisk_cache
parameter, there are really only two sensible choices (dicated by the blockdev parametercache.direct=on|off
). I think it makes it easier for the user to not have Ganeti offer four different choices when there really are only two.What has been tested?
-blockdev
-blockdev
What is currently missing?
@apoikos @iustin @saschalucas it would be great if you can find some time to review so we can try to improve this PR. Once this is done, we can close #1542 and aim for a 3.1 release :-)