Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running docker in systemd-nspawn #220

Open
kiesstein opened this issue Mar 6, 2024 · 10 comments
Open

Running docker in systemd-nspawn #220

kiesstein opened this issue Mar 6, 2024 · 10 comments

Comments

@kiesstein
Copy link

I was trying to start the container in a systemd-nspawn container where /dev/kvm is bind mounted inside.
Inside the jail i run kvm-ok:

root@debian:/mnt/safe/docker/compose/windows$ kvm-ok 
INFO: /dev/kvm exists
KVM acceleration can be used

The Jail is the docker host with the following config:

version: "3"
services:
  windows:
    image: dockurr/windows
    container_name: windows
    devices:
      - /dev/kvm
    cap_add:
      - NET_ADMIN
    ports:
      - 8006:8006
      - 3389:3389/tcp
      - 3389:3389/udp
    stop_grace_period: 2m
    restart: on-failure
    volumes:
      - /mnt/tank/all/kvm/win:/storage

But when i start the container the VM just shuts down:

1933312K ........ ........ ........ ........ 41% 65.1M 37s

1966080K ........ ........ ........ ........ 42% 69.9M 36s

1998848K ........ ........ ........ ........ 42% 33.9M 36s


...



4718592K ........ ........ ........ ........ 99% 69.3M 0s

4751360K .....                              100% 38.2M=62s

❯ Extracting Windows 11 bootdisk...

❯ Extracting Windows 11 environment...

❯ Extracting Windows 11 setup...

❯ Extracting Windows 11 image...

❯ Adding XML file for automatic installation...

❯ Building Windows 11 image...

❯ Creating a 64G growable disk image in raw format...

❯ Booting Windows using QEMU emulator version 8.2.1 ...

3h3h3hBdsDxe: failed to load Boot0002 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0xA,0x0)/Scsi(0x0,0x0): Not Found

BdsDxe: loading Boot0001 "UEFI QEMU QEMU CD-ROM " from PciRoot(0x0)/Pci(0x5,0x0)/Scsi(0x0,0x0)

BdsDxe: starting Boot0001 "UEFI QEMU QEMU CD-ROM " from PciRoot(0x0)/Pci(0x5,0x0)/Scsi(0x0,0x0)

❯ Shutdown completed!

How can i get more logs or what do i miss? Is it not possible to run with bind mounted /dev/kvm or nspawn-jail?
I also installed qemu-kvm inside the jail:

sudo apt -y install qemu-kvm libvirt-daemon  bridge-utils virtinst libvirt-daemon-system

System: TrueNAS-Scale; Ryzen 1600x, SVM enabled
JailOS: Debian Bookworm

@kroese
Copy link
Contributor

kroese commented Mar 7, 2024

Very strange.. Normally when it shuts down it should at least print an error message or the reason of shutting down. So its very hard to tell now whats happening.

@kroese
Copy link
Contributor

kroese commented Mar 11, 2024

I now created a new release (v2.05) which should provide more error info in your case. Can you please try and see if it reports more info? And if not, can you try setting:

environment:
  CONSOLE: "Y"

in your compose file, and see if outputs more info?

@kiesstein
Copy link
Author

Thanks for the effort, here are the results:

docker-compose:

version: "3"
services:
  windows:
    image: dockurr/windows
    container_name: windows
    devices:
      - /dev/kvm
    cap_add:
      - NET_ADMIN
    ports:
      - 8006:8006
      - 3389:3389/tcp
      - 3389:3389/udp
    stop_grace_period: 2m
    restart: on-failure
    environment:
      MANUAL: "Y"
      CONSOLE: "Y"
    command: sleep infinity

Logs:

❯ Starting Windows for Docker v2.05...

❯ For support visit https://github.com/dockur/windows

❯ Downloading Windows 11...

[i] Downloading Windows media from official Microsoft servers...

[i] Downloading Windows 11...

[+] Got latest ISO download link (valid for 24 hours): https://software.download.prss.microsoft.com/dbazure/Win11_23H2_English_x64v2.iso?t=1122dfc9-36be-439d-b4b6-f16ca36b00a0&P1=1710254095&P2=601&P3=2&P4=1FloUyZlyxr%2f5wEBOrtHw8aWR4zK7kco4KoNsCL4eStk6tPD69rPy2Hadw6JVrzsutwGagcJyx3HtpN%2f36aYySC3PxmgeUoZE3q4yx0vrEcKa9iM%2bZYfatQoL77g64zVOzy9XLHBEM%2fuC4PnukGLCJyRGi%2fYHYQnoJzY74rrhmxlRZM8H%2f4CFqtgkU5yAPt7gA4JZc88c6YF3Rkesio20jG95nSvftrvsRdMl3tcloayvf4h1ezzxknQVKpZfkuQNGgOa61eY5j6Sdd80zBj%2fJ3BIY4H3m3Br%2bR36DHmAPdDZzDnUIGw87xgXw6o6z%2f1QMjCNZsxb%2fBpKEJrBmJeuw
#                                                                          1.4%
#                                                                          1.6%
#                                                                          1.7%
...
######################################################################## 100.0%

[+] Successfully downloaded Windows image!

❯ Extracting Windows 11 image...

❯ Building Windows 11 image...

❯ Creating a 64G growable disk image in raw format...

❯ Booting Windows using QEMU emulator version 8.2.1 ...

char device redirected to /dev/pts/0 (label serial0)

Weird, it doesn't even say now that is it shutting down. At the previous Version i could shortly see the same logs in the browser on port 8006. Now the container exits directly after "connecting to VNC" without the error messages.

Does the /dev/kvm` device need some special permissions? On Truenas Host/Nspawn Container and Docker container the device got the following permissions:

root@65350c78fe0e:/# ls -la /dev/kvm 
crw-rw---- 1 root 104 10, 232 Mar 11 14:43 /dev/kvm
root@65350c78fe0e:/# 

Thank you!

@kiesstein
Copy link
Author

So it did some digging and found out that the the container also does need the device /dev/vhost-net.
I found this while looking at a different docker-kvm project in the troubleshoot section:
https://github.com/BBVA/kvm?tab=readme-ov-file#notes--troubleshooting

So what i did was running bind mounting /dev/vhost-net into the nspawn-container and then qemu was able to boot up.

The weird thing is when i enable the console for more debug like @kroese mentioned:

environment:
  CONSOLE: "Y"

The VM would not start and only the following logs appeared when CONSOLE: "Y" :

❯ Starting Windows for Docker v2.05...

❯ For support visit https://github.com/dockur/windows

❯ Booting Windows using QEMU emulator version 8.2.1 ...

char device redirected to /dev/pts/0 (label serial0)

... and when i remove it, it works but the logs are kind of empty:

No log line matching the '' filter

I guess there is some kind of bug with that environment variable.
I don't know if this issue is only on my part with my special setup or this should be a separate issue.

@kroese
Copy link
Contributor

kroese commented Mar 11, 2024

@kiesstein Mmmh, very interesting find! In the past I had /dev/vhost-net in the example compose file, but it seemed it was not necessary as the container can create this device automaticly via mknod commands because of the NET_ADMIN capability. So I removed /dev/vhost-net from the compose file to keep it short.

Also its weird that it does not complain about anything when its created, but just exits much later when QEMU is launched.

And /dev/vhost-net is completely optional unless you are using DHCP mode with macvlan. The default (bridge network) mode can also run without it, just with slightly less performance.

So I still dont completely understand what is going on in your case. But maybe I should just not create it automaticly, except in macvlan mode, so that in case it causes any problems it does not happen in bridge mode...

Food for thought!

@kiesstein
Copy link
Author

You are right @kroese - it could not use mknot to create the device because the nspawn container needs the rights to do so.
I removed the bind mount of the device and gave it the rights to rwm of /dev/vhost-net like in: systemd-nspawn container with '--property=DeviceAllow=/dev/vhost-net rwm' and the windows VM boots up without issues!

@kroese
Copy link
Contributor

kroese commented Mar 11, 2024

Yes but the problem in this case is that mknod didnt return any error on your system. If it would have failed, the script launches QEMU without vhost-net and everything would have been fine because its optional.

But because mknod returned succesfully, the script assumes the device is available and tries to use it.

In any case, I will make some changes and just disable vhost-net unless somebody explicitly adds the device to their compose file.

@kroese
Copy link
Contributor

kroese commented Mar 11, 2024

I created a new tag (v2.06). Could you do me a favor and test if this version works in your original situation (where you did not mount /dev/vhost-net yet)? To see if the original issue is now solved.

@kiesstein
Copy link
Author

Ok so I tried to reproduce the problem with 2.05 and tested with winxp and I could not reproduce it.

I tried to install win11 again with the --property='DeviceAllow=/dev/vhost-net rwm' but still does not work - mhm, maybe it is only an installation issue or previously I only tried with winxp because it is faster to test - sadly I don't remember anymore.
So it is at least a win11 issue.
The statement that it works with --property='DeviceAllow=/dev/vhost-net rwm' and win11 is false!

Next test was instead --property='DeviceAllow=/dev/vhost-net rwm', bind mounting --bind=/dev/vhost-net, but sadly same issue.

Next was --capability=all without --property='DeviceAllow=/dev/vhost-net rwm' or --bind=/dev/vhost-net -> not working.

Next was --capability=all, --property='DeviceAllow=/dev/vhost-net rwm' and --bind=/dev/vhost-net -> not working.
I also did modprobe vhost_vsock on the jail-host because I remember doing that, but no.

Then I tested all bind mounts listed in this + vhost-net:

--bind=/dev/kvm --capability=all --bind=/dev/vhost-net --bind=/dev/fuse --bind=/dev/vsock --bind=/dev/vhost-vsock

-> not working.
I tried with winxp with same settings to make sure this at least still works, and it did.

Then I also tried to bind vhost-net in the dockerfile with win11 again like:

devices:
  - /dev/kvm
  - /dev/vhost-net

And still does not work. So I am not sure anymore what all I did in my testing but I can't get it running anymore with 2.05 with win11, and I do not remember exactly if it ever run with with11 - so there is that.

Then I tired with all the previous mentioned settings win10 and it did not work.
With win8 it did work! So something beginning with win10 is the issue.

I am sorry that I tested wrong (changing more than one setting(winxp)).

Then i did a docker compose pull and restarting the container with win8
First boot did not work after recreation - but then i restarted the container again and it did boot. I don't know, maybe a fluke?

❯ Starting Windows for Docker v2.07...

❯ For support visit https://github.com/dockur/windows

❯ Booting Windows using QEMU emulator version 8.2.1 ...

3h3h3hBdsDxe: loading Boot0003 "Windows Boot Manager" from HD(1,GPT,3C9F9213-A787-474C-8772-B5A6D80C1E6B,0x800,0x40000)/\EFI\Microsoft\Boot\bootmgfw.efi

BdsDxe: starting Boot0003 "Windows Boot Manager" from HD(1,GPT,3C9F9213-A787-474C-8772-B5A6D80C1E6B,0x800,0x40000)/\EFI\Microsoft\Boot\bootmgfw.efi

❯ Shutdown completed!

❯ Starting Windows for Docker v2.07...

❯ For support visit https://github.com/dockur/windows

❯ Booting Windows using QEMU emulator version 8.2.1 ...

3h3h3hBdsDxe: loading Boot0003 "Windows Boot Manager" from HD(1,GPT,3C9F9213-A787-474C-8772-B5A6D80C1E6B,0x800,0x40000)/\EFI\Microsoft\Boot\bootmgfw.efi

BdsDxe: starting Boot0003 "Windows Boot Manager" from HD(1,GPT,3C9F9213-A787-474C-8772-B5A6D80C1E6B,0x800,0x40000)/\EFI\Microsoft\Boot\bootmgfw.efi

Then i tried win10 again with no bind-mount and so on, but is not working with v2.07. It also does not work with all the bind mounts and privileged: true and - /dev/vhost-net in docker compose file.

In summary all the settings did nothing. Only thing i found out is that the it only works with winxp and win8 (win7, vista, ... I did not test) and it does not work with win10 and win11
So I am fairly certain that the devices or bind mounts are not the problem - so sadly we are back to square one.

@kiesstein kiesstein reopened this Mar 14, 2024
@kroese
Copy link
Contributor

kroese commented May 12, 2024

Can you try v3.05 while adding the privileged: true setting to your compose file ? This will set the ignore_msrs KVM parameter automaticly, which might solve your issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants