Memory sharing between virtual machines #473

jkuro-tii · 2024-02-13T10:17:00Z

A draft PR for a memory-sharing solution used for socket to socket data sending between virtual machines. It may
be used for wayland displaying.

The purpose of this PR is to share the code and try to find answers to the problem that arose.

Jira ticket:
https://ssrc.atlassian.net/browse/SP-3805

Documentation:
https://ssrc.atlassian.net/wiki/spaces/~62552e6ffdb60b006927ad98/blog/2022/09/29/612958326/Memory+sharing+between+virtual+machines
https://ssrc.atlassian.net/wiki/spaces/~62552e6ffdb60b006927ad98/pages/825720835/Wayland+displaying+with+shared+memory

Scenario that executes properly

Lenovo P1 with Fedora FC38 installed. For displaying and application VM's, there are two
separate vm-debug virtual machines with shared memory driver and the memsocket application
extra installed.

Actual
Playing YouTube is smooth, up to a resolution of 1440.
CPU consumption by the memsocket application is below 5%, usually 2-3%.

Regular Ghaf Lenovo X1 scenario

Build Ghaf for the Lenovo X1 target and run it.
Connect to WiFi, run Chrome, and play YouTube videos.

Actual
video is choppy, the memsocket application uses around 100% CPU.

Expected
Playing YouTube is smooth, with a resolution of up to 1440.

Another test was performed with a Lenovo X1 installed with FC and running virtual machines.
extracted from the Ghaf build.
The result was the same as with Ghaf, i.e., video quality was unacceptable and CPU consumption
by the memsocket application was high, also around 100%.

Observations
Performance measurement was done both on the GuiVM and host.
The perf data files:
perf_host.tar.gz
perf_guivm.tar.gz

Results on GuiVM
`
Samples: 387K of event 'cpu-clock:pppH', Event count (approx.): 96985250000
Children Self Command Shared Object Symbol

99.96% 0.00% memsocket [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe
99.96% 0.00% memsocket [kernel.kallsyms] [k] do_syscall_64
99.84% 0.00% memsocket [kernel.kallsyms] [k] __sock_sendmsg
99.83% 0.00% memsocket [kernel.kallsyms] [k] unix_stream_sendmsg
99.82% 0.00% memsocket libc.so.6 [.] __send
99.82% 0.00% memsocket [kernel.kallsyms] [k] __x64_sys_sendto
99.81% 0.00% memsocket [kernel.kallsyms] [k] __sys_sendto
99.71% 0.00% memsocket [kernel.kallsyms] [k] skb_copy_datagram_from_iter
99.70% 0.01% memsocket [kernel.kallsyms] [k] _copy_from_iter
99.69% 99.55% memsocket [kernel.kallsyms] [k] copy_user_enhanced_fast_string
89.30% 0.00% memsocket [kernel.kallsyms] [k] copy_page_from_iter
0.14% 0.12% memsocket [kernel.kallsyms] [k] __softirqentry_text_start
0.14% 0.00% memsocket [kernel.kallsyms] [k] __irq_exit_rcu
`

Results on host
`
+Samples: 152K of event 'cpu_atom/cycles/', Event count (approx.): 5979663105

Children Self Command Shared Object Symbol
78.00% 0.00% .qemu-system-x8 libc.so.6 [.] __GI___ioctl
76.53% 0.00% .qemu-system-x8 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe
76.53% 0.02% .qemu-system-x8 [kernel.kallsyms] [k] do_syscall_64
70.29% 0.00% .qemu-system-x8 [kernel.kallsyms] [k] __x64_sys_ioctl
70.28% 0.00% .qemu-system-x8 [kvm] [k] kvm_vcpu_ioctl
70.27% 2.07% .qemu-system-x8 [kvm] [k] kvm_arch_vcpu_ioctl_run
31.98% 2.91% .qemu-system-x8 [kvm_intel] [k] vmx_vcpu_run
21.27% 21.20% .qemu-system-x8 [kernel.kallsyms] [k] native_write_msr
17.35% 0.80% .qemu-system-x8 [kvm] [k] kvm_vcpu_halt
13.89% 0.19% .qemu-system-x8 [kvm] [k] kvm_vcpu_block
12.63% 0.23% .qemu-system-x8 [kernel.kallsyms] [k] schedule
12.41% 0.41% .qemu-system-x8 [kernel.kallsyms] [k] __schedule
10.47% 0.33% .qemu-system-x8 [kvm] [k] kvm_load_host_xsave_state
10.19% 0.17% .qemu-system-x8 [kvm] [k] kvm_load_guest_xsave_state
`

Both measurements indicate that the time is spent while executing the kernel __sock_sendmsg syscall,
when the send() function is used to write a buffer located in the shared memory into
wayland socket.

jenninikko · 2024-02-13T10:35:08Z

Are you planning to add documentation to the repository?

vilvo

Please make PR description according to the template - https://github.com/tiiuae/ghaf/blob/main/.github/pull_request_template.md
Please fill the PR template checklist, describe testing and remove "Draft" status.
Re-request review

jkuro-tii · 2024-02-13T11:03:24Z

The purpose of this PR is to share the code and try together to solve the mentioned problem of high kernel overhead.

jkuro-tii · 2024-02-14T10:22:44Z

Getting pert data. perf is a standard Linux tool (https://perf.wiki.kernel.org/index.php/Main_Page). In this PR the perf component is included in the host, GuiVM and AppVM VMs. The first stage of using perf is collecting execution data (e.g. sudo perf record -a -g -p PID) for certain period of time. The data is written into the perf.data file. Then the data can be visualized using the sudo perf report subcommand. perf report uses /proc/kallsyms for obtaining information about kernel symbols. If you want to process the data offline, copy /proc/kallsyms and refer to it with the perf record --kallsyms=<file> option. The files that I attached contain both perf.data and kernel symbols files.

To check the memsocket application performace on GuiVM:

Find the PID of the memsocket service: systemctl status --user memsocket.service | grep "Main PID"
Collect perf data: sudo perf record -a -g -p PID
Process it with perf report.

To check GuiVM's qemu performance on the host:

Start terminal
Login to host: ssh 192.168.101.2
Find PID of qemu instance running GuiVM: systemctl status microvm@gui-vm.service | grep "Main PID"
collect perf data: sudo perf record -a -g -p PID
Process it with perf report.

All files kept in the /home/ghaf directory on the host are stored on the Ghaf boot disk. It may be used for accessing them on another computer. E.g. files from the GuiVM machine can be copied to host using scp _file_ ghaf@192.168.101.2: and accessed offline.
Sample reading about perf:
1, 2

vilvo · 2024-02-14T14:11:34Z

To check the memsocket application performace on GuiVM

Thanks. I can reproduce your perf measurement and perf report analysis in the Youtube playing scenario. In addition, the user experience is terrible.

Behind the link of the description, there an interesting mention on

Result
Throughput socket to socket: 
2.8 GB/s = 22.4 Gb/s
using ssh: 140 MB/s = 1.1 Gb/s

Lenovo Thinkpad P1 Intel i7 @2.5 GHz

Sending 1 MiB buffer 8 K times (total 80 GiB sent)

Can you please share reproducible iperf test description? I'd like to reproduce and establish Lenovo X1 Carbon Gen11 baseline throughput. Another thing we could then check is to parametrize the buffer size and find where we hit the ceiling.

jkuro-tii · 2024-02-16T08:08:32Z

I did iperf measuring, with the unsock tool - in order to tunnel TCP traffic over socket and shared memory.

Test configuration:
chromium-vm: iperf -c with unsock -> [socket] -> memsocket->
[shared memory]
guivm: memsocket -> [socket] -> iperf -s with unsock

Performance achieved:
Lenovo X1 with Ghaf: ~90 Mbits/sec
Lenovo P1 with Fedora Linux and two Ghaf vm-debug machines (my development environment): ~32 Gbit/s

result.client_x1.log
result.server_x1.log
result.server.log
result.client.log

vilvo · 2024-02-19T06:46:44Z

I did iperf measuring, with the unsock tool - in order to tunnel TCP traffic over socket and shared memory.

Test configuration: chromium-vm: iperf -c with unsock -> [socket] -> memsocket-> [shared memory] guivm: memsocket -> [socket] -> iperf -s with unsock

Performance achieved: Lenovo X1 with Ghaf: ~90 Mbits/sec Lenovo P1 with Fedora Linux and two Ghaf vm-debug machines (my development environment): ~32 Gbit/s

result.client_x1.log result.server_x1.log [result.server.log]

(https://github.com/tiiuae/ghaf/files/14308083/result.server.log) result.client.log

[ghaf@ghaf-host:~/shmsockproxy/app]$ UNSOCK_ADDR=127.0.0.1 UNSOCK_DIR=`pwd` LD_P
RELOAD=~/unsock/libunsock.so.1.1.0  iperf -c 127.0.0.1 -p 1234

This indicates you have compiled some dependency in the current working directory (pwd?) of shmsockproxy app to use some version(which git commit hash?) of libunsock with iperf. From above instructions it's unclear on how to replicate the test setup as I rather not do guessing.

Optionally the test tools you have used could be nix packaged and included in the PR -debug build.

Would it make sense to add some simple unit throughput test through memsocket instead? What do you think?

vilvo · 2024-02-19T06:54:54Z

If I try a bit further, to compile the memsocket on ghaf-host I get a lot of gcc warnings on -Wformat-extra-args and -Wformat-overflow= (just a small snippet below).

[ghaf@ghaf-host:~]$ nix-shell -p git stdenv

[nix-shell:~]$ cd shmsockproxy/app/

[nix-shell:~/shmsockproxy/app]$ gcc memsocket.c
memsocket.c: In function ‘server_init’:
memsocket.c:175:8: warning: too many arguments for format [-Wformat-extra-args]
  175 |   INFO("server initialized", "");
...
memsocket.c: In function ‘wayland_connect’:
memsocket.c:50:19: warning: ‘%s’ directive writing up to 255 bytes into a region of size 235 [-Wformat-overflow=]
...

Edit. git was used to clone:

[nix-shell:~]$ git clone https://github.com/tiiuae/shmsockproxy.git
[nix-shell:~]$ cd shmsockproxy/
[nix-shell:~/shmsockproxy]$ git log --oneline -1
8b1bf60 (HEAD -> main, origin/main, origin/HEAD) Removed forking, as systemd sends SIGTERM

vilvo · 2024-02-19T07:11:31Z

Also, I think this is relevant to performance comparison between X1 and desktop - #340 (review)
So we can now use Ghaf internal NVMe instead of external SSD over USB3.

jkuro-tii · 2024-02-19T11:10:43Z

If I try a bit further, to compile the memsocket on ghaf-host I get a lot of gcc warnings on -Wformat-extra-args and -Wformat-overflow= (just a small snippet below).

memsocket.c: In function ‘wayland_connect’:
memsocket.c:50:19: warning: ‘%s’ directive writing up to 255 bytes into a region of size 235 [-Wformat-overflow=]
...
Edit. `git` was used to clone:
[nix-shell:]$ git clone https://github.com/tiiuae/shmsockproxy.git
[nix-shell:]$ cd shmsockproxy/

Fixed.

jkuro-tii · 2024-02-19T11:13:14Z

chromium -> [socket] -> waypipe -> [socket] -> memsocket ->

[shared memory]

-> memsocket send() -> [socket] -> waypipe -> wayland

If I try a bit further, to compile the memsocket on ghaf-host I get a lot of gcc warnings on -Wformat-extra-args and -Wformat-overflow= (just a small snippet below).

[ghaf@ghaf-host:~]$ nix-shell -p git stdenv

[nix-shell:~]$ cd shmsockproxy/app/

[nix-shell:~/shmsockproxy/app]$ gcc memsocket.c
memsocket.c: In function ‘server_init’:
memsocket.c:175:8: warning: too many arguments for format [-Wformat-extra-args]
  175 |   INFO("server initialized", "");
...
memsocket.c: In function ‘wayland_connect’:
memsocket.c:50:19: warning: ‘%s’ directive writing up to 255 bytes into a region of size 235 [-Wformat-overflow=]
...

Edit. git was used to clone:

[nix-shell:~]$ git clone https://github.com/tiiuae/shmsockproxy.git
[nix-shell:~]$ cd shmsockproxy/
[nix-shell:~/shmsockproxy]$ git log --oneline -1
8b1bf60 (HEAD -> main, origin/main, origin/HEAD) Removed forking, as systemd sends SIGTERM

I added binary library and a few commands by hand into chromium and gui VMs. Is it OK it I attach them here, or you want to include them in the Ghaf build?

vilvo · 2024-02-19T11:37:17Z

I added binary library and a few commands by hand into chromium and gui VMs. Is it OK it I attach them here, or you want to include them in the Ghaf build?

This command:

[ghaf@ghaf-host:~/shmsockproxy/app]$ UNSOCK_ADDR=127.0.0.1 UNSOCK_DIR=`pwd` LD_P
RELOAD=~/unsock/libunsock.so.1.1.0  iperf -c 127.0.0.1 -p 1234

from here indicates that some commands are also required on ghaf-host instead of only chromium-vm and gui-vm if one wants to reproduce the iperf throughput measurements. Now I'm quite not sure if that's the case or not. So I still have some trouble following the testing procedure and if we are talking about the integrated YouTube-streaming case or the iperf measurement case. So it's probably more clear if we add reproducible test setup in the Ghaf build for -debug-target and clarify instructions.

jkuro-tii · 2024-02-19T11:58:16Z

The memsock app in the gui-vm forwards all the data coming from shared memory to Waypipe, Thus, we can't run socket tests in a regular setup. If we want to perform them, memsock should be stopped in gui-vm: sudo systemctl stop --user memsocket.sevice. Then it can can be started by hand with custom socket path. The socket created by it can be then used by libunsocket + iperf.
I uploaded sample scripts - update your build memsocket binary location (e.g. systemctl status --user memsocket.service); I hope they'll clarify how to do it . Otherwise I'll prepare the build.

iperf.tar.gz
I ran also:
time dd if=/dev/zero of=/dev/ivshmem
time dd of=/dev/null if=/dev/ivshmem
in Chromium and Gui VM's. It seems it's something wrong with GuiVM - dd achieves 17.7 MB/s, whereas on Chromium VM it's 1.6 GB/s. Kernel configuration of both machines is identical.
It's in line with playing Youtube: high overhead happens only on GuiVM.

vilvo · 2024-02-19T13:04:55Z

The memsock app in the gui-vm forwards all the data coming from shared memory to Waypipe, Thus, we can't run socket tests in a regular setup. If we want to perform them, memsock should be stopped in gui-vm: sudo systemctl stop --user memsocket.sevice. Then it can can be started by hand with custom socket path. The socket created by it can be then used by libunsocket + iperf. I uploaded sample scripts - update your build memsocket binary location (e.g. systemctl status --user memsocket.service); I hope they'll clarify how to do it . Otherwise I'll prepare the build.

Ok, thanks. Confirms my understanding of the real scenario vs. perf test scenario. I tried to rebase your PR with the main-branch to install your PR to NVMe (instead of external SSD) using the installer that was merged on Friday. There was a merge conflict. Would you mind resolving that and updating the draft PR branch?

iperf.tar.gz I ran also: time dd if=/dev/zero of=/dev/ivshmem time dd of=/dev/null if=/dev/ivshmem in Chromium and Gui VM's. It seems it's something wrong with GuiVM - dd achieves 17.7 MB/s, whereas on Chromium VM it's 1.6 GB/s. Kernel configuration of both machines is identical. It's in line with playing Youtube: high overhead happens only on GuiVM.

This is good approach. Super simple and confirms the scenario. So there's nothing throttling the write on chromium-vm side, we get good throughput. The reader number indicates it can't handle reading as fast. If the same happens in the Youtube scenario (e.g. lost frames) I'm guessing the decoder will request resending of the frames which is not only visible in jittery playback but should be also visible in increased network usage. Another scenario to test is the other way round. dd so that the gui-vm writes and the chromium-vm reads. The problem with dd approach is that we don't see how much sent data we are losing. If it was done using a known big file (instead of /dev/zero) - e.g. random generated 20GB with calculated md5sum, we could control the block send size and iterations, regenerate the read data into a file and calculate a corresponding checksum.

jkuro-tii · 2024-02-20T08:28:40Z

It doesn't seem we are facing data loss. Connection between remote and waypipe instances is multichannel, data is sent both directions. After portion of data sent from AppVM, GuiVM sends a short package of data, probably it's confirmation. If its not received by AppVM, there is no further transmission. Also if we faced data loss, GUI stuff - windows, menu, scroll bars would be damaged. We don't face it.

I've pushed my branch rebased with main.

vilvo · 2024-02-20T09:49:19Z

I've pushed my branch rebased with main.

Thanks, I installed your rebased draft branch build to X1 NVMe with:

nix build .#packages.x86_64-linux.lenovo-x1-carbon-gen11-debug-installer
dd if=result/iso/nixos-23.11.20231218.3a9928d-x86_64-linux.iso ...
<boot and install to nvme0n1>

The problem with dd method is the numbers tell very little. Writing stops with

[ghaf@gui-vm:~]$ time dd if=/dev/zero of=/dev/ivshmem
dd: writing to '/dev/ivshmem': No space left on device
32769+0 records in
32768+0 records out
16777216 bytes (17 MB, 16 MiB) copied, 0.688331 s, 24.4 MB/s

real	0m0.692s
user	0m0.000s
sys	0m0.690s

vilvo · 2024-02-20T10:27:49Z

During the Youtube streaming scenario there's very heavy load in QEMU processes as seen from htop on ghaf-host.
The message signaled interrupts on chromium-vm and gui-vm are roughly ~50-60/s as seen with watch -n1 "cat /proc/interrupts"

When you stop the streaming, there's no QEMU load.

vilvo · 2024-02-20T11:12:57Z

We're not using aio=native with QEMU. Please check
https://hex.ro/wp/blog/kvm-qemu-high-cpu-load-but-low-cpu-usage/
I checked the strace during streaming and we've got the same - littered with events=POLLIN and a ton of anon_inode:[eventfd]s under the QEMU /proc/<pid>/fd

vilvo · 2024-02-20T11:35:52Z

And this presentation in the blog post link of the previous comment is extremely relevant for optimizing our case - https://vmsplice.net/~stefan/stefanha-kvm-forum-2017.pdf

jkuro-tii · 2024-02-21T10:16:12Z

I designed an utility for testing connection.
On one side (e.g. chromium-vm) it sends a file to a socket, on the other side (e.g. gui-vm) it receives it, without any writing. The transmission is protected by CRC-8.
Both sockets are connected with the memsocket app.
It's possible to specify how many data to send, and to force send broken CRC (by adding anything as 4th parameter), e.g.:

memtest socket_path /dev/random 10M
memtest socket_path /dev/random 10M anything # for sending broken CRC

First start it on the receiving site (any of the VMs):
memsocket -c ./test.socket & # run it once
memtest ./test.socket

Now file(s) can be send from another VM:
memsocket -s ./test.socket 2 & # run it once

memtest ./test.socket /dev/random 10M
memtest ./test.socket /dev/random 10M wrong_crc

It's prints some statistics.
Before using it it's necessary to stop the memsocket service: systemctl stop memsocket.service on the gui-vm and the other peer VM (e.g. chromium)! The service directs all data to the Waypipe's socket, so it must be stopped.

Sample sessions:

[ghaf@chromium-vm:]$ memtest
Usage: memtest socket_path for receiving.
memtest socket_path input_file [size CRC_error] for sending.
Before using stop the memsocket service: 'systemctl stop --user memsocket.service'.
For receiving run e.g.: 'memsocket -c ./test.sock &; memtest ./test.sock'.
To send a file run on other VM: 'memsocket -s ./test.sock 3 &; memtest ./test.sock /dev/random 10M.'
To force wrong CRC on sending: 'memtest ./test.sock /dev/random 10M xxx
'
[ghaf@chromium-vm:]$ systemctl stop --user memsocket.service

[ghaf@chromium-vm:~]$ memsocket -s ./test.sock 3 &
[1] 776

[ghaf@chromium-vm:~]$ memtest ./test.sock /dev/random 10M
Sent/received 10485761 bytes 2 Mbytes/sec
real 5.030s
user 0.010s
sys 0.030s

[ghaf@chromium-vm:]$ memtest ./test.sock /dev/random 100M
Sent/received 104857601 bytes 4 Mbytes/sec
real 27.640s
user 0.250s
sys 0.320s
[ghaf@chromium-vm:]$ memtest ./test.sock /dev/random 100M
Sent/received 104857601 bytes 244 Mbytes/sec
real 0.410s
user 0.200s
sys 0.200s

[ghaf@chromium-vm:~]$ memtest ./test.sock /dev/random 100M ui # to brake the CRC
Sent/received 104857601 bytes 263 Mbytes/sec
real 0.380s
user 0.170s
sys 0.200s

[ghaf@chromium-vm:~]$ memtest ./test.sock /dev/random 100M
Sent/received 104857601 bytes 263 Mbytes/sec
real 0.380s
user 0.190s
sys 0.180s

Usage: memtest socket_path for receiving.
memtest socket_path input_file [size CRC_error] for sending.
Before using stop the memsocket service: 'systemctl stop --user memsocket.service'.
For receiving run e.g.: 'memsocket -c ./test.sock &; memtest ./test.sock'.
To send a file run on other VM: 'memsocket -s ./test.sock 3 &; memtest ./test.sock /dev/random 10M.'
To force wrong CRC on sending: 'memtest ./test.sock /dev/random 10M xxx

[ghaf@zathura-vm:~]$ systemctl stop --user memsocket.service

[ghaf@zathura-vm:~]$ memsocket -c ./test.sock &
[1] 670

[ghaf@zathura-vm:~]$ memtest ./test.sock
Waiting for a connection.
Connected...
Sent/received 104857602 bytes 159 Mbytes/sec
real 0.630s
user 0.270s
sys 0.030s
Read 104857601 bytes. CRC OK
Connected...
Sent/received 104857602 bytes 244 Mbytes/sec
real 0.410s
user 0.200s
sys 0.010s
Read 104857601 bytes. CRC OK
Connected...
Sent/received 104857602 bytes 263 Mbytes/sec
real 0.380s
user 0.180s
sys 0.020s
Read 104857601 bytes. TRAMISSION ERROR!
Connected...
Sent/received 104857602 bytes 263 Mbytes/sec
real 0.380s
user 0.190s
sys 0.010s
Read 104857601 bytes. CRC OK
``

jkuro-tii · 2024-02-21T10:22:59Z

I noticed that sending data between gui-vm and any other machine (zathura-vm, chromium-vm) is significantly slower then sending between app VMs (e.g. zathura-vm and chromium-vm).

On my desktop I use qemu version 8.0.5. The full run command line (it's a Ghaf vm-debug build):
exec /nix/store/ppfkk3h94b3s3vhifp55q80kr5y8bzvc-qemu-host-cpu-only-8.0.5/bin/qemu-kvm -cpu max \ -name ghaf-host \ -m 1024 \ -smp 1 \ -device virtio-rng-pci \ -net nic,netdev=user.0,model=virtio -netdev user,id=user.0,"$QEMU_NET_OPTS" \ -virtfs local,path=/nix/store,security_model=none,mount_tag=nix-store \ -virtfs local,path="${SHARED_DIR:-$TMPDIR/xchg}",security_model=none,mount_tag=shared \ -virtfs local,path="$TMPDIR"/xchg,security_model=none,mount_tag=xchg \ -drive cache=writeback,file="$NIX_DISK_IMAGE",id=drive1,if=none,index=1,werror=report -device virtio-blk-pci,bootindex=1,drive=drive1 \ -device virtio-keyboard \ -usb \ -device usb-tablet,bus=usb-bus.0 \ -kernel ${NIXPKGS_QEMU_KERNEL_ghaf_host:-/nix/store/sgv6vmzrbkghz8a0hr05garms07cqz2m-nixos-system-ghaf-host-23.05.20230927.5cfafa1/kernel} \ -initrd /nix/store/sgv6vmzrbkghz8a0hr05garms07cqz2m-nixos-system-ghaf-host-23.05.20230927.5cfafa1/initrd \ -append "$(cat /nix/store/sgv6vmzrbkghz8a0hr05garms07cqz2m-nixos-system-ghaf-host-23.05.20230927.5cfafa1/kernel-params) init=/nix/store/sgv6vmzrbkghz8a0hr05garms07cqz2m-nixos-system-ghaf-host-23.05.20230927.5cfafa1/init regInfo=/nix/store/w7b65bipqayzsi6w2ksjvd1yi7m8b0x3-closure-info/registration console=ttyS0,115200n8 console=tty0 $QEMU_KERNEL_PARAMS" \ $QEMU_OPTS \ "$@"

modules/development/debug-tools.nix

jkuro-tii · 2024-02-22T10:13:41Z

I ran my tool in several configuration. Results:

there is no data lost or data corruption in any scenario
transfers between chromium-vm and zathura-vm are fast - ~250 MBytes/s
transfers between any of the above VMs and gui-vm are very slow. They hardly reach 12 MB/s

Shutting down weston gave no result. But doing that and eliminating passing-through the GPU solved the problem. After that transfer to/from gui-vm have the same high speed as inter app-vm's.

I'll continue with different virtual PCI devices setup, and try to use qemu own monitor in order to find the bottleneck.

jkuro-tii · 2024-02-26T10:34:45Z

The driver already contains interrupts counters. I verified number of interrupts sent by one VM against numbers listed in /proc/interrupts/. There are in line, i.e. number of interrupts raised by one VM is equal to number /proc/interrupts in the other VM as well as counted by the device driver. It means that there is no interrupt loop, missed interrupts.

Delays and huge CPU load happen only when GPU is pass-through. Without it CPU load is less then 5%.

jkuro-tii · 2024-04-05T09:04:06Z

Added some improvements:

allocating shared memory area using huge pages (2MB size)
adding to the qemu ivshmem driver ability to map the above memory directly into VM physicall address space (currently it's mapped into PCI memory area). The physical memory address is set in kernel command line.

Signed-off-by: Jaroslaw Kurowski <jaroslaw.kurowski@tii.ae>

jkuro-tii temporarily deployed to internal-build-workflow February 13, 2024 10:17 — with GitHub Actions Inactive

jkuro-tii had a problem deploying to external-build-workflow February 13, 2024 10:17 — with GitHub Actions Failure

jkuro-tii force-pushed the jkuro-add-wayland-shm-lenovo_rebase branch from d6017d1 to eef36be Compare February 13, 2024 10:19

jkuro-tii temporarily deployed to internal-build-workflow February 13, 2024 10:19 — with GitHub Actions Inactive

jkuro-tii had a problem deploying to external-build-workflow February 13, 2024 10:19 — with GitHub Actions Failure

jkuro-tii requested review from vilvo, vadika and mbssrc February 13, 2024 10:38

vilvo suggested changes Feb 13, 2024

View reviewed changes

jkuro-tii force-pushed the jkuro-add-wayland-shm-lenovo_rebase branch from eef36be to 6224584 Compare February 20, 2024 08:21

jkuro-tii temporarily deployed to internal-build-workflow February 20, 2024 08:21 — with GitHub Actions Inactive

jkuro-tii had a problem deploying to external-build-workflow February 20, 2024 08:21 — with GitHub Actions Failure

jkuro-tii closed this Feb 21, 2024

jkuro-tii reopened this Feb 21, 2024

jkuro-tii temporarily deployed to internal-build-workflow February 21, 2024 10:16 — with GitHub Actions Inactive

jkuro-tii had a problem deploying to external-build-workflow February 21, 2024 10:16 — with GitHub Actions Failure

Mic92 reviewed Feb 21, 2024

View reviewed changes

modules/development/debug-tools.nix Outdated Show resolved Hide resolved

jkuro-tii temporarily deployed to internal-build-workflow March 19, 2024 11:09 — with GitHub Actions Inactive

jkuro-tii had a problem deploying to external-build-workflow March 19, 2024 11:09 — with GitHub Actions Failure

jkuro-tii force-pushed the jkuro-add-wayland-shm-lenovo_rebase branch from 379f0cf to d8e1c2a Compare March 19, 2024 11:11

jkuro-tii temporarily deployed to internal-build-workflow March 19, 2024 11:11 — with GitHub Actions Inactive

jkuro-tii had a problem deploying to external-build-workflow March 19, 2024 11:11 — with GitHub Actions Failure

jkuro-tii changed the title ~~Memory sharing for wayland displaying~~ Memory sharing between virtual machines Apr 5, 2024

jkuro-tii force-pushed the jkuro-add-wayland-shm-lenovo_rebase branch from d8e1c2a to f0ca97a Compare April 5, 2024 09:03

jkuro-tii temporarily deployed to internal-build-workflow April 5, 2024 09:03 — with GitHub Actions Inactive

jkuro-tii had a problem deploying to external-build-workflow April 5, 2024 09:03 — with GitHub Actions Failure

jkuro-tii force-pushed the jkuro-add-wayland-shm-lenovo_rebase branch from f0ca97a to 5ea3c2c Compare April 5, 2024 09:20

jkuro-tii temporarily deployed to internal-build-workflow April 5, 2024 09:20 — with GitHub Actions Inactive

jkuro-tii had a problem deploying to external-build-workflow April 5, 2024 09:21 — with GitHub Actions Failure

jkuro-tii marked this pull request as ready for review April 5, 2024 09:46

jkuro-tii force-pushed the jkuro-add-wayland-shm-lenovo_rebase branch from 5ea3c2c to 2967781 Compare April 23, 2024 10:21

jkuro-tii temporarily deployed to internal-build-workflow April 23, 2024 10:21 — with GitHub Actions Inactive

jkuro-tii had a problem deploying to external-build-workflow April 23, 2024 10:21 — with GitHub Actions Failure

Memory sharing between VMs

dec7369

Signed-off-by: Jaroslaw Kurowski <jaroslaw.kurowski@tii.ae>

jkuro-tii force-pushed the jkuro-add-wayland-shm-lenovo_rebase branch from 2967781 to dec7369 Compare April 25, 2024 06:24

jkuro-tii temporarily deployed to internal-build-workflow April 25, 2024 06:24 — with GitHub Actions Inactive

jkuro-tii had a problem deploying to external-build-workflow April 25, 2024 06:24 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory sharing between virtual machines #473

Memory sharing between virtual machines #473

jkuro-tii commented Feb 13, 2024 •

edited

jenninikko commented Feb 13, 2024

vilvo left a comment

jkuro-tii commented Feb 13, 2024

jkuro-tii commented Feb 14, 2024

vilvo commented Feb 14, 2024

jkuro-tii commented Feb 16, 2024

vilvo commented Feb 19, 2024

vilvo commented Feb 19, 2024 •

edited

vilvo commented Feb 19, 2024

jkuro-tii commented Feb 19, 2024

jkuro-tii commented Feb 19, 2024

vilvo commented Feb 19, 2024

jkuro-tii commented Feb 19, 2024 •

edited

vilvo commented Feb 19, 2024

jkuro-tii commented Feb 20, 2024

vilvo commented Feb 20, 2024

vilvo commented Feb 20, 2024

vilvo commented Feb 20, 2024

vilvo commented Feb 20, 2024

jkuro-tii commented Feb 21, 2024 •

edited

jkuro-tii commented Feb 21, 2024

jkuro-tii commented Feb 22, 2024

jkuro-tii commented Feb 26, 2024 •

edited

jkuro-tii commented Apr 5, 2024 •

edited

Memory sharing between virtual machines #473

Are you sure you want to change the base?

Memory sharing between virtual machines #473

Conversation

jkuro-tii commented Feb 13, 2024 • edited

Scenario that executes properly

Actual Playing YouTube is smooth, up to a resolution of 1440. CPU consumption by the memsocket application is below 5%, usually 2-3%.

Regular Ghaf Lenovo X1 scenario

Expected Playing YouTube is smooth, with a resolution of up to 1440.

jenninikko commented Feb 13, 2024

vilvo left a comment

Choose a reason for hiding this comment

jkuro-tii commented Feb 13, 2024

jkuro-tii commented Feb 14, 2024

vilvo commented Feb 14, 2024

jkuro-tii commented Feb 16, 2024

vilvo commented Feb 19, 2024

vilvo commented Feb 19, 2024 • edited

vilvo commented Feb 19, 2024

jkuro-tii commented Feb 19, 2024

jkuro-tii commented Feb 19, 2024

vilvo commented Feb 19, 2024

jkuro-tii commented Feb 19, 2024 • edited

vilvo commented Feb 19, 2024

jkuro-tii commented Feb 20, 2024

vilvo commented Feb 20, 2024

vilvo commented Feb 20, 2024

vilvo commented Feb 20, 2024

vilvo commented Feb 20, 2024

jkuro-tii commented Feb 21, 2024 • edited

jkuro-tii commented Feb 21, 2024

jkuro-tii commented Feb 22, 2024

jkuro-tii commented Feb 26, 2024 • edited

jkuro-tii commented Apr 5, 2024 • edited

jkuro-tii commented Feb 13, 2024 •

edited

Actual
Playing YouTube is smooth, up to a resolution of 1440.
CPU consumption by the memsocket application is below 5%, usually 2-3%.

Expected
Playing YouTube is smooth, with a resolution of up to 1440.

vilvo commented Feb 19, 2024 •

edited

jkuro-tii commented Feb 19, 2024 •

edited

jkuro-tii commented Feb 21, 2024 •

edited

jkuro-tii commented Feb 26, 2024 •

edited

jkuro-tii commented Apr 5, 2024 •

edited