New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory sharing between virtual machines #473
base: main
Are you sure you want to change the base?
Memory sharing between virtual machines #473
Conversation
d6017d1
to
eef36be
Compare
Are you planning to add documentation to the repository? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please make PR description according to the template - https://github.com/tiiuae/ghaf/blob/main/.github/pull_request_template.md
- Please fill the PR template checklist, describe testing and remove "Draft" status.
- Re-request review
The purpose of this PR is to share the code and try together to solve the mentioned problem of high kernel overhead. |
Getting pert data. perf is a standard Linux tool (https://perf.wiki.kernel.org/index.php/Main_Page). In this PR the To check the memsocket application performace on GuiVM:
To check GuiVM's qemu performance on the host:
All files kept in the |
Thanks. I can reproduce your Behind the link of the description, there an interesting mention on
Can you please share reproducible |
I did iperf measuring, with the unsock tool - in order to tunnel TCP traffic over socket and shared memory. Test configuration: Performance achieved: result.client_x1.log |
This indicates you have compiled some dependency in the current working directory ( Optionally the test tools you have used could be nix packaged and included in the PR Would it make sense to add some simple unit throughput test through memsocket instead? What do you think? |
If I try a bit further, to compile the
Edit.
|
Also, I think this is relevant to performance comparison between X1 and desktop - #340 (review) |
Fixed. |
chromium -> [socket] -> waypipe -> [socket] -> memsocket -> [shared memory] -> memsocket send() -> [socket] -> waypipe -> wayland
I added binary library and a few commands by hand into chromium and gui VMs. Is it OK it I attach them here, or you want to include them in the Ghaf build? |
This command:
from here indicates that some commands are also required on |
The iperf.tar.gz |
Ok, thanks. Confirms my understanding of the real scenario vs. perf test scenario. I tried to rebase your PR with the
This is good approach. Super simple and confirms the scenario. So there's nothing throttling the write on |
eef36be
to
6224584
Compare
It doesn't seem we are facing data loss. Connection between remote and waypipe instances is multichannel, data is sent both directions. After portion of data sent from AppVM, GuiVM sends a short package of data, probably it's confirmation. If its not received by AppVM, there is no further transmission. Also if we faced data loss, GUI stuff - windows, menu, scroll bars would be damaged. We don't face it. I've pushed my branch rebased with main. |
Thanks, I installed your rebased draft branch build to X1 NVMe with:
The problem with
|
We're not using |
And this presentation in the blog post link of the previous comment is extremely relevant for optimizing our case - https://vmsplice.net/~stefan/stefanha-kvm-forum-2017.pdf |
I designed an utility for testing connection.
First start it on the receiving site (any of the VMs): Now file(s) can be send from another VM:
It's prints some statistics. Sample sessions: [ghaf@chromium-vm: [ghaf@chromium-vm:~]$ memsocket -s ./test.sock 3 & [ghaf@chromium-vm:~]$ memtest ./test.sock /dev/random 10M [ghaf@chromium-vm: [ghaf@chromium-vm:~]$ memtest ./test.sock /dev/random 100M ui # to brake the CRC [ghaf@chromium-vm:~]$ memtest ./test.sock /dev/random 100M Usage: memtest socket_path for receiving. [ghaf@zathura-vm:~]$ systemctl stop --user memsocket.service [ghaf@zathura-vm:~]$ memsocket -c ./test.sock & [ghaf@zathura-vm:~]$ memtest ./test.sock |
I noticed that sending data between gui-vm and any other machine (zathura-vm, chromium-vm) is significantly slower then sending between app VMs (e.g. zathura-vm and chromium-vm). On my desktop I use qemu version 8.0.5. The full run command line (it's a Ghaf vm-debug build): |
I ran my tool in several configuration. Results:
Shutting down weston gave no result. But doing that and eliminating passing-through the GPU solved the problem. After that transfer to/from gui-vm have the same high speed as inter app-vm's. I'll continue with different virtual PCI devices setup, and try to use qemu own monitor in order to find the bottleneck. |
The driver already contains interrupts counters. I verified number of interrupts sent by one VM against numbers listed in /proc/interrupts/. There are in line, i.e. number of interrupts raised by one VM is equal to number /proc/interrupts in the other VM as well as counted by the device driver. It means that there is no interrupt loop, missed interrupts. Delays and huge CPU load happen only when GPU is pass-through. Without it CPU load is less then 5%. |
379f0cf
to
d8e1c2a
Compare
d8e1c2a
to
f0ca97a
Compare
Added some improvements:
|
f0ca97a
to
5ea3c2c
Compare
5ea3c2c
to
2967781
Compare
Signed-off-by: Jaroslaw Kurowski <jaroslaw.kurowski@tii.ae>
2967781
to
dec7369
Compare
A draft PR for a memory-sharing solution used for socket to socket data sending between virtual machines. It may
be used for wayland displaying.
The purpose of this PR is to share the code and try to find answers to the problem that arose.
Jira ticket:
https://ssrc.atlassian.net/browse/SP-3805
Documentation:
https://ssrc.atlassian.net/wiki/spaces/~62552e6ffdb60b006927ad98/blog/2022/09/29/612958326/Memory+sharing+between+virtual+machines
https://ssrc.atlassian.net/wiki/spaces/~62552e6ffdb60b006927ad98/pages/825720835/Wayland+displaying+with+shared+memory
Scenario that executes properly
Lenovo P1 with Fedora FC38 installed. For displaying and application VM's, there are two
separate vm-debug virtual machines with shared memory driver and the memsocket application
extra installed.
Actual
Playing YouTube is smooth, up to a resolution of 1440.
CPU consumption by the memsocket application is below 5%, usually 2-3%.
Regular Ghaf Lenovo X1 scenario
Build Ghaf for the Lenovo X1 target and run it.
Connect to WiFi, run Chrome, and play YouTube videos.
Actual
video is choppy, the memsocket application uses around 100% CPU.
Expected
Playing YouTube is smooth, with a resolution of up to 1440.
Another test was performed with a Lenovo X1 installed with FC and running virtual machines.
extracted from the Ghaf build.
The result was the same as with Ghaf, i.e., video quality was unacceptable and CPU consumption
by the memsocket application was high, also around 100%.
Observations
Performance measurement was done both on the GuiVM and host.
The perf data files:
perf_host.tar.gz
perf_guivm.tar.gz
Results on GuiVM
`
Samples: 387K of event 'cpu-clock:pppH', Event count (approx.): 96985250000
Children Self Command Shared Object Symbol
0.14% 0.12% memsocket [kernel.kallsyms] [k] __softirqentry_text_start
0.14% 0.00% memsocket [kernel.kallsyms] [k] __irq_exit_rcu
`
Results on host
`
+Samples: 152K of event 'cpu_atom/cycles/', Event count (approx.): 5979663105
`
Both measurements indicate that the time is spent while executing the kernel __sock_sendmsg syscall,
when the send() function is used to write a buffer located in the shared memory into
wayland socket.