Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Discrepancies On Similar Hardware #2988

Open
geoffreymoyerweatherford opened this issue Jan 6, 2023 · 4 comments
Open

Performance Discrepancies On Similar Hardware #2988

geoffreymoyerweatherford opened this issue Jan 6, 2023 · 4 comments

Comments

@geoffreymoyerweatherford

Currently comparing 2 similar hardware platforms both containing same software builds. Results from this comparison have yielded significantly different performance results. According to the lscpu dump, eve's dom0 on the E3845 is running as a "para" and eve's dom0 on the E3940 is "full". Additionally, we a running a Windows 10 VM as a guest and noticed the more vCPUs allocated on the E3940, the lower the performance. 2 allocated vCPUs render the Windows 10 VM unusable. The performance on the E3845 has remained consistent, which leads me to believe this issue has to do with Xen being in a "para" versus "full" state.

Does anyone know how/when/where the Virtualization Type is set?
Is there a way to manually set the Virtualization Type?
Any other suggestions based on this issue?

Models and Versions:
Architecture: AMD64
Processor: Intel Atom E3845 and Intel Atom 3940
EVE Version: 8.11.0
Linux Kernel: 5.10.121 with SMP
Hypervisor: Xen
Xen Version: 4.15

Other Notes:

Thank You

Screenshot from 2023-01-06 11-15-21

Screenshot from 2023-01-06 14-16-54

@eriknordmark
Copy link
Contributor

@geoffreymoyerweatherford I see that E3940 has VT-d while E3845 does not support VT-d, which explains why the latter needs to use paravirtualized device drivers. That might make some difference in I/O performance, but I don't know how I/O intensive your application is.

I don't understand why you did:
conf/grub.cfg was updated to use 4 CPUS for hv_ctrd, hv_eve, and hv_dom0

That will make less CPU cores available to the guest VMs, since you are assigning more cores to the EVE orchestration services. Was there an issue which make you change the default of 1?
Note that the CPU allocation for the guest VMs are done using the EVE API from the controller.

@geoffreymoyerweatherford
Copy link
Author

geoffreymoyerweatherford commented Jan 7, 2023

Thank you for the response @eriknordmark

Some Rationale:

  • The reason I tweaked the conf/gub.cfg hv_dom0, was the 3 remaining CPUs never seemed to be released on bootup. The dmesg's, when using the defaults of 1, says we are booting in SMP but only 1 CPU is available. This was the case on both the E3940 and E3845. I thought dom0 would need to release these to make them available to the guests, please correct me if this is wrong.
  • The reason I tweaked the conf/gub.cfg hv_ctrd, was to increase the max vCPUs available to guest VMs and Containers. I assumed these ran as containerd-shims, please correct me if this is wrong.
  • conf/grub.cfg hv_eve can be returned back to its default.

There was an issue on the E3940 when we allocated 2 vCPUs via the EVE API and used the default values of 1 in the grub.cfg, the Windows OS never properly booted. It would halt in its initial load and freeze on the "spinning wheel". When we allocated 1 vCPUs via the EVE API and used the default values of 1 in the grub.cfg, the Windows OS did boot. However, we ideally need more cores for this VM. We were able to allocate 2 vCPUs when we bumped the default values from 1 to 4 in the grub.cfg but performance suffered significantly in the VM. Thats makes more sense now with your description above regarding EVE services. using more cycles.

We did not see the same issue when performing this same test on the E3845 using the same constraints. We were able to allocate 2 vCPUs via the EVE API and use the default values of 1 in the grub.cfg.

Are you aware of a way to disable the VT-d capabilities on the E3940 processor (without BIOS disabling it)? This option is not available in our BIOS version.

Thanks Again.

@eriknordmark
Copy link
Contributor

  • The reason I tweaked the conf/gub.cfg hv_dom0, was the 3 remaining CPUs never seemed to be released on bootup. The dmesg's, when using the defaults of 1, says we are booting in SMP but only 1 CPU is available. This was the case on both the E3940 and E3845. I thought dom0 would need to release these to make them available to the guests, please correct me if this is wrong.

With Xen the output from Linux running in dom0 only refers to the resources (in this case CPUs) allocated to dom0.
The Xen hypervisor keeps the other CPUs and then will be used by guest domains.
(Note that KVM works completely differently in this respect.)

  • The reason I tweaked the conf/gub.cfg hv_ctrd, was to increase the max vCPUs available to guest VMs and Containers. I assumed these ran as containerd-shims, please correct me if this is wrong.

No, they are not. The application instances you deploy on EVE-OS are always deployed on the hypervisor, whether the application instances are VM images or OCI containers. The use of containered inside EVE-OS is an implementation detail (we use linuxkit as a way to build and run the different components inside EVE-OS).

  • conf/grub.cfg hv_eve can be returned back to its default.

There was an issue on the E3940 when we allocated 2 vCPUs via the EVE API and used the default values of 1 in the grub.cfg, the Windows OS never properly booted. It would halt in its initial load and freeze on the "spinning wheel". When we allocated 1 vCPUs via the EVE API and used the default values of 1 in the grub.cfg, the Windows OS did boot. However, we ideally need more cores for this VM. We were able to allocate 2 vCPUs when we bumped the default values from 1 to 4 in the grub.cfg but performance suffered significantly in the VM. Thats makes more sense now with your description above regarding EVE services. using more cycles.

Is there a particular reason you are using Xen and not KVM? I think KVM is what is commonly used today.

My understanding is that if you use Xen without VT-d you'll need to have the paravirtualized device drivers in your Windows image; I don't think they are present by default. (And it is possible that the pv drivers are not needed with KVM on the same hardware since the Qemu setup is different.)

We did not see the same issue when performing this same test on the E3845 using the same constraints. We were able to allocate 2 vCPUs via the EVE API and use the default values of 1 in the grub.cfg.

That is very odd since I'd expect the absence of VT-d being more problematic; perhaps Xen+qemu does emulation on the fly?

Are you aware of a way to disable the VT-d capabilities on the E3940 processor (without BIOS disabling it)? This option is not available in our BIOS version.

I haven't seen any BIOS where one can enable/disable VT-d; some older BIOS required enabling VT-x since it wasn't enabled by default.

@geoffreymoyerweatherford
Copy link
Author

geoffreymoyerweatherford commented Jan 9, 2023

Is there a particular reason you are using Xen and not KVM? I think KVM is what is commonly used today.

We are using Xen since it was the only hypervisor that booted Windows for us. When tried using KVM (and ACRN) we saw the Windows VM enter a repair loop then entered the Advanced Settings/Recovery Wizard. It would never get beyond either of these states. I was also able to notice the BSOD message SYSTEM_THREAD_EXCEPTION_NOT_HANDLED.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants