New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVX KVM Xsave issue #958
Comments
@mattsinc @v-ramadas These are the notes I took when diagnosing this issue. I thought they might be helpful |
Thanks! |
FYI: @nmosier pointed out to me via email that the XCR registers aren't being checkpointed. I attempted a hacky workaround to set XCR0 to the pre-checkpoint value but unfortunately the error persists, so while that is something that needs to be implemented, there is more work to be done still. |
The extended control registers were not being updated in the KVM thread context nor updated in the KVM state. This was causing issues when checkpointing since the XCR0 value was reverting to the default value rather than what it was previously before the checkpoint. THis was causing multiple applications to crash due to executing instructions which are now illegal instructions due to XCR0 being incorrect. This commit adds the XCR0 as a misc register similar to the exiting x86 control registers and adds all of the helper functions to access and set the register value. It also adds support for updating the KVM CPU's state with the register value and updating the thread context's misc reg value so that it is checkpointed along with the other misc regs. Note that this does *not* add support for XSAVE of the AVX state (i.e., the upper 128 bits of YMM registers). It does however fix the immediate problem in issue gem5#958 . Change-Id: I97456c8b57cbc7b381bd4be94944ce6567a43c76
The extended control registers were not being updated in the KVM thread context nor updated in the KVM state. This was causing issues when checkpointing since the XCR0 value was reverting to the default value rather than what it was previously before the checkpoint. THis was causing multiple applications to crash due to executing instructions which are now illegal instructions due to XCR0 being incorrect. This commit adds the XCR0 as a misc register similar to the exiting x86 control registers and adds all of the helper functions to access and set the register value. It also adds support for updating the KVM CPU's state with the register value and updating the thread context's misc reg value so that it is checkpointed along with the other misc regs. Note that this does *not* add support for XSAVE of the AVX state (i.e., the upper 128 bits of YMM registers). It does however fix the immediate problem in issue gem5#958 . Change-Id: I97456c8b57cbc7b381bd4be94944ce6567a43c76
The extended control registers were not being updated in the KVM thread context nor updated in the KVM state. This was causing issues when checkpointing since the XCR0 value was reverting to the default value rather than what it was previously before the checkpoint. THis was causing multiple applications to crash due to executing instructions which are now illegal instructions due to XCR0 being incorrect. This commit adds the XCR0 as a misc register similar to the exiting x86 control registers and adds all of the helper functions to access and set the register value. It also adds support for updating the KVM CPU's state with the register value and updating the thread context's misc reg value so that it is checkpointed along with the other misc regs. Note that this does *not* add support for XSAVE of the AVX state (i.e., the upper 128 bits of YMM registers). It does however fix the immediate problem in issue gem5#958 . A checkpoint upgrader is also provided to add the default value of XCR0 if the checkpoint tag is missing. Change-Id: I97456c8b57cbc7b381bd4be94944ce6567a43c76
The extended control registers were not being updated in the KVM thread context nor updated in the KVM state. This was causing issues when checkpointing since the XCR0 value was reverting to the default value rather than what it was previously before the checkpoint. THis was causing multiple applications to crash due to executing instructions which are now illegal instructions due to XCR0 being incorrect. This commit adds the XCR0 as a misc register similar to the exiting x86 control registers and adds all of the helper functions to access and set the register value. It also adds support for updating the KVM CPU's state with the register value and updating the thread context's misc reg value so that it is checkpointed along with the other misc regs. Note that this does *not* add support for XSAVE of the AVX state (i.e., the upper 128 bits of YMM registers). It does however fix the immediate problem in issue #958 . Change-Id: I97456c8b57cbc7b381bd4be94944ce6567a43c76
The extended control registers were not being updated in the KVM thread context nor updated in the KVM state. This was causing issues when checkpointing since the XCR0 value was reverting to the default value rather than what it was previously before the checkpoint. THis was causing multiple applications to crash due to executing instructions which are now illegal instructions due to XCR0 being incorrect. This commit adds the XCR0 as a misc register similar to the exiting x86 control registers and adds all of the helper functions to access and set the register value. It also adds support for updating the KVM CPU's state with the register value and updating the thread context's misc reg value so that it is checkpointed along with the other misc regs. Note that this does *not* add support for XSAVE of the AVX state (i.e., the upper 128 bits of YMM registers). It does however fix the immediate problem in issue gem5#958 . A checkpoint upgrader is also provided to add the default value of XCR0 if the checkpoint tag is missing. Change-Id: I97456c8b57cbc7b381bd4be94944ce6567a43c76
Describe the bug
The AVX / YMM register state is not saved or restored in gem5 with the X86KvmCPU leading to crashes on checkpoint restoration when AVX is enabled in CPUID.
Affects version
develop @ 0c684f2d331e47570f47e980307977284666582e
gem5 Modifications
No modification
To Reproduce
This is easiest to reproduce using a full system GPU configuration as it enables AVX by default and supports checkpoint/restore. This requires the VEGA_X86 build. The application doesn't really matter here, so for the application one can simply use a blank shell script.
nproc
Note: There is currently a workaround command line option that disables AVX to avoid this issue. Adding the
--disable-avx
command line option should not see this error.Terminal Output
Expected behavior
There should be no kernel backtrace dumps.
Host Operating System
Ubuntu 20.04
Host ISA
amd64
Compiler used
gcc 9.4.0
Additional information
Manual "backtrace" from Linux KVM call:
https://elixir.bootlin.com/linux/v5.4/source/arch/x86/kvm/x86.c#L3442
https://elixir.bootlin.com/linux/v5.4/source/arch/x86/kernel/fpu/core.c#L338
https://elixir.bootlin.com/linux/v5.4/source/arch/x86/include/asm/fpu/internal.h#L534
https://elixir.bootlin.com/linux/v5.4/source/arch/x86/include/asm/fpu/internal.h#L457
https://elixir.bootlin.com/linux/v5.4/source/arch/x86/include/asm/fpu/internal.h#L445
https://elixir.bootlin.com/linux/v5.4/source/arch/x86/include/asm/fpu/internal.h#L338
https://elixir.bootlin.com/linux/v5.4/source/arch/x86/include/asm/fpu/internal.h#L260
https://elixir.bootlin.com/linux/v5.4/source/arch/x86/include/asm/asm.h#L153
https://elixir.bootlin.com/linux/v5.4/source/arch/x86/mm/extable.c#L106
The text was updated successfully, but these errors were encountered: