Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arch-x86: bug in movss instruction #893

Closed
Nils-TUD opened this issue Feb 26, 2024 · 5 comments · Fixed by #1024
Closed

arch-x86: bug in movss instruction #893

Nils-TUD opened this issue Feb 26, 2024 · 5 comments · Fixed by #1024
Labels
arch-x86 The X86 ISA bug

Comments

@Nils-TUD
Copy link

Describe the bug

According to c9x.me the movss instruction (F3 0F 10) copies the lowest 32-bit of one XMM register to the lowest 32-bit of another XMM register. However, gem5 copies to lower 64-bit, producing wrong results.

Affects version
Current stable version (v23.1)

gem5 Modifications
None

To Reproduce
Let's consider the following program in test.S:

.text
.global _start
_start:
    mov $0x1122334455667788,%rax
    mov $0x12345678,%rcx
    movq %rax,%xmm0
    movq %rcx,%xmm1
    movss %xmm1,%xmm0

    // gem5 shutdown
    mov $0,%rdi
    .long   0x0021040F
1:  jmp 1b

Build it:

$ gcc -nostdlib -o test test.S

Run it on gem5:

$ ./build/X86/gem5.opt --debug-flags=Exec configs/learning_gem5/part1/two_level.py ./test > log.txt

Terminal Output
The relevant part of log.txt is:

116219000: system.cpu: T0 : 0x1000 @_start    : mov     rax, 0x1122334455667788
116219000: system.cpu: T0 : 0x1000 @_start. 0 :   MOV_R_I : limm   rax, 0x1122334455667788 : IntAlu :  D=0x1122334455667788
116223000: system.cpu: T0 : 0x100a @_start+10    : mov  rdi, 0x12345678
116223000: system.cpu: T0 : 0x100a @_start+10. 0 :   MOV_R_I : limm   rcx, 0x12345678 : IntAlu :  D=0x0000000012345678
116225000: system.cpu: T0 : 0x1011 @_start+17    : movd XMM0
116225000: system.cpu: T0 : 0x1011 @_start+17. 0 :   MOVD_XMM_R : mov2fp   %xmm0_low, rax, 0 : SimdMisc :  D=0x1122334455667788
116225000: system.cpu: T0 : 0x1011 @_start+17. 1 :   MOVD_XMM_R : lfpimm   %xmm0_high, 0 : FloatAdd :  D=0x0000000000000000
116229000: system.cpu: T0 : 0x1016 @_start+22    : movd XMM1
116229000: system.cpu: T0 : 0x1016 @_start+22. 0 :   MOVD_XMM_R : mov2fp   %xmm1_low, rcx, 0 : SimdMisc :  D=0x0000000012345678
116229000: system.cpu: T0 : 0x1016 @_start+22. 1 :   MOVD_XMM_R : lfpimm   %xmm1_high, 0 : FloatAdd :  D=0x0000000000000000
116231000: system.cpu: T0 : 0x101b @_start+27    : movss        XMM0
116231000: system.cpu: T0 : 0x101b @_start+27. 0 :   MOVSS_XMM_XMM : movfp   %xmm0_low, %xmm1_low : IntAlu :  D=0x0000000012345678

Expected behavior
According to the spec, the result in $xmm0 should be 0x1122334412345678 instead of 0x12345678. And that's also what I get when running it natively via $ gdb ./test:

...
1: x/i $pc
=> 0x555555555000 <_start>:     movabs $0x1122334455667788,%rax
(gdb) si
0x000055555555500a in _start ()
1: x/i $pc
=> 0x55555555500a <_start+10>:  mov    $0x12345678,%rcx
(gdb)
0x0000555555555011 in _start ()
1: x/i $pc
=> 0x555555555011 <_start+17>:  movq   %rax,%xmm0
(gdb)
0x0000555555555016 in _start ()
1: x/i $pc
=> 0x555555555016 <_start+22>:  movq   %rcx,%xmm1
(gdb)
0x000055555555501b in _start ()
1: x/i $pc
=> 0x55555555501b <_start+27>:  movss  %xmm1,%xmm0
(gdb)
0x000055555555501f in _start ()
1: x/i $pc
=> 0x55555555501f <_start+31>:  mov    $0x0,%rdi
(gdb) p/x $xmm0
$1 = {v8_bfloat16 = {0x5678, 0x1234, 0x3344, 0x1122, 0x0, 0x0, 0x0, 0x0}, v8_half = {0x5678, 0x1234, 0x3344, 0x1122, 0x0,
    0x0, 0x0, 0x0}, v4_float = {0x12345678, 0x11223344, 0x0, 0x0}, v2_double = {0x1122334412345678, 0x0}, v16_int8 = {0x78,
    0x56, 0x34, 0x12, 0x44, 0x33, 0x22, 0x11, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int16 = {0x5678, 0x1234, 0x3344,
    0x1122, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x12345678, 0x11223344, 0x0, 0x0}, v2_int64 = {0x1122334412345678, 0x0},
  uint128 = 0x1122334412345678}

Host Operating System
Ubuntu 22.04

Host ISA
x86-64

Compiler used
gcc 11.4.0

Additional information
The current implementation of movss looks like this:

def macroop MOVSS_XMM_XMM {
    movfp xmml, xmmlm, dataSize=4
};

I think the fix should be to make movfp only access the lower 32-bit of both XMM register. However, AFAICS there is currently no infrastructure to access the lower 32-bit part of an XMM register (only xmml and xmmh for the lower and higher 64-bit, respectively). Am I missing something or is the way to go here adding a new symbol like xmmll or so and a corresponding accessor function to float_reg?

@Nils-TUD Nils-TUD added the bug label Feb 26, 2024
@ivanaamit ivanaamit added the arch-x86 The X86 ISA label Mar 5, 2024
@ivanaamit
Copy link
Contributor

@Nils-TUD, your proposed change sounds good. Please feel free to submit that change to the gem5 repo. Thank you.

@ivanaamit
Copy link
Contributor

@Nils-TUD, this is a friendly reminder to create a PR with your proposed change. Thanks.

@Nils-TUD
Copy link
Author

Thanks for the reminder! I still have that on my todo list, but have to find the time. That might still take a few weeks, I'm afraid.

@EmilyG2413
Copy link

Hello, my name is Emily and I’m working with my partner Lukas on a final project for our Virtualization class. We are Computer Science at the University of Texas at Austin and currently enrolled in CS360V Virtualization, taught by Dr. Chidambaram (VJ). As part of our coursework, we are required to contribute to a few issues within an open-source repository relevant to the class.

After reviewing the gem5 repository, we believe that this particular issue aligns well with our objectives and skill set and were wondering if we may be assigned to this issue. If that's possible, before proceeding, we wanted to ask if there are any additional details or considerations regarding this issue that we should be aware of. We are committed to completing all three issues by May 6th to the best of our abilities.

Thank you for considering our request,
Emily and Lukas

@Nils-TUD
Copy link
Author

Nils-TUD commented Apr 8, 2024

That's more than fine from my side! :) Thanks a lot! I haven't looked deeper into this issue and therefore don't have more information than already given above.

@BobbyRBruce BobbyRBruce linked a pull request Apr 15, 2024 that will close this issue
ivanaamit added a commit that referenced this issue Apr 18, 2024
Movfp instruction did not account for only copying the lower half of src
register if dataSize is 4.
GitHub Issue: #893 
I used the test code in issue #893 to verify the fix is working.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-x86 The X86 ISA bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants