arch-x86: bug in movss instruction #893

Nils-TUD · 2024-02-26T16:33:06Z

Describe the bug

According to c9x.me the movss instruction (F3 0F 10) copies the lowest 32-bit of one XMM register to the lowest 32-bit of another XMM register. However, gem5 copies to lower 64-bit, producing wrong results.

Affects version
Current stable version (v23.1)

gem5 Modifications
None

To Reproduce
Let's consider the following program in test.S:

.text
.global _start
_start:
    mov $0x1122334455667788,%rax
    mov $0x12345678,%rcx
    movq %rax,%xmm0
    movq %rcx,%xmm1
    movss %xmm1,%xmm0

    // gem5 shutdown
    mov $0,%rdi
    .long   0x0021040F
1:  jmp 1b

Build it:

$ gcc -nostdlib -o test test.S

Run it on gem5:

$ ./build/X86/gem5.opt --debug-flags=Exec configs/learning_gem5/part1/two_level.py ./test > log.txt

Terminal Output
The relevant part of log.txt is:

116219000: system.cpu: T0 : 0x1000 @_start    : mov     rax, 0x1122334455667788
116219000: system.cpu: T0 : 0x1000 @_start. 0 :   MOV_R_I : limm   rax, 0x1122334455667788 : IntAlu :  D=0x1122334455667788
116223000: system.cpu: T0 : 0x100a @_start+10    : mov  rdi, 0x12345678
116223000: system.cpu: T0 : 0x100a @_start+10. 0 :   MOV_R_I : limm   rcx, 0x12345678 : IntAlu :  D=0x0000000012345678
116225000: system.cpu: T0 : 0x1011 @_start+17    : movd XMM0
116225000: system.cpu: T0 : 0x1011 @_start+17. 0 :   MOVD_XMM_R : mov2fp   %xmm0_low, rax, 0 : SimdMisc :  D=0x1122334455667788
116225000: system.cpu: T0 : 0x1011 @_start+17. 1 :   MOVD_XMM_R : lfpimm   %xmm0_high, 0 : FloatAdd :  D=0x0000000000000000
116229000: system.cpu: T0 : 0x1016 @_start+22    : movd XMM1
116229000: system.cpu: T0 : 0x1016 @_start+22. 0 :   MOVD_XMM_R : mov2fp   %xmm1_low, rcx, 0 : SimdMisc :  D=0x0000000012345678
116229000: system.cpu: T0 : 0x1016 @_start+22. 1 :   MOVD_XMM_R : lfpimm   %xmm1_high, 0 : FloatAdd :  D=0x0000000000000000
116231000: system.cpu: T0 : 0x101b @_start+27    : movss        XMM0
116231000: system.cpu: T0 : 0x101b @_start+27. 0 :   MOVSS_XMM_XMM : movfp   %xmm0_low, %xmm1_low : IntAlu :  D=0x0000000012345678

Expected behavior
According to the spec, the result in $xmm0 should be 0x1122334412345678 instead of 0x12345678. And that's also what I get when running it natively via $ gdb ./test:

...
1: x/i $pc
=> 0x555555555000 <_start>:     movabs $0x1122334455667788,%rax
(gdb) si
0x000055555555500a in _start ()
1: x/i $pc
=> 0x55555555500a <_start+10>:  mov    $0x12345678,%rcx
(gdb)
0x0000555555555011 in _start ()
1: x/i $pc
=> 0x555555555011 <_start+17>:  movq   %rax,%xmm0
(gdb)
0x0000555555555016 in _start ()
1: x/i $pc
=> 0x555555555016 <_start+22>:  movq   %rcx,%xmm1
(gdb)
0x000055555555501b in _start ()
1: x/i $pc
=> 0x55555555501b <_start+27>:  movss  %xmm1,%xmm0
(gdb)
0x000055555555501f in _start ()
1: x/i $pc
=> 0x55555555501f <_start+31>:  mov    $0x0,%rdi
(gdb) p/x $xmm0
$1 = {v8_bfloat16 = {0x5678, 0x1234, 0x3344, 0x1122, 0x0, 0x0, 0x0, 0x0}, v8_half = {0x5678, 0x1234, 0x3344, 0x1122, 0x0,
    0x0, 0x0, 0x0}, v4_float = {0x12345678, 0x11223344, 0x0, 0x0}, v2_double = {0x1122334412345678, 0x0}, v16_int8 = {0x78,
    0x56, 0x34, 0x12, 0x44, 0x33, 0x22, 0x11, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int16 = {0x5678, 0x1234, 0x3344,
    0x1122, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x12345678, 0x11223344, 0x0, 0x0}, v2_int64 = {0x1122334412345678, 0x0},
  uint128 = 0x1122334412345678}

Host Operating System
Ubuntu 22.04

Host ISA
x86-64

Compiler used
gcc 11.4.0

Additional information
The current implementation of movss looks like this:

def macroop MOVSS_XMM_XMM {
    movfp xmml, xmmlm, dataSize=4
};

I think the fix should be to make movfp only access the lower 32-bit of both XMM register. However, AFAICS there is currently no infrastructure to access the lower 32-bit part of an XMM register (only xmml and xmmh for the lower and higher 64-bit, respectively). Am I missing something or is the way to go here adding a new symbol like xmmll or so and a corresponding accessor function to float_reg?

The text was updated successfully, but these errors were encountered:

ivanaamit · 2024-03-08T16:50:24Z

@Nils-TUD, your proposed change sounds good. Please feel free to submit that change to the gem5 repo. Thank you.

ivanaamit · 2024-03-28T18:12:54Z

@Nils-TUD, this is a friendly reminder to create a PR with your proposed change. Thanks.

Nils-TUD · 2024-03-28T18:56:38Z

Thanks for the reminder! I still have that on my todo list, but have to find the time. That might still take a few weeks, I'm afraid.

EmilyG2413 · 2024-04-04T02:51:29Z

Hello, my name is Emily and I’m working with my partner Lukas on a final project for our Virtualization class. We are Computer Science at the University of Texas at Austin and currently enrolled in CS360V Virtualization, taught by Dr. Chidambaram (VJ). As part of our coursework, we are required to contribute to a few issues within an open-source repository relevant to the class.

After reviewing the gem5 repository, we believe that this particular issue aligns well with our objectives and skill set and were wondering if we may be assigned to this issue. If that's possible, before proceeding, we wanted to ask if there are any additional details or considerations regarding this issue that we should be aware of. We are committed to completing all three issues by May 6th to the best of our abilities.

Thank you for considering our request,
Emily and Lukas

Nils-TUD · 2024-04-08T07:13:12Z

That's more than fine from my side! :) Thanks a lot! I haven't looked deeper into this issue and therefore don't have more information than already given above.

Movfp instruction did not account for only copying the lower half of src register if dataSize is 4. GitHub Issue: #893 I used the test code in issue #893 to verify the fix is working.

Nils-TUD added the bug label Feb 26, 2024

ivanaamit added the arch-x86 The X86 ISA label Mar 5, 2024

Lukas-Zenick mentioned this issue Apr 14, 2024

arch-x86: Movfp account for dataSize=4 #1024

Merged

BobbyRBruce linked a pull request Apr 15, 2024 that will close this issue

arch-x86: Movfp account for dataSize=4 #1024

Merged

ivanaamit closed this as completed May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arch-x86: bug in movss instruction #893

arch-x86: bug in movss instruction #893

Nils-TUD commented Feb 26, 2024

ivanaamit commented Mar 8, 2024

ivanaamit commented Mar 28, 2024

Nils-TUD commented Mar 28, 2024

EmilyG2413 commented Apr 4, 2024

Nils-TUD commented Apr 8, 2024

arch-x86: bug in movss instruction #893

arch-x86: bug in movss instruction #893

Comments

Nils-TUD commented Feb 26, 2024

ivanaamit commented Mar 8, 2024

ivanaamit commented Mar 28, 2024

Nils-TUD commented Mar 28, 2024

EmilyG2413 commented Apr 4, 2024

Nils-TUD commented Apr 8, 2024