cpu-o3: executing MFENCE in two parallel SMT threads causes hang #1049

nmosier · 2024-04-20T21:18:33Z

Describe the bug
Executing an x86 mfence instruction in two sibling SMT threads on the O3 CPU causes the CPU to get stuck. Specifically, it appears that one of the mfences reaches the head of the ROB but never becomes ready to execute.

This simple assembly proof of concept (mfence.asm) triggers the bug:

        global _start

_start:
        mov dword [rsp], eax
        mfence

        mov eax, 60
        mov edi, 0
        syscall

Affects version
develop @ c54039d

gem5 Modifications
None

To Reproduce

Build gem5: scons build/X86/gem5.opt
Assemble the POC: nasm -felf64 -o mfence.o mfence.asm && ld -o mfence mfence.o
Run it under SMT: ./build/X86/gem5.opt configs/deprecated/example/se.py --cpu-type=X86O3CPU --smt --caches -c './mfence;./mfence'

Terminal Output

$ timeout -s INT 10 ./build/X86/gem5.opt configs/deprecated/example/se.py --cpu-type=X86O3CPU -c './mfence;./mfence' --smt --caches 
gem5 Simulator System.  https://www.gem5.org
gem5 is copyrighted software; use the --copyright option for details.

gem5 version DEVELOP-FOR-24.0
gem5 compiled Apr 20 2024 20:38:46
gem5 started Apr 20 2024 21:17:07
gem5 executing on cafe-cet, pid 1395698
command line: ./build/X86/gem5.opt configs/deprecated/example/se.py --cpu-type=X86O3CPU -c './mfence;./mfence' --smt --caches

warn: The se.py script is deprecated. It will be removed in future releases of  gem5.
Global frequency set at 1000000000000 ticks per second
warn: No dot file generated. Please install pydot to generate the dot file and pdf.
src/mem/dram_interface.cc:690: warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes)
src/arch/x86/linux/se_workload.cc:76: warn: Unknown operating system; assuming Linux.
src/arch/x86/linux/se_workload.cc:76: warn: Unknown operating system; assuming Linux.
src/base/statistics.hh:279: warn: One of the stats is a legacy stat. Legacy stat is a stat that does not belong to any statistics::Group. Legacy stat is deprecated.
system.remote_gdb: Listening for connections on port 7000
src/sim/power_state.cc:105: warn: PowerState: Already in the requested power state, request ignored
**** REAL SIMULATION ****
src/sim/simulate.cc:199: info: Entering event queue @ 0.  Starting simulation...
Exiting @ tick 15376263500 because user interrupt received

As you can see, the simulation times out after 10 seconds without both threads exiting.

Expected behavior
Both threads should exit immediately.

Host Operating System
Ubuntu 22.04

Host ISA
X86

Compiler used
GCC 11.4.0

Additional information
None yet

The text was updated successfully, but these errors were encountered:

nmosier · 2024-04-20T23:21:55Z

I think I've found the root cause of the issue. Thread 0 exits in the same cycle that thread 1's MFENCE instruction [sn:74] is sent from IEW to commit (via the toCommit time buffer). Thread 0 exiting causes a call to CPU::exitThreads() -> CPU::haltContext() -> CPU::removeThread(). This code causes the problem:

void
CPU::removeThread(ThreadID tid)
{
     .....
    // Flush out any old data from the time buffers.
    for (int i = 0; i < timeBuffer.getSize(); ++i) {
        timeBuffer.advance();
        fetchQueue.advance();
        decodeQueue.advance();
        renameQueue.advance();
        iewQueue.advance(); // This causes any completed instructions in-flight to commit, even those belonging to different threads, to be lost!
    }
    ....

Link to code

nmosier · 2024-04-20T23:23:43Z

One fix would just be to clear out all thread-specific state from each time buffer (or just iewQueue, since that's what's causing problems), rather than nuking the contents of all the time buffers.

Fix gem5#1049. Clear only thread-specific state from the O3 CPU time buffers, rather than clearing *all* state for *all* threads. Change-Id: I48cd50e39b3cd5e0068ddfc19a03d9bbd3f31bd5

Fix gem5#1049. Clear only thread-specific state from the O3 CPU time buffers, rather than clearing *all* state for *all* threads. Specifically, this patch adds a `clearStates()` method for all O3 time buffer structs in src/cpu/o3/comm.hh. Upon thread exit, `CPU::removeThread()` now invokes this method for all structs in each time buffer, rather than flushing out the time buffers (which nukes the states for all threads, not just the exiting one). Change-Id: I48cd50e39b3cd5e0068ddfc19a03d9bbd3f31bd5

BobbyRBruce · 2024-04-22T18:28:41Z

Thanks @nmosier ! I think your solution is correct.

nmosier added the bug label Apr 20, 2024

This was referenced Apr 21, 2024

cpu-o3: Clear thread state in time buffers on thread exit #1052

Open

cpu-o3: Assertion failure in CPU::removeThread() on SMT thread exit #1055

Closed

BobbyRBruce linked a pull request Apr 22, 2024 that will close this issue

cpu-o3: Clear thread state in time buffers on thread exit #1052

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpu-o3: executing MFENCE in two parallel SMT threads causes hang #1049

cpu-o3: executing MFENCE in two parallel SMT threads causes hang #1049

nmosier commented Apr 20, 2024

nmosier commented Apr 20, 2024

nmosier commented Apr 20, 2024

BobbyRBruce commented Apr 22, 2024

cpu-o3: executing MFENCE in two parallel SMT threads causes hang #1049

cpu-o3: executing MFENCE in two parallel SMT threads causes hang #1049

Comments

nmosier commented Apr 20, 2024

nmosier commented Apr 20, 2024

nmosier commented Apr 20, 2024

BobbyRBruce commented Apr 22, 2024