Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu-o3: executing MFENCE in two parallel SMT threads causes hang #1049

Open
nmosier opened this issue Apr 20, 2024 · 3 comments · May be fixed by #1052
Open

cpu-o3: executing MFENCE in two parallel SMT threads causes hang #1049

nmosier opened this issue Apr 20, 2024 · 3 comments · May be fixed by #1052
Labels

Comments

@nmosier
Copy link
Contributor

nmosier commented Apr 20, 2024

Describe the bug
Executing an x86 mfence instruction in two sibling SMT threads on the O3 CPU causes the CPU to get stuck. Specifically, it appears that one of the mfences reaches the head of the ROB but never becomes ready to execute.

This simple assembly proof of concept (mfence.asm) triggers the bug:

        global _start

_start:
        mov dword [rsp], eax
        mfence

        mov eax, 60
        mov edi, 0
        syscall

Affects version
develop @ c54039d

gem5 Modifications
None

To Reproduce

  1. Build gem5: scons build/X86/gem5.opt
  2. Assemble the POC: nasm -felf64 -o mfence.o mfence.asm && ld -o mfence mfence.o
  3. Run it under SMT: ./build/X86/gem5.opt configs/deprecated/example/se.py --cpu-type=X86O3CPU --smt --caches -c './mfence;./mfence'

Terminal Output

$ timeout -s INT 10 ./build/X86/gem5.opt configs/deprecated/example/se.py --cpu-type=X86O3CPU -c './mfence;./mfence' --smt --caches 
gem5 Simulator System.  https://www.gem5.org
gem5 is copyrighted software; use the --copyright option for details.

gem5 version DEVELOP-FOR-24.0
gem5 compiled Apr 20 2024 20:38:46
gem5 started Apr 20 2024 21:17:07
gem5 executing on cafe-cet, pid 1395698
command line: ./build/X86/gem5.opt configs/deprecated/example/se.py --cpu-type=X86O3CPU -c './mfence;./mfence' --smt --caches

warn: The se.py script is deprecated. It will be removed in future releases of  gem5.
Global frequency set at 1000000000000 ticks per second
warn: No dot file generated. Please install pydot to generate the dot file and pdf.
src/mem/dram_interface.cc:690: warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes)
src/arch/x86/linux/se_workload.cc:76: warn: Unknown operating system; assuming Linux.
src/arch/x86/linux/se_workload.cc:76: warn: Unknown operating system; assuming Linux.
src/base/statistics.hh:279: warn: One of the stats is a legacy stat. Legacy stat is a stat that does not belong to any statistics::Group. Legacy stat is deprecated.
system.remote_gdb: Listening for connections on port 7000
src/sim/power_state.cc:105: warn: PowerState: Already in the requested power state, request ignored
**** REAL SIMULATION ****
src/sim/simulate.cc:199: info: Entering event queue @ 0.  Starting simulation...
Exiting @ tick 15376263500 because user interrupt received

As you can see, the simulation times out after 10 seconds without both threads exiting.

Expected behavior
Both threads should exit immediately.

Host Operating System
Ubuntu 22.04

Host ISA
X86

Compiler used
GCC 11.4.0

Additional information
None yet

@nmosier nmosier added the bug label Apr 20, 2024
@nmosier
Copy link
Contributor Author

nmosier commented Apr 20, 2024

I think I've found the root cause of the issue. Thread 0 exits in the same cycle that thread 1's MFENCE instruction [sn:74] is sent from IEW to commit (via the toCommit time buffer). Thread 0 exiting causes a call to CPU::exitThreads() -> CPU::haltContext() -> CPU::removeThread(). This code causes the problem:

void
CPU::removeThread(ThreadID tid)
{
     .....
    // Flush out any old data from the time buffers.
    for (int i = 0; i < timeBuffer.getSize(); ++i) {
        timeBuffer.advance();
        fetchQueue.advance();
        decodeQueue.advance();
        renameQueue.advance();
        iewQueue.advance(); // This causes any completed instructions in-flight to commit, even those belonging to different threads, to be lost!
    }
    ....

Link to code

@nmosier
Copy link
Contributor Author

nmosier commented Apr 20, 2024

One fix would just be to clear out all thread-specific state from each time buffer (or just iewQueue, since that's what's causing problems), rather than nuking the contents of all the time buffers.

nmosier added a commit to nmosier/gem5 that referenced this issue Apr 21, 2024
Fix gem5#1049. Clear only thread-specific state from the O3 CPU time
buffers, rather than clearing *all* state for *all* threads.

Change-Id: I48cd50e39b3cd5e0068ddfc19a03d9bbd3f31bd5
nmosier added a commit to nmosier/gem5 that referenced this issue Apr 21, 2024
Fix gem5#1049. Clear only thread-specific state from the O3 CPU time
buffers, rather than clearing *all* state for *all* threads.

Change-Id: I48cd50e39b3cd5e0068ddfc19a03d9bbd3f31bd5
nmosier added a commit to nmosier/gem5 that referenced this issue Apr 21, 2024
Fix gem5#1049. Clear only thread-specific state from the O3 CPU time
buffers, rather than clearing *all* state for *all* threads.

Specifically, this patch adds a `clearStates()` method for all O3 time
buffer structs in src/cpu/o3/comm.hh. Upon thread exit,
`CPU::removeThread()` now invokes this method for all structs in each
time buffer, rather than flushing out the time buffers (which nukes
the states for all threads, not just the exiting one).

Change-Id: I48cd50e39b3cd5e0068ddfc19a03d9bbd3f31bd5
@BobbyRBruce
Copy link
Member

Thanks @nmosier ! I think your solution is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants